<stereo-img> a web component to display 3D stereo pictures on web pages and in VR

2021-11-13

Stereo imaging is an old technique used to capture and view pictures with a sense of depth: simply capture what the left and right eyes see, and then ensure the image of each eye goes into the respective eye of the viewer. Consumer VR headsets are an ideal way to view stereoscopic pictures, since they have a wide angle of view and have solved the spatial tracking problem. Google Cardboard is probably the device that introduced me to both stereo photography and VR (though it's a bit later, after trying an HTC Vive that I realized the true immersive value proposition of VR). So it's around this time that I started to capture stereo pictures, at first for my wedding in 2016, with a rig made of two GoPros triggered by a remote and later with the Lenovo Mirage Camera (a camera with two lenses, producing VR180 pictures).

picture of an home-made stereo rig using two GoPros — A home-made stereo rig using two GoPros and a Lenovo Mirage Camera.

picture of a Lenovo Mirage Camera — A home-made stereo rig using two GoPros and a Lenovo Mirage Camera.

Sadly, neither VR nor stereo photography has really taken of (yet), and most viewers are proprietary native apps, when they haven't been discontinued.
So to ensure I'll still be able to view my stereo pictures in the future, I decided to build a stereo picture viewer on top of the web platform: See a demo and the GitHub repo.

From a developer standpoint, it's quite convenient to use, just add this HTML element to your page or app:
<stereo-img src="vr180-lenovo-mirage.vr.jpg"></stereo-img>

Here's a demo (I recommend clicking the "Enter VR" button if you have a VR headset):

It supports "VR Photos" ("VR180", Google Camera panorama, Cardboard camera..), left-right stereo images, and anaglyph 3D (You know, the pictures you usually view with red / green glasses).

How it's made?

The viewer is basically made of 3 technical pieces:

A stereo picture parser
A VR viewer
A simple web component

Let's talk parsing

Left-right stereo simply requires extracting the left and right halves of the image.
For Anaglyph, it requires extracting the green and red color channels of the image to reconstruct the left and right eye images. Note that for both of these, the image doesn't render well when seen in a regular image viewer.
This is where "VR pictures" (like the "VR180"format ) have an advantage: they store the right eye pixels in the image metadata, this makes them look a bit more normal when seen in a regular image viewer. I find this pretty clever: jam in the XMP metadata of a JPEG image another JPEG image (or even sound!). Because the right eye and other metadata are stored in the XMP metadata, writing a parser by hand would be quite some work (I started, but there is a lot to do to parse metadata of a JPEG file properly), so I rely on exifr, a powerful and maintained JavaScript module to parse metadata of a JPEG file. The spec of VR Photos is simple and well documented, so I was able to extract all metadata I needed (right eye pixels, orientation, angle of view, etc.).

A VR viewer for stereo pictures

Now that I had extracted the pixels for both eyes and some additional metadata like orientation and angle of view, I needed to render them in each eye the VR headset. Using Three.js, I rendered the images on a sphere in 3D, properly oriented and truncated using orientation and angle of view information. Three.js has built-in support for WebXR, starting VR session and creating the appropriate camera for each eye. Using layers, I am able to show to each eye of the VR headset the corresponding image.

Wrapping it all in a web component

To simplify the life of developers wanting to embed this viewer on their web pages, I wanted to abstract all of this complexity so that the "API" ends up exactly like using a regular <img src="image.jpg"> HTML tag. And this is where web components come into play. They are a framework agnostic way to declare custom HTML elements. I sincerely believe they are the future of the web platform, the unifying standard between rapidly changing frameworks. While I could have used Lit to create my web component, I just coded it "by hand", as it turns out defining a new custom element is actually just a few lines of code. The 3D renderer and "Enter VR" button are all rendered in a "shadow DOM", i.e. a DOM subtree that is encapsulated in the <stereo-img> element.

Give it a try

Take a look at the demo, and feel free to contribute.