The pursuit of the perfect VR experience has been shared by endless sci-fi books and movies over at least the last 50 years. The most intuitive and commonly shared perception of what “real VR” should look and feel like came from Star Trek’s famous Holodeck experience, which was inspired by inventor and holographer Gene Dolgoff, who invented the first digital projector (see: http://www.startrek.com/article/meet-the-man-behind-the-holodeck-part-1). Fans love this incredible vision of a virtual simulation with true presence, where participants can interact with synthetic content that is indistinguishable from themselves in what seems like an endless universe, and who can blame them? I am one of them.
The generic name, especially in use aboard Federation starships, for the “smart” virtual reality system as evolved by the 2360s — a technology that combines transporter, replicator, and holographic systems.
I find myself reflecting about this vision and thinking of ways to make it a reality. As unattainable as it may sound, I take solace that we at Lytro move the ball forward step-by-step, overcoming countless technical challenges and inventing an entire ecosystem of solutions through methodical experimentation and imagination. At a top level, we know that both outward and inward Light Field solutions are likely to contribute to this vision. However, these techniques have advantages and challenges both creatively and technically due to current hardware and software limitations. So what can be done now that allows everyone to experience VR with true presence while forging towards a solution that brings the Holodeck to reality?
The most common technique for capturing live action for VR is often referred to as “spherical video.” There are multiple flavors of this technique but they all have one thing in common – the viewer is in the center of a virtual sphere and live action footage is projected around them. This can provide a great experience from a single point of view but, lack any ability to truly move in that space (aside from jumping from one spot to another with pre-captured content for “known locations,” as some applications do). The VR flavor of today is “360° video,” essentially flat 2D video frames stitched and displayed on a sphere (very much like planetarium) with a single point of view.
Cross section diagram of outward facing VR capture with four lenses
The common approach for producing spherical video uses outward facing VR capture where any number of cameras are pointing outward with a single common center. Using Light Field technology, Lytro is able to take this approach to a completely new level. Instead of “360° video” with a single point of view, Lytro provides infinite points of view by projecting the entire live action 3D scene as viewed from within a volume of space. How large of a space? That depends on the experience you want to provide, and the amount of data that you are willing to deal with (capture, process, edit, store, download, stream, etc). This also determines the number of cameras and size of the camera configuration required.
This Light Field view volume can be adjusted and even sculpted to meet very specific artistic or practical needs. It can also be scaled up and down to capture any desired volume of space but for two challenges: a) data size and b) no object can be placed inside of the Light Field view volume. Allow me to explain:
A) Data Size
Like any volume of space, Light Field view volume grows to the power of 3. This sounds completely intuitive until you start reflecting on what that means. Let’s assume that a 10cm x 10cm x 10 cm cube of space takes X amount of storage. Now, let’s just scale it to 0.5m x 0.5m x 0.5m cube of space – that’s already 125X larger. Continuing with that growth, a 1m x 1m x 1m cube of space is 1,000X larger. Considering that most VR content is already considered large and it captures the visual information for a single point in space, a 10cm x 10cm x 10cm cube is vastly larger in its visual information (arguably infinitely larger).
Volume growing to the power of 3 – 1x, 125x, 1,000x
Give it a try – look at an object a few feet away from you and move your head slightly from side to side a few inches – notice what moves and what changes. Now move your head in a different direction. Note how everything changes slightly: position of object, sizes, angles, reflections, highlights, colors, etc. That’s the challenge of the Light Field view volume – it is incredibly intricate and our eyes and brain take it completely for granted – that’s how the world around us behaves effortlessly and when it doesn’t, we notice!
B) Object Placement
As stated earlier, outward facing VR captures has all of the cameras pointing outward with a single common center. This means that no object can be placed behind the cameras (inside of the Light Field view volume) as no cameras will be observing it. This is a very counterintuitive tradeoff to consider. We all wish to capture the largest possible volume of space, enabling the largest range of viewer motion, but the larger the view volume, the further away objects need to be.
Outward facing VR capture with viewer inside of the view volume and the closest objects just outside of it
So, how do we build the Holodeck?
Outward facing VR capture can capture an incredibly intricate and realistic view of a world as long as the viewer’s point of view stays separated from that world (in it, but not intersecting with objects). For example, a viewer observing the moon landing can be placed just at arm’s length next to a lander with the astronaut leaving footprints on the moon’s surface, but not close enough to walk around the lander or climb its ladder. To do that, we need visual information that comes from other angles. For that we need inward facing VR capture.
Inward facing VR capture with cameras pointing toward a common center
Inward facing VR capture is exactly as it sounds – an arrangement of cameras, pointing toward a common center, covering a very wide range of possible angles around a given volume of space. As we think of what the Holodeck experience ought to be, this sounds like a fantastic solution. If we want to capture an experience inside of a room, we should cover that room with cameras and capture every ray of light that is inside of it! But wait a minute … how many cameras will that take?
As described in an earlier blog post (The On Set Experience of “Moon”), Lytro’s outward facing VR capture solution used more than 300 cameras to create a 360° spherical Light Field experience, which required us to develop a special server architecture to synchronize and store the data. To cover a room scale experience, many thousands of cameras would need to be involved and the sheer amount of data required (captured, processed, edited, etc.) would present an entirely new level of complication.
Even if such a system would be attempted, you can imagine many scenarios where portions of a scene cannot be seen/captured from any point along the edges of that imaginary room (where we just assumed these cameras would be placed). If the goal is to allow the viewer to walk freely inside of the room, the viewer is more than likely to encounter areas where no camera on the walls had an opportunity to “see it.” For example, imagine a person by a desk, leaning toward her computer monitor reading his email. Cameras placed behind the monitor would not see her face. Cameras on the sides would see the side of her face but not his front. Cameras on the top would see her forehead but also not his front. A VR viewer walking in that environment might easily attempt to look at the face of that person and not have any real view of that angle.
Furthermore, traditional inward facing camera setups introduce another interesting challenge – cameras suddenly can see each other. This is an obvious challenge considering that an inward facing setup doesn’t have any “infinity” (as outward facing capture has). It always captures a defined volume of space so the cameras define the edge of that space. As a result, inward capture requires extra steps for trimming off the edges and/or substituting them with something else (replacement background – like on a theater stage).
All that said, more practical approaches of inward facing capture are already in use in the VFX community. Motion capture stages use tens and even hundreds of cameras to track actors in 3D space. These motion vectors are then mapped to computer graphic models to generate more realistic motion in computer animation. Some VR pioneers create inward facing capture rigs for isolated objects (one actor at a time), translate their shape and position into 3D information (meshes or point clouds) and then use them in gaming engines – not achieving cinematic quality and realism, but great for games.
For Lytro, inward facing Light Field capture is a very interesting proposition and we are pursuing practical solutions in this field. The goal is to maintain the Light Field (reflections, specular highlights, etc.) in a completely photo-realistic environment, while enabling free range of motion for the viewer.
It is important to note that Lytro’s technology is not designed specifically for inward or outward facing capture. It is generalized enough to handle infinite number of configurations including hybrid models. Lytro’s technology can also accommodate the merger of data from multiple sources, including live action Light Field and ray-traced CG. The key is merging the information into a unified Light Field that creates a true sense of presence. The current distinction between inward and outward facing capture is merely a stage in the evolution of both VR and Light Field.
Very few companies have brought together the right mix of people, skills and innovation required to think through these challenges and push the envelope of what’s currently possible. By providing true presence in a defined volume of space, Lytro helps content creators build immersive new worlds and experiences that are transformative. This is a major stepping stone towards the vision of the Holodeck, which will remain an inspiration to us all.