In today’s blog post, we’re going to talk about how we leveraged cloud processing to produce Hallelujah. Because we shoot with 475 cameras (95 cameras in five wedges) to get a full 360º Light Field, Lytro videos contain hundreds of times more data than a standard 360º video production. Our second VR feature, Hallelujah, was ten times longer than our first and required significant render resources. There was only one way to meet this challenge: embrace cloud rendering.
To process at this scale, we had to be mindful of compute costs. To manage them, we leveraged “preemptible instances” on Google’s Cloud Platform. These instances are roughly 25% of the cost of their non-preemptible counterparts, but Google can shut them down at any time and will always shut them down after they have run for 24 hours. To make the best use of these low-cost virtual machines, we tailored our jobs to run in 10 to 15 minutes. We called this the “Goldilocks” duration because at most we would lose those minutes if our jobs were preempted, but the jobs were long enough to reduce the load on our render management systems which allocate work to waiting servers.
But there’s more to processing in the cloud than just managing costs. Our cloud operations team developed rendering solutions for each Lytro pipeline stage to maximize efficiency while generating Light Field images in the cloud. Let’s dive into each in detail; pre-processing, post-production rendering, and Light Field volume generation.
To optimize our pre-processing renders, which reconstruct captured scene information, we quickly realized that we would need to focus on reducing I/O times (in particular write output on shared filers). We chose to use the more powerful 32-core SSD filers to ensure that as we added more machines to our cloud render farm their job, progress would not be slowed due to I/O issues. We also combined nearly a dozen large filers into a cluster which allowed us to achieve a total write throughput of 3.5 GB per second, with all jobs communicating through a single file cluster interface. We got a bit lucky when a mid-production increase in Google’s I/O throughput on those filesystems also helped us achieve these high write speeds.
The preprocessed point cloud from the interior of St. Ignatius church.
Post-production is the only part of our Light Field rendering process that uses third party rendering tools, and our goal at this stage was to maximize job output while minimizing license usage. Using high-memory 32-core machines enabled us to quadruple the number of active job sessions for each license consumed, a big improvement over the numbers we achieved during our first production using smaller, on premise hardware (8-core, 64GB RAM machines). In the months since Hallelujah was completed, powerful machines have become more widely available and we anticipate running our third-party jobs on 64-core or higher machines during future productions.
To optimize our Light Field volume generation process, we rewrote some of our rendering software to be CPU rather than GPU focused. CPU based algorithms had the potential to produce better quality images and – just as important – would allow us to access more and cheaper machines in the cloud. We knew that file reading would be a bottleneck during Light Field volume generation, so we decided to switch some of our filers to use Google Cloud Platform SSD persistent disk offering in read-only mode. Using these machines in read-only mode let us bypass the SSD restriction of 800 megabytes per second reads per file system, and instead achieve speeds of hundreds of megabytes per second for every client of the read-only disk. This allowed us to run 24,000 cores in parallel using 1,500 16-core virtual machines reading from just one SSD persistent disk.
Balancing render time and cost to achieve the highest possible visual quality is not easy, and our cloud operations team did an amazing job building this end-to-end pipeline for Hallelujah. At Lytro, we’re “all in” for cloud rendering – it’s the only way to process our data sets for aggressive post-production timelines. If this all sounds really exciting, it is, and we’re hiring. 😉