Recording and mixing audio for virtual reality
As VR applications and cinematic experiences grow in popularity, so comes a requirement to design, create and mix the actual audio for these virtual spaces.
Ambisonic mixing requires a different understanding of how (and where) your audience will hear the various elements of your track
Technically speaking, VR audio is binaural, in other words a different audio signal is fed to each ear in order to create the perception of a three-dimensional sound field. In many senses, the stereophonic sound is binaural, but the words aren’t synonymous: stereo audio can’t recreate a complete and natural three-dimensional sound field, whereas that’s what binaural audio is all about. Stereo works very well, obviously, for creating left-right positioning and – if you know the tricks – can create the impression of near-far as well.
When coupled with conventional visual media, this works just fine, because the action on a two-dimensional screen maps naturally to a stereo sound field. Surround sound creates a more immersive aural experience than stereo, but just like stereo it is locked to a fixed, two-dimensional point of view, and just like stereo, it has no way of capturing or simulating the up-down position of a sound source.
What’s needed for cinematic VR, then, is a method of capturing panoramic three-dimensional sound, and a way of mixing and reproducing that sound independently of the listening position and speaker configuration – surprisingly, the technology for doing this, Ambisonics, has been around since the 1970s.
Ambisonics is a full-sphere surround-sound format, and works on much the same principles as mid-side stereo. With mid-side, the mid channel carries a collective signal of the sound at the listening position, while the side channel carries left-right positional offset information; in an Ambisonics system, three side channels are employed to capture positional information in all three dimensions (‘higher order’ Ambisonics use more side channels for greater positional accuracy).
And because Ambisonics captures a full-sphere sound field, it’s possible for the apparent listening position within this field to be modified by, in effect, rotating the sphere – just what’s needed for VR! Similarly, when mixing for VR, Ambisonics panners allow you to position non-Ambisonic parts within the 3D sound field, in much the same way as a stereo panner lets you set the left-right position of a mono part within a stereo sound field.
There are actually quite a few Ambisonic microphones on the market now, and custom-built rigs of omni- and bi-directional mics are also common. Good examples of these include the Sennheiser Ambeo, NT-SF1 (and Soundfield app) by RØDE, Core Sound’s TetraMic and the Zoom H3-VR. Most mainstream DAWs support Ambisonics either natively (Pro Tools, Nuendo and Cubase, for example) or via plug-ins such as the Waves B360 Ambisonics Encoder. This support allows you to create a multi-channel Ambisonics mix while monitoring via stereo headphones and allows the listening direction to be modified, either by an HMD or via onscreen controls.
There are also tools for syncing your DAW with a VR video player, although this is a clumsy way to work because you can’t see both your DAW’s controls and the scene you’re working on at the same time, and when viewing the VR scene, there’s no guarantee you’ll even be facing your workstation. Solutions will come, but for now, this clumsiness is par for the course.
Once you’ve got DAW and HMD playing nicely together, you then have to face the biggest challenge of VR audio: nothing you know is relevant any more! Reverbs don’t really work in this all-encompassing context, and create a confusing and unrealistic aural environment; positioning point-source sounds, such as a voice or car horn is easy enough, but convincingly simulating their reflections and echoes within the virtual environment is not; positioning wider, diffuse sounds, like ambience and foley, and making them move realistically with the point of view, is also difficult but not impossible. For example, the Ambisonics mode of Steinberg’s MultiPanner has a number of tools for controlling the three-dimensional position, source size and field size of a sound.
However, capturing natural sound in a real space using a full-sphere mic will always produce a more convincing result than trying to simulate sound effects, ambience and Foley with samples and reverb; in the absence of an Ambisonic mic, a mid-side pair will capture a sound that works reasonably well within an Ambisonic system.
As with most things, you can find plenty of discussions on the web about VR audio, and these are certainly useful in helping you set off on the right course. But if you’re planning on moving more seriously into VR audio production then the best thing to do is experiment, fail, and then experiment some more! Every innovation in technology has been essentially founded on that principle. With VR technology accelerating at the rate it is, it makes sense to arm yourself with the technical knowledge that could potentially become the norm for audio production in years to come.