Virtual Reality Sound Design, and its close cousins Augmented and Mixed Reality present some new challenges either not present or not as important in more traditional interactive media and games. By keeping in mind how our hearing system works under the hood, and understanding 3D sound perception's key elements--- delays, filtering, expectations and interaction with our other senses--- we can create more compelling and natural sounding VR audio soundscapes. Here are nine things to keep in mind when creating sound for virtual reality experiences.
1) Broadband sounds will localize better than narrowband
The wider the frequency content in the original source, especially at high frequencies, the better the audio spatialization technologies will operate. This is because HRTF processing, which is at the core of most spatialization algorithms, filters the sounds particularly strongly at high frequencies (above around 6-8kHz). If there is no frequency content there to filter, the HRTF filters have nothing to ‘grab onto’, and the spatialization effect will be weaker, particularly for elevation/declination and front/rear effects.
2) Be wary of going against expectations!
Because much of sound localization is learned, going against expectations can be dangerous. Both Microsoft and Google, for example, have independently stated that they have found it very difficult to process a “bird song” and have it robustly appear to come from below you, no matter what kind of processing they do. You can take “whitenoise.wav”, and process it to make it convincingly sound like it is coming from below. But if you take “birdsong.wav” and do the exact same signal processing, the bird will sound like it is above you. Why? Because in virtually all our human experience, bird sounds come from above you, and it is very difficult to override that preconception. In this case, expectations appear to override signal processing.
3) Don’t rely on the “spatialization processing” to do all the job of spatialization.
Current day spatialization algorithms are very powerful, and can do a good job. But there is more to 3D hearing the “the big 4” (HRTF, ILD, ITD, reverb). Have a look at this three minute video, a demonstration of audio and visual changes as objects are near and far from the listener.
Part of the reason we perceive the character's voice as being “far” is a) he is shouting and b) he is simultaneously relatively quiet. That is, a ‘far away’ voice isn’t just a near voice with a “far” distance effect applied, but is a complex interaction of the sound source itself, plus the environment it exists in and our learned expectations of the timbre of normal speech vs shouted speech and how loud we expect them to be. The nature of the sound itself is critical in determining its perceived distance. Even the best HRTF plug-in can’t do that.
4) Don’t go against visuals
This is a refinement of the above point. Our hearing works in conjunction (not in isolation from) our other senses. If you create a mismatch between the aural location of a sound source and its visual representation, you create a conflict in the brain, which it will try to resolve, more often than not in favor of the visuals.
5) Use natural, rather than synthetic sound sources
Much of our ability to properly localize sounds in 3D space around us is learned. We spend a lifetime learning what the tone of a human voice is like, what various environmental sounds sound like at various azimuths, elevations and distances. Because we localize sounds by detecting changes in frequency content, it is easier for us to recognize the position of sounds we are familiar with.
6) Use the visuals to your advantage
You can use visuals to help augment the quality of sound localization. For example, if you see a jet flying towards you, and hear and see it fly past your left shoulder behind you, that will be a more robust “rear audio image” than placing the same static jet sound behind you.
Note: using visuals is not ‘cheating’! Your brain is trying to make a single coherent picture of its world based on aural and visual (or other) inputs. Since it just saw a jet fly out of its field of view, it will do a very good job of making you think the jet still exists and is behind you.
7) Tread carefully when creating a gameplay mechanic that relies on audio spatialization accuracy
It turns out that even in nature, we aren’t very good at determining the location of a sound in 3D space. (See this GDC video, starting at about 15:15 in for about 3 minutes).
Since we aren’t exactly perfect in the real world, creating a gameplay mechanic which requires the player to use pinpoint sound location accuracy will likely result in a frustrating experience for the player. This is especially important because, since everyone’s ears are different, the same spatialization algorithm can sound vastly different to different people.
I learned this the hard way when I created a simple iOS game that relied on 3D sound for its main gameplay mechanic.
8) Localization on the horizontal plane is much better than above/below
Our ability to determine the position of a sound source varies greatly with where it is. We are the most precise on the horizontal plane, directly in front. We are the worst directly overhead. So if your game has sounds primarily on the same plane as the listener, you’ll get the best effect. By contrast, elevated sounds will be harder for a listener to locate with any accuracy.
9) It’s very difficult to properly judge 3D sound processing without head tracking
One of the great advantages of Virtual Reality is that it finally allows us to include head tracking into the 3D audio processing. Head movements greatly assist with 3D audio perception, and is an extremely powerful addition, particularly for front/rear differentiation. So be wary of judging your 3D audio implementation in your DAW or running an application without actually wearing your VR/AR/MR device—the audio experience may be drastically different