Stereo Imaging, Amplitude Differences, Time Arrival Differences, and the Precedence Effect

Introduction

How “Stereo” is heard (for most people):

When humans hear two simultaneous arrivals of the same sound from different directions at the same amplitude, the “stereo” illusion is that the source is not from two different directions (i.e., from two separate speakers), but from a “phantom” location between those speakers.

Anybody who’s twiddled with the balance control has figured out that changing the relative amplitude between two sound sources will alter the perception of the direction of that “phantom” source, moving it towards, and ultimately directly from, the source with the highest amplitude. Recording engineers have used “balance” controls, i.e., what they refer to as “Pan Pots,” to pan sounds across the “stereo” sound field, from left to right or back again in the same way for many years, allowing them to “place” different tracks in different perceived locations across the soundstage.

When sounds happen in real space, they also create reflections, which arrive later, due to the fact that their paths are always longer than that of the straight shot of direct sound. Our sophisticated auditory processing uses these reflections in conjunction with the direct sound to figure out things about the environment, the relationship of the source of the direct sound to the environment, our relationship to the environment, and ultimately, our relationship to the source. Details about the size of the environment, the nature of surfaces around the listener and the sound source, as well as the location of nearby reflective boundaries can easily be discerned by those inclined to pay attention, much like an unconscious sonar.

In fact, those people who are graced with both functioning sight and hearing might be interested to know that much of the unconscious processing of both auditory and visual information happens right on top of each other, literally, suggesting that the habits of audiophiles describing much of what they hear in a visual context might be due to something more than a lack of vocabulary.

How We React When We Hear Something

Upon hearing a sound, we look for a source, and if none exists, we construct one in our mind’s eye. This offers one reason why things sound better with the lights off, as we can look for sounds without finding a visual contradiction. It also suggests double-blind tests as the only valid means of evaluating subtle sonic differences, but that’s an entirely different tangent, and fortunately the topic of this section, and in fact most anything that has to do with loudspeakers, is hardly subtle.

So, anyway, delayed sound, caused by reflections, tells us something about the creation of the sound, in terms of its environment, its relationship to that environment, and by association, to ours. Reflected sound delayed, say, 20 ms, equivalent to a reflection from a wall ten feet behind the source, can’t be discerned by humans as a separate arrival, but aside from frequency response changes caused by comb filtering (Check out a good surround processor configuration for an audible demonstration of frequency response changes due to different arrival times), can dramatically change the character of what we hear. Add floors, ceilings, the rest of the walls, the absorption rates and absorption spectrum of the materials, and it gets really interesting, but extremely complex. I think that it’s enough to know that the reflected sound, in terms of its multiple time arrivals, particularly the first reflection, is important.

Even more significant for this particular topic, is to recognize that in real life, “stereo” as two-channel “purists” have come to hold as correct, doesn’t happen. There is no simultaneous arrival of a single source of sound from more than one direction, certainly not with the 40-60 degree spread that two-channel reproduction depends on. Two channels are not inherently correct simply because we’ve got two ears. It’s just easy, and people are used to it. It’s certainly not natural, which is the only reason it can fool us into hearing what’s not there, because we haven’t evolved to accommodate artificial illusions.

In natural contexts, and with our auditory system, the direction of the first arrival is the true direction of the source, and any other information is used to establish context. By comparing the time arrival difference between our ears, and combining that with the Head Related Transfer Function (HRTF), our auditory system localizes the pattern of frequencies that make up the sound based on the first arrival of that pattern.

The Head Related Transfer Function, put simply, describes the effect that the shape of our head and outer ears, or pinnae, have on the sound before it enters the ear canal, so that things sound different coming from different directions, whether listening with two ears or not, because passing over or around our head changes the sound on its way to our ears, and does so differently from different directions.

The difference in time between your ears, due to one ear being closer to the sound, the inter-aural time delay, helps establish the angle in relation to the axis running between the ear canals. The HTRF refines the left/right information, as well as discerning front/back and up/down information.

If you’d like to try the time delay, you can snap your fingers behind your head, and listen to where it’s coming from. Then put a cardboard tube over an ear with your free hand, increasing the travel distance of the sound waves to one ear, and repeat. The snap moves to the non-tubed ear’s direction. The combination of the HRTF and the inter-aural time delays, in addition to amplitude differences at each ear location, helps us to discern the direction of the first arrival, and our auditory processing then locks out subsequent arrivals of the same sound as directional information, and uses it to establish distance, depth, and environment.

Perceptual psychologists have documented this filtering based on time arrivals as the “Precedence Effect.” With multiple sources of identical content (such as two loudspeakers,) if two identical impulses are played back, but with a slight delay inserted between the sources, the subjects cannot discern a delay, yet perceive the sound to originate more towards the direction of the first arrival, as opposed to between the sources.

To the Point

For those who’d like to try the experiment at home, the Chesky DVD Audio demonstration disc has tracks that play clicks equally divided between two channels in terms of amplitude, alternating between simultaneous and an inter-channel delay of 1 ms. If your setup is truly time-aligned, one click sits dead center between the two speakers, and the other completely clumps to a single speaker. This demonstrates several points of our long-winded topic.

First, time-alignment is critical between channels where depth and location information are to be ascertained from recorded material. If you wish to get the most spatial resolution out of your playback system, it is out and out imperative. If you have time-alignment via DSP, great, though it’s likely that the delay adjustments may not provide small enough increments if it’s limited to single units of feet, or 1 ms increments, at which point you’ll have to resort to physically moving the loudspeakers a few inches this way and that to make up the difference. Since some DVD Audio and SACD configurations don’t have provisions for time-alignment, even if Dolby-Digital and DTS playback do, it may mean forgoing time-alignment in the surround processor and aligning the speakers completely by physical position. Stacey Spears (who has a processor that will do time-alignment for DVD Audio) did a demo of differences in alignment by switching between delay times for the center speaker instantaneously, while we were listening, and if the material shared much content between the three front speakers, the differences were very far from subtle. The time-alignment of surround channels is problematic for those without electronic delay in a DVD Audio or SACD multi-channel situation, in that most listening rooms don’t allow anywhere near the same physical distance to surround channels as the front.

Often, the surrounds are too close to our sitting position. While their output levels may be set perfectly, the perceived result may be that the surround channels are too prominent in producing what should be, for the most part, ambient, environmental information. It’s not necessarily a disaster, but not optimal. In such situations, multi-speaker arrays, or dipolar surround speakers, which inherently force a diffuse sound field, may prove handy, in which case the arrivals and direction are so spread out anyway that the image couldn’t clump even if the recording engineer wanted it to.

Secondly, in two-channel, “purist” scenarios, there’s no such thing as a wide sweet spot that expands much at all off center for critical listeners. If it doesn’t sound much different to you when you move your head a foot or two to the left or right from center, you either aren’t sensitive to or paying attention to directional information for one reason or another, or your system/room combination may be mucking it up so badly that worse isn’t really much worse in a relative context. If the “image stayed solidly in the center even when I moved off-center” in a two-channel listening context, there was never a solid, genuine image to begin with.

Third, assuming you achieve time-alignment, while the use of a center channel may stabilize the front sonic soundstage, allowing a presentation more forgiving of listeners off center, there is only one ideal location in a multi-channel playback scenario, as there can be only one location, by simple geometry, which is equidistant (or equivalent factoring in DSP correction) from the left, center, and right channels. In two-channel playback, you’ve got whatever space you can fit equidistant from the left and right speakers that still satisfies the 40-60 degree spread that two-channel reproduction requires. In either scenario, if you had it right to begin with, moving away from that small listening area collapses the spread towards the closest loudspeaker, and flattens the perception of any depth captured or generated in the recording.

Fourth, simple balance controls don’t work as anything other than a poor band-aid as a last resort. If you’re closer to one speaker than another, the balance control simply compensates for amplitude, not time differences, and may very well lead to over compensating amplitude in an attempt to make up for time differences, messing up your listening experience twice. If you’ve got DSP available for a simple linear delay, use it. If not, sit in the right spot.

Fifth, while “Pan Pots” are alright in the context of most studio recordings which try to capture sounds in a single channel with as close to no acoustical environment as possible (a sound-proof booth) and then mix the sounds with pans amidst a variety of processing, and then filled out with artificial reverberation (usually) to compensate for the lack of ambient information, anybody who’s compared even the best, very well done studio recording to a decent recording of a performance captured real-time, without mixing, through a handful of microphones, can tell you that there’s no comparison in terms of imitating reality. Hey, I like to listen to a lot of recordings manufactured exclusively in a recording studio too, even to instruments that have no acoustic event, modified electronically to sound hopefully “clean, tight” and however else the recording engineer can creatively enhance the work for my pleasure, but such things aren’t real, and interestingly enough, the more accurate the monitoring system, the easier it is to hear the recording itself, in addition to the musical event.

Studio mixes can still sound really good, but making them sound realistic is actually much more difficult because the techniques habitually reject real ambient information for the sake of providing a controlled environment for the mix, where ambient information can be added by machine. A good idea? Considering the levels of compression, and plain poor taste in mixing that abounds with the majority of popular recordings, it’s not at the top of my recording engineer complaint list. Outlawing the use of the NS-10 near-field monitor (and replacing it with the much better NS-1000) for anything beyond a reference for how a mix sounds on a lousy speaker would be a good start. Imagine how much better recordings would be if all the engineers could hear what they’re making.

Conclusions

Bottom line: Be careful with using the time delay features for the various channels on your receiver, and with where you plop down your speakers.

Just thought I’d mention it.