Introduction
How “Stereo” is heard (for most people):
When humans hear two simultaneous arrivals of the same sound from
different directions at the same amplitude, the “stereo” illusion is
that the source is not from two different directions (i.e., from two
separate speakers), but from a “phantom” location between those
speakers.
Anybody who's twiddled with the balance control has figured out that
changing the relative amplitude between two sound sources will alter the
perception of the direction of that “phantom” source, moving it towards,
and ultimately directly from, the source with the highest amplitude.
Recording engineers have used “balance” controls, i.e., what they refer to as
“Pan Pots,” to pan sounds across the “stereo” sound field, from left to
right or back again in the same way for many years, allowing them to “place”
different tracks in different perceived locations across the soundstage.
When sounds happen in real space, they also create reflections, which
arrive later, due to the fact that their paths are always longer than
that of the straight shot of direct sound. Our sophisticated auditory
processing uses these reflections in conjunction with the direct sound
to figure out things about the environment, the relationship of the
source of the direct sound to the environment, our relationship to the
environment, and ultimately, our relationship to the source. Details
about the size of the environment, the nature of surfaces around the
listener and the sound source, as well as the location of nearby
reflective boundaries can easily be discerned by those inclined to pay
attention, much like an unconscious sonar.
In fact, those people who are graced with both functioning sight and hearing might be
interested to know that much of the unconscious processing of both
auditory and visual information happens right on top of each other,
literally, suggesting that the habits of audiophiles describing much of
what they hear in a visual context might be due to something more than
a lack of vocabulary.
How We React When We Hear Something
Upon hearing a sound, we look for a source, and if none exists, we
construct one in our mind's eye. This offers one reason why things sound
better with the lights off, as we can look for sounds without finding a
visual contradiction. It also suggests double-blind tests as the only
valid means of evaluating subtle sonic differences, but that's an
entirely different tangent, and fortunately the topic of this section,
and in fact most anything that has to do with loudspeakers, is hardly
subtle.
So, anyway, delayed sound, caused by reflections, tells us something
about the creation of the sound, in terms of its environment, its
relationship to that environment, and by association, to ours. Reflected
sound delayed, say, 20 ms, equivalent to a reflection from a wall ten
feet behind the source, can't be discerned by humans as a separate
arrival, but aside from frequency response changes caused by comb
filtering (Check out a good surround processor configuration for an
audible demonstration of frequency response changes due to different
arrival times), can dramatically change the character of what we hear.
Add floors, ceilings, the rest of the walls, the absorption rates and
absorption spectrum of the materials, and it gets really interesting,
but extremely complex. I think that it's enough to know that the
reflected sound, in terms of its multiple time arrivals, particularly
the first reflection, is important.
Even more significant for this particular topic, is to recognize that in
real life, “stereo” as two-channel “purists” have come to hold as correct,
doesn't happen. There is no simultaneous arrival of a single source of
sound from more than one direction, certainly not with the 40-60 degree
spread that two-channel reproduction depends on. Two channels are not
inherently correct simply because we've got two ears. It's just easy, and people
are used to it. It's certainly not natural, which is the only reason it
can fool us into hearing what's not there, because we haven't evolved to
accommodate artificial illusions.
In natural contexts, and with our auditory system, the direction of the
first arrival is the true direction of the source, and any other
information is used to establish context. By comparing the time arrival
difference between our ears, and combining that with the Head Related
Transfer Function (HRTF), our
auditory system localizes the pattern of frequencies that make up the
sound based on the first arrival of that pattern.
The Head Related Transfer Function, put simply, describes the effect
that the shape of our head and outer ears, or pinnae, have on the sound
before it enters the ear canal, so that things sound different coming
from different directions, whether listening with two ears or not,
because passing over or around our head changes the sound on its way to
our ears, and does so differently from different directions.
The difference in time between your ears, due to one ear being closer to
the sound,
the inter-aural time delay, helps establish the angle in relation to the
axis running between the ear canals. The HTRF refines the left/right
information, as well as discerning front/back and up/down information.
If you'd like to try the time delay, you can snap your fingers behind
your head, and listen to where it's coming from. Then put a cardboard tube over an
ear with your free hand, increasing the travel distance of the sound
waves to one ear, and repeat. The snap moves to the non-tubed ear's
direction. The combination of the HRTF and the inter-aural time delays,
in addition to amplitude differences at each ear location, helps us to
discern the direction of the first arrival, and our auditory processing
then locks out subsequent arrivals of the same sound as directional
information, and uses it to establish distance, depth, and environment.
Perceptual psychologists have documented this filtering based on time
arrivals as the “Precedence Effect.” With multiple sources of identical
content (such as two loudspeakers,) if two identical impulses are
played back, but with a slight delay inserted between the sources, the
subjects cannot discern a delay, yet perceive the sound to originate
more towards the direction of the first arrival, as opposed to between
the sources.
To the Point
For those who'd like to try the experiment at home, the Chesky DVD Audio
demonstration disc has tracks that play clicks equally divided between
two channels in terms of amplitude, alternating between simultaneous and an
inter-channel delay of 1 ms. If your setup is truly time-aligned, one
click sits dead center between the two speakers, and the other
completely clumps to a single speaker. This demonstrates several points
of our long-winded topic.
First, time-alignment is critical between channels where depth and
location information are to be ascertained from recorded material. If
you wish to get the most spatial resolution out of your playback system,
it is out and out imperative. If you have time-alignment via DSP, great,
though it's likely that the delay adjustments may not provide small
enough increments if it's limited to single units of feet, or 1 ms
increments, at which point you'll have to resort to physically moving
the loudspeakers a few inches this way and that to make up the
difference. Since some DVD Audio and SACD configurations don't have
provisions for time-alignment, even if Dolby-Digital and DTS playback
do, it may mean forgoing time-alignment in the surround processor and
aligning the speakers completely by physical position. Stacey Spears (who has a
processor that will do time-alignment for DVD Audio) did a demo of
differences in alignment by switching between delay times for the center
speaker instantaneously, while we were listening, and if the material
shared much content between the three front speakers, the differences
were very far from subtle. The time-alignment of surround channels is
problematic for those without electronic delay in a DVD Audio or SACD
multi-channel situation, in that most listening rooms don't allow
anywhere near the same physical distance to surround channels as the
front.
Often, the surrounds are too close to our sitting position. While their output levels may be set
perfectly, the perceived result may be that the surround channels are
too prominent in producing what should be, for the most part, ambient,
environmental information. It's not necessarily a disaster, but not
optimal. In such situations, multi-speaker arrays, or dipolar surround
speakers, which inherently force a diffuse sound field, may prove handy,
in which case the arrivals and direction are so spread out anyway that
the image couldn't clump even if the recording engineer wanted it to.
Secondly, in two-channel, “purist” scenarios, there's no such thing as a
wide sweet spot that expands much at all off center for critical
listeners. If it doesn't sound much different to you when you move your
head a foot or two to the left or right from center, you either aren't
sensitive to or paying attention to directional information for one
reason or another, or your system/room combination may be mucking it up
so badly that worse isn't really much worse in a relative context. If
the “image stayed solidly in the center even when I moved off-center”
in a two-channel listening context, there was never a solid, genuine
image to begin with.
Third, assuming you achieve time-alignment, while the use of a center
channel may stabilize the front sonic soundstage, allowing a
presentation more forgiving of listeners off center, there is only one
ideal location in a multi-channel playback scenario, as there can be
only one location, by simple geometry, which is equidistant (or
equivalent factoring in DSP correction) from the left, center, and right
channels. In two-channel playback, you've got whatever space you can fit
equidistant from the left and right speakers that still satisfies the
40-60 degree spread that two-channel reproduction requires. In either
scenario, if you had it right to begin with, moving away from that small
listening area collapses the spread towards the closest loudspeaker, and
flattens the perception of any depth captured or generated in the
recording.
Fourth, simple balance controls don't work as anything other than a poor
band-aid as a last resort. If you're closer to one speaker than another,
the balance control simply compensates for amplitude, not time
differences, and may very well lead to over compensating amplitude in an
attempt to make up for time differences, messing up your listening
experience twice. If you've
got DSP available for a simple linear delay, use it. If not, sit in the
right spot.
Fifth, while “Pan Pots” are alright in the context of most studio
recordings which try to capture sounds in a single channel with as close
to no acoustical environment as possible (a sound-proof booth) and then
mix the sounds with pans amidst a variety of processing, and then filled
out with artificial reverberation (usually) to compensate for the lack
of ambient information, anybody who's compared even the best, very well done studio recording to a
decent recording of a performance captured real-time, without mixing,
through a handful of microphones, can tell you that there's no
comparison in terms of imitating reality. Hey, I like to listen to a lot
of recordings manufactured exclusively in a recording studio too, even to
instruments that have no acoustic event, modified electronically to
sound hopefully “clean, tight” and however else the recording engineer
can creatively enhance the work for my pleasure, but such things aren't
real, and interestingly enough, the more accurate the monitoring system,
the easier it is to hear the recording itself, in addition to the
musical event.
Studio mixes can still sound really good, but making them sound realistic
is actually much more difficult because the techniques habitually reject real
ambient information for the sake of providing a controlled environment
for the mix, where ambient information can be added by machine. A good
idea? Considering the levels of compression, and plain poor taste in
mixing that abounds with the majority of popular recordings, it's not at
the top of my recording engineer complaint list. Outlawing the use of the
NS-10 near-field monitor (and replacing it with the much better NS-1000) for anything beyond a reference for how a mix sounds on a
lousy speaker would be a good start. Imagine how much better recordings
would be if all the engineers could hear what they're making.
Conclusions
Bottom line: Be careful with using the time delay features for the
various channels on your
receiver, and with where you plop down your speakers.
Just thought I'd mention it.
- Colin Miller -