There is more to Dolby Digital than just raw audio information. That
digital stream flowing into your decoder has other bits of information - called
"Metadata" - along for the ride. Metadata is information your decoder can use to
do certain things better, like downmixing the soundtrack to two channels or changing the dynamic
range. One piece of Metadata which has gotten a pretty bad rap is Dialogue
Normalization. In their 30 year history, Dolby has never been known to add something
that would deteriorate audio quality, so I started to investigate. Like so many things in
this world, Dialnorm, as well call it for short,
is somewhat misunderstood.
From the name, you might think Dialnorm affects the level of
dialogue with respect to the other channels or content. This is not at all true. As
we will learn, the balance of the mix (loudness) from channel to channel, sound element to
sound element, is entirely the result of the sound engineer's efforts, and Dialnorm does
not affect this relationship.
To begin, let's lay some ground rules. First and
foremost, metadata, like Dialnorm, is ancillary to the audio data. In other words, it's in
the bitstream, not in the sound. For any Metadata to have any effect at all, your
processor needs to use it. One example of Metadata is down-mix information. Any DVD
player can make a 2-channel Pro Logic mix from a 5.1 soundtrack, and they can do it on the
fly. Metadata for downmixing tells the player what the relative level of the
channels should be in the fabricated Pro Logic mix. Again, this is information
for your decoder's benefit, and the sound information itself is not affected by its
presence.
The example of downmix metadata is an option which is at the sound
engineers discretion to use. The only piece of metadata that is actually mandatory for
consumer delivery is Dialnorm.
Lets think of a soundtrack as a vertical bar. The bottom
represents the quietest sound, and the top represents the loudest. The difference between
the two - how tall our bar is - is called the "dynamic range" and
wide that range is depends
on the sound format and/or media. Conventional cassette tape can have a dynamic range of 60 dB, CD
audio 80 dB, and Dolby Digital 105 dB. That is quite a difference between soft and loud.
Sound engineers can be very creative with this sort of latitude, giving you whisper quiet
ambiance, intelligible dialogue, involving music, and then eye opening effects like
explosions or the last bang of an orchestral movement (Fig.1). And, with wide dynamic
range, this is all possible without you touching your volume control.
While the above example works great for cinema sound, not all
sources make such responsible use of dynamic range. For years, we have all been witness to
the classic abuse of dynamic range: The television commercial (notwithstanding the
fact that, without the commercials, we would not have any TV programs). A typical
commercial makes all of its sounds as loud as possible to get your attention.
The worst instances prompt us to reach for the remote and turn the volume down until the
show is back on. The Advanced Television Systems Committee (or ATSC) selected Dolby
Digital for all digital TV transmissions, so while it can be annoying now, the problem
could get worse. Consider the following: Youre watching a movie at a level
you are comfortable with. You can understand people talking, and the gunfights are nice
and loud (isn't it great to be male?) The commercial interrupts, and a guy trying to sell
you a car is as loud as the gun fight! (Fig. 2).
Dialnorm can help! It
lets our Dolby Digital decoders know how loud dialogue is for each show or program.
Dialnorm does this by expressing where on our dynamic range scale most of the talking
occurs. Its just a value and is not "in" the soundtrack
itself. What your consumer decoder has been instructed to do with this information
is adjust your volume for you such that all sound content at the Dialnorm level plays at a
consistent level through your system. Consider again our evening of watching a movie.
Weve set our volume control at the desired level for the show. When the commercial
comes on, the decoder sees that there is a different Dialnorm value for the commercial and
over-rides the volume so that the level of sounds at the Dialnorm values play out the
same. When the movie is back on, the decoder sees the change again and readjust the volume
to where it was (Fig. 3).
The graph shows a pretty dramatic scenario, but you
get the point. If implemented right, your days of riding the volume during
commercials could be over. Dialnorm permits your decoder to make volume adjustments
for you when the program changes. One very important note: You will never find
it changing in the middle of a program so there is no need to worry about the artist's
vision for a soundtrack being compromised.
But why is dialogue the reference from program to
program? Of all the sounds you are likely to hear through your home theater system,
dialogue is the one you have the most experience in hearing. Gun shots, car crashes,
ray-guns, and even live music, though exciting, are not near as familiar to us as a simple
conversation. Dialogue is something we hear everyday and we know what it should
sound like and how loud it should be. Its no wonder then that the level of
dialogue is most important. When you adjust your volume, most often you are
subconsciously setting it so that dialogue sounds natural and intelligible. Once
you've struck that balance, Dialnorm keeps dialogue at a constant level, program to
program.
Dispelling the Myths
Before we address some of the concerns people have, lets
look at a few technical detials. When we speak of how loud sounds are in a Dolby
Digital soundtrack, we express the loudest level as "0 dB" and the quietest as
"-105 dB". The Dialnorm value expresses the level of dialogue as how much
lower it is then the peak (0 dB). So a value of "-31" indicates a point 31
dB below the peak and, incidentally, is the value at which no volume adjustment is
performed by your consumer decoder. A Dialnorm value of -27 would indicate to your
decoder that the dialogue is at a point 27dB below the peak, or 4dB higher than a program
with a Dialnorm value of -31. Your decoder would then turn things down by 4
dB. A Dialnorm value of -25 would call for a 6 dB reduction and so on. The -27 setting "fits" movie soundtracks perfectly in that it
yields a very natural level for talking and is likely the most common for movies.
For decades this has been the standard level for dialogue in motion picture soundtracks.
Myth #1: Dialnorm reduces dynamic range.
The main criticism that you are likely to encounter
is that Dialnorm affects dynamic range, specifically that it reduces it. Indeed, from Fig.
3, it looks like weve chopped off the bottom third of the commercials
soundtrack. In truth we havent: Weve only turned the volume down.
Lowering the volume knob on your system would have the exact same effect. If you really want that car commercial to blast you in the face, turn up
your volume (though I doubt you'd want to), and you will still have all 105 dB dynamic
range. Consider again that most films will use the same -27 setting, and that other
values are appropriate for other material. In other words, if you've set your
playback level to one that you are comfortable watching a movie at, Dialnorm is going to
maintain that comfort level for you as the program changes, but there is nothing stopping
you from further adjusting your volume one way or the other. "Controlled" values
of Dialnorm may someday be imposed in such areas as broadcast TV (Dialnorm was a
major point of attraction when Dolby Digital was chosen as the audio format for HDTV), but
you always have the final say with your volume knob.
Myth #2: Dialnorm reduces everything by 4 dB, altering reference level playback of a movie.
A common criticism is that Dialogue
Normalization "normally" reduces the level of the soundtrack by about 4
dB. Reduces it as compared to what? You have to compare it to something else first,
and then the question becomes: is the Dolby Digital soundtrack 4 dB too low, or is the
other material 4 dB too high? Follow me on this one.
A lot of home theater
enthusiasts are concerned with what is called "reference level playback".
In a nutshell, you use test-tones (as may be found on such DVDs as AVIA) to
set the volume to the same standard levels used in cinemas. The reason
to do this is to hear the soundtrack at the level the movie makers intended. A
concern naturally arises that if volume is being altered by Dialnorm, the sound engineer's
vision is compromised. Reference level playback is in practice very very loud in the
relatively small acoustic spaces of home,
and we must caution you against it at this point. Not only do most find it
uncomfortably loud, but as we noted in our article explaining the LFE channel, it can
quickly bring a subwoofer to its knees. But for the record, let's press on.
The default power-on setting
for Dialnorm on Dolby's professional AC-3 encoder, the DP569, is -27 because as we noted,
that value is a perfect fit for movie soundtracks. True, this value calls for your
decoder to attenuate its output by 4 dB. Fact is, the two most common reference
DVDs, Video Essentials and AVIA, were encoded with the same -27 Dialnorm value, so their test noises are also being
attenuated by 4 dB, making them a perfect reference for Dolby Digital movies. If you've set-up a system with either of these tools, then any
movie you play will not be "reduced" by 4 dB as compared to
the reference.
DTS soundtracks, unlike Dolby
Digital, are not attenuated by 4 dB by your decoder. This means that if you've set up
your system using AVIA or Video Essentials, the DTS soundtrack is actually going to play
4 dB too high. Yes, that's right. You read it right: On a system
calibrated for reference level playback with Video Essentials or AVIA, DTS soundtracks
play 4 dB too loud. Conversely (and to be fair), if you set up a system using DTS test
noise, the Dolby Digital soundtrack will be 4 dB too low. Yet what is important here,
and what I really want you to take away from this, is that regardless of what actual
level you watch a movie at, relative to one another, there exists this 4 dB difference
between DTS and Dolby Digital movie soundtracks played over consumer equipment. If
at any time you are comparing soundtracks, you must turn your volume down when listening to
the DTS track and/or raise it when listening to the Dolby Digital track (as the case may
be) in order to hear the same level from both.
We should note that most THX-certified receivers and processors address this
by attenuating DTS material by 4dB after the decode stage, effectively putting
everything on level ground.
Myth #3: Dialnorm
adversely affects S/N (signal-to-noise) ratio.
Another concern that comes up is the
notion that if the volume is being adjusted by the decoder (for any reason) in the digital
domain, there is a reduction in quality (S/N ratio) from bit-width reduction.
Audiophiles should actually appreciate that, when performed in the digital domain, the
Dialnorm adjustment is extremely accurate. Dolby Digital is capable of 24 bit
resolution. Thus, a volume reduction of 4 dB would mean a bit reduction of less than
1 bit. IF (big if) the D/A converters were silent to -144 dB, you might be able to
measure this. In the real world where the D/A's performance is less than that, these
sort of level changes at the decoder stage will not have an effect on S/N ratio for a
given volume level. The same holds true for dynamic range.
Other considerations
The vast majority of DVD movie releases use
the -27 Dialnorm value, but we cannot categorically say that other values are
'wrong'. There have reposts of DVDs, mostly music titles, which were encoded in
error with oddball Dialnorm values. Dolby continues to watch for these mistakes and
alert the studios, but ultimately, it's not Dolby's fault or that of their system.
Some other functions of Dialnorm:
One has to consider that not all playback systems have state of the
art audio hardware (portable DVD players comes to mind). By using Dialnorm to bring
down the decoder's output a bit, they create "virtual" digital headroom and
safeguard against digital clipping in consumer digital electronics.
The Dialnorm value is also used as the reference for Dynamic Range
Control.*
For many purists, 'dynamic range control' is a dirty little phrase.
Let's talk about it for just a moment. Most, if not all Dolby Digital
decoders have the option of reducing dynamic range. Reducing dynamic range in simple terms
means raising the level of quiet sounds and lowering the level of loud ones such that
there is less of a delta. The classic example is watching a film while others are
trying to sleep. If you just turn the volume down so that explosions won't bother the
sleepers, it will likely be too low for you to hear the dialogue. By invoking dynamic
range control, you will hear all of the soundtrack but not disturb others with loud
peaks. Quiet playback is not the only use for DRC. When you are watching a movie on an
airplane, you are bombarded with the noise of the engines. A wide dynamic range would
leave quiet sounds to be drowned out by the engines. By compressing dynamic range you
bring the soundtrack together to a point where it can be raised above the engine noise but
not ruin your hearing (Fig. 4). Of course, the airline food might ruin your stomach, but
that is not our domain.
Dialnorm's role here is that its value represents the
"center" of compression. That is, sounds under it are raised, sounds above it
are lowered, but sounds at its level are unchanged. An associated value that
goes along with Dialnorm, and is used at this point, is the Dynamic Range Control Preset. Although Dialnorm is the center of the action where sound levels are not
adjusted, a DRC preset is selected that tells the decoder how wide this
"null-zone" is and can range from 5 to 20 dB (Fig. 5).
Again, its no coincidence that dialogue is the center of DRC,
its level going unaffected. Next time you are watching something which is mostly
talking, try turning on Dynamic Range Control. If the soundtrack was assembled
properly, you should not hear any change in the level of the dialogue. Now
watch something which has talking AND loud explosions. When you engage Dynamic Range
Control, the dialogue stays the same, but the explosion is not near as loud. This is
why its important for the Dialnorm value to be set not arbitrarily, but exactly where most
of the talking occurs in the dynamic range scale.
Here's a wild, radical thought (dangerous I know). Consider
again "reference" level playback that we talked about earlier. It is the
volume at which each channel would play the loudest sound in a soundtrack at 105 dB.
That is a very loud level (but, it keeps you from thinking about the sticky floors
in the theater). It's time to put the macho image aside and admit we don't watch an
entire movie at that level at home. I don't anyway. But when we play a
soundtrack at anything less than reference, look at what happens (Fig. 6). The
quietest sounds in the passage drop below the threshold of hearing. If you set your
volume too low, you might miss some important stuff. If you invoke a mild dynamic
range control, from a certain point of view, you would be better off because you would
once again be hearing all the soundtrack.
In conclusion, the purpose of Dialnorm is to empower your
decoder to keep the volume of dialogue content consistent from program to program.
It does not limit the volume you listen at or impose a certain volume level,
nor does it rob you of dynamic range. With it and other Metadata that we will talk
about in future articles, Dolby Digital decoders can take one soundtrack and play it back
with equal aplomb be it over headphones, the 3" speaker of a TV, or a full blown home
theater sound system.
Cheers and happy listening gang!
- Brian Florian -
(Last Updated - 8/2001)
I would like to thank Roger Dressler & Mike Babbitt of
Dolby Labs for the generous contribution of his knowledge for this article.
Notes:
* Dynamic range control is not
to be confused with the "Compressor" or "Compression" used in the
music recording industry. This practice cuts down the transients of a string
pluck, for example, to make the instrument sound "fat and funky".
Mixlev, another piece of AC-3 metadata, is not
directly related to Dialnorm, despite some similarities.