Introduction
The following Article concerns itself with a piece of software called an Equalizer. Removing certain frequencies with this EQ enables the ATRAC codec necessary for any MiniDisc recorder to encode any piece of music in a superior way. Substantial proof will be offered by means of measurements as well as musical examples.
A warning: if you are lazy or if PC-based audio is a complete mystery to you this article isn´t for you. You also won´t be happy if you think that an equalizer is one of the devil's minions (an opinion clearly not based upon facts). Furthermore, the significant sound improvements described here require you to spend some time with your music, it means work. If you can´t afford the necessary time, you probably aren´t interested in good audio quality coming from MiniDisc at all. Or are you? ;)
Basics of lossy encoding
Lossy (or perceptual) codecs were developed for only one thing: making audio smaller without you, dear constant reader, noticing it. Using a datarate of, say,
128 kBit/s you´ll receive a file the size of only
5 megabytes (before:
50 megabytes). Every lossy codec available on the market performs this 'shrinking', it doesn´t matter if it´s
ATRAC,
MP3,
AAC,
OGG or
WMA. How do these codecs achieve this? They
'erase' or
'alter' parts of any musical material, getting rid off things our ear cannot perceive anyway. Erasing certain parts of any object reduces its size which is most convenient for portable players not having
terabytes of available space; they can store more music that way. In the article that follows I will compare ATRAC (necessary for MiniDisc) to MP3. Both codecs are equally old (20 years), yet MP3 is still in development. ATRAC however was declared dead in 2004 with the introduction of
Hi-MD, since then it hasn´t been improved. ATRAC is effectively one of the worst codecs around, it doesn´t help that its datarate is comparably high (
292 kBit/s). It could have been more effective if Sony would have decided to fuck their desire for power constraints (a superior ATRAC IC would have drained battery power faster). Well, we have to work with what we´ve got and how we are able to improve it. Anyway, the following chapters will show you the most important basic instrument any perceptual (lossy) codec employs to fool your ear... or more precisely, your brain (
Fig. I):
|
Fig. I: Equal-loudness contour, logarithmic scale (copyright: Wikipedia) |
On the left of
Fig. I you can see the
Sound Pressure Level, measured in
dB. The line below presents several frequency points of the entire frequency range our ear can hear (
20 Hz to
20,000 Hz), measured in
Hz. Now take the highest frequency at
16,000 Hz; it would need a level of roughly
25 dB to be perceived as loud as the
4,000 Hz tone. The key thing you have to understand is the difference between
actual levels and
perceived levels. Our ear itself listens with almost perfect precision, but our brain decides what part of the received audio material is kept. Our brain has been 'tuned' over the course of several millenia to recognize the human voice with immaculate precision, the area from
2,000 to
4,000 Hz is the most important frequency band with the highest sensitivity. So listening sensitivity of the ear/brain combination sucks at low and - especially - at high frequencies. Those frequencies weren´t necessary for survival, so they have been
'dropped' as a result. Which means that we have to make signals at those areas much louder. But it also means that we´re unable to hear grave errors at low and high frequencies - and that´s where perceptual (lossy) codecs come in:
|
Fig. II: Multitone signal, -6 dB, .WAV, 24/44.1 (linear scale, Hanning) |
Above (
Fig. II) you can see a multitone signal (
.WAV) in
24 bit and
44.1 kHz which consists of more than 100 seperate sines. Frequency response ranges from
0 to
22.050 Hz. There aren´t any errors to witness (the thickened bases of the several sines are caused by imperfections of the Hanning windows function used to display the results). Compare that to the same signal, encoded with MP3:
|
Fig. III: Multitone signal, -6 dB, MP3 (Lame, 256 kBit/s), 24/44.1 (linear scale, Hanning) |
It´s evident that something has been added by the MP3 compression (
Fig. III). This something is simple quantization noise, inherent to every digital system. In this case it´s at roughly
-90 dB (original .WAV file:
-144 dB). This amount of noise also reveals how lossy codecs work. It is always said (even I did so above) that they
'remove' or
'erase' parts of the music with surgical precision. Well, it isn´t true, they don´t really
'remove' anything (ignore the cut-off at
19 kHz for a sec'). What any lossy codec does instead is decreasing bit-depth dynamically, taking into account complexity, gain and composition of the signal. Example: a clarinet, formerly having a resolution of
24 bits now has a resolution of perhaps only
6 bits. The remaining
18 bits resolution are non-existent anymore which -
BINGO - creates quantization noise. Effectively, the quality of any lossy compression is partly determined by how well it is able to
hide this noise. MP3 is very good at this, a wonder given the fact that this codec is over 20 years old. You can also see that the quantization noise floor is shaped according to the Equal-Loudness contour (
Fig. I). It is fairly low at
4,000 Hz (I failed to point this out with my red line) and rises slightly towards higher frequencies.
Disadvantages of ATRAC compared to MP3
|
Fig. IV: Multitone signal, -6 dB, ATRAC 4.0, 24/44.1 (linear scale, Hanning) |
ATRAC (in this case
4.0;
DSP Type-R doesn´t differ) performs much worse. Quantization noise is stronger (
70 dB, Fig. IV, MP3:
-90 dB)...
WideBitStream my ass! ATRAC has a datarate of
292 kBit/s to its disposal, the MP3 example has even less (
256 kBit/s), yet it yields superior performance. But it should be clear by now that ATRAC also follows the Equal-Loudness contour (
Fig. I), just like every other lossy codec. However, ATRAC & MP3 do a bit more: they completely remove high frequencies. MP3 did it in my example at
19 kHz, ATRAC at
17.5 kHz. They get rid of those frequencies because
A) we don´t hear them well and
B) because their encoding hurts the rest of the frequency spectrum. How can the presence of high frequencies distort frequencies below them? Because both codecs use a
constant bit-rate, they cannot adapt this bit-rate (and therefore possible quality) to changing complexity patterns. MP3 is of course able to encode with
variable bit-rate (VBR). But I used constant bit-rate only (CBR) to create fair comparison conditions. ATRAC is forced to encode everything with
292 kBit/s, no matter what. Should a certain signal require a higher bit-rate, pity, ATRAC simply can´t do it and has to take away more information at other areas, informations it might want to keep instead.
Advantages of ATRAC compared to MP3
|
Fig. V: 1 kHz sine, MP3 (Lame, 256 kBit/s), 24/44.1 (logarithmic scale, Hanning) |
|
Fig. VI: 1 kHz sine, ATRAC 4.0, 24/44.1 (logarithmic scale, Hanning) |
The situation reverses with less complex signals.
Fig. V and
Fig. VI exhibit that ATRAC performs superior to MP3 with simple signals. Quantization noise is more evenly distributed, resolution of 20 bits is retained. Which means: the less complex a signal is, the better can it be encoded by a lossy codec; apparently, this is even more valid for ATRAC. But it has another advantage: its time/frequency resolution, expressed in block length. Perceptual codecs need to decide between
long mode and
short mode. The latter enables any lossy codec to encode transients (very short & loud signals, for example the attack of a piano or handclapping). For short mode, MP3 uses a window size of
192 samples (or
4.3 ms). Now if a signal is shorter than that it is ignored, it ceases to exist because
it's effectively 'invisible'. In comparison ATRAC has not one but two short modes. For lower frequencies it´s
130 samples (
2.9 ms), for higher frequencies
65 samples (or
1.45 ms). In general, ATRAC is better suited to encode very loud and short transients, helping dynamics and precision.
Sonic differences between ATRAC & MP3
Ask
'normal' people (not audiophiles) how they perceive the sound of MP3 and you´ll most likely receive the answer that it sounds slightly warmer to them, that is, if they can hear any difference at all. Yet your basic audiophile will call its sound
'cold' and
'digital' when in reality nothing could be further from the truth (their reasoning: it
must sound that way because it uses lossy compression). Listen, all you audiophiles out there: MP3 encoded music sounds a tiny bit warmer compared to the original. The reason is
not that it removes any frequency content above
16 kHz (also stated by audiophiles), the one and only
true reason is that it fails at encoding short transients responsible for dynamics, attack and precision. When ATRAC still was used regularly it too was described as sounding
'cold' (for example in German
STEREOPLAY magazine 15 years ago). Again, this isn´t true. At least not for
ATRAC 4.0 and
ATRAC 4.5. Both sound significantly more pleasant and warmer than the original (
ATRAC DSP Type-R changed the situation somewhat). Responsible for this mellow signature isn´t a too short window size. These shortcomings are caused by the ATRAC codec attempting to encode signals up to
20 kHz. All those years ago, magazines and audiophiles alike (the german
STEREO magazine paramount among them) constantly pressed Sony to improve rendering of high frequencies. They believed that if you could retain frequencies from
16 to
22.05 kHz it would yield true audiophile sound. Bullshit! ATRAC would have profited extremely if they wouldn´t have listened to audiophiles, I will show you how.
Tweaking ATRAC
|
Fig. VII: ATRAC standard encoding (24/44.1, linear scale) |
By trying to encode signals up to 22,050 Hz (see
Fig. VII) ATRAC is losing precious available bits better reserved for lower, more readily audible frequency bands. This creates an overall pleasant sound signature by producing soft compression artifacts. Of course, this sound is far away from the truth. The final
critical band ATRAC encodes is the frequency area from
15,500 to
22,050 Hz, if it wouldn´t be present anymore, ATRAC would have more bits to spare for lower frequency bands
(0 to
15,500 Hz). Remember: the less complex the music is (a.k.a. less frequency bands), the better will it be encoded by ATRAC.
For that reason we will erase frequencies beyond 15,500 Hz with an equalizer so that ATRAC 4.0 or higher doesn´t need to concern itself with these frequencies anymore! Will this sound muffled? No, it won´t since ATRAC encodes transient responses with utter precision (compared to MP3 which always sounds slightly blanketed depending on the material). Getting rid of frequencies beyond
15,500 Hz improves quantization noise as a result:
|
Fig. VIII: Multitone signal, -6 dB, ATRAC 4.0, STANDARD ENCODING, 24/44.1 (linear scale, Hanning) |
|
Fig. IX: Multitone signal, -6 dB, ATRAC 4.0, 15.5 kHz CUTOFF, 24/44.1 (linear scale, Hanning) |
Look at Fig. IX and compare it to Fig. VIII by clicking on one of them with the left mouse button and scrolling through both of them. Quantization noise floor has been lowered by roughly 5 dB - and only because frequencies beyond 15,500 Hz have been removed. Stunning result, isn´t it?
|
Fig. X: RMAA frequency response, ATRAC 4.0 STANDARD ENCODING, four passes |
|
Fig. XI: RMAA frequency response, ATRAC 4.0, 15.5 kHz CUTOFF, four passes |
Even RMAA recognizes the effect. Fig. X & XI depict that the hole around 4,000 Hz, typical for any ATRAC version, has almost disappeared along with the odd response at subsonic frequencies (20 Hz). Increasing levels from 10,000 to 15,000 Hz on Fig. XI are caused by my equalizer setting (see below at 'Equalizing ATRAC (costly option)').
Equalizing ATRAC (free option)
I told you that this tweak is free, I therefore searched, found and measured a suitable equalizer. This was difficult, not many free equalizers around are able to process with high quality. I will however also tell you about costly alternatives, namely
SoundForge and
iZotope Ozone. SoundForge is the VST-host while Ozone is the equalizer I work with in that case. I will talk about them because they are yield slightly superior quality.
Never, I repeat, NEVER use built-in equalizers (SoundForge, WaveLab, Adobe Audition, foobar2000, Winamp). I´ve measured them and they create so many errors that it´s shocking. Anyway, to achieve the tweak without paying any money while still retaining high quality you´ll need these things:
- foobar2000 (get it here)
- a VST-wrapper (get it here)
- the equalizer EngineersFilter from RS-MET (get it here)
or
- Audacity (get it here)
- the equalizer EngineersFilter from RS-Met
|
Fig. XII: RS-MET EngineersFilter setting for ATRAC Cutoff |
Fig. XII reveals my configuration for the cutoff filter. The EngineersFilter offers several other filtering methods but I decided to keep it simple in order for less tech-savy people to use it as well. Regarding installation/setup of foobar2000, its VST-Wrapper and the EngineersFilter I cannot help you however, you need to figure that out for yourself, the same goes for Audacity. Other recommendations are: keep the signal at 32 bit floating-point, regardless if you´re working with foobar2000, SoundForge or any other digital audio editor. As you know, the MiniDisc is capable of working with high resolution material so if you´re recording from a PC just keep it at that high resolution. If you don´t want to use a PC I´d recommend a CD-RW (which can be erased and rewritten). In that case, decrease bit-depth to 16 bit without using noise-shaped dither (the shaped and dithered quantization noise would otherwise confuse ATRAC again) and burn the results to CD-RW.
Equalizing ATRAC (costly option)
|
Fig. XIII: iZotope Ozone 4.0 settings for ATRAC cutoff |
|
Fig. XIV: iZotope Ozone 4.0 general setup (-> click 'Option') |
First of all you´ll need a digital audio editor like SoundForge, WaveLab, Adobe Audition or Audacity. With these you´ll be able to load iZotope Ozone (in my case, version 4.0) which you will configure to the specifications pictured in Figs. XIII & XIV. The 1.5 dB amplification of frequencies at 20,000 kHz is optional and used by me to fool my ear into not recognizing that certain frequencies are alltogether absent. Why would you even use the iZotope Ozone EQ? Because it´s in my experience the best equalizer on the market, it doesn´t create phase distortions nor other distortions or errors and generally performs perfectly. Have a look:
|
Fig. XV: iZotope Ozone, phase response |
|
Fig. XVI: EngineersFilter, phase response |
The phase response sadly is very underrepresented when it comes to sonic differences between DSPs or units playing audio material. In this case it´s evident that iZotope Ozone has superior phase performance (Fig. XV) compared to the EngineersFilter (Fig. XVI), yet it is debatable if this is audible at all. Let´s be fair: the EngineersFilter EQ performs admirably compared to all the other free EQs I´ve tested. Impulses play a role too:
|
Fig. XVII: iZotope Ozone, impulse response |
|
Fig. XVIII: EngineersFilter, impulse response |
Fig. XVII depicts a perfectly symmetrical impulse response for iZotope Ozone. A high steepness of the cutoff filter produces better frequency resolution at the expense of perfect impulse response. The same is true for the EngineersFilter, although here the impulse response (
Fig. XVIII) is modeled after the first CD players with analogue anti-aliasing filtering. In the end you have to decide, I´ve written it
years ago that the effects of impulses are overrated. BTW, the settings I´ve described yield the following measurable results:
|
Fig. XIX: frequency response, ATRAC 4.0, 15.5 kHz cutoff (logarithmic scale) |
|
Fig. XX: frequency response detail, ATRAC 4.0, 15.5 kHz cutoff |
As you can see on both examples above which were created by RMAA I´ve achieved the desired effect - without frequency deviations created by crappy equalizers and, almost (for the EngineersFilter) without phase distortions. The graph depicting the zoomed-in frequency response (
Fig. XX) reveals a not too steep cutoff, yet it´s precise enough to get rid of frequencies beyond
15.5 kHz. The result of my procedure is evidenced by
Fig. XXI: the picture shows a spectogram derived from an ATRAC encoded/decoded recording (compare to
Fig. VII). The precise
15.5 kHz cutoff is clearly visible.
|
Fig. XXI: ATRAC encoding with 15.5 kHz cutoff (24/44.1, linear scale) |
The sound
When I first heard the results I couldn´t believe my ears, the sound had improved by such a margin that I was wondering how I had been able to listen to it before. Precision, attack, stability and holographic impression of the stage were sounding so much better now... But listen for yourself. The following files were recorded digitally with the
Sony MZ-R 55 featuring
A) the
standard full-frequency and
B) the
15.5 kHz cutoff discovered by me. In both cases, the original files were at
24/44.1. After recording I played them back using the
Kenwood DM-5090 (also digitally) and recorded its output with the optical input of my
Creative Labs Soundblaster X-Fi HD USB. After that I merged three 30-seconds examples and uploaded them to
soundcloud.
INSTEAD OF LISTENING TO THEM ONLINE, DOWNLOAD THEM! Reason: both are ATRAC-encoded/decoded PCM-files, encoded again with MP3 by soundcloud (at
128 kBit/s). Should you just press
'play' you´d only hear a transcoded file, revealing compression artifacts clouding possible differences. Downloading them however you´ll be able to listen to the pure, ATRAC-encoded/decoded, Kenwood-derived, digital files in pristine 24 bit quality without further influence from soundcloud. You would even be able, should you desire, to perform a
DBT listening test; these two examples were edited with sample precision.
Three ATRAC 4.0 encoded samples, standard encoding
Three ATRAC 4.0 encoded samples, encoded using my 15.5 kHz cutoff filter
Epilogue
And? What do you say? I feel that the result speaks for itself. I admit that this tweak requires some effort but I think that it´s worth it. I now can use ATRAC 4.0 again! Oh yes, I almost forgot... why didn´t I use a more recent ATRAC version? While the effects will be superior using ATRAC 4.5 or higher, the ATRAC 4.0 equipped recorders I own (MZ-R 30, MZ-R 50, MZ-R 35, MZ-R 37, MZ-R 55) have high quality drives, producing MiniDiscs running without flaw on any other MD recorder / player. Later units (MZ-R 900, MZ-R 909) featuring superior ATRAC ICs fail to do this. With the exception of the Sony MZ-N 510 they all record with unreliable results. I also admit that it isn´t very convenient to use MiniDisc these days. The reason to use them, for me at least, isn´t their sound. Every other lossy codec employed today around the world is superior. I´m sorry, but it´s a fact. Still, I love those little discs. The players/recorders are of high build quality, sound well enough (in some cases more than well) and you get the joy of bringing some amount of 'slowness' into your musical life by occupying yourself with media you can actually touch. Let´s face it, I´m an idiot. An idiot... just like people still listening to vinyl. Like them I believe in an ancient and deceased format. But have I mentionend yet, that it´s pure joy? Oh, I did? Never mind! Anyway, with my tweak you´re able to prolong the lifetime of MiniDisc before it´s completely replaced by superior codecs and playback devices. And while you´re at it, use it in combination with the FiiO E07K, it´ll sound even better this way. Use this chance well and enjoy the results!
Last update: 06.09.2013