Wednesday, September 04, 2013

Spectacular sounding MiniDisc tweaking - FOR FREE!


Introduction

The following Article concerns itself with a piece of software called an Equalizer. Removing certain frequencies with this EQ enables the ATRAC codec necessary for any MiniDisc recorder to encode any piece of music in a superior way. Substantial proof will be offered by means of measurements as well as musical examples.
A warning: if you are lazy or if PC-based audio is a complete mystery to you this article isn´t for you. You also won´t be happy if you think that an equalizer is one of the devil's minions (an opinion clearly not based upon facts). Furthermore, the significant sound improvements described here require you to spend some time with your music, it means work. If you can´t afford the necessary time, you probably aren´t interested in good audio quality coming from MiniDisc at all. Or are you? ;)

Basics of lossy encoding

Lossy (or perceptual) codecs were developed for only one thing: making audio smaller without you, dear constant reader, noticing it. Using a datarate of, say, 128 kBit/s you´ll receive a file the size of only 5 megabytes (before: 50 megabytes). Every lossy codec available on the market performs this 'shrinking', it doesn´t matter if it´s ATRACMP3AACOGG or WMA. How do these codecs achieve this? They 'erase' or 'alter' parts of any musical material, getting rid off things our ear cannot perceive anyway. Erasing certain parts of any object reduces its size which is most convenient for portable players not having terabytes of available space; they can store more music that way. In the article that follows I will compare ATRAC (necessary for MiniDisc) to MP3. Both codecs are equally old (20 years), yet MP3 is still in development. ATRAC however was declared dead in 2004 with the introduction of Hi-MD, since then it hasn´t been improved. ATRAC is effectively one of the worst codecs around, it doesn´t help that its datarate is comparably high (292 kBit/s). It could have been more effective if Sony would have decided to fuck their desire for power constraints (a superior ATRAC IC would have drained battery power faster). Well, we have to work with what we´ve got and how we are able to improve it. Anyway, the following chapters will show you the most important basic instrument any perceptual (lossy) codec employs to fool your ear... or more precisely, your brain (Fig. I):

Fig. I: Equal-loudness contour, logarithmic scale (copyright: Wikipedia)
On the left of Fig. I you can see the Sound Pressure Level, measured in dB. The line below presents several frequency points of the entire frequency range our ear can hear (20 Hz to 20,000 Hz), measured in Hz. Now take the highest frequency at 16,000 Hz; it would need a level of roughly 25 dB to be perceived as loud as the 4,000 Hz tone. The key thing you have to understand is the difference between actual levels and perceived levels. Our ear itself listens with almost perfect precision, but our brain decides what part of the received audio material is kept. Our brain has been 'tuned' over the course of several millenia to recognize the human voice with immaculate precision, the area from 2,000 to 4,000 Hz is the most important frequency band  with the highest sensitivity. So listening sensitivity of the ear/brain combination sucks at low and - especially - at high frequencies. Those frequencies weren´t necessary for survival, so they have been 'dropped' as a result. Which means that we have to make signals at those areas much louder. But it also means that we´re unable to hear grave errors at low and high frequencies - and that´s where perceptual (lossy) codecs come in:

Fig. II: Multitone signal, -6 dB, .WAV, 24/44.1 (linear scale, Hanning)
Above (Fig. II) you can see a multitone signal (.WAV) in 24 bit and 44.1 kHz which consists of more than 100 seperate sines. Frequency response ranges from 0 to 22.050 Hz. There aren´t any errors to witness (the thickened bases of the several sines are caused by imperfections of the Hanning windows function used to display the results). Compare that to the same signal, encoded with MP3:

Fig. III: Multitone signal, -6 dB, MP3 (Lame, 256 kBit/s), 24/44.1 (linear scale, Hanning)
It´s evident that something has been added by the MP3 compression (Fig. III). This something is simple quantization noise, inherent to every digital system. In this case it´s at roughly -90 dB (original .WAV file: -144 dB). This amount of noise also reveals how lossy codecs work. It is always said (even I did so above) that they 'remove' or 'erase' parts of the music with surgical precision. Well, it isn´t true, they don´t really 'remove' anything (ignore the cut-off at 19 kHz for a sec'). What any lossy codec does instead is decreasing bit-depth dynamically, taking into account complexity, gain and composition of the signal. Example: a clarinet, formerly having a resolution of 24 bits now has a resolution of perhaps only 6 bits. The remaining 18 bits resolution are non-existent anymore which - BINGO - creates quantization noise. Effectively, the quality of any lossy compression is partly determined by how well it is able to hide this noise. MP3 is very good at this, a wonder given the fact that this codec is over 20 years old. You can also see that the quantization noise floor is shaped according to the Equal-Loudness contour (Fig. I). It is fairly low at 4,000 Hz (I failed to point this out with my red line) and rises slightly towards higher frequencies.

Disadvantages of ATRAC compared to MP3

Fig. IV: Multitone signal, -6 dB, ATRAC 4.0, 24/44.1 (linear scale, Hanning)
ATRAC (in this case 4.0; DSP Type-R doesn´t differ) performs much worse. Quantization noise is stronger (70 dB, Fig. IV, MP3: -90 dB)... WideBitStream my ass! ATRAC has a datarate of 292 kBit/s to its disposal, the MP3 example has even less (256 kBit/s), yet it yields superior performance. But it should be clear by now that ATRAC also follows the Equal-Loudness contour (Fig. I), just like every other lossy codec. However, ATRAC & MP3 do a bit more: they completely remove high frequencies. MP3 did it in my example at 19 kHz, ATRAC at 17.5 kHz. They get rid of those frequencies because A) we don´t hear them well and B) because their encoding hurts the rest of the frequency spectrum. How can the presence of high frequencies distort frequencies below them? Because both codecs use a constant bit-rate, they cannot adapt this bit-rate (and therefore possible quality) to changing complexity patterns. MP3 is of course able to encode with variable bit-rate (VBR). But I used constant bit-rate only (CBR) to create fair comparison conditions. ATRAC is forced to encode everything with 292 kBit/s, no matter what. Should a certain signal require a higher bit-rate, pity, ATRAC simply can´t do it and has to take away more information at other areas, informations it might want to keep instead.

Advantages of ATRAC compared to MP3

Fig. V: 1 kHz sine, MP3 (Lame, 256 kBit/s), 24/44.1 (logarithmic scale, Hanning)
Fig. VI: 1 kHz sine, ATRAC 4.0, 24/44.1 (logarithmic scale, Hanning)
The situation reverses with less complex signals. Fig. V and Fig. VI exhibit that ATRAC performs superior to MP3 with simple signals. Quantization noise is more evenly distributed, resolution of 20 bits is retained. Which means: the less complex a signal is, the better can it be encoded by a lossy codec; apparently, this is even more valid for ATRAC. But it has another advantage: its time/frequency resolution, expressed in block length. Perceptual codecs need to decide between long mode and short mode. The latter enables any lossy codec to encode transients (very short & loud signals, for example the attack of a piano or handclapping). For short mode, MP3 uses a window size of 192 samples (or 4.3 ms). Now if a signal is shorter than that it is ignored, it ceases to exist because it's effectively 'invisible'. In comparison ATRAC has not one but two short modes. For lower frequencies it´s 130 samples (2.9 ms), for higher frequencies 65 samples (or 1.45 ms). In general, ATRAC is better suited to encode very loud and short transients, helping dynamics and precision.

Sonic differences between ATRAC & MP3

Ask 'normal' people (not audiophiles) how they perceive the sound of MP3 and you´ll most likely receive the answer that it sounds slightly warmer to them, that is, if they can hear any difference at all. Yet your basic audiophile will call its sound 'cold' and 'digital' when in reality nothing could be further from the truth (their reasoning: it must sound that way because it uses lossy compression). Listen, all you audiophiles out there: MP3 encoded music sounds a tiny bit warmer compared to the original. The reason is not that it removes any frequency content above 16 kHz (also stated by audiophiles), the one and only true reason is that it fails at encoding short transients responsible for dynamics, attack and precision. When ATRAC still was used regularly it too was described as sounding 'cold' (for example in German STEREOPLAY magazine 15 years ago). Again, this isn´t true. At least not for ATRAC 4.0 and ATRAC 4.5. Both sound significantly more pleasant and warmer than the original (ATRAC DSP Type-R changed the situation somewhat). Responsible for this mellow signature isn´t a too short window size. These shortcomings are caused by the ATRAC codec attempting to encode signals up to 20 kHz. All those years ago, magazines and audiophiles alike (the german STEREO magazine paramount among them) constantly pressed Sony to improve rendering of high frequencies. They believed that if you could retain frequencies from 16 to 22.05 kHz it would yield true audiophile sound. Bullshit! ATRAC would have profited extremely if they wouldn´t have listened to audiophiles, I will show you how.

Tweaking ATRAC

Fig. VII: ATRAC standard encoding (24/44.1, linear scale)
By trying to encode signals up to 22,050 Hz (see Fig. VII) ATRAC is losing precious available bits better reserved for lower, more readily audible frequency bands. This creates an overall pleasant sound signature by producing soft compression artifacts. Of course, this sound is far away from the truth. The final critical band ATRAC encodes is the frequency area from 15,500 to 22,050 Hz, if it wouldn´t be present anymore, ATRAC would have more bits to spare for lower frequency bands (0 to 15,500 Hz). Remember: the less complex the music is (a.k.a. less frequency bands), the better will it be encoded by ATRAC. For that reason we will erase frequencies beyond 15,500 Hz with an equalizer so that ATRAC 4.0 or higher doesn´t need to concern itself with these frequencies anymore! Will this sound muffled? No, it won´t since ATRAC encodes transient responses with utter precision (compared to MP3 which always sounds slightly blanketed depending on the material). Getting rid of frequencies beyond 15,500 Hz improves quantization noise as a result:

Fig. VIII: Multitone signal, -6 dB, ATRAC 4.0, STANDARD ENCODING, 24/44.1 (linear scale, Hanning)
Fig. IX: Multitone signal, -6 dB, ATRAC 4.0, 15.5 kHz CUTOFF, 24/44.1 (linear scale, Hanning)
Look at Fig. IX and compare it to Fig. VIII by clicking on one of them with the left mouse button and scrolling through both of them. Quantization noise floor has been lowered by roughly 5 dB - and only because frequencies beyond 15,500 Hz have been removed. Stunning result, isn´t it?

Fig. X: RMAA frequency response, ATRAC 4.0 STANDARD ENCODING, four passes
Fig. XI: RMAA frequency response, ATRAC 4.0, 15.5 kHz CUTOFF, four passes
Even RMAA recognizes the effect. Fig. X & XI depict that the hole around 4,000 Hz, typical for any ATRAC version, has almost disappeared along with the odd response at subsonic frequencies (20 Hz). Increasing levels from 10,000 to 15,000 Hz on Fig. XI are caused by my equalizer setting (see below at 'Equalizing ATRAC (costly option)').

Equalizing ATRAC (free option)

I told you that this tweak is free, I therefore searched, found and measured a suitable equalizer. This was difficult, not many free equalizers around are able to process with high quality. I will however also tell you about costly alternatives, namely SoundForge and iZotope Ozone. SoundForge is the VST-host while Ozone is the equalizer I work with in that case. I will talk about them because they are yield slightly superior quality. Never, I repeat, NEVER use built-in equalizers (SoundForge, WaveLab, Adobe Audition, foobar2000, Winamp). I´ve measured them and they create so many errors that it´s shocking. Anyway, to achieve the tweak without paying any money while still retaining high quality you´ll need these things:
- foobar2000 (get it here)
- a VST-wrapper (get it here)
- the equalizer EngineersFilter from RS-MET (get it here)
or
- Audacity (get it here)
- the equalizer EngineersFilter from RS-Met 

Fig. XII: RS-MET EngineersFilter setting for ATRAC Cutoff
Fig. XII reveals my configuration for the cutoff filter. The EngineersFilter offers several other filtering methods but I decided to keep it simple in order for less tech-savy people to use it as well. Regarding installation/setup of foobar2000, its VST-Wrapper and the EngineersFilter I cannot help you however, you need to figure that out for yourself, the same goes for Audacity. Other recommendations are: keep the signal at 32 bit floating-point, regardless if you´re working with foobar2000, SoundForge or any other digital audio editor. As you know, the MiniDisc is capable of working with high resolution material so if you´re recording from a PC just keep it at that high resolution. If you don´t want to use a PC I´d recommend a CD-RW (which can be erased and rewritten). In that case, decrease bit-depth to 16 bit without using noise-shaped dither (the shaped and dithered quantization noise would otherwise confuse ATRAC again) and burn the results to CD-RW.

Equalizing ATRAC (costly option)

Fig. XIII: iZotope Ozone 4.0 settings for ATRAC cutoff
Fig. XIV: iZotope Ozone 4.0 general setup (-> click 'Option')
First of all you´ll need a digital audio editor like SoundForge, WaveLabAdobe Audition or Audacity. With these you´ll be able to load iZotope Ozone (in my case, version 4.0) which you will configure to the specifications pictured in Figs. XIII & XIV. The 1.5 dB amplification of frequencies at 20,000 kHz is optional and used by me to fool my ear into not recognizing that certain frequencies are alltogether absent. Why would you even use the iZotope Ozone EQ? Because it´s in my experience the best equalizer on the market, it doesn´t create phase distortions nor other distortions or errors and generally performs perfectly. Have a look:

Fig. XV: iZotope Ozone, phase response
Fig. XVI: EngineersFilter, phase response
The phase response sadly is very underrepresented when it comes to sonic differences between DSPs or units playing audio material. In this case it´s evident that iZotope Ozone has superior phase performance (Fig. XV) compared to the EngineersFilter (Fig. XVI), yet it is debatable if this is audible at all. Let´s be fair: the EngineersFilter EQ performs admirably compared to all the other free EQs I´ve tested. Impulses play a role too:

Fig. XVII: iZotope Ozone, impulse response
Fig. XVIII: EngineersFilter, impulse response
Fig. XVII depicts a perfectly symmetrical impulse response for iZotope Ozone. A high steepness of the cutoff filter produces better frequency resolution at the expense of perfect impulse response. The same is true for the EngineersFilter, although here the impulse response (Fig. XVIII) is modeled after the first CD players with analogue anti-aliasing filtering. In the end you have to decide, I´ve written it years ago that the effects of impulses are overrated. BTW, the settings I´ve described yield the following measurable results:

Fig. XIX: frequency response, ATRAC 4.0, 15.5 kHz cutoff (logarithmic scale)
Fig. XX: frequency response detail, ATRAC 4.0, 15.5 kHz cutoff
As you can see on both examples above which were created by RMAA I´ve achieved the desired effect - without frequency deviations created by crappy equalizers and, almost (for the EngineersFilter) without phase distortions. The graph depicting the zoomed-in frequency response (Fig. XX) reveals a not too steep cutoff, yet it´s precise enough to get rid of frequencies beyond 15.5 kHz. The result of my procedure is evidenced by Fig. XXI: the picture shows a spectogram derived from an ATRAC encoded/decoded recording (compare to Fig. VII). The precise 15.5 kHz cutoff is clearly visible.

Fig. XXI: ATRAC encoding with 15.5 kHz cutoff (24/44.1, linear scale)

The sound

When I first heard the results I couldn´t believe my ears, the sound had improved by such a margin that I was wondering how I had been able to listen to it before. Precision, attack, stability and holographic impression of the stage were sounding so much better now... But listen for yourself. The following files were recorded digitally with the Sony MZ-R 55 featuring A) the standard full-frequency and B) the 15.5 kHz cutoff discovered by me. In both cases, the original files were at 24/44.1. After recording I played them back using the Kenwood DM-5090 (also digitally) and recorded its output with the optical input of my Creative Labs Soundblaster X-Fi HD USB. After that I merged three 30-seconds examples and uploaded them to soundcloud. INSTEAD OF LISTENING TO THEM ONLINE, DOWNLOAD THEM! Reason: both are ATRAC-encoded/decoded PCM-files, encoded again with MP3 by soundcloud (at 128 kBit/s). Should you just press 'play' you´d only hear a transcoded file, revealing compression artifacts clouding possible differences. Downloading them however you´ll be able to listen to the pure, ATRAC-encoded/decoded, Kenwood-derived, digital files in pristine 24 bit quality without further influence from soundcloud. You would even be able, should you desire, to perform a DBT listening test; these two examples were edited with sample precision.


Three ATRAC 4.0 encoded samples, standard encoding



Three ATRAC 4.0 encoded samples, encoded using my 15.5 kHz cutoff filter

Epilogue

And? What do you say? I feel that the result speaks for itself. I admit that this tweak requires some effort but I think that it´s worth it. I now can use ATRAC 4.0 again! Oh yes, I almost forgot... why didn´t I use a more recent ATRAC version? While the effects will be superior using ATRAC 4.5 or higher, the ATRAC 4.0 equipped recorders I own (MZ-R 30, MZ-R 50, MZ-R 35, MZ-R 37, MZ-R 55) have high quality drives, producing MiniDiscs running without flaw on any other MD recorder / player. Later units (MZ-R 900, MZ-R 909) featuring superior ATRAC ICs fail to do this. With the exception of the Sony MZ-N 510 they all record with unreliable results. I also admit that it isn´t very convenient to use MiniDisc these days. The reason to use them, for me at least, isn´t their sound. Every other lossy codec employed today around the world is superior. I´m sorry, but it´s a fact. Still, I love those little discs. The players/recorders are of high build quality, sound well enough (in some cases more than well) and you get the joy of bringing some amount of 'slowness' into your musical life by occupying yourself with media you can actually touch. Let´s face it, I´m an idiot. An idiot... just like people still listening to vinyl. Like them I believe in an ancient and deceased format. But have I mentionend yet, that it´s pure joy? Oh, I did? Never mind! Anyway, with my tweak you´re able to prolong the lifetime of MiniDisc before it´s completely replaced by superior codecs and playback devices. And while you´re at it, use it in combination with the FiiO E07K, it´ll sound even better this way. Use this chance well and enjoy the results!


Last update: 06.09.2013

11 comments:

  1. CHET BAKER LIVERECORDING??!! MAYBE ONE DAY JE TE SUIS DE LOIN NICE /COTE D°AZUR

    SALUTATIONS A MUENSTER JEAN G.

    ReplyDelete
    Replies
    1. german humor sent from france.... jean G. used to be marlenes???! thought she would be enchantee to hear from him! and wtf was not her STYLE; en plus M maitrisait tres bien le francais!! a translater would be very helpful, but MUNSTER?? i didnt get the test-differences and tried to suggest a chet b live recording for MUSICfans not for i.e. ROBOTS: DAC TEST MAGNIFIQUE very good german quality!! tanx from NICE/FR

      Delete
  2. I can't tell the diffeerence between the two. I don't know maybe the difference is more at an inaudible technical level? Anyways, I've recently failed to resist to buy few of these recorders after seeing one in a pawn shop. They look and feel great, and they're proving handy in recording things out of my computer. I have an audio interface, and I'm really confused how I could record the master out of the audio interface back into the computer.

    The quality of the inputs of these recorders are absolutely great, to my ears anyway. They are very clean and low on noise. That's what I am really enjoying about these recorders. I am wondering if the quality of the pre-amps is better than let's say a modern Zoom H2n, which has very noisy line/mic inputs. I haven't tested this yet, but I am thinking maybe even of the reduced ATRAC quality compared to PCM recording, the shortcomings of the MD could be rectified by its quality inputs. I don't know yet...

    Great article by the way. Any input appreciated.

    ReplyDelete
    Replies
    1. Most probably the inputs of modern devices are better. With line-in signals anyway. You have to remember that a microphone demands power to be able to work. The Sonys always had very low noise when used with microphones but the sound rarely held up to the lack of noise.

      In any case, for recording something, a PC or a portable recorder equipped with memory cards is infinitely superior. For one, they record in lossless quality, secondly, you can easily move the resulting files to the PC. With MD this isn´t as easy (as long as you´re unable to use the MZ-RH1). I don´t think that MD technology is in any way convenient when compared to todays technology.

      I like it nonethelss, maybe because it´s so ancient. But one thing is for sure: they may sound better to your ear, in reality the files on your computer are closer to the source they came from.

      Delete
  3. Wow ! Very nice publication !
    I found your website because I get back my sony MZ-R700 from my parents house, after teen years !
    I am interested to use it again, and was wondering if ATRAC is better than MP3. You provide me with (part of) an answer !

    I will hack my player to update from ATRAC 4.5 to ATRAC DSP Type-R (you can find this process over internet), and use an equalizer.

    Thanks and cheers

    Hugo D.

    ReplyDelete
    Replies
    1. I´ve described that hack, too. It´s here on my blog.

      To anwer your question: ATRAC has theoretical advances... but MP3 has been in development constantly. The last ATRAC 1 version was released in 1998 (Type-R) and it shows. Today, MP3 is easily superior.

      Delete
  4. You miss a lot of information like most people when it comes to digital. Its not just the wave model that compression produces that counts. The fact that a Digital Audio Converter is required to get the sound out to the speakers no one ever seems to talk about much. Basically digital is being converted to analog so a speaker can play the music. The higher the quality this DAC process happens (or doen't happen at all in the case of pure analog) then the richer the sound coming out of the speakers. Its ALL and I repeat ALL about this analog process otherwise you would hear nothing. Its impossible to hear digital anything. In fact, I usually edit digital wave files listening to no sound at all. I know exactly what is happening visually by looking at the waves. I edit by sight I listen by analog.

    ReplyDelete
    Replies
    1. DACs are so good these days that most of them perform audibly transparent, their error rates way below the point of audibility. Their general quality is the reason why no one who knows anything meaningful about DACs, still talks about them. So why the bullshit talk about superiority?

      And the waves you're talking about? A visual approximation of a digitally stored signal converted back to analog. So, in fact, you're editing something that looks like an analogue wave. Oh, and btw, this approximation of waves, spectograms and what not was not designed to "guess" the music. It was designed to spot defects and errors, this method harks back to completely analog days.

      Your text doesn't make much sense. What is it really you want to say?

      Delete

Related Posts Plugin for WordPress, Blogger...

The Socials