Autotune and extreme compression

Jeremy Threlfall · Post by **Jeremy Threlfall** » 5 Sep 2019 6:28 pm

A theory has come to me in the night â€¦.

note I am not a qualified audio engineer or audio anything really, but this notion has some intuitive appeal to me

Came to me after reading a book on the development of MP2/3/4 compressed files

and how that extreme form of compression works by deleting the notes (or sounds) that the brain would infer for itself. For example, for a song in a major key, if your brain hears the root and the 3rd, it might not need to hear the 5th, to know its there

(I don't really know what they take out in that compression process, but that's what I took away from my reading)

So if your brain hears a root note and a slightly off key 3rd, then it might not be able to infer the deleted info, and the third would have to be pitch corrected for the compression scheme to work

is this a deluded notion? Is this why even good singers use auto-tune these days?

b0b · Post by **b0b** » 5 Sep 2019 10:29 pm

I think you misunderstood. What they're removing isn't pitches, it's dynamic range. A compressor doesn't know anything about musical intervals or pitch. It's working with volume levels, that's all.

Jeremy Threlfall · Post by **Jeremy Threlfall** » 5 Sep 2019 11:48 pm

I have most likely misunderstood

Is the so-called 'compression' that they use to create mp3 and mp4 files the same as the compression we use in studios?

it was explained to me as diminishing file size by removing 'unnecessary' information, or information we can 'do without'

Jeremy Threlfall · Post by **Jeremy Threlfall** » 6 Sep 2019 12:13 am

https://www.retromanufacturing.com/blog ... ac-wma-mp3

Ian Rae · Post by **Ian Rae** » 6 Sep 2019 1:06 am

It's digital compression. A computer decides what ones and zeros it can leave out without degrading the information below a set level. It doesn't know whether it's an audio recording or your holiday snaps.

Georg Sørtun · Post by **Georg Sørtun** » 6 Sep 2019 1:33 am

You got "data compression" – used to create mp3/mp4 etc. files – right…

Jeremy Threlfall wrote:it was explained to me as diminishing file size by removing 'unnecessary' information, or information we can 'do without'

https://en.wikipedia.org/wiki/Data_compression
… but all that is used after the recording is done. Such file-size reduction works on the principles that neither the ear nor the brain can distinguish minute details when masked by "noise" (stronger, more dominant, sounds) in a complex piece.
Of course, that depends on how well the ear/brain in question is trained on listening, but most listeners are not trained so with the right degree of "fuzzyness" it may work quite well.

Audio-compression is more "ordinary" (and/or "old") studio processes, to keep audio-levels comfortably between noise-floor and distortion-ceiling, by analogue and/or digital processes, and is used on its own and as part of data-compression.

Georg Sørtun · Post by **Georg Sørtun** » 6 Sep 2019 1:42 am

To add: "autotune" is mainly used to keep singers/musicians on key or to create special (VoCoder) effects. A "side-effect" is that such processes also reduce "off-key details" and thereby reduces the amount of data to "data-compress" to reduce file sizes through later processes.

Jeremy Threlfall · Post by **Jeremy Threlfall** » 6 Sep 2019 1:54 am

Excellent - so, I’m sort of right. Thanks George!

b0b · Post by **b0b** » 6 Sep 2019 8:02 am

Jeremy Threlfall wrote:I have most likely misunderstood

Is the so-called 'compression' that they use to create mp3 and mp4 files the same as the compression we use in studios?

it was explained to me as diminishing file size by removing 'unnecessary' information, or information we can 'do without'

The compression of MP3 files is not audio compression, it's file size compression. There are several algorithms involved to determine what is 'unnecessary information', but I doubt that any of them have to do with musical pitch. Most of it is tiny variations between the volumes of adjacent samples (very small slices of time). Once those little details are removed, standard data compression algorithms can reduce the file size considerably.

Ian Rae · Post by **Ian Rae** » 6 Sep 2019 10:53 am

Take the example of a video of someone talking to camera in front of a stationary background. The only thing that changes from frame to frame is the speaker's face, so if instead of repeatedly recording or sending the complete frame you just code the differences, there is much less information.

Although it's less easy to visualise for audio, it's on the same principle.

Jeremy Threlfall · Post by **Jeremy Threlfall** » 7 Sep 2019 7:01 am

OK - I’ll discard my whim (thanks for the updates)

So, there’s no excuse for auto-tune, at least coming from the engineer/producer

Good, I didn’t miss anything

Dom Franco · Post by **Dom Franco** » 7 Sep 2019 8:56 am

To clarify a bit... Analog audio recording (Tape, direct to disc etc.) captures everything and plays it back with an infinite amount of variability. (However, also including tape hiss, record groove noise clicks and pops etc.)

Digital recording converts the analog signal to "1's" and "0's" (binary computer numbers) and depending upon the quality of the A to D converters and the resolution (number of binary bits allowed) will make a very good copy of the analog with out the noise added upon playback.

HOWEVER: at some point digital information will be limited to the number of bits that can be stored for each note.

So in effect all Digital Audio suffers from some compression, however the highest sample rates are nearly perfect. When a smaller bit rate "sample size" is used there will be more degradation of the audio play back.

Often this digital compression is undetectable to the human ears, but lesser quality recordings can sometimes sound "sterile or "harsh" compared to analog.

werner althaus · Post by **werner althaus** » 10 Sep 2019 7:04 pm

Ian Rae wrote:Take the example of a video of someone talking to camera in front of a stationary background. The only thing that changes from frame to frame is the speaker's face, so if instead of repeatedly recording or sending the complete frame you just code the differences, there is much less information.

Although it's less easy to visualise for audio, it's on the same principle.

I love this example, especially when you compare it to the same person talking in front of a tree on a windy day but I think what you're describing is lossless data compression. It's considered lossless despite reducing file size because instead of encoding a number of identical pixels one by one it'll just say X times pixel red , therefore it is reversible.

On the other hand lossy compression reduces data by also eliminating non-redundant datapoints that are deemed un-necessary , most of them based on psychoacoustic phenomena such as masked sounds, reflections/ reverbs/ delays that fall within the Haas effect and therefore do not contribute to spatial localization, etc but also things like upper frequency response, resolution of the mid signal vs side signal , etc.. This is irreversible.

werner althaus · Post by **werner althaus** » 10 Sep 2019 7:42 pm

Dom Franco wrote:To clarify a bit... Analog audio recording (Tape, direct to disc etc.) captures everything and plays it back with an infinite amount of variability. (However, also including tape hiss, record groove noise clicks and pops etc.)

Digital recording converts the analog signal to "1's" and "0's" (binary computer numbers) and depending upon the quality of the A to D converters and the resolution (number of binary bits allowed) will make a very good copy of the analog with out the noise added upon playback.

HOWEVER: at some point digital information will be limited to the number of bits that can be stored for each note.

So in effect all Digital Audio suffers from some compression, however the highest sample rates are nearly perfect. When a smaller bit rate "sample size" is used there will be more degradation of the audio play back.

Often this digital compression is undetectable to the human ears, but lesser quality recordings can sometimes sound "sterile or "harsh" compared to analog.

I'm not sure if this is clarifying anything. For starters analog recording doesn't capture "everything", neither in the time domain nor in terms of amplitude. It's just not how microphones work, it's not how amplifiers work, it's not how tape or direct -to-disk works. each of these components in a recording chain has limitations regarding frequency response, distortion, signal-to-noise-ratio etc, that's why honest specs are published in a way that show values within a given and hopefully agreeable operational range only, like +/- 3dB 20Hz to 20 KHz or s/N = 90 dB re +4dBu, 22KHz BW, unity gain Those are the ranges within which the gear performs close to being able to capture faithfully.
If you are referring to analog audio as capturing "everything" vs digital audio missing some information due to stair steps then you've fallen victim to the digital myths floating around on the internet.
Digital, like high quality, honestly specd analog captures faithfully across a predetermined operational range with samplerate defining the frequency range and bit depth defining dynamic range ( in theory).
Why am I pointing this out? Because contrary to what you say all digital does NOT suffer from some compression and higher sample rates are no more perfect than lower ones, they only expand the frequency range of audio to be captured and they only do so in theory, in reality all but the best ones introduce more problems due to decreased clock accuracy, added intermodulation distortion and a host of other problems.
And as far as what level of lossy data compression is detectable, IMO that hasn't been determined yet because not all audio encodes the same. You can actually prep your audio to be more resilient to even very lossy data compression. Some mixes sound great at 128Kbps while others sound lossy, swirly, dull and whatnot even at 320.

Godfrey Arthur · Post by **Godfrey Arthur** » 25 Sep 2019 12:22 pm

Jeremy Threlfall wrote:OK - Iâ€™ll discard my whim (thanks for the updates)

So, thereâ€™s no excuse for auto-tune, at least coming from the engineer/producer

Good, I didnâ€™t miss anything

Pro recording studio schools have a prerequisite course in Autotune. Each engineer-student hopeful spending thousands in school fees is expected to master Autotune.

There is a use for the software. You can dial in trace amounts and choose the notes it works over. It doesn't all have to sound like T-pain.

Here as #1 of the 25 Best Rock albums for Billboard's 2018 list is The 1975 with it's delve into pitch modulation.

Serious Autotune happening here.

TOOTIMETOOTIMETOOTIME

https://www.youtube.com/watch?time_cont ... fxPQUKfim4

As has been pointed out, your premise that converting a file to Mp3 is a compressing of the file size not a compression of the audio.

What you posture compression has to do with Autotune is not clear. All Autotune does is pitch correct the same way Melodyne does.

If we listen to songs made decades before Autotune, even the best singers of the day have songs on vinyl that have out of tune notes. And today we can pick those out of tune notes out.

Maybe singers like Karen Carpenter were really spot on in the note pitch department.

werner althaus · Post by **werner althaus** » 25 Sep 2019 6:46 pm

Godfrey Arthur wrote: ...What you posture compression has to do with Autotune is not clear...

The way I understand the OP is that he is wondering whether a pitch -corrected harmony part consisting of a root, a third and a fifth would be easier and better data compressed (by removing the fifth) relative to a not perfect harmony part because , as he stipulates ( incorrectly IMO) the brain can then "infer' the missing fifth note if perfect intervals between the root, the third and the fifth are present. The assumption is that the codec algorithm can safely remove the fifth for psychoacoustic reasons. I don't believe it works that way, for starters autotune will pitch correct to tempered tuning which is different from overtones or combination tones that we hear, which vary a few cents up or down from tempered tuning. I don't believe that the psychoacoustic toolkit used is removing any fundamentals or first harmonics of any given sound at all unless they are masked by other sounds or fall within the Haas effect's spatial effect due to delay. We should be able to try this by encoding a single note with its series of overtones vs all those overtones played as separate notes. Would the algorithm remove the played overtones since those frequencies are already present in the single note? I don't believe it would unless maybe if they share the exact envelope (attack, decay, etc).