Baphy Ruins Sample Rate

0 Comments

Baphometrix explains why 44100 is the 420 for most music producers

Baphometrix is a Producer-DJ specializing in festival-oriented bass music genres (with some hip-hop on the side). She is a student of ill.Gates and a member of Producer Dojo’s Class of 808.

I have probably burst a few bubbles in my previous two deep-dives about loudness and dithering here in the Producer Dojo blog. But that was just to soften you up for this deep-dive about sample rate. I’m going to take a blow torch to the golden calf that “48000 (or 96000) is the best sample rate for music production”. Move over, Adam Conover. Baphy is in the house! (Cue the Adam Ruins Everything theme song)

In the same vein as my previous deep-dives, I’m going to start with some assertions that might surprise you. And as always, stick with me to the end before deciding I’m wrong!

1 – All music producers who routinely use producer-oriented sample libraries (drums, loops, FX, foley, etc.) should ALWAYS use a 44100 sample rate for their projects.

2 – The only exceptions to the preceding assertion are producers that work with 100% original recorded tracks, and specialized producers/composers who routinely use cinematic sample libraries designed expressly for film scoring work. For these two types of producers, it’s desirable to work at project rates of 48000, but they must pay special attention to the next assertion, which is…

3 – You should never mix samples of differing sample rates inside of a project. For example, if your project is at 44100 and you want to bring in some random 48000 atmospheric or impact sample from a cinematic library, you should first convert that sample to 44100 outside of your DAW with some 3rd party sample rate conversion (SRC) tool, and then bring the converted sample into your project.

4 – Recording engineers have good reasons to work at 96000 or 192000 sample rates when recording organic instruments. Mixing engineers have good reasons to work in 48000 or higher project rates. Mastering engineers have good reasons to work at 19200 sample rates during at least some of their processing. But none of these good reasons translate to the typical music producer’s environment! Just because they do it does not mean it’s a good idea for you to do it!

5 – Your typical YouTube content creator does NOT need a 48000 audio file for their Adobe Premiere project. (Or similar prosumer video editing suites.) Any legit requirement for a 48000 audio file comes only from the high end movie studios who use very specialized video suites and specialized 7.1 surround audio tools.

Do any of these surprise you? Read on to understand why!

The endless debate

If you think the debate (and confusion and bad advice) about bit rate is bad, it’s even worse for sample rate. You can spend hours googling and digging for information about sample rate and sample rate conversion (SRC) and see nothing but a mass of conflicting advice and assertions.

There are broadly three camps in the debate, that go something like this:

A – “I’ve heard that big studios use 96,000 sample rate–aka “studio quality” sample rate (per iZotope documentation)–so I do too because obviously that’s the best, right?”

B – “Just use 44100, nobody will hear the difference. Blind tests have proven that people who think they can hear a difference between 44100, 48000, and 96000 are no more accurate than if they simply were guessing”

C – “96000 is too much of a drain on typical bedroom producer systems, so I use 48000 because clearly that’s better than 44100, right? I mean, pitch shifters and warping work a little better with 48000 because they have finer grains to work with, right?”
–OR–
“I use 48000 because YouTube uses 48000” (which is actually kinda sorta wrong)
–OR–
“I use 48000 because I want put up my music for sync licensing too” (because sync licensing is for videos, and the standard for audio tracks used in professional video editing suites is 48000)

The thing is… wait for it… All of these three camps are wrong when it comes to music production. Or perhaps more accurately, they are all only half-right. Or getting some small part right, but then drawing the wrong conclusions from that small part.

The one camp that is closest to being right is the one that argues 48000 (or 96000) is better than 44100 for pitch shifting and warping. That is an absolute fact. But I’m willing to bet that most producers operating from that correct understanding still pull a bunch of 24-bit 44100 samples into their 48000 (or 96000) projects and… all that great benefit from working at a project rate of 48000/96000 suddenly goes right down the toilet! Read on to learn why.

The one thing nearly everyone gets wrong and never talks about

There is literally only one place I have ever seen this fundamental problem described since Tarekith and Bob Katz took down their excellent essays and analyses of sample rate from back around 2010, which was the first time I dived hard into this subject and decided what the best approach to sample rate was. (And no technical advances in DAW software since that time have changed the facts.)

That one place is the Audio Fact Sheet for Ableton Live. Go take a brief look at section 32.3.2 and then come back here. I spent literally 4 hours googling for any other source that laid it out as plainly as Ableton, and nope. Not even iZotope does.

So Ableton flatly says “Do not mix files of different sample rates inside a project! Instead, use some high-quality SRC tool to convert files to the project sample rate, and THEN bring them into the project”

Why do they say this? Especially if you’ve done a little digging yourself, and you’ve discovered (through one arcane source) that Ableton’s SRC algorithm seems to look very strong and clean? I’m going to help you read between the lines here.

The main issue revolves around linear-phase lowpass filters versus minimum-phase lowpass filters. If you’ve ever done your homework about when to use linear-phase EQ filters versus when not to use them, you know that linear-phase filters have two downsides:

A – They are CPU-intensive and cause significant project lag

B – They cause pre-ringing around the filter point, which in plain English means any transients happening near the filter point become smeared sounding.

So here’s the deal: good-quality SRC algorithms all use a linear-phase lowpass filter to limit aliasing foldback from the spectrum above the Nyquist frequency during sample rate conversion. You don’t have to understand what aliasing or the Nyquist frequency is, or why SRC algorithms need to try and cut everything happening above the Nyquist frequency. Just focus on that keyword “linear-phase”.

The other important thing to understand is that not a single DAW (or Sampler) out there performs SRC when you pull new samples into a project (or into a Sampler device). They leave your imported samples alone and untouched. But the DAW has a dilemma! When you press play, it has to make sure all the audio streams from all the tracks/channels/groups/returns (and Samplers) are running at the same sample rate that the DAW project itself is set to.

So the DAW solves this problem by performing SRC in real-time, while the playhead is running, on every audio stream that isn’t running at the project’s configured sample rate. And the BIG problem is that if the DAW were to use high-quality linear-phase SRC algorithms in real time on potentially dozens of audio streams, it would eat a huge amount of CPU and incur a sizeable chunk of latency. So the DAW uses lower-quality real-time SRC that uses a minimum-phase lowpass filter! It’s a simple trade-off of efficiency versus quality.

Ableton is the only DAW I’ve run across that comes close to spelling this out the way I just have. They hint at this in their section 32.3.2. If you understand how differences among SRC algorithms across DACs and Software all hinge around efficient but low-quality minimum-phase filters versus intensive but high-quality minimum-phase filters, you can put two and two together when reading Ableton’s clear and unambiguous warning to not mix sample rates inside of a project.

Still not convinced? Ableton uses the SoX Resampler library as of Ableton 9.1. If you look at the SoX documentation they clearly state that

Note that for real-time resampling, libsoxr may have a higher latency
than non-FFT based resamplers. For example, when using the `High Quality’
configuration to resample between 44100Hz and 48000Hz, the latency is
around 1000 output samples, i.e. roughly 20ms (though passband and FFT-
size configuration parameters may be used to reduce this figure).

In plain English, this means that if Ableton were using high-quality resampling during real-time SRC, every audio stream being SRCed in real-time would be incurring roughly 20 ms of latency. I don’t think they would make that choice. Because SoX is highly configurable, Ableton clearly made the choice to go for lower quality and (near?) zero-latency during real-time SRC, and this is why Ableton strongly advises to not mix sample rates in a project.

If you’re going to work at 48000 or 96000, here’s how to do it RIGHT

There is nothing at all wrong with working your music projects at a configured sample rate of 48000 or 96000. But it takes more effort and time to do it right.

The goal is simply to prevent your DAW from needing to do any real-time SRC during playback.

So what this means is that if you have a legit reason to work at 48000 or 96000, you should do exactly what Ableton advises, and use some other high-quality SRC tool to pre-convert every sample that isn’t at the native sample rate of your project. Probably the easiest and cheapest high-quality tool for most of us to get our hands on is iZotope RX7 Standard (or Advanced). The Resample module in RX7 is very very good. Just use the default values, and you’ll get very clean SRC.

So…. is this feasible? It’s definitely extra work. Most of the sample packs you buy from Loopmasters, ADSR , Native Instruments, Black Octopus, Cymatics, and so on are all 24-bit 44100. That means every kick sample, every snare sample, every loop–they ALL need to be pre-converted to 24-bit 48000 (or 96000) before you drag them into your project.

And for those of us who use 128s? Yep. You need to pre-convert every sample to 24-bit 48000 (or 96000) before you drag them into the DAW’s native Sampler device and build the 128. Yep. That’s a big PITA. Yep. You’ve already got a User Library full of 128s, some of which were very painstaking to make. Or you have some 128s that somebody else made, and you’d be looking at exporting all those individual samples out of it just so you could convert them, then rebuilding that 128 from scratch all over again.

Yes. THAT right there is why I personally work at 24-bit 44100. Because 95% (or more) of the samples I reach for every single day are at 24-bit 44100. And the last thing I want to do is waste time manually converting samples before I pull them into my project.

But But But… I have a LEGIT need to work at 48000 (or 96000)

There are only two legit needs to for working at 48000 (or 96000):

A – You are doing audio specifically for professional film scoring (not for YouTube creators), and you rely mostly on cinematic sample libs (like Spitfire stuff) that are all natively 24-bit 48000 to begin with.

B – You are a recording engineer in a big studio with serious computing power, and you need to solve some real-world issues with getting the cleanest signal possible recorded from some problematic instrument sources that spit out a LOT of ultrasonic frequencies above 20 Hz. A typical use case is certain types of horn instruments being close-miked. You’re solving a real-world issue with hardware analog-to-digital converters in your recording signal path. To do this, you’re using a 96000 sample rate on the sound card during recording to ensure that as little audible high-frequency smearing is happening as possible. From there, the studio producer and mixing engineers might continue to work with a project all at 96000 for a while (or all the way through mastering), or they might simply downsample that original 96000 raw file to 48000 or 44100 (using good tools) so that the project engineers can work at 48000/44100 during arrangement, mixing, and mastering.

What’s not a legit need to work at 48000 or 96000? “I want my pitch shifters or warping to sound better”. Why isn’t this a legit need? Because you can do this special-purpose work in a side project (not your main project), and then when you’ve got the sound you want, you can export a stem out of that side project, convert it to the 44100 your main project is running at, and then import it into your main project.

“That’s too much trouble!” you say? Is it really? Isn’t working in multiple side projects actually a “best practice” even among bedroom producers? Think about the advice we commonly hear about “vocals are a special case, they take a LOT of intensive processing, so it’s best to do all your vocals in a separate side project and then bring a few stems back into your main project from there.”

I’m not going to elaborate on why professional recording engineers in big studios might choose to record something at 96000 through their AD converters. It gets into some funky physics. The bottom line is that all AD converters do oversampling internally, and perform sample rate conversion internally. So by working at a higher sample rate that internal low-pass filter is set up so high that most of the “bad sound” (aliasing foldback and pre-ringing smearing) happens waaaaaayyyyyyy above any audible frequencies that get “printed” to the sample file. Just remember that big studios have special gear that is far better equipped than your PC or Mac for working with huge, CPU-intensive 96000 sample rate files.

But… I might shop my music around for sync licensing–that needs to be at 48000, right?

Yes, most sync-licensing services will want you to upload 24-bit 48000 files, because that’s the standard for nearly all movie studio professional video editing suites (not Adobe Premiere Pro, which is a prosumer video tool). Why do they want 48000 sample rate files? It revolves around the standard frame rates for the video itself. 48000 is just a cleaner match for the audio to sync up nicely with the video frames and playhead tracking. Again, though, this is a nuanced detail that only professional video editors might care about. Prosumer software like Adobe Premiere Pro doesn’t care, and can work with 44100 audio tracks just fine.

But…. You know you can simply take a 44100 master WAV export and put it through a one-time SRC up to 48000, right? You know that a one-time SRC from 44100 to 48000 is incredibly clean with today’s software, right? You know that Ableton Live 9.11 and onward has incredibly clean SRC on export, right? So if you have Ableton 9.11 or higher, you don’t even need iZotope RX7 to get a clean upsample to 48000, right?

Yes. Ableton 9 and 10 have really really really clean SRC on export (it uses linear-phase SRC during audio export). In fact, Ableton’s export SRC is almost as good as iZotope RX7’s Resample module. Every other DAW falls way short of Ableton. Not Logic, not ProTools, not Reaper, not FL Studio, and definitely not my own beloved Bitwig (lol). How do I know this? Go experiment with this fun (and fairly up-to-date) tool for a while. Read the Help button to understand what it’s showing you. http://src.infinitewave.ca

TL;DR – why not work your music projects at 24-bit 44100 and avoid nearly all of the day to day hassle and sound degradation described in the preceding sections? Then export a 24-bit 44100 master WAV for upload to your music distributor (and SoundCloud/Bandcamp, etc.), and export a 24-bit 48000 master WAV for upload to any sync libraries you use?

A – If you’re on Ableton Live, you can do both exports right out of Ableton itself and trust that the 48000 version is really clean.

B – If you’re on any other DAW, just take your 24-bit 44100 export and run it through RX7 to get your 24-bit 48000 version.

But… I might shop my music for sync licensing, and I don’t have Ableton and I can’t afford RX7 Standard

Fair point. So… first off, I’d be surprised if most sync licensing services don’t accept 24-bit 44100 files anyway, and perform the SRC to 48000 for you. But even if your preferred sync licensing service doesn’t do that, you can always find a pro mastering engineer (or a friend with RX7 Standard) to do it for you. For free or for cheap.

What does Baphy recommend and do herself?

I’m all about speed and simplicity, and 98% or more of the samples I reach for are 24-bit 44100. So my projects are all set to 24-bit 44100. If I notice that some rare sample I grabbed is a 48000, I’ll stop and convert it with RX7 and then bring it in. The one thing I consciously do is avoid buying sample packs made at 48000. In short, I try to end up with as few 48000 samples in my sample library as possible. And if I accidentally bring a 48000 into my project and don’t notice it, I don’t lose sleep over it. I don’t really care about the tiny impact of only one or two tracks at most that my DAW is performing crappy realtime SRC on.

I export 24-bit 44100 master WAVs. If I ever need to provide a 48000 WAV to anyone, I’ll simply run my 24-bit 44100 master through RX7 to make a 24-bit 48000 as needed.

Yeah but what about Kontakt libraries or Maschine libraries? Aren’t they 48000? I use those all the time in my music projects

The samples used in most Kontakt libraries are packed up in a compressed file and you cannot directly re-use any of those samples by dragging them into your DAW project. Instead, those samples are played through Kontakt itself. And Kontakt does its own internal sample rate conversion as needed to match the configured sample rate of your DAW project. Again, because of latency, it’s almost certain that Kontakt is using lower-quality minimum-phase realtime SRC, just like your DAW. So the big question is whether the Kontakt library you’re using comprises 48000 or 44100 samples?

That’s a tough question to answer with certainty, because Native Instruments is very careful to never specify the sample rate used in its Kontakt instruments. Not in the product pages, not in the documentation. Nowhere. And because the sample libraries are packed up, it’s not a simple matter to just examine them. My strong hunch is that all the Kontakt instruments sold by NI are using 44100 samples internally. For one, vendors that actually make sample packs and sample-based instruments using 48000 samples usually brag about that fact. (And it’s also a typical marker of a “cinematic” sample pack.) For two, as mentioned in the next paragraph, all of the Maschine expansions are using 44100 samples, so it’s IMO a safe bet that the Kontakt instruments are too.

Maschine is a different story because the samples in Maschine expansions (and the Factory Library) can all be browsed and dragged directly into your DAW projects entirely outside of Maschine itself. I can verify for a certainty that all the samples in Maschine libraries/expansions are 24-bit 44100.

Hey you haven’t mentioned YouTube yet–Doesn’t YouTube want 48000 music files?

Nope. Or, Yep? YouTube is inconsistent on their recommendations. In some places they say music files should be at 24-bit 44100 (in their docs for ContentID). In other places they throw out the party line that videos with music soundtracks should be at 48000. Which all videos coming out of Adobe Premiere (or whatever prosumer video software you might use) are already at! All these video software tools internally handle the upsampling of audio from 44100 to 48000 when you export videos from them.

What is certain is that the audio stream from YouTube playback is 44100. Take that for what it’s worth. My 44100 songs turned into YouTube Music videos by DistroKid and my music videos made in Premiere and uploaded as 48000 mp4 movies all sound the same to me, so…. don’t sweat it.

You still haven’t convinced me, because I know that frequencies ABOVE the audible range also change frequencies IN the audible range

Yes. You’re absolutely correct. And this is the main reason that pro recording engineers and big studios with the right equipment will use the computing horsepower necessary to capture as much of those ultrasonic (above the audible range) frequencies as possible during the initial analog to digital conversion. And why engineers and studios with computing horsepower prefer to work in 96000 for as much of the project lifecycle as possible. If you’re not sure what I mean by horsepower, just look at specialized (and expensive) studio solutions like Waves Soundgrid processors.

To save me the whys and wherefores of explaining this concept here, I’ll pass you off to a very short and well-written essay by Dave Askew. Pay close attention to the diagrams down in section 6 and keep those in mind as I continue after you come back.

The TL;DR is that yes, working in the highest sample rate possible is theoretically desirable, but practically undesirable for the average bedroom producer working on relatively low-powered home computers and laptops. And made worse by FAR when you consider that most of the samples you reach for and drag into your projects and samplers all the time are mostly 24-bit 44100 samples.

In short, the upsides of that tiny impact on the “accuracy” of your high sample rate frequencies in the audible range are FAR outweighed by the downsides of too much low-quality real-time SRC inside your DAW. Those diagrams in section 6 of that essay? Any type of SRC turns a single peaky “tick” in a sound wave into something that has a reduced amplitude for that “tick” and entirely new ripples on one or both sides of that tick. This is called ringing, aka smearing. In SRC that uses linear-phase filters, the ringing is smaller, but it spreads out to both sides of the original tick. This is why you hear the common advice/caution that “linear-phase EQs can smear your transients and cause pre-ringing”.

And in lower-quality (but more efficient) SRC that uses minimal-phase filters, all the ringing occurs on the right side of the original tick and the ringing is much louder near the tick.

Neither result is desirable, but it’s inescapable. And this is why professional film/video composers will often work in 48000 projects and stick to expensive cinematic libraries that have 24-bit 48000 samples. They try to work in the sample rate needed by their primary customer: a movie studio professional film/video editor. They do this so that they never have to perform any type of SRC at any step of the process.

But I’m writing this essay for music producers. Especially those of the “bedroom” variety, which most of us are these days, at least in the electronic dance music world. Our customers are SoundCloud, Beatport and all other streaming platforms and stores, USBs destined to be played on CDJs at clubs and festivals, and YouTube. Nearly all of these customers want you to hand them 24-bit 44100 WAV files. Sure, they might accept other formats, but then they themselves SRC what you hand them to either lossy formats that sound way worse than any 44100 vs 48000/96000 debate, or they convert what you hand them to 44100. To me, at the end of the day, it just makes more sense for us to stick with 24-bit 44100 end-to-end in our DAWs and primary master outputs. If you need to hand someone a 48000 master for sync licensing, just do a one-time SRC on the master WAV itself.

Thanks for hanging with me until the very end! I know this is chewy stuff. (phew!) Next month I’ll talk about clipping and clipper plugins, and why you sometimes want to reach for a clipper instead of a compressor or limiter. Stay tuned!