I’m a musician who has played loud music live for over fifty years and I hear the a and b examples 24/7. Thanks for identifying the frequency of my tinnitus!
Wave scientist, here (not audio). I agree this is an excellent demonstration of aliasing. However, I think this video seems like an argument primarily for *mastering* above 44.1 kHz, particularly if you're generating a lot of synthetic sounds, rather than recording or playing back audio above 44.1 kHz. I wouldn't expect human voices or musical instruments to produce a lot of power in frequencies above the human range of hearing, so you're probably not going to get a lot of audible aliasing if you record audio at 44.1 kHz. And then if that aliasing hasn't been baked into your digital audio file to begin with, then you won't be hearing it. An exception I could imagine would be if you're recording in a noisy environment where the "noise" isn't Gaussian--in which case, perhaps you could get some beat-like pattern of "noise" in your audible range. Edit: the other caveat would be that if you have high-fidelity audio and you're playing at back at a lower sampling rate, it's anyone's guess how that downsampling/resampling algorithm is working. It might introduce it's own wonkiness. But then if you're trying to drive speakers at a frequency beyond which they've been tested, you might get non-linear weirdness, too.
I'm really focusing on the capture side of things. My interest is really in the "making" side of things. Some cymbals can shimmer up in the high range of human hearing - I've heard tell of some loss of the "brightness" of cymbals because recording at 44.1kHz and compounding of a bunch of low pass filters causes that upper range to just lose power... but that's theoretical to me. But I really never answer the question in the title "What is the Optimal Sampling Rate"... haha. I think my purpose was really to understand what the argument is, not necessarily advocate for it. That's my impression when I was trying to work through Lavry's paper.
@@FilmmakerIQ this is an interesting discussion and has me sitting here on a Saturday afternoon tinkering with MATLAB and Audacity.... :-) So what I just tried was I created a 7 kHz square wave sampled at 44.1 kHz. It looks like a typical square wave. Sounds like your video Then I generated a 7 kHz square wave in a 192 kHz track, and applied a bunch of aggressive 22.05 kHz lowpass filters so it has no frequency content above 22.05 kHz. Then I made another 7 kHz square wave in a 192 kHz track, and just told Audacity to resample it to 44.1 kHz The two 7 kHz square waves generated in 192 kHz tracks, one lowpass filtered, and one resampled, sound like square waves, and sound almost.the same. The one 7 kHz square wave generated in the 44.1 kHz space has dozens of overtones and sounds totally different. I don't have an easy to way generate a square sweep, but this would be an interesting experiment for you to try on a square wave sweep and see what happens.
So here's the deal with 7 kHz square wave and any flavor of 48kHz (96kHz, 192kHz) - they will all sound the same! Remember in my video where I did the frequency analysis? There was the fundamental at 7kHz - that first harmonic at 21kHz and then aliases separated by 2kHz starting at 1kHz and going up. That works out because cycle of reflections caused by the 24khz Nyquist limit cycle around and around on odd numbers. Doubling the sample rate to 96 or quadrupling it to 192 doesn't change the cycle, it just changes where the cycle ends and picks up again! And since a square wave is suppose to be an infinite series, it doesn't matter where the cycle picks up.... Change the fundamental frequency of that square wave to 7001hz - and it won't cycle around like that and you'll hear the difference between 48 and 192khz.
@@FilmmakerIQ "aliases separated by 2kHz starting at 1kHz and going up. That works out because cycle of reflections caused by the 24khz Nyquist limit cycle around and around on odd numbers" - sorry, don't understand this: where does the 2kHz separation come from ? How do you get from 7x5, 7x7, 7x9, ... to all those frequencies ?
I have no problem letting the "Hz"/"KHz" misstep slide, as it's a common-enough slip-up. But sorry, I gotta call you out on repeatedly referring to a low-pass filter as a "limiter". Otherwise, this explanation - like all of your tutorials - strikes a comfortable balance between technical accuracy and accessibility to a broad audience. Well done! Btw, be forewarned: you just know somebody's going to link to this video in a Gearslutz post.
I used to work in sonar engineering in which we used digital signal processing. To avoid aliasing the first 'processing' step was an analog filter which would cut off frequencies that could cause trouble because of this aliasing.
A well designed delta modulation ADC does part of filtering for you. A part like the AD7768 uses a modesty high order integrator so you get more than one pole of anti-alias for free.
@@kensmith5694 I've never been able to fully wrap my head around delta-sigma converters. Like... I can sort of follow the math line by line, but I can't really develop and intuition for the "heart" of how they work.
@@kensmith5694 When I mentioned working in sonar engineering I was talking about 1979 and 1980. We had to construct our own processing unit by using a 5 MHz 32 bit 'high speed' multiplier as the heart of our system.
What you explained is just that, either have a good recorder (A/D converter) which can do a good job at filtering out > Nyquist frequency signal, or, record at higher sampling rate (for example, 96kHz is much more than enough), and then, down sample to 44.1kHz/48kHz when you do your post. In digital domain, you can do (very close to) exact calculation, and at the end, save a few bytes on the final product (without jeopardizing quality). However, to those crazy guys who insist to get so called high res files for PLAY BACK, they are just crazy, forget them!
I mean, even cheap equipment like 100$ DVD Players in 2007 had already 192kHz DACs avoiding any problems like this at all. But for the final media, more than 44,1kHz doesnt make much sense since anyways most released music is still in 44,1/16Bit. Even most(or all!) vinyl records are made from 44,1kHz samples. Tidal even dare to upsample/"remaster" 44,1kHz/16Bit originals to expand their "HiRes" collection... Since anyways every HiFi gear have HPF for anything above 20kHz, in combination with internal 96kHz+ processing, more like 384kHz nowadays, no. 44,1 is just fine... more is acceptable in cases of digitalized vinyl records or yeah - why not. 44,1/48 vs 96+ is like comparing 4K vs 8K... it doesnt make practical sense, probably a bit with the perfect circumstances... but hey its possible. Thats why my AVR has 9x(or 11x idk) 384kHz/32Bit (32Bit!!! wtf?!) DACs, by numbers even better than my Hi End Stereo Gear with "only" 192kHz/24Bit Wolfson DACs. Only in recording and mastering more than 44kHz are needed, and these are anyways at 96kHz+ since its possible. I dont get people when they complain about "only CD quality"/44,1kHz... damn! Thats at least completely uncompressed, not like the lossy! MQA garbage for example. In fact (and already proven...) CD quality is better and more accurate than MQA (which is another compression format like mp3 - but worse, and with high license fees haha). Some of my friends are completely addicted to HiRes and/or Tidal/MQA, only because they see any blue light or 96/192kHz on their receivers screen... despite having absolutely the same sound as a 44,1kHz CD with the same mastering. Damn, they use soundbars, garbage "HiFi" gear, BT headphones and they dare to complain about 44,1 kHz only! I also prefer HiRes source material, but mostly because of different masterings, less loudness, more audiophile/dynamic, easily for the "demanding" people mastered.
@@harrison00xXx I believe you are incorrect about vinyl masters. Mastering for vinyl is a separate master than the master for CD. Professional mastering engineers want to work with the highest quality mix which means NOT 44.1kHz/16bit. And most likely the vinyl press wants to make their master from the highest quality version available. At least for the major label artists. Independent artists, well ya know they get what they pay for and can’t be reasonably used to make statements about what’s used to make vinyl records.
@@arsenicjones9125 Ofc its differently mastered for vinyl, but still, the samples used to make the "negative" are for probably 99,9% of the (non quadrophonic) records 44,1kHz/16Bit, as CD quality, that was my point As if CD Quality is "bad"....cmon, thats the most accurate and "lossless" quality standard we ever got. Ofc there is now "HiRes" but thats more of a voodoo/too much...
@@harrison00xXx no I’m afraid you’re again incorrect. In major studio albums they regularly record at high sample rates then down sample to 48khz 24bit to edit & mix. Some major studios do all their editing and mixing work in 96khz/32bit floating. Then it will be down sampled again after mastering. Again we can dismiss what independents do because they don’t do anything in any standardized format. CD quality is not the most accurate, lossless standard available. 🤦♂️🤣 An original recording made in a 96khz/32bit wav file is more a more accurate representation of the analog signal. If there are more samples w greater bit depth it MUST be more accurate than a lower sample rate and bit depth. Just because you cannot discern a difference in every piece of music you hear doesn’t mean there is no difference or that there is no difference which affects the experience. Just to be clear I don’t think CD quality is bad just that it’s not without flaws either. Upsampling won’t increase fidelity in anyway but a higher sampled recording is higher fidelity
@@arsenicjones9125 So you have proof that the source material for making vinyls ins more than 44,1kHz? Sure they edit and master at higher bitrates, but the end result is mostly 44,1/16Bit sampled. This change probably with HiRes for the customers slowly, but its a known fact that 44,1kHz were used for vinyls FOR DECADES at least.
The problem with the 1st group mentioned (44.1 vs 48 etc) reminded me of "Complex Problems have simple, easy to understand, wrong answers." The same is true for Flat Earthers, young earth creationists etc. They have a very simple solution that seems to work because the [majority of the] people they are talking to don't understand the complexities. The problem Group 3, the Audio Engineers, have is the majority don't understand the solution as presented mathematically and say "that is just your opinion!" and no more important than their opinion.... You see a lot of this these days. It is Great to have videos like this one that go far enough to explain simply the problem for the majority without going of in to deep (group 3) Audio Engineer Geek speak of MSc maths.
4:50 a tiny bit of correction on this part. If you actually activate the "stats for nerds" option, you would see that TH-cam actually uses a much newer audio compression format called Opus, developed by the same xiph foundation that Monty himself works for. And what's interesting about this audio codec is that the developers have decided to restrict the sampling frequency to 48 kHz (44.1 kHz sources get upsampled upon conversion, hi-res sources get downsampled and 48 kHz sources are essentially no-op and passes through). The reason for this is exactly the same reason you mentioned a few seconds ago, the math is just easier that way. You will only get 44.1 kHz if for whatever reason, your device requests TH-cam to fall back on to the old AAC or Vorbis codecs for compatibility reasons which will almost never happen especially if you're watching from a web browser or using an Android phone. But considering that Opus is still a lossy format, it's still gonna cut off any frequency above 20 kHz anyways.
There's a lot that gets said about "TH-cam compression" and how it affects audio. Generally, the degree to which it affects the sound of any given audio demo is nearly moot. These days, few of us are hearing _anything_ that hasn't already passed through a perceptual audio encoder of some sort (MP3, AAC, Bluetooth audio codecs, Netflix / Hulu / YT, and so on...) and nearly all of those codecs are going to brick-wall filter the highest of the high frequencies to avoid wasting data bandwidth on stuff only our pets will hear anyway. The exception to this rule is the rare fabricated audio example like in this video, which uses a signal that is rarely something you'll encounter in a typical audio presentation of any sort. Yep. Those are affected by compression. Sure enough. But most of the time, when somebody is comparing a direct feed of a source audio file with one picked up through a lavalier microphone from sound being played through a 3" cube smart speaker, and then says "you won't get the full impact of this because of TH-cam audio compression", I just roll my eyes. haha I _think_ that 128kbps Ogg stream can adequately capture the sonic differences you were trying to convey, don't you worry about that.
don't underestimate the degree to which lossy compression might actually be doing a better job of preserving the signal than you think - eg, check out dan worrall "wtf is dither"; it's a long video and I don't remember exactly where in the video he does it, but somewhere in the middle he compares mp3 to 16bit wav in a situation where the mp3 *unequivocally destroys wav* in terms of which one represented the data better. wav was more lossy than mp3. That's because quantizing to 16bit integer naively actually introduces more noise than mp3 compression, if your signal is simple enough. it's all about what bitrate mp3 or ogg needs in order to near-losslessly compress a given section; and ogg vorbis is based on wavelets, not discrete cosine, which was why ogg vorbis can handle certain kinds of phasing sounds much better than mp3. so - yeah, as long as you're in a high enough quality mode that the bitrate compression is in the -100db range, you'll probably be able to hear whatever -70db effect they're trying to show. it's only when you turn down to 240p and your mp3 noise is -10db that we have a serious problem from audio compression. now, video on the other hand... :D
In my experience watching a movie/TV show on Netflix and watching it on Bluray is usually night/day difference. Its not so much that you obviously lose highs, you seem to lose dynamic range, it sounds flat and dull. Of course its not always enough to spoil the experience, but sometimes it definitely is. Same with the picture quality.
@@jhoughjr1 Music videos strangely are often the worst offenders, whereas some youtubers use music and it sounds fine. I'm very sensitive to lossy codecs too. Hated Bluetooth audio until LDAC and Samsung Scalable came along.
@@jhoughjr1 i dont know super exactly what youre taking about. but ive heard youtube uses aac codec. Imho for certain bassheavy generes, youtube is miserable. bass just doesent translate well on it. guitars are ok, but i still preffere mp3s of mine.. Apple music also uses aac i heard. But i found it bit better, dont know if its a specialized aac version they use. Other than that i seen a test video that compared waveforms to lok for normal compression (audioplugin compression) , and nothing was found.
Excellent and interesting video .. I would to like one term here, that is 'oversampling' .. When digitising an analog waveform it is quite normal to have a relatively tame analog filter, but run the sampling at a much higher frequency than the output requires, 8x or 16x oversampling is common. The next step is to have digital filter operating on this high frequency sampled signal and then downsample to the required frequency eg. 44.1kHz. The 10kHz squarewave has audible undertones because it was simply generated mathematically - there is no oversampling or anti-aliasing going on at all - if the signal was filtered properly before being recorded the 10kHz squarewave and 10kHz sinewave would, of course, sound exactly the same (since the next harmonic is not captured).
An exceptional video, sir, especially for going the extra mile and looking into TH-cam's own codec shenanigans with your own examples. I regret to say, I didn't hear much difference in the 7kHz files, but considering I'm getting older and adults lose top end in their hearing range over time, I'm not surprised anyways. (I can barely hear CRT yoke noise anymore, which I definitely could as a kid) Aside from pure monophonic sound, I think higher sampling rates have a dedicated purpose when doing any kind of stereo/surround real-time recording, or doing any audio processing involving pitch/duration manipulation. In the first case, human hearing becoming more sensitive to phase differences between ears as frequency increases, such differences in phase and arrival time contribute to our sense of a physical space the audio is occurring in. (Worth noting here that the Nyquist-Shannon sampling theorem assumes a linear and time-invariant process, where it doesn't matter how much or how little the signal is delayed from any arbitrary start point-human hearing, however, is definitely NOT a time-invariant process) When dealing with sampled audio, at higher frequencies, the number of discrete phases a wave can take drops off considerably: assuming a wave at exactly half the sampling frequency, you can have it however loud you want (within the limits of bit depth), but you can only have two phases of the signal (0° and 180°). One octave down, you only have 4 available phases (0, 90, 180, 270), and so on. This might contribute to the sense of "sterility" and "coldness" associated with older digital recordings that didn't take this into account. So if you're mixing audio that relies heavily on original recordings of live, reverberant spaces (drum kit distant-miked in a big room, on-set XY pair, etc.), it's an advantage to get the highest sample rate you can afford when recording/mixing, then downsample your audio for mastering/publishing, if needed. This way, you can preserve as much detail as possible, and give your audio the best shot at being considered realistic. In the second case, having extra audio samples helps when you want to pitch audio up/down or time compress/stretch. Since some of the algorithms for doing these techniques involves deletion of arbitrary samples or otherwise bring normally inaudible frequencies into human hearing range, having that extra information can also be a benefit for cleaner processing, depending on your artistic intent.
That's not entirely true, actually. The Xiph video mentioned in the content here covers the waveform phase topic as well. The reconstruction filter post-DAC is basically turning discrete samples into a Bezier curve. Just sliding the sampled points around on the X/Y axis (if X is sample, and Y is word value -- i.e., the amplitude of an individual sample) will alter the resulting wave's phase. Another way to think of this is to imagine using a strobe light to capture an object moving in a circle. If the speed of the object rotating about the circumference was perfectly aligned with the flashing frequency such that there are exactly two flashes per revolution, it would look like the object is appearing in one spot, then another spot 180 degrees from the first, and repeating indefinitely. This is basically the Nyquist frequency. From that, you could construct a perfect circle because you have the diameter. So now, imagine altering the "phase" of that object so that the strobed captures place those objects at different places around that circumference. You can still construct a perfect circle. Same with audio samples. It doesn't matter if the phase changes. As the Xiph video says (I'm paraphrasing because it has been a while since I watched it), there is one and only one solution to the waveform created by a series of samples, _provided that the input waveform and output waveform have both been band-limited to below the Nyquist frequency._
@@nickwallette6201 Well, yes, for any arbitrary signal, you can still reconstruct it with sampling, but I was mostly thinking psychoacoustically, where delay and phase variations between ears plays such a big deal in stereo sound. And one of the side effects of sampling is that you get phase constraints, like I described above. For example, with a signal at half the Nyquist frequency, how do you distinguish between a full-amplitude sine wave and a cosine of -3dB intensity, when they both share the exact same sample representation (alternating between .707 and -.707)? Since that phase information can spell the difference between a centered (in-phase) or diffused (out-of-phase) stereo sound space, preserving phase and delay information is super important, and with finite sample intervals, there's only so many phase states you can have at high frequencies. I also acknowledge, however, that bandlimiting filters induce their own phase delays as well, which can have a significant effect on the perceived audio-hence one of the other advantages of higher sample rate is to relax the requirements of bandlimiting and reconstruction filters to minimize their coloration of the audio.
@@eddievhfan1984 With two samples per cycle, you can reconstruct a waveform with any phase you want. You could indeed have phase and anti-phase waveforms at 20kHz with a 44kHz sample rate. Try it. Use an audio editor to create 20kHz sine, then invert the phase. Zoom in to the sample level and look at the waveform it draws. This is a representation of what the reconstruction filter does. I think it would be an academic exercise though, as 1) who's going to be able to determine relative phase between channels at the theoretical threshold of human hearing?, and 2) that's going to be in the knee of the low-pass filter curve, where any passive components on the output are going to affect the signal. It would not be unlikely to have a mismatch between L and R channels. High-end stuff might try to match capacitors to 1% or so, but there's plenty of gear out there (even respectable gear) that uses electrolytics rated at +/-20%. There's a lot of concern over perfection that is not at all practically relevant.
So here’s the thing, TH-cam does support 48 kHz audio, and it does support higher frequencies than 16 kHz... sometimes. Every time you upload a video to TH-cam, the encoder creates about 6 different versions of the audio with different codecs, sample rates, bitrates, etc. On playback, it will automatically choose the audio based on your network, decoding capabilities, etc. Just because the video was ruined after you checked the download, that doesn’t mean it would have been ruined for all listeners. Really it’s TH-cam’s technical inconsistency you have to worry about (I think that might also be true for your video about cutting the video 1 frame early) TLDR; Your description of TH-cam’s capabilities wasn’t strictly true, but you were still right to cater to the worst case scenario. Very interesting video!
19:07 "I just want to cover some interesting notes" Clever ... John, Thanks for sending me down the rabbit hole. It took me 5 days to finish your video. Your instruction is always good, because of the practical examples you provide. Your videos inspire conversations outside of TH-cam and outside of film making. Thanks for that too. edit: sorry wrong time stamp, could not find original ...
You switched it up between A and B lmao. Interestingly, the frequency of the harmonic you used is really close to NTSC horizontal refresh rate (15734Hz), which a CRT’s flyback makes audible as it deflects the electron gun left to right and back. I’m 41 and so far I’ve always been able to hear 15kHz flyback
39 and oh gods do I NOT miss working on TVs and that wretched noise. I can only imagine how horrific that nose must be to cats and dogs. We practically used to torture our pets with those damnable things.
I remember as a kid freaking want to smash all school tv’s what a trash they let us watch in the first place and then the fucking beep, will hear even now I think, I did run out the classroom sometimes and told the teacher to blast herself with this earpiercing beep! She was like: what beep!? Bitch.. the older the crt, the more chance you may use it to deflect vermin out of your garden..
I think it is interesting how many people rag on CD quality, CD sound pretty good and I think most people have a colored memory of it. It is the same thing that Techmoan talks about in his video about cassettes most people where not listening on quality equipment and I know for my generation we most used CDs that we burned which had mp3s which are lower quality then CD audio. Spotify only recently got "CD quality" audio but people don't complain about there quality.
My earliest memories from the early 90's regarding CDs is that, a) they sounded really, really good, and b) my mom will get REALLY mad if we play with her discs (they were expensive)! My dad had a Panasonic component stereo setup, nothing high-end or audiophile grade but it was half-decent at least. He had some Type-II cassettes too which sounded really good on that player. By the mid to late 90's CDs were starting to replace cassettes as the on-the-go medium for portable players, boomboxes, and car audio, which tended to sound bad to start with, but no matter how good your system is all of these are frankly crappy listening environments. Whereas vinyl was never a portable medium so even now if you had a vinyl player you'd probably have it in a dedicated listening room at the very least.
@@peteblazar5515 the components of a square wave are the sum of infinite _odd_ harmonics. So the first harmonic is 3x the fundamental frequency, the next is 5x, and then 7x, etc.
I'm a professor of Electrical and Computer Engineering at Georgia Tech, and have taught courses in signal processing for 20 years. Besides an excellent tutorial by Dan Worrall, this is the only video on the topic I've seen on TH-cam that doesn't make me cringe. In fact, your video is superb. :)
Love the deliberate error 🥳 also thought my hearing was failing with the sine sweep until you pointed out TH-cam hard cuts at 16khz. I'm one of those weirdos in their 40s who can still hear when shopping malls have a mosquito device... Or could during the before times at least .. haven't been to a mall in 2 years
@@MyRackley Hmm, sadly i know mine doesn't at 65, but then i've played in too many bands with overloud guitarists, and in one case, a drummer who overhit his cymbals all the time, where we rehearsed in a small room. Still have a low level of tinnitus in my right ear, but luckily it's not really noticeable unless things are really quiet, and i guess i've become quite good (or at least my brain has!) at filtering it out of consciousness!
My electrical communications systems prof literally just covered the sampling theorem in class today, and by chance I saw this on my recommended. This video is an EXCELLENT demonstration of aliasing. Thanks so much for making this. BTW: I can totally hear the difference between A and B on YT, but I can't tell the difference on the 7kHz one. But that could be my Bluetooth headphones. I'll edit this comment when I get home and try my corded headphones/speakers.
So, after you showed the example at 4:40, my first thought was, "well, what if you instead choose a frequency that exactly divides the sampling rate?". So I opened up audacity, made sure both my audio device and the project were set to 48KHz, and tried generating a 12KHz tone - in that case, a square wave sounds just like a sine, but slightly louder. It's easy to make sense of it if you think about it in terms of generated samples - you just get two high ones followed by two low ones, and that pattern repeats *exactly* at a rate of 12KHz. If you choose a frequency that doesn't cleanly divide your sampling rate, you have to resort to an approximation - some runs of high/low samples will be longer, some shorter, so that over a longer period, they average out to the frequency that you're trying to achieve. But in that case, you're essentially creating a longer pattern of samples that takes more time before it repeats, which creates a bunch of other spurious (aliased) frequencies in your signal. I think the real takeaway here is that mathematically ideal square waves are awkward and don't work out that great in reality. Sines are way nicer.
You choose a special case which is square wave with the frequency of the sample rate divided by four! There's two ways to think about that. Either the mathematical sum as you described or as a visual graph. Only one sinusoidal wave can fit the given samples... Instead of the sample defining the top of the square wave, it defines each side of the crest and trough of a sine wave with greater amplitude!
Another great video. One thing about your sine/square test, you can simulate what would happen in a real-world situation by generating your waves at a sample rate like 3,072KHz (64x48K) and convert to 48KHz to listen to it. That's because all modern ADCs sample at at least 64fs, often 128 or 256fs, filter out everything above 20KHz, then down-sample to your capture rate. Another experiment I ran a few years ago was record a series of sweep tones to my blackface ADAT, which allows the sample rate to be continuously varied from about 40KHz to 53KHz. At 53KHz, aliasing is *almost* eliminated where it's quite audible at 40KHz. Yes, those converters are out of date, but it's still a valuable learning tool. That said, I'm a huge proponent of 96KHz in digital mixers, where the ADCs are working low-latency mode. At 48Khz, an unacceptable amount of aliasing is allowed to keep latency through the mixer below, say 1ms (not a problem in analogue mixers). At 96Khz, the converters can run in low-latency mode and have no audible aliasing. When I'm working in the box on material that was captured by dedicated recording devices (latency is not an issue), 48KHz is fine.
As a mixing engineer for over a decade I'm glad to see you got this right. I'm also glad that at over 50 years old I can still hear the difference between waves A and B. And for the vast majority of people listening to audio on crappy playback systems it doesn't matter one bit.
Double blind tests of Redbook 16 bit 44.1kHz digital audio vs. high res 24bit, 96kHz digital audio, played for average listeners, audiophiles, and high res audio 'experts'....all couldn't accurately pick out the the high res files. The average listeners had a 50/50 probability, while the rest of the audiophile/experts scored even lower! As an EE, and music lover, I've always stressed the importance of the master recording being the great deciding factor on the quality. Quality in, quality out. No amount of oversampling, upscaling, or bit rate will improve a crappy initial master source.
The Fourier Transform tells you how loud each sine wave in your signal is - a spectrogram, if you plot it. It also can tell you the phase, so all 3 parameters - frequency, amplitude, and phase - of a sine wave are covered. The Inverse Fourier Transform puts all those sine waves back together. In computers we use Discrete Fourier Transforms, and usually a "fast" implementation known as an FFT for "Fast Fourier Transform." (Which BTW is one of the top 3-5 hacks in all of computer science.)
Project and storage samplerate at 48k with each processing stage using oversampling has been proven to be optimal. You have to increase project sample rate to 384kHz to get the same. The trick is in the oversampling, allowing for wider bandwidth while processing to reduce artifacts and then filtering the unnecessary frequencies out keeps it cleaner. 48k is not enough for some signal processing, while it is plenty for other. A gain change can be done in 48k but compressing, anything that modifies the phase or time domain in anyway has to be oversampled to decrease overall antialiasing. The strangest thing is that despite having additional filtering stage at each processing block (for ex, each plugin in a project) and converting back and forth, it is less CPU intensive. Higher samplerates by far most of the time run "empty" signal, the entire bandwidth is processed at each stage while oversampling is not needed for linear operations. This is not very known thing, which is a bit odd in my opinion. You can test this at any point, device antialiasing stress tests and compare 192k project rate to same processing done in 48k base and oversampling. The latter has less artifacts.
When you first brought up harmonics and square waves, I thought about posting a correction cause it sounded like you were about to make a big mistake by ignoring band limiting filtering, but I watched the rest of the video…and you handled it all. Well done, including your edit post TH-cam processing. Yes, I did hear a tiny difference between your 5.2khz sine wave and the 5.2/15.6khz additive construction square wave synthesis. I do have exceptionally good high frequency hearing for a 55yr old, however, it’s also important to note that music is never a pure sine wave, nor a square wave, so you would never hear even the tiny (barely noticeable even to excellent hearing and only because it was a pure note of extended duration) differences I heard in an actual piece of music. The important part, as other have pointed out as that your waveform must have and appropriate low pass filter applied. That could be a 20khz analog filter, with sampling at 48khz or higher, or 20-24khz filter before 57.6khz, or 20-25khz before 60’hz, or a 20-35khz analog filter and sampling at 88.2 or higher sampling. And it’s always good to lower the noise floor by recording at 20 or 24 bit depth. Do all your editing and mixing at something above 48khz and above 20bot depth, then master for 44.1/48 16/18/20 bit sure, you can master for 24bit depth, but no one will actually be able to tell the difference.
Hey John, I’ve been doing digital signal processing since 1980, 41 years, including spatial digital signals. Nyquist can be grasped with knowing one concept: that sampling at the Nyquist frequency there is no phase information. Phase information is restored as the sample rate is increased above Nyquist. To differentiate a square wave from a sine wave, both still have to be faithfully reproduced, including the phase information. At 10 KHz, a 44.1kHz sample rate only produces 4 samples her sine wave, partially preserving the phase of the signal. Since a square wave is made up of more than one frequency, the phase information becomes important, as it affects the sound not just the amplitude of the sound. 44.1 kHz works because most of what we listen to is under 8kHz. If you want to preserve phase up to 15kHz, really should sample above 60kHz. Now, if you are listening to stereo, you really want to preserve more phase information, so makes even more sense to go 60kHz or higher. Even though to me 44.1 kHz seems fine enough for me. I always wanted to make a spatial audio standard that recorded phase information as well as sampling information, a transformation rather than sledgehammer sampling. This has been done commercially outside the audio industry for over 35 years.
You are totally ignoring the sound reproduction equipment's role in this. Sure at 10 KHz, a 44.1 kHz sample only produces 4 samples. So? The signal recreated by the DAC sent to the vibrating membrane or paper cone of your headphones or speakers while reconstructing this 4 sample pulse of 1 second, it's plenty. 60 kHz may be useful during mastering of the original, but at the consumer level, we don't benefit from it with proper noise cancelling and anti-aliasing applied.
Another thing to consider is that at exactly the Nyquist limit, the signal contains no information whatsoever on the phase of the signal, so if you had a 90 degree phase shift between the left and right channel (or multiple channels in a multi track recording), that information would not register correctly in the audio samples. This may not be so important when listening to the audio as our hearing is not so sensitive to the phase of such short wavelengths, but if you start to do addition of the channels or other signal processing where the different channels interact, the same signals oversampled vs sampled at the Nyquist limit can produce a different sounding result, even after the result has been downsampled back to the Nyquist limit.
Nyquist will accurately reproduce the sound, if you THEN add extra modifications on that then that in no way implies anything about nyquist not being 100% correct.
@@TurboBaldur Yes, of course, but that in no way has any effect on what us humans can actually hear, and there the 44kHz 16 bit is enough. if the mastering of the audio is done poorly that is not the fault of the medium not does it make Nyquist any less correct.
@@ABaumstumpf exactly, if the sampling is being done for playback to a human only then 44.1k is fine. But if you plan to edit the audio it makes sense to get more samples, even if the final export is to 44.1k
This is a great point, and I believe it may be why many digital recordings made in the early 90s sound "flat" compared to late-generation analog recordings. Too many engineers just relied blindly on the digital technology without thinking of consequences like this. Nowadays of course studios work with much higher bitrates and bit-depths for processing and mastering before producing the 44.1kHz or 48kHz files for release.
Hey John, this is great seeing you do some new technical and concise teaching videos. Your work is so helpful for anyone digging in a bit in the subjects you tackle, so thank you for that!
Aliasing is pretty much a non issue when going though a modern codec. The generated square wave example was not filtered, as it would be on any DAC. If you recorded that wave and then displayed it, it would sound the same but not look square anymore, but look like 2 sines mixed together. Codecs sample at a much higher rate (>1Mhz) with fewer bits of resolution then down sample using a CIC filter and multiple half band filters. Through the magic of poly phase filtering, an 18th order elliptical halfband filter is only 4 multiplies to drop the rate by 2 with a very steep cutoff. You chain multiple half bands together, maybe a 3 or 5 phase if needed, to drop down to 44.1 or 48K rate. Its pretty easy to knock out any audible aliasing with a chain of tuned 18th order filters.
A truly great video about this complex subject with an appropriate amount of humor concerning the state of the commenting on TH-cam in these times. Thank you for your efforts, they are well appreciated.
Great video, thanks! FWIW (and that’s not much) at 5:00 you say that TH-cam samples everything to 44.1. But actually, TH-cam uses the opus codec for the audio channels of videos, and that format is locked to 48. I think a few older vids might also have ogg or m4a which may be in 44.1, but “most” are sent in 48. It’s certainly not substantive for the point you’re making, more just trivia. Thanks!
Aha. Interesting. Using youtube-dl, here are the streams available for your video (limited to audio): 249 webm audio only tiny 52k , webm_dash container, opus @ 52k (48000Hz), 7.13MiB 250 webm audio only tiny 61k , webm_dash container, opus @ 61k (48000Hz), 8.37MiB 251 webm audio only tiny 108k , webm_dash container, opus @108k (48000Hz), 14.81MiB 140 m4a audio only tiny 129k , m4a_dash container, mp4a.40.2@129k (44100Hz), 17.71MiB I'm on a mac here (but not an iOS device); in Firefox, the youtube web app uses stream #251 (as visible in the "stats for nerds" right-click; in Safari it uses #140, so you are indeed correct! Again, thanks for the excellent video.
This is so cool. As a former TV audio mixer, this just rocs. And, by the way, the square wave sweep reminded me of some unknown 60s era Saul Bass movie credit animation.
Is anyone else not able to hear the 10kHz sine wave at all, and the 7kHz sine wave only barely? I really hope it's something in my hardware configuration, rathen than me having lost that much hearing. 😢 (FWIW, I'm on a Framework laptop on Ubuntu GNU/Linux... could probably go into more details on what audio system, but don't know off-hand.) Edit: P.S. In the sweep, the audio cuts out for me at about 7:46, so whatever frequency that is.
Interesting! I never thought about *not* having a low-pass filter (to cut out higher frequencies) in front of an AD converter - because it would sound really, really ugly! (There are some tricks to get away with weak analog filters, but they involve oversampling and digital filtering, aka signal processing.) As an engineer it was always clear that you would need this high-cut filter. And on your 5.2 kHz demonstration - I can only hear the switching itself. There's a discontinuity in many switching events, but when the switching was continuous (on crossing the zero-line I'd guess) I couldn't hear it at all. Yes, my hearing is already that bad (but nearing 60 this is quite normal). Where it does make sense to use higher sampling rates (and 24 bit) is in audio processing, because higher "resolution" (in amplitude and time) makes it easier to manipulate signals. Same as in image processing: It makes perfect sense to use 16 bit per channel (or even 32 bit float) images in high resolution when doing advanced image editing, but the end result could be distributed in much lower resolution with just 8 bpc (this is common practice); yes, there's still a chance that you run into issues with color management, but there are ways to deal with that on the "output" side.
I'm 47, suffer from tinnitus and use $25 wireless Logitech headphones but even I could hear the difference between the two 5.2kHz samples. The aliased one sounds 'dirty' to me. Not sure what this proves though.
Yup, for audio processing it always makes sense to use float. You get * A higher dynamic range (145 dB vs 96 dB), which gives you more headroom before clipping * Simpler (and possibly faster on anything newer than a Pentium II) code when working with normalized range -1 to 1 For image editing, it depends on your purpose, but VFX requires the higher dynamic range of 16 or 32 bits per channel. Editing for a website or printer may work with less headroom.
It is a similar story with image resolution where people claim that a 4K TV is way better than their old 1080p TV - but the difference was not really due to resolution but size. You need a rather large screen at a close distance for any visual difference between 1080p and 4K, and now with 8K.... you need like 60" monitor at 1m distance for there to be any visual difference. 44 kHz 16 bit is enough for humans - for us that can be called "perfect". There has not been a single human that has ever shown to be able to accurately hear anything above 21kHz. For the bit-depth - kinda debatable as without noise-shaping, dithering or anything like that this is "only" ~96 dB SNR - so from the faintest sound perceivable (you'd need to be literally dead to not have the sound of blood flowing through your veins) up to soundlevels that cause permanent hearing-damage with just half an hour of exposure per day. You could literally have an audio-track with the drop of a needle and being on a busy road - and both things would be fully captured. Doing ANYTHING but listening to the audio is a different beast. Just image taking a photo with a resolution just high enough that it looks perfect to you (doesn't even matter what actual size/resolution) - ok. Now take the same image and stretch it to say 5 times the size - oh, it suddenly is no longer perfect. When you want to manipulate any data, be it image, sound, or anything else - you end up introducing distortions and losing some precision, so you'd better make sure that the initial data you got is way more than you actually want to deliver at the end, and do all your manipulations with as much USEFUL data as possible. With audio that often means capturing >20bits of depth at 96 kHz - which allows you to squeeze and stretch the sound a lot before any unwanted distortions become audible. Useful as in like this video is showing the problem of aliasing.You do NOT want that in your data so you better just use >96kHz during manipulation and then filter all the high-frequency stuff out before it ends up getting folded into the audible range. Cause once it is there you are not getting ridd of that anymore.
I do know that MP3 compression cuts out at 16khz because of the way the standard was designed. Also, I think some devices start to roll off frequencies in the last octave or so, so even if you have speakers and ears that can reproduce and perceive those frequencies, your hardware may be reducing their amplitude.
Not all MP3 encoders cut at 16KHz though . LAME Encoder does not beyond a certain bitrate. And anyway, TH-cam does not use MP3 compression. It uses either AAC or Opus.
48kHz is enough for playback, usually the steepness of the LP filter to bandwith is 45 to 55% of the sample output, it gives almost non-existent phase errors and ripple within maximum audible range. 96kHz can provide benefits when it comes to pitch shifting of high frequency information and lower latency. Softer filters can also be used with 96kHz and possibly less ringing and phaseshifts, but it is rare for different applied filter to be used between different sampling frequencies, in addition, higher sampling frequencies often results in an extra component-instability. Basically all DACs and ADCs use delta sigma modulation with multiple bits (often 2-6bits). This involves a sampling frequency of several MHz, but they utilize another more effective type of modulation for the purpose, this modulation arise from a sawtooth that follow the analogue tone frequency which provides a pulse density/width that is digitized with 1-bit for a bitstream, partly and continuously analog for a certain period and compared to the analog input signal with differential circuits which results in different high frequency pulses designed to add or remove the energy in certain frequency bands, the distorsion energy is in this way increased in higher frequency bands and decreased in lower frequency bands, which is continued until the noise is satisfactorily reduced within the desired frequency band, this is done in several steps by several circuits, divided by amplitude for more effective noise shaping while maintaining stability, after this the process of demodulation and decimation takes part from several 1bit PDM bitstreams divided by amplitude to one 24bit PCM, with applied digital filters and downsampling.
Nice video. There's an interesting tidbit that I have noticed with the whole 44.1 v 48.8 - you need to be consistent even though it doesn't matter. If you playback a 44.1 file inside a 48.8 project (or vice versa) you get pitch drift phenomena. This is why consistency is key even though sampling rate doesn't matter. The real key is "mastering for your platform" as it were. Understanding the playback limitations of youtube and making sure you sort your audio for playback. Tis redundant to do all your audio at 88/-8dB if youtube is going to down sample to 44.1./ -15dB.
What I do and it works flawlessly is use 48 k 16 bit with a limiter ceiling of -0.1 dB, with a compression of 2-3dB and I do not get any compression on my tracks and its actually below the loudness maximum. It seems like it actually counteracts the effects from yt
I’ve been recording and mixing for years, and the only time sample rate matters is on the recording. 24-bit audio @ 192KHz is indistinguishable from analog tape, and if you can record your audio at that sample rate, that will give you the option to master it for any format you want, with the least amount of degradation to the sound. For folks that understand how film and video work, it’s similar to folks that are shooting video in 4k if they plan on making 1080p content or in 8k if they’re planning to release something in 4k, because even though they never plan to release anything at that higher resolution, it gives them more options for cropping the footage and doing other stuff that you wouldn’t be able to if you shot video at the intended output resolution of the finished product. Applying high, low or bandpass filtering to audio is essentially the same as cropping an image, and the more detail you have to crop, the better it’s going to look or sound. Just think about an image, if it’s the size of the file you want the final output to be, and you decide to trim off the edges to reframe the photo, and then if you increase the image size so it matches the output resolution you started with, then you’re gonna be looking at something larger and far less detailed and blurry than you would have if the image had started out at a much higher resolution. I will be the first to admit my recordings are all at 44.1KHz or 48Khz, but that’s because I couldn’t afford the hardware (or it didn’t exist when I made the recordings) so the end results that I got with those mixes never sounded as clear or crisp as the stuff you hear that’s been stamped with the official “Mastered for iTunes” label. Another interesting topic I think that builds on this lesson would be to discuss the process of dithering when mastering audio. Some folks might be surprised to find out that the best sounding digital masters deliberately introduce white noise into the file as part of the mastering process, especially when downsampling from something like 192KHz audio to 44.1KHz.
@@FilmmakerIQ analog tape has way more dynamic range and headroom than 16-bit audio at 44.1KHz. That's why everyone was still recording to analog tape, well after the CD, DAT and other forms of digital audio were invented. Believe me, they didn't do it because it was easier or saved money. Maintaining an analog recording studio with massive tape reels was an expensive and fiddly endeavor, so anyone running a studio back in the day would've jumped on the latest technology if it would've simplified that process. It wasn't until everyone eventually converted to digital recordings in the 2000's, when sample rates and quality of studio gear were high enough to record 24-bit audio, at sample rates well above 44.1KHz. You don't have to like my analogy, because it's not exactly perfect, but people know more about editing photos and videos these days than they do about audio--and they just need something they can wrap their heads around, to know why people choose to record at higher sample rates than what we hear as the finished product. However, my explanation and analogy are not wrong--let alone completely wrong. I not only studied digital audio in college, I also worked in radio, and even helped teach a class on digital audio production. The professor wasn't the most skilled at recording and editing, because he came up in the analog era, and just used the computer like a tape deck and did everything old school. So, I helped him teach students one-on-one, how to actually use a DAW in one of the studios, so that they could record their assignments. I still record, mix and produce music for myself and others in my spare time, so I might not be a TH-camr but I know what I'm talking about, and I'm not sure you know what I'm talking about, because if you did, you wouldn't call me "wrong" and use that as the catalyst for making a video to correct me. I have no idea what your credentials are or experience in this field is, but I got the impression that you're someone who has some technical understanding, and just learned all of this shit in the process of making your video, and you really don't have more than a decade of actual knowledge. It's funny, because this video was actually lacking some pretty basic information about the topic. You didn't even explain why someone would want to record anything at 44.1KHz, when there are much higher sample rates. You brought up using 48KHz as the sample rate, but didn't explain where that comes from. I think your viewers are even more ignorant than you on the subject, and might not know that CDs happen to use 16-bit @ 44.1KHz, and that DVD audio uses 48KHz. For anyone else reading this that actually cares to learn something, CD's compromised on the sound quality, because they couldn't make players that played back compressed audio without making them super expensive, and that was the highest quality sound they could use and still fit an entire symphony onto a single disc. (Audiophiles are historically fans of classical music, and when you're launching a new music format that's only going to be be affordable to the wealthy and/or those with "discerning taste", you kinda want to make sure you can cater to them a bit. It was a huge selling point for anyone sick of flipping albums to hear the second half of the performance, and I'm sure that without the support of those snooty weridos, CDs might never have taken off.) DVD's used 48KHz because it was the base sample rate used by DAT, which was one of the original digital recording formats, and because it was what people were using in studios, it got adopted by Mpeg2, DVD and digital broadcast formats. It only sounds slightly better, and it's almost imperceptible if someone uses proper dithering when creating the final audio file. It was simply a matter of compatibility with existing pro-audio equipment, which also supported higher sample rates like 96KHz. Good studios would record at the higher sample rate, and then downsample their work for the finished product. DVD-A used 24-bit audio @ 48KHz, because they were purely an audio experience, so they could use up more of the space on the disc for higher quality sound. Newer formats like BD (and the new dead HD) DVD used 96KHz, again, because of the larger amount of space available. Which is still really good sounding, but it's still only half the sample rate of the highest quality digital recordings, which is 24-bit @ 192KHz. There may eventually come a time when there's equipment that can capture audio at a higher sample rate, but even the obnoxious audiophile community that would typically support anything that's higher quality, just for the sake of it being measurably better (even if it wasn't perceptibly better) hasn't been pushing for anything higher. Turns out, even they can't tell the difference between 24-bit audio @ 192KHz, when compared to a super clean analog recording, from a well maintained deck with Dolby noise reduction. If you don't overdrive the tape, or have it distort in the upper frequencies, and you play it back on equipment that doesn't have any ground hum, it sounds fucking amazing--and so does 24-bit audio @ 192KHz, which I guarantee you've never heard in your life. Unless you're in a legit recording studio with high end gear to hear the difference, you can't tell. You can absolutely hear the difference between analog tape and the much lower quality audio used by CDs, because the dynamic range is reduced to 96 dB (which is a non-trivial 48 dB less than 24-bit audio) and more importantly, it's less than the 110 dB range of analog tape when recorded using a Dolby SR noise reduction system. 32-bit audio hasn't really taken off, because 24-bit audio already overkill with a wide dynamic range of 144 dB, which is already higher than the theoretical dynamic range of human hearing, which taps out at 140 dB--so 192 dB is just needlessly wasting storage space. That said, 16-bit audio with proper noise shaped dithering can have a perceived dynamic range of 120 dB, but again pure analog tape also has an effectively infinite sample rate, so that combined with the actually greater dynamic makes it sounds better than CD audio. Honestly, I'm not even sure what the point of your video even was, because TH-cam isn't the platform capable of even showing the subtle differences between audio using sample rates of 44.1KHz and 48KHz, especially when TH-cam already filters out everything over 15KHz. You may not be able to hear sounds over 15KHz, but I still can, and at this point if your hearing is already damaged enough to the point you can't even hear a sine wave between 15-20KHz, then you're clearly not the guy who should even care, because those sounds are for you, and I would agree that you shouldn't invest in anything better than CD audio, because it's completely lost on you. For those of us that actually understand digital audio, and have fully functional ears that can hear everything from 20Hz to 20KHz, there's plenty of reasons to record or listen to music that's using a higher sample rate and bit depth than CD audio. Of course, that's just a simplified explanation of some of the vast amounts of information your video was lacking, because I didn't even discuss the bit rate of digital audio (mostly because we were discussing uncompressed digital audio, and it's only when compressing audio files that bit rate becomes an issue, because that's where the sound quality gets drastically reduced.) But hey, you're just a guy who doesn't really have a background in this stuff, so I don't expect you to talk shop on the fine points of all this. Those of us who work with this stuff for real actually need to know about how our recording medium actually works, and we have to know how audio works, so that when we're mixing it for your consumption, that it sounds right--so we don't expect laypeople to know how the Fletcher-Munson curve affects our hearing during the process of recording and mixing, or on playback over a sound system of any kind. So, while they title of your video isn't wrong--the work you showed to get to the right answer is, because nobody in the history of the music and recording industry, or tangentially film and television, ever said 44.1KHz was optimal. The reason it's not optimal, is because the low pass filter is still attenuating frequencies within the audible range. So when Harry Nyquist figured all this shit out, he was merely pointing out the bare minimum that audio had to be sampled at to reproduce the full range of human hearing. He wasn't wrong, it's just that there's no perfect low pass filter that exists, capable of attenuating frequencies outside the range of human hearing, without attenuating audible signals. So, even with the best possible filter, you're still going to cut things off well above what we can hear, just to make sure nothing gets cut. In the real world, I typically don't allow my mixes to contain very much above 15KHz, because as you've noted, it's not supported by TH-cam, and most people won't hear that stuff anyway. However, I do allow reverb to contain as much high end content or "air" as we call it in the business, because those are the subtle things your ears will detect and miss if it's unnaturally chopped. It's like bad lighting in a poorly edited photo, or CGI--you have to be an expert to know what you're looking for to see it, but we instinctively know when those subtleties are lost and it will seem wrong or fake. Anyway, good luck with your channel. Hopefully you spend some time learning and doing some research before you go off and make something that's going to confuse or misinform your viewers.
I'm not reading this novel especially when you start with a completely false statement that tape has more dynamic range... There's no point when you're so off base from the start.
@@FilmmakerIQ Maybe if you read what I wrote, you'd actually learn something smart guy. Feel free to look it up. Analog tape recorded with Dolby SR noise reduction, which was the standard in professional studios, had a dynamic range of 110dB, while 16-bit digital audio has a dynamic range of 96dB. I'm not talking about cassette tapes here bud, I'm talking about 1/2 tape used in professional studios to make multi-track recordings. So, please just STOP with your nonsense, because you don't know what the hell you're even talking about. You looked some things up on Wikipedia, and think that you're a professional because you make TH-cam videos. How many professional studios have you been in that actually had 1/2 tape machines? I guarantee you've never even seen a 1/2 inch tape in your life, let alone heard one played back over the studio monitors in a real studio. Clearly, you seem to fancy yourself a "Filmmaker" and not a recording engineer, or producer--so why don't you go make your silly little videos about lenses, or light meters, because you don't know shit about digital audio or recording.
4:51 TH-cam actually converts to 48kHz. The reason is that the developers of Opus audio codec decide to support 48kHz but not 44.1kHz. (They have an FAQ for this.) But if you watch TH-cam on an Apple device, TH-cam will deliver an MP4 format with AAC audio codec, that will be either 44.1kHz or 48kHz.
Well when I download the video from my own TH-cam Studio - it's 44.1 - so I think most everything is delivered at that sample rate and it conforms with everything I've read so far.
@@FilmmakerIQ That's probably because you are downloading it as MP4 format (H.264+AAC). For Chrome / Firefox / Edge streaming, TH-cam defaults to use WebM format (VP9+Opus), which uses 48kHz sample rate.
Thank you for this excellent explanation. I am an audio engineer for a living, for many years I used a digital mixing console (a Panasonic Ramsa WR-DA7) which can operate at both 44.1k and 48k. I was always able to hear the difference between the two even when only recording voiceover, which I've done a lot of. I also have read Lavry's work in the past, when he previously insisted that there was no difference whatsoever between the two sampling rates and no need to ever use above 44.1K, and knew something had to be wrong. I also have used high sample rates, particularly 96k, and agree that they require a LOT of processing power, which translates into a lower track count and fewer native plugins that can be used, which makes those high rates inconvenient at best, at least for now. Coincidentally, it always seemed to me that the best compromise between computing power and the audio problems I was hearing would a sample rate of 64kHz (since in computing we like to use powers of 2 as factors, mostly because it's easy to clock-divide by 2 or 4, etc.). It's interesting that Lavry's proposed sample rate of 60K is very close to my own thoughts, and personally I'm glad to see that he has come around from his prior position that 44.1k was just fine. I also knew that when using wave generation software just like you illustrated in Adobe Audition, when generating a 16K sine wave at 48k sampling rate, the result is a wave with only three data points per cycle: one at zero, one near the peak, and one near the trough - which is of course a 16K TRIANGLE wave, not a sine wave, albeit a someone oblique one. Yes, those overtones are outside the range of hearing, and yet you could hear that something was wrong - it definitely was not a sine wave that was playing back. Aliasing is exactly the problem - there was no anti-aliasing applied to the data generated by Audition or any other similar program, or any anti-aliasing generated by the WR-DA7 that was outputting it and that the computer was digitally connected to - and there still isn't today on most high-end professional equipment. So there's just no question that the VAST majority of digital playback equipment out there simply applies no anti-aliasing filtering at all and never did. To my trained ear, this has been quite annoying indeed. I also remember the very early days of CDs, and the first CD player I bought, a Sony. I didn't like it, because the top end sounded "brittle", which was a common complaint in those days. And in fact it wasn't until CD players introduced "oversampling" that the problem went away - basically moving the aliasing frequencies so they are all hypersonic, by extrapolating and outputting one or three "samples between the samples" caused later generation CD players to sound significantly better. The bottom line is that Nyquist really doesn't handle the concept of aliasing very well, as you aptly point out. And what is needed, particularly for audio production, is a sampling rate that allows all of the alias frequencies to be moved above the 20kHz threshold of hearing. Computing power is a temporary problem, so I have a feeling that in the not too distant future all professional audio production will be done at 96k, even though we don't really need it to be quite that high. Thank you for what I believe settles this issue hopefully for good.
Sorry but three sample points do not produce a sawtooth wave, it's produces a sine wave. You don't connect the dots with straight lines, you draw a sine wave through the dots. A saw tooth wave has integer harmonics, it would need to be constructed with many sine waves which works probably be above Nyquist if the wave is only 3 samples wide. Lastly, I don't think you understand why Lavry suggests 60. He stated in the paper that 44.1 is if not perfect, close to perfect.
@@FilmmakerIQ I think you misunderstood what I said - "triangle", not "sawtooth". And I wasn't referring to an actual triangle wave, I was only referring to the shape created by the three points if you connect them, which isn't exactly what's going to happen in the DAC anyway, because DACs don't transition from one point to the next in any smooth way, they simply jump to the next value. The bottom line is that for a 16kHz sine wave, only three data points are created, and only three data points are going to be output by a DAC. The DAC itself is not going to "draw a sine wave through the dots". It's just going to output stairsteps at three data points and that's it (unless of course we're talking about oversampling, which would instead use spline interpolation or some similar approach to approximate where the additional samples would be. But to my knowledge no production hardware - such as Pro Tools or UAD Apollo etc. - utilizes oversampling on output). For example, if you create a 16kHz 24-bit sine wave at -3.0db, each cycle will have exactly three points - one at zero, one at -4.2 db above zero (sample value 5,143,049) and one at -4.2 db below zero (sample value -5,143,049). The DAC isn't going to transition smoothly between those points, it's simply going to output a zero for 20.83 milliseconds, followed by a sample value of 5,143,049 for 20.83 ms, and then a sample value of -5,143,049 for 20.83 ms. If DACs did indeed "draw a sine wave through the dots", then aliasing wouldn't be a problem, because the DAC itself would be reacting perfectly to the INTENTION of the data - just as analog tape used to do. But the problem is of course, as with many things computer-related, DACs simply don't do that. They just output a voltage corresponding to a number for a specified number of milliseconds as dictated by the sampling rate. It is of course this behavior that causes the alias frequencies to result, as you have very correctly and articulately described. As for Lavry's 60, correct me if I'm wrong, but my understanding is that the advantage here is twofold: 1) it pushes the vast majority of alias frequencies into the supersonic range, making them a non-problem, and 2) it provides more headroom for creating anti-aliasing filters, should a playback hardware developer choose to do so, which sadly, very few ever seem to. My point was merely to essentially agree with Lavry, but I'm suggesting that when taking into account the fact that digital hardware designers prefer to do things in powers of 2, that a better choice for "optimal sampling rate" should be 64kHz specifically. Personally, I wish hardware developers provided that option in addition to 48k and 96k because that's what I would use for production instead of 48k or 96k. It would be quite a good compromise.
That's completely incorrect. Yes the DAC does draw a sine wave because it's coverting it back to analog. The speakers cone is a physical object and it moves through space with inertia, it can just jump to each sample point and hold for the next one. So If you produced three samples you will not get a triangle, you will get a sine wave. Watch Monty's video in my description. Samples are not stair steps, they define the points of a sinusoidal wave. This is the key to Fourier transform and Nyquest theorem. Aliasing has nothing to do with stair steps (because there aren't any stairsteps). Aliasing is the result of frequencies that are higher than the sampling frequency. Your understanding of Lavry's 60 is incorrect as well. It doesn't push alias frequencies in the ultrasonic... You don't push alias frequencies... it provides enough headroom for anti aliasing filters to work without affecting the audible range. Lastly clock speed has zip to do with binary. 64khz is meaningless because time is irrelevant construct. Look at the history of computing you will not see any clock speed correlating with any binary numbers... Because it's been simply not how that works... Also 64khz isn't a binary number. The closest is 2^16 which is 65.636khz
I'm glad someone GETS IT, regarding aliasing. I've had this argument with so many tone-deaf wanna be engineers that do not understand why percussion sampled at 44 Khz sounds like so much white noise but sampled at 192 Khz sounds like percussion instruments.
I once studied electrical engineering with a bit of signal processing but then went into energy (the big kilo volts stuff) and finally computer science. And in computer graphics I was right back in the fourier transform again, because yep.. its exactly the same thing in computer graphics. And while all the theory has been ages ago so I really need a refresher myself, I find this discussion everywhere: This debate of higher sampling rate, completely ignoring aliasing is going on in graphics just as well. Just look at all the "graphics mods" for games that upload huge textures for absolutely everything and then change the engine settings so bigger textures are being sampled for small objects, then wonder why performance goes down the toilet while aliasing artifacts appear and make things look worse instead of better. Its almost as if game and engine developers know about these engineering principles and optimize for them. Like... as if they know what they're doing :D Same goes for mesh level of detail too btw. Rendering a triangulated mesh is nothing but sampling. The sampling rate is your screen resolution. If you make an insanely detailed mesh that will show up small on your screen, you'll get mesh aliasing which will also look crap. People always thing smaller textures, mipmaps, and LODs are only used for performance, and if my PC is kick-ass, I should always load everything at the biggest size (bigger/more is better), completely ignoring signal processing principles and aliasing.
Glad to see you dug in a little more to check out the difference between the theoretical "ideal", and what actually works in practice. There are still, of course, many other variables, but the answer to "which sample rate?" is always "it depends". Jumping back to the last video, my comment was only that I found it interesting that the original concept sample rate being 60K was almost a happy accident of ending up with that ideal range suggested by folks like Dan Lavry. It would likely have radically changed the course of digital audio development as we all know it.
Great video. LOVE the Monty video. It is awesome in it's clarity. 13:10 "If we had an infinite sampling rate ..." Isn't an infinite sampling called 'Analog'? :oP Kinda defeats the purpose of digital which can be MUCH smaller in storage size. 16:50 I could pick out that the frequency was transitioning from sine wave to square wave, but the tone was indistinguishable to my ears. (yes I listened to your linked video) Thank you for the time and effort to produce this video. It is appreciated!
There is a short cross fade. I couldn't get the waves to cut exactly at the same amplitude so it was either a crossfade or a pop on the switch and I chose the cross fade.
Analog is not "infinite sample", your tape has a frequency range based on how fast you run it, normal copper wires will struggle with RF frequency, even the air has a frequency range because it's made of individual molecules. Natural is more "digital" than "analog" in the sense that energy comes in discrete packets because of quantum mechanics.
One time you said the right words "four ninety-three" while the numbers on screen said "439". I was not expecting to hear the difference between 440 and 439!
Wow this video is so interesting!~ I was (thinking) sure it's not just twice the frequency because if i downsampled some audio file to just 22.1 kHz (after checking that the treble was well below 10kHz) to save space on my CDs, it just didn't sound right, almost like sandpaper trebles. Well, now i know, thanks to your helpful explanations. Harmonics do affect the timbre of the sound, even though we can't hear them directly.
Another engineer here. In my laziness I was edging into camp 2. Thank you for showing me the error of my ways, and reminding me of what I knew 40 years ago. Nyquist's sampling theorem is correct, and it assumes a perfectly band-limited signal. You band-limit a wider bandwidth signal using a low-pass (anti-aliasing) filter. Precision analogue filters can be expensive and difficult to create. Further, if you have a sharp transition in the filter, you introduce artefacts which are visible on transients in the signal, and might be audible, although I really don't know. To allow an easy-to-implement gentle roll-off filter without attenuating your wanted signal in the passband, you need a lot of headroom. BTW, to me this is all theoretical. As somebody of retirement ago, with loud tinnitus, 20 KHz sampling rate would be just fine.
Thank you for also talking about and checking the audio uploaded to YT. Years ago, some science documentary on NatGeo or Discovery was being broadcast on TV, explaining how adults can't hear above 16kHz and to test it out with your friendly adult (or parent) nearby. To my shock, i myself couldn't hear the 16kHz wave they were "playing". Not wanting to age so quickly (and good thing i had a computer as well), i generated a 16 kHz sine wave, and i was _so relived_ to know that i could hear it lol. And sadly the TV didn't have a comment section like here to complain. Rant: Then, wanting to check "how old i was", i tried with higher frequencies, and found out that i couldn't hear more than 18 kHz. Still not wanting to age so quickly, i was sure something was amiss. Then i found out. My speaker system itself had a frequency response range from 18 Hz to 18 kHz. argh lol. I bought better speakers with response up to 20 kHz and sure enough, i could hear it. This just makes me wonder. Do we really "age" out of this frequency or do we just "waste it away" because we don't use it any more. I still practise hearing 18 kHz (with good speakers/earphones) every now and then. And also have a save file on my phone to test out earbuds; before i can buy them and it making me lose my hearing range. P.S: i couldn't hear 20 kHz sine wave. I don't know if it's my limitation or the speaker's. Until i can get a volunteer who can blind test, i'll still be searching. (i'm not sure earbuds/speakers produce enough power anyway at the 20 kHz frequency, to use resonance on other objects.)
It has to do with the hairs in the cochlea of the ear. The ones responsible for the highest frequencies are in the smallest part of the cochlea (they have to vibrate the fastest). As we age, the cochlea becomes more rigid and inflexible to those high frequencies, and that's why we lose the high range.
Excellent, excellent video! It does an excellent job of cutting through the woo-woo and uninformed opinions out there. You even corrected a misconception I had of aliasing, namely that the aliased frequency “wrapped around” to the low end of the spectrum vs being “folded back”! That is, I thought that a 25 KHz signal samples at 48 KHz would appear as a 1 KHz one, not 23 KHz. (I was going to correct you, but I couldn’t explain the Audition display, so went and looked it up. Duh…) Thanks for correcting a misconception I’ve held ever since my Signals and Systems class in undergrad :-) One note and a question for you and/or the audience though: I had understood that one of the problems with CD-grade audio wasn’t the potential for aliasing so much as it was the “brick wall” low pass filter you had to use to allow 20 KHz to get through but cut out anything beyond 22.1 entirely. AFAIK, filters with such abrupt frequency cutoffs mess with signal phase well down into the audible range. Is this the case? My knowledge of such things dates back a good 35 years, so it’s possible that modern technology has found a way around the problem. (This would of course be a further argument in favor of a 60kHz sampling frequency, you could use a less-abrupt filter that wouldn’t impact phase relationships in the pass and.) ==> So my question: With current technology, can you get a flat passband, sharp cutoff and linear phase all at the same time (in the analog domain)? Thanks again for the fantastic video!
Yeah I got that misconception as well... But there's another issue that shows up as low frequency "beating" when you start to get close to the Nyquest limit but still under. I don't know what it's called but it's worth looking into. The brick wall limiter was definitely the issue, I think it's particularly prominent in a mastering scenario... After you add up all the little tiny decrease in high frequency in all the audio stages plus all the filter chains... it becomes substantial enough to notice.
In the old days that was an issue. These days ADCs are sampling much faster internally and downsampling on the output, so the brick wall filter requirements are much less steep.
Comparing a 10khz sine wave to a 10khz square wave, only proves you have 10 more khz left to hear the aliasing artefacts of the higher overtones that you cannot hear. You are not hearing the 30khz overtone. You are hearing the aliasing artefacts at 20khz. That's what is making the slightly crisper sound. Harmonic distortion. One of the reasons why people prefer analogue sound over digital is because it sounds warmer and more organic. But that's not how the real music sounded. The digital version is a perfect replica of the original sound. The analogue version suffers harmonic distortion which is pleasing to the ear. The Minimoog analogue synth had such a great sound because Bob Moog miscalculated the resister values in the filter, causing the sound to distort in a pleasing way.
This was great! Thank you. It seems that ultimately, this matters for producing, but not at all for listening. By time I get it to listen to, those high frequencies should have been long filtered out. However, I wonder how many PC sound systems (windows, alsa, openal, etc) bother to apply a low pass to signals they downsample in order to avoid aliasing?
As someone who has repeatedly defended digital audio, including debunking false claims, I've been posting that Monty video for years. Great stuff that. Dan Lavry's White Paper has also been quite informative. I own a Lavry AD11 as well as a DA10 and record at 24/96 kHz most of the time for my songs, a handful of which you can find on Soundcloud. You can almost make out the AD11 under the desk behind my guitars in this video: th-cam.com/video/HwNNdziJbbw/w-d-xo.html.
I'm so happy I watched this video because a while back I watched a video which contained a sweep up to 20kHz and noticed that the sound cut off abruptly at 16kHz. I was unsure whether the culprit was TH-cam, some other link in my audio chain or if the limit of human hearing is experienced as a hard limit (intuitively, this didn't seem right). I really need to have my hearing properly tested. I'm 38 now and I was "still in the game" comfortably up to 16kHz and I can definitely hear below 20Hz (I think it was somewhere around 16-18Hz when I stopped experiencing it as sound when I tested a while back). My mother told me that when I had my hearing tested as a kid by my school, they said my hearing was above average and that I could hear tones most couldn't. The funny thing is, the reason my hearing was being tested was because they thought I was deaf. My brother used to throw tantrums at home and I learned to "tune out" sounds I found annoying. Turns out I found the teachers annoying too.
It's the other way round. The Gibbs phenomenon shows up, when there is NO aliasing. It is the result of running through the antialiasing-filter in a DAC. The antialiasing-filter rolls off all frequencies above 20kHz and the result are the squiggles around the edges. A perfect square wave has an infinite number of overtones, and when you cut those off with the antialiasing filter, the result is a band limited square wave, which exhibits Gibbs phenomenon.
When the film guy gives better explanation about sound stuff then actual sound guys... Also, sound was recorded with distortion in the analog era, and now we crave that distortion, so why should we go against THIS form of aliasing distortion? Don't bother. Life is Short. Record in 44.1k. Great video :) Cheers!
Well, there are music styles that use aliasing as a stylistic element. But one thing dynamic distortion has going for it is that the frequency content it adds is harmonically related to the input and even with intermodulation (where there's a risk of losing this feature) you have some intervals where it remains so - hence the popularity of power chords, where the added frequency is an octave below the root note. So I doubt it will become something with popular appeal and your reasoning reminds me of the old "we went from triads to tertrachords and now romanticism is regularly using quintachords and sextachords, so obviously the next big thing will be 12-tone music".
The 16k cut off is probably the encoding setting TH-cam picked for the codec, not some hard filter they applied. Most perpetual encoders (AAC, MP3) will throw away high frequency content. I mean, it probably wasn’t a nefarious decision by TH-cam.
Of course it wasn't nefarious... But it was one annoying obstacle in trying to demonstrate this concept. And then it's only on SOME of the streams, not all...
the issue is more to do with the fundamental frequency of the instruments we use and the way we construct music. we only have drums in high frequency or transients of vocal sibilance.
Absolutely fantastic work as usual. I'd love to see how this whole thing compares to analog sound though. I've only ever worked digitally before, but I've always been fascinated by the physical manifestation of sound and it's analog recordings.
I've heard that too -- but unless they have special scientific microphones designed to capture frequencies above human hearing, I'm not sure it matters.
I don't think it's about frequency width. It's about stretching the entire recording. When you stretch it makes everything thinner. Like pulling rubber band. Signal would get less resolution. Less data points. Through entire range of frequencies.
Problem is the rubber band analogy doesn't work because Nyquist does not work that way. Using 24khz, the audio would be EXACTLY the same as 48khz in every respect BUT only up to 12khz. So it's not that you have more data points - that doesn't matter when the audio is sent back to analog in the speakers. I suspect the reason 96hz would be used for slow downed effects is the same reason I discussed in the video: headroom. With 96kz there's about an octave and change you can manuever around in without running into a Nyquist limit that dips into the perceivable range.
You skipped an important point: the steeper a filter, the greater the induced phase shift introduced to the signal. You can get around this using different types of filters, but those introduce other temporal artifacts (such as pre-ringing with linear phase filters). And crucially, just as an anti-aliasing filter is needed at the analog input to a digital system, a reconstruction filter is needed at the analog output from a digital system. Therefore, the primary advantage of higher sampling rates in audio is that one can use less steep anti-aliasing and reconstruction filters starting at higher frequencies well outside the audible range, but well below the Nyquist frequency-all while generating fewer artifacts within the audible range.
>You are watching this at 44.1 kHz, because that's what TH-cam converts every video to. That's not entirely true. TH-cam has the audio available in two formats, AAC and Opus. For AAC it uses 44.1 kHz, but the Opus audio is in 48 kHz, as Opus does not support 44.1 by default. Which one is used depends on the platform; in most web browsers (except Safari) it usually uses Opus. And of course, it might be converted to 48 kHz while being played back, and I'm not sure what they do to it internally, if they convert it to 44.1 and back to 48 for Opus. And the low pass filter you mentioned near the end might be different depending on the audio format too. Which format did you download it in? Edit: Apparently most of this has been commented already.
15:20 John (from the future), audio above 16 kHz is possible on TH-cam these days, but depending on the download method you use, it may get re-encoded using an mp3 codec and cut everything above 16kHz. I recently brought up this topic in the comments of a VSauce3 video called "Could You Survive A QUIET PLACE?" because that video includes a sample of an 18kHz tone. Like you, I thought youtube had a hard limit at 16 kHz, but someone advised me in a since deleted comment how to download it in such a way that I wouldn't lose the upper frequencies and I was able to verify that the 18 kHz frequency was in fact there. The audio codec on that video is Opus.
When I download the video from TH-cam's own official means (in creator studio) I see the 16khz hard limit. If I cannot verify that TH-cam isn't cutting off the high end, there's no point in the test when I'm asking people to see if they can hear the high end.
@@FilmmakerIQ I'm sure it depends on the audio codec you used in the video file you uploaded too. TH-cam probably transcodes some but not other formats. Anyway, just thought I'd mentioned that is in fact possible somehow. I verified it using the spectrogram view in Adobe Audition, btw.
Maybe, I only tried uploading h.264 and when that didn't work I tried uploading a ProRes file with uncompressed audio. Both came back with hard limit on the audio.
@@FilmmakerIQ TH-cam's official means only give you a very inferior 720p h.264 + hard-limited AAC media file. There are less-than-official means that actually allows you to download TH-cam videos at the qualities they're streamed at. That particular program even had their github taken down at some point and it caused a big fuss, but it's back now.
A small error: The Nyquist frequency is the first one you can't reproduce not the last one you can. Imagine a sine wave that happens to cross through zero right at each sampling point and you will see why.
5.2khz square WILL sound like only 2 waves since next wave is at 25kHz, and it does sound like CRT whine... becasue CRT whine is very close at 15,7kHz (PAL is a bit higher than NTSC) and is a result of line frequency and caused by magnets shoving the electron beam. Speaking of CRTs. I happen to still hear up to 18kHz and it annoyed me SO MUCH that for so long people just used CRTs everywhere and not cared about the noise it creates... from supermarkets to some movies and tv shows 15,7kHz was everywhere.
Hi John, you're probably just explaining "aliasing" as a phenomenon, but the video title "baited me clicks" and now I'm all confused about if I disagree or not, how dare you! :D Aliasing requires some knowledge on how to handle/avoid it, but the example with the generated square wave and its mirrored-back harmonics, to my understanding at least, only showed that that particular piece of software that was used had no way of properly handling the calculated harmonics for the chosen sample rate. It's not exactly a real-life use case. I personally can't see any use for sample rates higher than maybe 48k for simple audio recording/playback (provided that your audio capture device does a good job at band-limiting). I do see a potential use for higher sample rates (as in "the resolution in which your DAW project operates") during post-processing if you're using tools where you know that they produce aliasing themselves when the project runs at 44.1k or 48k (i.e. plugins that produce extra harmonic distortion and aren't internally oversampled). Or if you record audio that you want to pitch down an octave or more. Using such high sampling rates surely is much more demanding on any computer system compared to simple band-limiting, which should be done at the analog-to-digital stage anyway, whatever your chosen sample rate. So yes, we all want to avoid aliasing, absolutely. But using high sample rates isn't necessarily the best way to do that. Does the "audio capture device" even sound good when recording at a higher rate? Many might support 192k, not all of them work well when doing that, some few cheaper ones might even sound worse. Or is it the opposite and your device *only* sounds good at higher sample rates? Perhaps the high frequency rolloff of your specific audio capture device starts so "early" that when using 48k you can actually hear it dampen the high frequency content? My point, it's better to know your tools and make a decision based on that. Also, love your channel!
"Showed that that particular piece of software that was used had no way of properly handling the calculated harmonics for the chosen sample rate." - I think that's the BENEFIT of that software. It was simple test tone generator not a musical application. The test tone reveals something interesting about aliasing - a topic I've heard about but didn't truly understand until I saw the folding reflected aliases. If the square wave had been constructed AND then band limited, then I would have never had the true revelation of what aliasing actually is - it would always be this mysterious filter thing that engineers talk about.
Fun fact... People that lived with older TVs with noisy line-output transformers may have developed notches in their hearing at 15734Hz (NTSC) or 15625Hz PAL) although if they are that old they may not now hear much above 12kHz or so anyway (that's me at 63). I remembered this when you picked 5.2 and 15.6kHz for the demonstration. I also wondered how hard that 16kHz wall is that TH-cam apply, and would probably have gone with 5 and 15kHz or even 4 and 12kHz. If interested, it's also instructive to construct square waves visually using a graphing calculator to help with understanding how each odd harmonic improves the squareness of the waveform, although I guess Audition can do that as well. Great video, too, by the way.
Hello, apologies for being slightly picky. Electrical engineer here, I work with high speed data converters, >1Gbit type work. You nailed pretty much everything, but I’d like to make 1 tiny (but SUPER important) note. BTW, this mistake is even in some EE text books: nyquist theorem doesn’t say you should sample at twice the maximum frequency in your signal, but rather twice the maximum bandwidth of your signal. This ensures your entire signal falls within the first nyquist zone, allowing your anti-aliasing filter to cut out all unwanted signals. The reason square waves never work well, their bandwidth is technically infinite. Again, don’t want to take anything away from your video, great work! Edit: I wrote that first paragraph not trying not get to technical, but I feel it leaves a bit to be desired. I don't do anything audio, and I just realized the maximum frequency in an audio signal usually describes the maximum bandwidth (please correct me if this assumption is wrong), making the first paragraph a distinction without a difference to audio folks. However, is it import to make that distinction between maximum frequency and bandwidth because it allows for the use of underdamping to still faithfully reproduce your signal. In my world, I often times want to digitize signals in the GHz range, but if i know my signals bandwidth, i can sample a much much MUCH lower frequency, and filter my output to any higher nyquist zone. In this case I am using the aliased signals to faithfully represent the initial signal. This technique requires pretty fancy filtering, and knowing the center frequency and bandwidth of your incoming signal. Often time we dont have that info.
I remember some article in an audiophile magazine about a study in the early days of CDs. A recording company recorded a classic orchestra on both reel-to-reel tape and a PCM processor. When played back to an audience, there was no clear line between the media. Depending on the piece played, the majority preferred one or the other. The conclusion at this point was that each recording added some specific artifacts to the music, which might benefit one piece, but not the other. After this, they went to analogue and digital mastered vinyl records and high end tape cassettes on one hand, CD on the other. All of the same performance. Oddly enough, here the lines were defined more clearly. The digital camp voted for the CD, the analogue camp for vinyl and cassette. Then one of the technicians had an idea: They went back to the master recordings, but added noise from a blank vinyl record or a blank tape. The result was that everyone voted for their favoured medium. Vinyl enthusiasts picked up on the clicking noise from the blank record, the tape guys picked up the tape noise. So either consciously or subconsciously, they confirmed their bias. I wish I could find that study online, maybe someone reading this can help? Different sample rates, compression methods and bitrates affect music recordings. The artifacts become part of the music and some will prefer the sound of one type over another. A lot of it also depends on how much care has been taken during production, from recording to mastering to compression of the publishing file. The audible difference between low and high sample rate might be minuscule, but because more care has been taken to produce the high end recording, the result sounds better. Now throw in confirmation bias, and everyone will say they are right because ...
I'm probably missing something big, but wouldn't high sample rate be useful for slowed down audio? Kind of like shooting in highres for legroom when it comes to editing and zooming in in post?
I think that would be entirely based on how the audio is slowed down. The thing to remember with Nyquist is that higher sampling rates do not actually give us more information in the frequency covered under the Nyquist limit, they give us a higher Nyquist limit.
Curiously, it's possible to hear frequencies up to at least 50KHz, it has been demonstrated with bone conduction experiments all the way back in the 50s or so; however it was also demonstrated that they basically don't really matter - it's impossible to tell apart frequencies above approximately 16.5 KHz, they all sound the same, and there is some hard anatomical reason for that, i forget. So you may perhaps actually want to capture a little ultrasonic energy, but you can fold it back into the band above 17-ish KHz. Band limited synthesis of the square wave is a solved issue. I think the simplest way is additive synthesis from sines, which you cover right in this video. Since Adobe has ignored this well known insight, one can consider their square wave synthesizer buggy by design, maybe they made it this way to look good to amateurs, band limited square wave always looks like it's heavily ringing, even though it's not. Unfortunately a lot of algorithms and plug-ins have some aliasing or other sampling rate related issues such as "EQ cramping", either due to limited computational budget or by oversight. So high sample rate intermediates are sometimes good, though should be ever more rarely actually needed as far as DAWs, their builtin effects and generators, and higher end plugins are concerned. Audition probably doesn't have quite that professional an ambition for a silly effect. Something to keep in mind that most recording devices don't truly have a configurable sampling rate at the lowest hardware level. The reason is that the analogue filter that would reject aliasing needs to be tuned to the sampling frequency, and you don't want to include the same hardware several times, and yet more hardware to switch between those variants, not only for cost, but also for noise and other degradation that ensues. So the internal sampling rate can be for example 384KHz, and often analogue anti aliasing filter will have a corner frequency of somewhere just north of 20KHz. So you have over 3 octaves of filter room, so at 36db/oct filter, there's like 110db of suppression for all the junk. Then the ADC will have an internal downsampling to something more palatable, like 48/96/192 KHz, and these are easily aliasing-free. This isn't entirely how modern ADCs work (keyword delta-sigma modulation), but it's not too unfair a simplified representation. If 44.1/88.2 KHz are desired, resampling happens elsewhere downstream, in a DSP or software, and of course then it's the issue of how much you trust that particular implementation to be low aliasing. Just 12 years ago, it was not uncommon to find fairly low quality sample rate conversion in a major DAW! It's not entirely trivial and fairly computationally taxing to get right. Things got a lot better since. But for a given audio interface, you shouldn't expect 48KHz mode to introduce any aliasing that you can avoid by recording at 96/192. Besides aliasing, the other potential resampler behaviour trait is phase shift, which nominally isn't audible, but under circumstances can be.
@@VioletGiraffe Harmonics are always above the fundamental, not below. But indeed it has shown that there are no auditory hair cells that correspond to higher frequencies than about 16.5 KHz. And yet there is apparently a mechanism to excite them with a higher frequency signal.
yeah subharmonic resonances would make that possible, it's the same sort of thing where humans can (pretty easily) distinguish phenomena above 60hz - despite that when *staring dead-on at a screen*, your eyes can't tell the difference. but your cochlea has actual pitch-specific resonators; the hairs float in the resonator bit, and the resonator bit definitely does not have a 50khz band. so, yeah, it makes sense that you could identify the presence of sound in your environment that was generated by a 50khz emitter, but there is actually no possible way your brain could receive it as 50khz sound, it would be like seeing gamma rays or being touched above the top of your head - you can hallucinate the experience due to a real phenomenon in your environment, but it's not really representing reality correctly.
The difference heard between sample rates and bit depth are only down to the various filters applied to the signal when the audio device change mode. The best practise is to calibrate one and only use it, and work on your room acoustics and modes. If ou change bit depth or sampling rate you have to re-do your calibrations. It is actually better to play various sample rate materials all down to the same calibrated output than to change the output to the same as the material input. But inside the computer keep all your files as they came. When playing only use the output that is well known and calibrated. My outpud DAC is outputting 20bits 48 khz because I need the 48khz for the bluetooth compatibility and I use 20 bits because I am snob (it's for a type of stream). If I had not other constraints I would use 44.1 and 16 bits because that's enough, even when I want to smash the ears with 115db. As long as it is balanced xlr out for analog, we are happy. There is no excuse for time domain (and yes whe hear 7mn time bounces) n excuse for depth, no excuse for noise, no excuse for ultrasound.
Great video and demonstration. I can clearly hear the difference between A and B when side by side but suspect I wouldn’t be able to discern the two if separated by 15 seconds of silence OR if comparing a more real world scenario with the organic complexities that our perception sorta smooths out when not listening to some sterile demonstration of sine vs square. Maybe worth noting I’m by no means young (39) yet still heard the difference though am likely an outlier as my earring presently scores within the range of teenagers and I remember frequently being bothered by high frequency sounds that nobody else seemed to hear back when I actually was a teen. 20+ years of working at shows and touring with bands took care of that curse😂
Thank you for this video. I want to listen to it probably many times to understand more what is happening. I also loved the Monty Montgomery video! It was so neatly presented, even I with no audio degree could understand it. So to sum it up, if I understood it correctly: ✅ Audio engineers use high sample rate (96 kHz+) for recording to avoid aliasing (and therefore avoid any unwanted weird sounds)? ✅ For consumer audio playback (music, games, movies etc.) nothing more than 44.1 kHz is even needed for any human being? As it is a waste of system resources for no benefit.
I’m a musician who has played loud music live for over fifty years and I hear the a and b examples 24/7. Thanks for identifying the frequency of my tinnitus!
Lmao. My tinitua is a little higher.
Never been able to figure out where mine is as it doesn't seem to be a single tone.
7.3K in my right ear 😢
Engineer here. This is a good explanation and easily the best visualisation of aliasing I've seen. nice!
Wave scientist, here (not audio). I agree this is an excellent demonstration of aliasing. However, I think this video seems like an argument primarily for *mastering* above 44.1 kHz, particularly if you're generating a lot of synthetic sounds, rather than recording or playing back audio above 44.1 kHz.
I wouldn't expect human voices or musical instruments to produce a lot of power in frequencies above the human range of hearing, so you're probably not going to get a lot of audible aliasing if you record audio at 44.1 kHz. And then if that aliasing hasn't been baked into your digital audio file to begin with, then you won't be hearing it.
An exception I could imagine would be if you're recording in a noisy environment where the "noise" isn't Gaussian--in which case, perhaps you could get some beat-like pattern of "noise" in your audible range.
Edit: the other caveat would be that if you have high-fidelity audio and you're playing at back at a lower sampling rate, it's anyone's guess how that downsampling/resampling algorithm is working. It might introduce it's own wonkiness. But then if you're trying to drive speakers at a frequency beyond which they've been tested, you might get non-linear weirdness, too.
I'm really focusing on the capture side of things. My interest is really in the "making" side of things.
Some cymbals can shimmer up in the high range of human hearing - I've heard tell of some loss of the "brightness" of cymbals because recording at 44.1kHz and compounding of a bunch of low pass filters causes that upper range to just lose power... but that's theoretical to me.
But I really never answer the question in the title "What is the Optimal Sampling Rate"... haha. I think my purpose was really to understand what the argument is, not necessarily advocate for it. That's my impression when I was trying to work through Lavry's paper.
@@FilmmakerIQ this is an interesting discussion and has me sitting here on a Saturday afternoon tinkering with MATLAB and Audacity.... :-)
So what I just tried was I created a 7 kHz square wave sampled at 44.1 kHz. It looks like a typical square wave. Sounds like your video
Then I generated a 7 kHz square wave in a 192 kHz track, and applied a bunch of aggressive 22.05 kHz lowpass filters so it has no frequency content above 22.05 kHz.
Then I made another 7 kHz square wave in a 192 kHz track, and just told Audacity to resample it to 44.1 kHz
The two 7 kHz square waves generated in 192 kHz tracks, one lowpass filtered, and one resampled, sound like square waves, and sound almost.the same.
The one 7 kHz square wave generated in the 44.1 kHz space has dozens of overtones and sounds totally different.
I don't have an easy to way generate a square sweep, but this would be an interesting experiment for you to try on a square wave sweep and see what happens.
So here's the deal with 7 kHz square wave and any flavor of 48kHz (96kHz, 192kHz) - they will all sound the same!
Remember in my video where I did the frequency analysis? There was the fundamental at 7kHz - that first harmonic at 21kHz and then aliases separated by 2kHz starting at 1kHz and going up. That works out because cycle of reflections caused by the 24khz Nyquist limit cycle around and around on odd numbers.
Doubling the sample rate to 96 or quadrupling it to 192 doesn't change the cycle, it just changes where the cycle ends and picks up again! And since a square wave is suppose to be an infinite series, it doesn't matter where the cycle picks up....
Change the fundamental frequency of that square wave to 7001hz - and it won't cycle around like that and you'll hear the difference between 48 and 192khz.
@@FilmmakerIQ "aliases separated by 2kHz starting at 1kHz and going up. That works out because cycle of reflections caused by the 24khz Nyquist limit cycle around and around on odd numbers" - sorry, don't understand this: where does the 2kHz separation come from ? How do you get from 7x5, 7x7, 7x9, ... to all those frequencies ?
I have no problem letting the "Hz"/"KHz" misstep slide, as it's a common-enough slip-up. But sorry, I gotta call you out on repeatedly referring to a low-pass filter as a "limiter". Otherwise, this explanation - like all of your tutorials - strikes a comfortable balance between technical accuracy and accessibility to a broad audience. Well done!
Btw, be forewarned: you just know somebody's going to link to this video in a Gearslutz post.
I used to work in sonar engineering in which we used digital signal processing. To avoid aliasing the first 'processing' step was an analog filter which would cut off frequencies that could cause trouble because of this aliasing.
Hello 👋 👋 👋
I need a course of Signal Processing
Can you help me
Thanks in advance
@@yaakoubberrgio5271 I've put up the lectures from my ECE3084: Signals and Systems course at Georgia Tech: th-cam.com/video/VtSlmdshqrI/w-d-xo.html
A well designed delta modulation ADC does part of filtering for you. A part like the AD7768 uses a modesty high order integrator so you get more than one pole of anti-alias for free.
@@kensmith5694 I've never been able to fully wrap my head around delta-sigma converters. Like... I can sort of follow the math line by line, but I can't really develop and intuition for the "heart" of how they work.
@@kensmith5694 When I mentioned working in sonar engineering I was talking about 1979 and 1980. We had to construct our own processing unit by using a 5 MHz 32 bit 'high speed' multiplier as the heart of our system.
What you explained is just that, either have a good recorder (A/D converter) which can do a good job at filtering out > Nyquist frequency signal, or, record at higher sampling rate (for example, 96kHz is much more than enough), and then, down sample to 44.1kHz/48kHz when you do your post. In digital domain, you can do (very close to) exact calculation, and at the end, save a few bytes on the final product (without jeopardizing quality). However, to those crazy guys who insist to get so called high res files for PLAY BACK, they are just crazy, forget them!
I mean, even cheap equipment like 100$ DVD Players in 2007 had already 192kHz DACs avoiding any problems like this at all.
But for the final media, more than 44,1kHz doesnt make much sense since anyways most released music is still in 44,1/16Bit. Even most(or all!) vinyl records are made from 44,1kHz samples. Tidal even dare to upsample/"remaster" 44,1kHz/16Bit originals to expand their "HiRes" collection...
Since anyways every HiFi gear have HPF for anything above 20kHz, in combination with internal 96kHz+ processing, more like 384kHz nowadays, no. 44,1 is just fine... more is acceptable in cases of digitalized vinyl records or yeah - why not. 44,1/48 vs 96+ is like comparing 4K vs 8K... it doesnt make practical sense, probably a bit with the perfect circumstances... but hey its possible. Thats why my AVR has 9x(or 11x idk) 384kHz/32Bit (32Bit!!! wtf?!) DACs, by numbers even better than my Hi End Stereo Gear with "only" 192kHz/24Bit Wolfson DACs.
Only in recording and mastering more than 44kHz are needed, and these are anyways at 96kHz+ since its possible.
I dont get people when they complain about "only CD quality"/44,1kHz... damn! Thats at least completely uncompressed, not like the lossy! MQA garbage for example. In fact (and already proven...) CD quality is better and more accurate than MQA (which is another compression format like mp3 - but worse, and with high license fees haha). Some of my friends are completely addicted to HiRes and/or Tidal/MQA, only because they see any blue light or 96/192kHz on their receivers screen... despite having absolutely the same sound as a 44,1kHz CD with the same mastering. Damn, they use soundbars, garbage "HiFi" gear, BT headphones and they dare to complain about 44,1 kHz only!
I also prefer HiRes source material, but mostly because of different masterings, less loudness, more audiophile/dynamic, easily for the "demanding" people mastered.
@@harrison00xXx I believe you are incorrect about vinyl masters. Mastering for vinyl is a separate master than the master for CD. Professional mastering engineers want to work with the highest quality mix which means NOT 44.1kHz/16bit. And most likely the vinyl press wants to make their master from the highest quality version available. At least for the major label artists. Independent artists, well ya know they get what they pay for and can’t be reasonably used to make statements about what’s used to make vinyl records.
@@arsenicjones9125 Ofc its differently mastered for vinyl, but still, the samples used to make the "negative" are for probably 99,9% of the (non quadrophonic) records 44,1kHz/16Bit, as CD quality, that was my point
As if CD Quality is "bad"....cmon, thats the most accurate and "lossless" quality standard we ever got. Ofc there is now "HiRes" but thats more of a voodoo/too much...
@@harrison00xXx no I’m afraid you’re again incorrect. In major studio albums they regularly record at high sample rates then down sample to 48khz 24bit to edit & mix. Some major studios do all their editing and mixing work in 96khz/32bit floating. Then it will be down sampled again after mastering. Again we can dismiss what independents do because they don’t do anything in any standardized format.
CD quality is not the most accurate, lossless standard available. 🤦♂️🤣 An original recording made in a 96khz/32bit wav file is more a more accurate representation of the analog signal. If there are more samples w greater bit depth it MUST be more accurate than a lower sample rate and bit depth. Just because you cannot discern a difference in every piece of music you hear doesn’t mean there is no difference or that there is no difference which affects the experience. Just to be clear I don’t think CD quality is bad just that it’s not without flaws either. Upsampling won’t increase fidelity in anyway but a higher sampled recording is higher fidelity
@@arsenicjones9125 So you have proof that the source material for making vinyls ins more than 44,1kHz?
Sure they edit and master at higher bitrates, but the end result is mostly 44,1/16Bit sampled. This change probably with HiRes for the customers slowly, but its a known fact that 44,1kHz were used for vinyls FOR DECADES at least.
The problem with the 1st group mentioned (44.1 vs 48 etc) reminded me of "Complex Problems have simple, easy to understand, wrong answers." The same is true for Flat Earthers, young earth creationists etc. They have a very simple solution that seems to work because the [majority of the] people they are talking to don't understand the complexities.
The problem Group 3, the Audio Engineers, have is the majority don't understand the solution as presented mathematically and say "that is just your opinion!" and no more important than their opinion.... You see a lot of this these days.
It is Great to have videos like this one that go far enough to explain simply the problem for the majority without going of in to deep (group 3) Audio Engineer Geek speak of MSc maths.
That is really an insightful way to look at it.
@@FilmmakerIQ Hi John, You call me "insightful" again and I will sue! :-)
Need to put a low pass filter on that comment.
4:50
a tiny bit of correction on this part. If you actually activate the "stats for nerds" option, you would see that TH-cam actually uses a much newer audio compression format called Opus, developed by the same xiph foundation that Monty himself works for. And what's interesting about this audio codec is that the developers have decided to restrict the sampling frequency to 48 kHz (44.1 kHz sources get upsampled upon conversion, hi-res sources get downsampled and 48 kHz sources are essentially no-op and passes through). The reason for this is exactly the same reason you mentioned a few seconds ago, the math is just easier that way. You will only get 44.1 kHz if for whatever reason, your device requests TH-cam to fall back on to the old AAC or Vorbis codecs for compatibility reasons which will almost never happen especially if you're watching from a web browser or using an Android phone. But considering that Opus is still a lossy format, it's still gonna cut off any frequency above 20 kHz anyways.
There's a lot that gets said about "TH-cam compression" and how it affects audio. Generally, the degree to which it affects the sound of any given audio demo is nearly moot. These days, few of us are hearing _anything_ that hasn't already passed through a perceptual audio encoder of some sort (MP3, AAC, Bluetooth audio codecs, Netflix / Hulu / YT, and so on...) and nearly all of those codecs are going to brick-wall filter the highest of the high frequencies to avoid wasting data bandwidth on stuff only our pets will hear anyway.
The exception to this rule is the rare fabricated audio example like in this video, which uses a signal that is rarely something you'll encounter in a typical audio presentation of any sort. Yep. Those are affected by compression. Sure enough. But most of the time, when somebody is comparing a direct feed of a source audio file with one picked up through a lavalier microphone from sound being played through a 3" cube smart speaker, and then says "you won't get the full impact of this because of TH-cam audio compression", I just roll my eyes. haha I _think_ that 128kbps Ogg stream can adequately capture the sonic differences you were trying to convey, don't you worry about that.
don't underestimate the degree to which lossy compression might actually be doing a better job of preserving the signal than you think - eg, check out dan worrall "wtf is dither"; it's a long video and I don't remember exactly where in the video he does it, but somewhere in the middle he compares mp3 to 16bit wav in a situation where the mp3 *unequivocally destroys wav* in terms of which one represented the data better. wav was more lossy than mp3. That's because quantizing to 16bit integer naively actually introduces more noise than mp3 compression, if your signal is simple enough. it's all about what bitrate mp3 or ogg needs in order to near-losslessly compress a given section; and ogg vorbis is based on wavelets, not discrete cosine, which was why ogg vorbis can handle certain kinds of phasing sounds much better than mp3. so - yeah, as long as you're in a high enough quality mode that the bitrate compression is in the -100db range, you'll probably be able to hear whatever -70db effect they're trying to show. it's only when you turn down to 240p and your mp3 noise is -10db that we have a serious problem from audio compression.
now, video on the other hand... :D
In my experience watching a movie/TV show on Netflix and watching it on Bluray is usually night/day difference. Its not so much that you obviously lose highs, you seem to lose dynamic range, it sounds flat and dull.
Of course its not always enough to spoil the experience, but sometimes it definitely is. Same with the picture quality.
disagree comoletely.
YT audio is highly compressed and I can tell the difference between songs in YT vs Apple Music.
No contest
@@jhoughjr1 Music videos strangely are often the worst offenders, whereas some youtubers use music and it sounds fine. I'm very sensitive to lossy codecs too. Hated Bluetooth audio until LDAC and Samsung Scalable came along.
@@jhoughjr1 i dont know super exactly what youre taking about. but ive heard youtube uses aac codec. Imho for certain bassheavy generes, youtube is miserable. bass just doesent translate well on it. guitars are ok, but i still preffere mp3s of mine.. Apple music also uses aac i heard. But i found it bit better, dont know if its a specialized aac version they use.
Other than that i seen a test video that compared waveforms to lok for normal compression (audioplugin compression) , and nothing was found.
Excellent and interesting video .. I would to like one term here, that is 'oversampling' .. When digitising an analog waveform it is quite normal to have a relatively tame analog filter, but run the sampling at a much higher frequency than the output requires, 8x or 16x oversampling is common. The next step is to have digital filter operating on this high frequency sampled signal and then downsample to the required frequency eg. 44.1kHz.
The 10kHz squarewave has audible undertones because it was simply generated mathematically - there is no oversampling or anti-aliasing going on at all - if the signal was filtered properly before being recorded the 10kHz squarewave and 10kHz sinewave would, of course, sound exactly the same (since the next harmonic is not captured).
An exceptional video, sir, especially for going the extra mile and looking into TH-cam's own codec shenanigans with your own examples. I regret to say, I didn't hear much difference in the 7kHz files, but considering I'm getting older and adults lose top end in their hearing range over time, I'm not surprised anyways. (I can barely hear CRT yoke noise anymore, which I definitely could as a kid)
Aside from pure monophonic sound, I think higher sampling rates have a dedicated purpose when doing any kind of stereo/surround real-time recording, or doing any audio processing involving pitch/duration manipulation.
In the first case, human hearing becoming more sensitive to phase differences between ears as frequency increases, such differences in phase and arrival time contribute to our sense of a physical space the audio is occurring in. (Worth noting here that the Nyquist-Shannon sampling theorem assumes a linear and time-invariant process, where it doesn't matter how much or how little the signal is delayed from any arbitrary start point-human hearing, however, is definitely NOT a time-invariant process) When dealing with sampled audio, at higher frequencies, the number of discrete phases a wave can take drops off considerably: assuming a wave at exactly half the sampling frequency, you can have it however loud you want (within the limits of bit depth), but you can only have two phases of the signal (0° and 180°). One octave down, you only have 4 available phases (0, 90, 180, 270), and so on. This might contribute to the sense of "sterility" and "coldness" associated with older digital recordings that didn't take this into account. So if you're mixing audio that relies heavily on original recordings of live, reverberant spaces (drum kit distant-miked in a big room, on-set XY pair, etc.), it's an advantage to get the highest sample rate you can afford when recording/mixing, then downsample your audio for mastering/publishing, if needed. This way, you can preserve as much detail as possible, and give your audio the best shot at being considered realistic.
In the second case, having extra audio samples helps when you want to pitch audio up/down or time compress/stretch. Since some of the algorithms for doing these techniques involves deletion of arbitrary samples or otherwise bring normally inaudible frequencies into human hearing range, having that extra information can also be a benefit for cleaner processing, depending on your artistic intent.
Yes, I haven't factored in pitch alterations.
That's not entirely true, actually. The Xiph video mentioned in the content here covers the waveform phase topic as well. The reconstruction filter post-DAC is basically turning discrete samples into a Bezier curve. Just sliding the sampled points around on the X/Y axis (if X is sample, and Y is word value -- i.e., the amplitude of an individual sample) will alter the resulting wave's phase.
Another way to think of this is to imagine using a strobe light to capture an object moving in a circle. If the speed of the object rotating about the circumference was perfectly aligned with the flashing frequency such that there are exactly two flashes per revolution, it would look like the object is appearing in one spot, then another spot 180 degrees from the first, and repeating indefinitely. This is basically the Nyquist frequency. From that, you could construct a perfect circle because you have the diameter.
So now, imagine altering the "phase" of that object so that the strobed captures place those objects at different places around that circumference. You can still construct a perfect circle.
Same with audio samples. It doesn't matter if the phase changes. As the Xiph video says (I'm paraphrasing because it has been a while since I watched it), there is one and only one solution to the waveform created by a series of samples, _provided that the input waveform and output waveform have both been band-limited to below the Nyquist frequency._
@@nickwallette6201 Well, yes, for any arbitrary signal, you can still reconstruct it with sampling, but I was mostly thinking psychoacoustically, where delay and phase variations between ears plays such a big deal in stereo sound. And one of the side effects of sampling is that you get phase constraints, like I described above. For example, with a signal at half the Nyquist frequency, how do you distinguish between a full-amplitude sine wave and a cosine of -3dB intensity, when they both share the exact same sample representation (alternating between .707 and -.707)? Since that phase information can spell the difference between a centered (in-phase) or diffused (out-of-phase) stereo sound space, preserving phase and delay information is super important, and with finite sample intervals, there's only so many phase states you can have at high frequencies.
I also acknowledge, however, that bandlimiting filters induce their own phase delays as well, which can have a significant effect on the perceived audio-hence one of the other advantages of higher sample rate is to relax the requirements of bandlimiting and reconstruction filters to minimize their coloration of the audio.
Delay is not an issue with the sample rate. Sample rate does not affect the precision of timing of the wave in any respect
@@eddievhfan1984 With two samples per cycle, you can reconstruct a waveform with any phase you want. You could indeed have phase and anti-phase waveforms at 20kHz with a 44kHz sample rate. Try it. Use an audio editor to create 20kHz sine, then invert the phase. Zoom in to the sample level and look at the waveform it draws. This is a representation of what the reconstruction filter does.
I think it would be an academic exercise though, as 1) who's going to be able to determine relative phase between channels at the theoretical threshold of human hearing?, and 2) that's going to be in the knee of the low-pass filter curve, where any passive components on the output are going to affect the signal. It would not be unlikely to have a mismatch between L and R channels. High-end stuff might try to match capacitors to 1% or so, but there's plenty of gear out there (even respectable gear) that uses electrolytics rated at +/-20%. There's a lot of concern over perfection that is not at all practically relevant.
So here’s the thing, TH-cam does support 48 kHz audio, and it does support higher frequencies than 16 kHz... sometimes. Every time you upload a video to TH-cam, the encoder creates about 6 different versions of the audio with different codecs, sample rates, bitrates, etc. On playback, it will automatically choose the audio based on your network, decoding capabilities, etc. Just because the video was ruined after you checked the download, that doesn’t mean it would have been ruined for all listeners. Really it’s TH-cam’s technical inconsistency you have to worry about (I think that might also be true for your video about cutting the video 1 frame early)
TLDR; Your description of TH-cam’s capabilities wasn’t strictly true, but you were still right to cater to the worst case scenario.
Very interesting video!
19:07 "I just want to cover some interesting notes" Clever ...
John, Thanks for sending me down the rabbit hole. It took me 5 days to finish your video. Your instruction is always good, because of the practical examples you provide. Your videos inspire conversations outside of TH-cam and outside of film making. Thanks for that too.
edit: sorry wrong time stamp, could not find original ...
You switched it up between A and B lmao. Interestingly, the frequency of the harmonic you used is really close to NTSC horizontal refresh rate (15734Hz), which a CRT’s flyback makes audible as it deflects the electron gun left to right and back. I’m 41 and so far I’ve always been able to hear 15kHz flyback
Yep
So that's why you can hear this high pitch noise from CRT TVs?
39 and oh gods do I NOT miss working on TVs and that wretched noise. I can only imagine how horrific that nose must be to cats and dogs. We practically used to torture our pets with those damnable things.
yep. as a kid I could hear if a TV was on even if the screen was dark.
I remember as a kid freaking want to smash all school tv’s what a trash they let us watch in the first place and then the fucking beep, will hear even now I think, I did run out the classroom sometimes and told the teacher to blast herself with this earpiercing beep! She was like: what beep!? Bitch.. the older the crt, the more chance you may use it to deflect vermin out of your garden..
As soon as you mentioned Monty, I knew that you got it right.
I think it is interesting how many people rag on CD quality, CD sound pretty good and I think most people have a colored memory of it. It is the same thing that Techmoan talks about in his video about cassettes most people where not listening on quality equipment and I know for my generation we most used CDs that we burned which had mp3s which are lower quality then CD audio. Spotify only recently got "CD quality" audio but people don't complain about there quality.
CD's rule baby!
My earliest memories from the early 90's regarding CDs is that, a) they sounded really, really good, and b) my mom will get REALLY mad if we play with her discs (they were expensive)! My dad had a Panasonic component stereo setup, nothing high-end or audiophile grade but it was half-decent at least. He had some Type-II cassettes too which sounded really good on that player.
By the mid to late 90's CDs were starting to replace cassettes as the on-the-go medium for portable players, boomboxes, and car audio, which tended to sound bad to start with, but no matter how good your system is all of these are frankly crappy listening environments. Whereas vinyl was never a portable medium so even now if you had a vinyl player you'd probably have it in a dedicated listening room at the very least.
1st Harmonic with 3 times fundamental frequency? Where is harmonic with 2 times frequency?
@@peteblazar5515 the components of a square wave are the sum of infinite _odd_ harmonics. So the first harmonic is 3x the fundamental frequency, the next is 5x, and then 7x, etc.
I wouldn't rag on mp3s either. Unless the bitrate is really low or it's encoded with an old encoder I just can't tell the difference.
I'm a professor of Electrical and Computer Engineering at Georgia Tech, and have taught courses in signal processing for 20 years. Besides an excellent tutorial by Dan Worrall, this is the only video on the topic I've seen on TH-cam that doesn't make me cringe. In fact, your video is superb. :)
Love the deliberate error 🥳 also thought my hearing was failing with the sine sweep until you pointed out TH-cam hard cuts at 16khz. I'm one of those weirdos in their 40s who can still hear when shopping malls have a mosquito device... Or could during the before times at least .. haven't been to a mall in 2 years
@@MyRackley Hmm, sadly i know mine doesn't at 65, but then i've played in too many bands with overloud guitarists, and in one case, a drummer who overhit his cymbals all the time, where we rehearsed in a small room. Still have a low level of tinnitus in my right ear, but luckily it's not really noticeable unless things are really quiet, and i guess i've become quite good (or at least my brain has!) at filtering it out of consciousness!
My electrical communications systems prof literally just covered the sampling theorem in class today, and by chance I saw this on my recommended. This video is an EXCELLENT demonstration of aliasing. Thanks so much for making this.
BTW: I can totally hear the difference between A and B on YT, but I can't tell the difference on the 7kHz one. But that could be my Bluetooth headphones. I'll edit this comment when I get home and try my corded headphones/speakers.
So, after you showed the example at 4:40, my first thought was, "well, what if you instead choose a frequency that exactly divides the sampling rate?". So I opened up audacity, made sure both my audio device and the project were set to 48KHz, and tried generating a 12KHz tone - in that case, a square wave sounds just like a sine, but slightly louder.
It's easy to make sense of it if you think about it in terms of generated samples - you just get two high ones followed by two low ones, and that pattern repeats *exactly* at a rate of 12KHz. If you choose a frequency that doesn't cleanly divide your sampling rate, you have to resort to an approximation - some runs of high/low samples will be longer, some shorter, so that over a longer period, they average out to the frequency that you're trying to achieve. But in that case, you're essentially creating a longer pattern of samples that takes more time before it repeats, which creates a bunch of other spurious (aliased) frequencies in your signal.
I think the real takeaway here is that mathematically ideal square waves are awkward and don't work out that great in reality. Sines are way nicer.
You choose a special case which is square wave with the frequency of the sample rate divided by four!
There's two ways to think about that. Either the mathematical sum as you described or as a visual graph. Only one sinusoidal wave can fit the given samples... Instead of the sample defining the top of the square wave, it defines each side of the crest and trough of a sine wave with greater amplitude!
Another great video. One thing about your sine/square test, you can simulate what would happen in a real-world situation by generating your waves at a sample rate like 3,072KHz (64x48K) and convert to 48KHz to listen to it. That's because all modern ADCs sample at at least 64fs, often 128 or 256fs, filter out everything above 20KHz, then down-sample to your capture rate.
Another experiment I ran a few years ago was record a series of sweep tones to my blackface ADAT, which allows the sample rate to be continuously varied from about 40KHz to 53KHz. At 53KHz, aliasing is *almost* eliminated where it's quite audible at 40KHz. Yes, those converters are out of date, but it's still a valuable learning tool.
That said, I'm a huge proponent of 96KHz in digital mixers, where the ADCs are working low-latency mode. At 48Khz, an unacceptable amount of aliasing is allowed to keep latency through the mixer below, say 1ms (not a problem in analogue mixers). At 96Khz, the converters can run in low-latency mode and have no audible aliasing. When I'm working in the box on material that was captured by dedicated recording devices (latency is not an issue), 48KHz is fine.
As a mixing engineer for over a decade I'm glad to see you got this right. I'm also glad that at over 50 years old I can still hear the difference between waves A and B. And for the vast majority of people listening to audio on crappy playback systems it doesn't matter one bit.
Double blind tests of Redbook 16 bit 44.1kHz digital audio vs. high res 24bit, 96kHz digital audio, played for average listeners, audiophiles, and high res audio 'experts'....all couldn't accurately pick out the the high res files.
The average listeners had a 50/50 probability, while the rest of the audiophile/experts scored even lower!
As an EE, and music lover, I've always stressed the importance of the master recording being the great deciding factor on the quality. Quality in, quality out. No amount of oversampling, upscaling, or bit rate will improve a crappy initial master source.
This is about extra noice introduced during processing of the audio.
Not about the output format really.
great content, thank you 💚
So in depth. Thank you so much!
The Fourier Transform tells you how loud each sine wave in your signal is - a spectrogram, if you plot it. It also can tell you the phase, so all 3 parameters - frequency, amplitude, and phase - of a sine wave are covered. The Inverse Fourier Transform puts all those sine waves back together. In computers we use Discrete Fourier Transforms, and usually a "fast" implementation known as an FFT for "Fast Fourier Transform." (Which BTW is one of the top 3-5 hacks in all of computer science.)
Yes but the how gets way more complicated
Project and storage samplerate at 48k with each processing stage using oversampling has been proven to be optimal. You have to increase project sample rate to 384kHz to get the same. The trick is in the oversampling, allowing for wider bandwidth while processing to reduce artifacts and then filtering the unnecessary frequencies out keeps it cleaner. 48k is not enough for some signal processing, while it is plenty for other. A gain change can be done in 48k but compressing, anything that modifies the phase or time domain in anyway has to be oversampled to decrease overall antialiasing. The strangest thing is that despite having additional filtering stage at each processing block (for ex, each plugin in a project) and converting back and forth, it is less CPU intensive. Higher samplerates by far most of the time run "empty" signal, the entire bandwidth is processed at each stage while oversampling is not needed for linear operations.
This is not very known thing, which is a bit odd in my opinion. You can test this at any point, device antialiasing stress tests and compare 192k project rate to same processing done in 48k base and oversampling. The latter has less artifacts.
Excellent video. You dealt with complex issues in an easy to understand and fun way - nice job, man.
When you first brought up harmonics and square waves, I thought about posting a correction cause it sounded like you were about to make a big mistake by ignoring band limiting filtering, but I watched the rest of the video…and you handled it all. Well done, including your edit post TH-cam processing.
Yes, I did hear a tiny difference between your 5.2khz sine wave and the 5.2/15.6khz additive construction square wave synthesis.
I do have exceptionally good high frequency hearing for a 55yr old, however, it’s also important to note that music is never a pure sine wave, nor a square wave, so you would never hear even the tiny (barely noticeable even to excellent hearing and only because it was a pure note of extended duration) differences I heard in an actual piece of music.
The important part, as other have pointed out as that your waveform must have and appropriate low pass filter applied. That could be a 20khz analog filter, with sampling at 48khz or higher, or 20-24khz filter before 57.6khz, or 20-25khz before 60’hz, or a 20-35khz analog filter and sampling at 88.2 or higher sampling. And it’s always good to lower the noise floor by recording at 20 or 24 bit depth. Do all your editing and mixing at something above 48khz and above 20bot depth, then master for 44.1/48 16/18/20 bit sure, you can master for 24bit depth, but no one will actually be able to tell the difference.
This was FABULOUS as always John! Amazing description!
Always the best videos!
Omg! So glad you mentioned the hard cut-off YT does at 16. I thought I was loosing my hearing during those sine wave sweeps.
John, this is the worst explanation of the connection between aperture, circle of confusion, and infinite focusing I've ever seen!
I agree.
These concepts are connected however.
LOL!
Hey John, I’ve been doing digital signal processing since 1980, 41 years, including spatial digital signals. Nyquist can be grasped with knowing one concept: that sampling at the Nyquist frequency there is no phase information. Phase information is restored as the sample rate is increased above Nyquist. To differentiate a square wave from a sine wave, both still have to be faithfully reproduced, including the phase information. At 10 KHz, a 44.1kHz sample rate only produces 4 samples her sine wave, partially preserving the phase of the signal. Since a square wave is made up of more than one frequency, the phase information becomes important, as it affects the sound not just the amplitude of the sound. 44.1 kHz works because most of what we listen to is under 8kHz. If you want to preserve phase up to 15kHz, really should sample above 60kHz.
Now, if you are listening to stereo, you really want to preserve more phase information, so makes even more sense to go 60kHz or higher. Even though to me 44.1 kHz seems fine enough for me.
I always wanted to make a spatial audio standard that recorded phase information as well as sampling information, a transformation rather than sledgehammer sampling. This has been done commercially outside the audio industry for over 35 years.
You are totally ignoring the sound reproduction equipment's role in this. Sure at 10 KHz, a 44.1 kHz sample only produces 4 samples. So? The signal recreated by the DAC sent to the vibrating membrane or paper cone of your headphones or speakers while reconstructing this 4 sample pulse of 1 second, it's plenty. 60 kHz may be useful during mastering of the original, but at the consumer level, we don't benefit from it with proper noise cancelling and anti-aliasing applied.
Another thing to consider is that at exactly the Nyquist limit, the signal contains no information whatsoever on the phase of the signal, so if you had a 90 degree phase shift between the left and right channel (or multiple channels in a multi track recording), that information would not register correctly in the audio samples. This may not be so important when listening to the audio as our hearing is not so sensitive to the phase of such short wavelengths, but if you start to do addition of the channels or other signal processing where the different channels interact, the same signals oversampled vs sampled at the Nyquist limit can produce a different sounding result, even after the result has been downsampled back to the Nyquist limit.
Nyquist will accurately reproduce the sound, if you THEN add extra modifications on that then that in no way implies anything about nyquist not being 100% correct.
@@ABaumstumpf Nyquist is correct about the absolute minimum sampling rate, but there are benefits in oversampling.
@@TurboBaldur Yes, of course, but that in no way has any effect on what us humans can actually hear, and there the 44kHz 16 bit is enough. if the mastering of the audio is done poorly that is not the fault of the medium not does it make Nyquist any less correct.
@@ABaumstumpf exactly, if the sampling is being done for playback to a human only then 44.1k is fine. But if you plan to edit the audio it makes sense to get more samples, even if the final export is to 44.1k
This is a great point, and I believe it may be why many digital recordings made in the early 90s sound "flat" compared to late-generation analog recordings. Too many engineers just relied blindly on the digital technology without thinking of consequences like this. Nowadays of course studios work with much higher bitrates and bit-depths for processing and mastering before producing the 44.1kHz or 48kHz files for release.
Hey John, this is great seeing you do some new technical and concise teaching videos. Your work is so helpful for anyone digging in a bit in the subjects you tackle, so thank you for that!
Aliasing is pretty much a non issue when going though a modern codec. The generated square wave example was not filtered, as it would be on any DAC. If you recorded that wave and then displayed it, it would sound the same but not look square anymore, but look like 2 sines mixed together.
Codecs sample at a much higher rate (>1Mhz) with fewer bits of resolution then down sample using a CIC filter and multiple half band filters. Through the magic of poly phase filtering, an 18th order elliptical halfband filter is only 4 multiplies to drop the rate by 2 with a very steep cutoff. You chain multiple half bands together, maybe a 3 or 5 phase if needed, to drop down to 44.1 or 48K rate. Its pretty easy to knock out any audible aliasing with a chain of tuned 18th order filters.
This video isn't about codecs
@@FilmmakerIQ Then congrats at demonstrating why an anti-aliasing filter is important and what happens without one.
Nice video and thanks for the link to Monty Montgomery's explanation
Thanks man....really useful info, but the main reason I wanted to leave a comment is that I really dig your set! Looks cool!
As usual a very thorough and clear exposition.
A truly great video about this complex subject with an appropriate amount of humor concerning the state of the commenting on TH-cam in these times. Thank you for your efforts, they are well appreciated.
Great video, thanks! FWIW (and that’s not much) at 5:00 you say that TH-cam samples everything to 44.1. But actually, TH-cam uses the opus codec for the audio channels of videos, and that format is locked to 48. I think a few older vids might also have ogg or m4a which may be in 44.1, but “most” are sent in 48. It’s certainly not substantive for the point you’re making, more just trivia. Thanks!
AAC is used for Apple devices which is locked to 44.1. It also happens to be what they use for the download file option in YT's creator studio.
Aha. Interesting. Using youtube-dl, here are the streams available for your video (limited to audio):
249 webm audio only tiny 52k , webm_dash container, opus @ 52k (48000Hz), 7.13MiB
250 webm audio only tiny 61k , webm_dash container, opus @ 61k (48000Hz), 8.37MiB
251 webm audio only tiny 108k , webm_dash container, opus @108k (48000Hz), 14.81MiB
140 m4a audio only tiny 129k , m4a_dash container, mp4a.40.2@129k (44100Hz), 17.71MiB
I'm on a mac here (but not an iOS device); in Firefox, the youtube web app uses stream #251 (as visible in the "stats for nerds" right-click; in Safari it uses #140, so you are indeed correct!
Again, thanks for the excellent video.
from Argentina I say THANKS! TU CONTENIDO ES BRILLANTE!
This is so cool. As a former TV audio mixer, this just rocs. And, by the way, the square wave sweep reminded me of some unknown 60s era Saul Bass movie credit animation.
Is anyone else not able to hear the 10kHz sine wave at all, and the 7kHz sine wave only barely? I really hope it's something in my hardware configuration, rathen than me having lost that much hearing. 😢 (FWIW, I'm on a Framework laptop on Ubuntu GNU/Linux... could probably go into more details on what audio system, but don't know off-hand.)
Edit: P.S. In the sweep, the audio cuts out for me at about 7:46, so whatever frequency that is.
Interesting! I never thought about *not* having a low-pass filter (to cut out higher frequencies) in front of an AD converter - because it would sound really, really ugly! (There are some tricks to get away with weak analog filters, but they involve oversampling and digital filtering, aka signal processing.) As an engineer it was always clear that you would need this high-cut filter. And on your 5.2 kHz demonstration - I can only hear the switching itself. There's a discontinuity in many switching events, but when the switching was continuous (on crossing the zero-line I'd guess) I couldn't hear it at all. Yes, my hearing is already that bad (but nearing 60 this is quite normal).
Where it does make sense to use higher sampling rates (and 24 bit) is in audio processing, because higher "resolution" (in amplitude and time) makes it easier to manipulate signals. Same as in image processing: It makes perfect sense to use 16 bit per channel (or even 32 bit float) images in high resolution when doing advanced image editing, but the end result could be distributed in much lower resolution with just 8 bpc (this is common practice); yes, there's still a chance that you run into issues with color management, but there are ways to deal with that on the "output" side.
I'm 47, suffer from tinnitus and use $25 wireless Logitech headphones but even I could hear the difference between the two 5.2kHz samples. The aliased one sounds 'dirty' to me. Not sure what this proves though.
Maybe it proves your wireless Logitech headphones arent very good? Try it on a speakers...
Yup, for audio processing it always makes sense to use float. You get
* A higher dynamic range (145 dB vs 96 dB), which gives you more headroom before clipping
* Simpler (and possibly faster on anything newer than a Pentium II) code when working with normalized range -1 to 1
For image editing, it depends on your purpose, but VFX requires the higher dynamic range of 16 or 32 bits per channel. Editing for a website or printer may work with less headroom.
This was amazing, fantastic explanation!
I've been curious about this for a long time..
Crazy technical and interesting. I learned more bout audio encoding than I ever knew. And I learned how little I know.
It is a similar story with image resolution where people claim that a 4K TV is way better than their old 1080p TV - but the difference was not really due to resolution but size.
You need a rather large screen at a close distance for any visual difference between 1080p and 4K, and now with 8K.... you need like 60" monitor at 1m distance for there to be any visual difference.
44 kHz 16 bit is enough for humans - for us that can be called "perfect". There has not been a single human that has ever shown to be able to accurately hear anything above 21kHz. For the bit-depth - kinda debatable as without noise-shaping, dithering or anything like that this is "only" ~96 dB SNR - so from the faintest sound perceivable (you'd need to be literally dead to not have the sound of blood flowing through your veins) up to soundlevels that cause permanent hearing-damage with just half an hour of exposure per day.
You could literally have an audio-track with the drop of a needle and being on a busy road - and both things would be fully captured.
Doing ANYTHING but listening to the audio is a different beast.
Just image taking a photo with a resolution just high enough that it looks perfect to you (doesn't even matter what actual size/resolution) - ok. Now take the same image and stretch it to say 5 times the size - oh, it suddenly is no longer perfect.
When you want to manipulate any data, be it image, sound, or anything else - you end up introducing distortions and losing some precision, so you'd better make sure that the initial data you got is way more than you actually want to deliver at the end, and do all your manipulations with as much USEFUL data as possible. With audio that often means capturing >20bits of depth at 96 kHz - which allows you to squeeze and stretch the sound a lot before any unwanted distortions become audible. Useful as in like this video is showing the problem of aliasing.You do NOT want that in your data so you better just use >96kHz during manipulation and then filter all the high-frequency stuff out before it ends up getting folded into the audible range. Cause once it is there you are not getting ridd of that anymore.
I do know that MP3 compression cuts out at 16khz because of the way the standard was designed. Also, I think some devices start to roll off frequencies in the last octave or so, so even if you have speakers and ears that can reproduce and perceive those frequencies, your hardware may be reducing their amplitude.
Not all MP3 encoders cut at 16KHz though . LAME Encoder does not beyond a certain bitrate. And anyway, TH-cam does not use MP3 compression. It uses either AAC or Opus.
48kHz is enough for playback, usually the steepness of the LP filter to bandwith is 45 to 55% of the sample output, it gives almost non-existent phase errors and ripple within maximum audible range. 96kHz can provide benefits when it comes to pitch shifting of high frequency information and lower latency. Softer filters can also be used with 96kHz and possibly less ringing and phaseshifts, but it is rare for different applied filter to be used between different sampling frequencies, in addition, higher sampling frequencies often results in an extra component-instability.
Basically all DACs and ADCs use delta sigma modulation with multiple bits (often 2-6bits). This involves a sampling frequency of
several MHz, but they utilize another more effective type of modulation for the purpose, this modulation arise from a sawtooth that follow the analogue tone frequency which provides a pulse density/width that is digitized with 1-bit for a bitstream, partly and continuously analog for a certain period and compared to the analog input signal with differential circuits which results in different high frequency pulses designed to add or remove the energy in certain frequency bands, the distorsion energy is in this way increased in higher frequency bands and decreased in lower frequency bands, which is continued until the noise is satisfactorily reduced within the desired frequency band, this is done in several steps by several circuits, divided by amplitude for more effective noise shaping while maintaining stability, after this the process of demodulation and decimation takes part from several 1bit PDM bitstreams divided by amplitude to one 24bit PCM, with applied digital filters and downsampling.
Nice video. There's an interesting tidbit that I have noticed with the whole 44.1 v 48.8 - you need to be consistent even though it doesn't matter. If you playback a 44.1 file inside a 48.8 project (or vice versa) you get pitch drift phenomena. This is why consistency is key even though sampling rate doesn't matter.
The real key is "mastering for your platform" as it were. Understanding the playback limitations of youtube and making sure you sort your audio for playback. Tis redundant to do all your audio at 88/-8dB if youtube is going to down sample to 44.1./ -15dB.
What I do and it works flawlessly is use 48 k 16 bit with a limiter ceiling of -0.1 dB, with a compression of 2-3dB and I do not get any compression on my tracks and its actually below the loudness maximum. It seems like it actually counteracts the effects from yt
I’ve been recording and mixing for years, and the only time sample rate matters is on the recording. 24-bit audio @ 192KHz is indistinguishable from analog tape, and if you can record your audio at that sample rate, that will give you the option to master it for any format you want, with the least amount of degradation to the sound. For folks that understand how film and video work, it’s similar to folks that are shooting video in 4k if they plan on making 1080p content or in 8k if they’re planning to release something in 4k, because even though they never plan to release anything at that higher resolution, it gives them more options for cropping the footage and doing other stuff that you wouldn’t be able to if you shot video at the intended output resolution of the finished product. Applying high, low or bandpass filtering to audio is essentially the same as cropping an image, and the more detail you have to crop, the better it’s going to look or sound. Just think about an image, if it’s the size of the file you want the final output to be, and you decide to trim off the edges to reframe the photo, and then if you increase the image size so it matches the output resolution you started with, then you’re gonna be looking at something larger and far less detailed and blurry than you would have if the image had started out at a much higher resolution.
I will be the first to admit my recordings are all at 44.1KHz or 48Khz, but that’s because I couldn’t afford the hardware (or it didn’t exist when I made the recordings) so the end results that I got with those mixes never sounded as clear or crisp as the stuff you hear that’s been stamped with the official “Mastered for iTunes” label.
Another interesting topic I think that builds on this lesson would be to discuss the process of dithering when mastering audio. Some folks might be surprised to find out that the best sounding digital masters deliberately introduce white noise into the file as part of the mastering process, especially when downsampling from something like 192KHz audio to 44.1KHz.
Okay I've gotta do a video on this because that analogy is completely wrong.
Also analog tape has less specs than 16bit 44.1.
@@FilmmakerIQ analog tape has way more dynamic range and headroom than 16-bit audio at 44.1KHz.
That's why everyone was still recording to analog tape, well after the CD, DAT and other forms of digital audio were invented. Believe me, they didn't do it because it was easier or saved money. Maintaining an analog recording studio with massive tape reels was an expensive and fiddly endeavor, so anyone running a studio back in the day would've jumped on the latest technology if it would've simplified that process. It wasn't until everyone eventually converted to digital recordings in the 2000's, when sample rates and quality of studio gear were high enough to record 24-bit audio, at sample rates well above 44.1KHz.
You don't have to like my analogy, because it's not exactly perfect, but people know more about editing photos and videos these days than they do about audio--and they just need something they can wrap their heads around, to know why people choose to record at higher sample rates than what we hear as the finished product.
However, my explanation and analogy are not wrong--let alone completely wrong. I not only studied digital audio in college, I also worked in radio, and even helped teach a class on digital audio production. The professor wasn't the most skilled at recording and editing, because he came up in the analog era, and just used the computer like a tape deck and did everything old school. So, I helped him teach students one-on-one, how to actually use a DAW in one of the studios, so that they could record their assignments.
I still record, mix and produce music for myself and others in my spare time, so I might not be a TH-camr but I know what I'm talking about, and I'm not sure you know what I'm talking about, because if you did, you wouldn't call me "wrong" and use that as the catalyst for making a video to correct me. I have no idea what your credentials are or experience in this field is, but I got the impression that you're someone who has some technical understanding, and just learned all of this shit in the process of making your video, and you really don't have more than a decade of actual knowledge.
It's funny, because this video was actually lacking some pretty basic information about the topic. You didn't even explain why someone would want to record anything at 44.1KHz, when there are much higher sample rates. You brought up using 48KHz as the sample rate, but didn't explain where that comes from. I think your viewers are even more ignorant than you on the subject, and might not know that CDs happen to use 16-bit @ 44.1KHz, and that DVD audio uses 48KHz.
For anyone else reading this that actually cares to learn something, CD's compromised on the sound quality, because they couldn't make players that played back compressed audio without making them super expensive, and that was the highest quality sound they could use and still fit an entire symphony onto a single disc. (Audiophiles are historically fans of classical music, and when you're launching a new music format that's only going to be be affordable to the wealthy and/or those with "discerning taste", you kinda want to make sure you can cater to them a bit. It was a huge selling point for anyone sick of flipping albums to hear the second half of the performance, and I'm sure that without the support of those snooty weridos, CDs might never have taken off.)
DVD's used 48KHz because it was the base sample rate used by DAT, which was one of the original digital recording formats, and because it was what people were using in studios, it got adopted by Mpeg2, DVD and digital broadcast formats. It only sounds slightly better, and it's almost imperceptible if someone uses proper dithering when creating the final audio file. It was simply a matter of compatibility with existing pro-audio equipment, which also supported higher sample rates like 96KHz. Good studios would record at the higher sample rate, and then downsample their work for the finished product. DVD-A used 24-bit audio @ 48KHz, because they were purely an audio experience, so they could use up more of the space on the disc for higher quality sound.
Newer formats like BD (and the new dead HD) DVD used 96KHz, again, because of the larger amount of space available. Which is still really good sounding, but it's still only half the sample rate of the highest quality digital recordings, which is 24-bit @ 192KHz. There may eventually come a time when there's equipment that can capture audio at a higher sample rate, but even the obnoxious audiophile community that would typically support anything that's higher quality, just for the sake of it being measurably better (even if it wasn't perceptibly better) hasn't been pushing for anything higher. Turns out, even they can't tell the difference between 24-bit audio @ 192KHz, when compared to a super clean analog recording, from a well maintained deck with Dolby noise reduction. If you don't overdrive the tape, or have it distort in the upper frequencies, and you play it back on equipment that doesn't have any ground hum, it sounds fucking amazing--and so does 24-bit audio @ 192KHz, which I guarantee you've never heard in your life. Unless you're in a legit recording studio with high end gear to hear the difference, you can't tell.
You can absolutely hear the difference between analog tape and the much lower quality audio used by CDs, because the dynamic range is reduced to 96 dB (which is a non-trivial 48 dB less than 24-bit audio) and more importantly, it's less than the 110 dB range of analog tape when recorded using a Dolby SR noise reduction system. 32-bit audio hasn't really taken off, because 24-bit audio already overkill with a wide dynamic range of 144 dB, which is already higher than the theoretical dynamic range of human hearing, which taps out at 140 dB--so 192 dB is just needlessly wasting storage space. That said, 16-bit audio with proper noise shaped dithering can have a perceived dynamic range of 120 dB, but again pure analog tape also has an effectively infinite sample rate, so that combined with the actually greater dynamic makes it sounds better than CD audio.
Honestly, I'm not even sure what the point of your video even was, because TH-cam isn't the platform capable of even showing the subtle differences between audio using sample rates of 44.1KHz and 48KHz, especially when TH-cam already filters out everything over 15KHz. You may not be able to hear sounds over 15KHz, but I still can, and at this point if your hearing is already damaged enough to the point you can't even hear a sine wave between 15-20KHz, then you're clearly not the guy who should even care, because those sounds are for you, and I would agree that you shouldn't invest in anything better than CD audio, because it's completely lost on you.
For those of us that actually understand digital audio, and have fully functional ears that can hear everything from 20Hz to 20KHz, there's plenty of reasons to record or listen to music that's using a higher sample rate and bit depth than CD audio. Of course, that's just a simplified explanation of some of the vast amounts of information your video was lacking, because I didn't even discuss the bit rate of digital audio (mostly because we were discussing uncompressed digital audio, and it's only when compressing audio files that bit rate becomes an issue, because that's where the sound quality gets drastically reduced.)
But hey, you're just a guy who doesn't really have a background in this stuff, so I don't expect you to talk shop on the fine points of all this. Those of us who work with this stuff for real actually need to know about how our recording medium actually works, and we have to know how audio works, so that when we're mixing it for your consumption, that it sounds right--so we don't expect laypeople to know how the Fletcher-Munson curve affects our hearing during the process of recording and mixing, or on playback over a sound system of any kind.
So, while they title of your video isn't wrong--the work you showed to get to the right answer is, because nobody in the history of the music and recording industry, or tangentially film and television, ever said 44.1KHz was optimal. The reason it's not optimal, is because the low pass filter is still attenuating frequencies within the audible range. So when Harry Nyquist figured all this shit out, he was merely pointing out the bare minimum that audio had to be sampled at to reproduce the full range of human hearing. He wasn't wrong, it's just that there's no perfect low pass filter that exists, capable of attenuating frequencies outside the range of human hearing, without attenuating audible signals. So, even with the best possible filter, you're still going to cut things off well above what we can hear, just to make sure nothing gets cut.
In the real world, I typically don't allow my mixes to contain very much above 15KHz, because as you've noted, it's not supported by TH-cam, and most people won't hear that stuff anyway. However, I do allow reverb to contain as much high end content or "air" as we call it in the business, because those are the subtle things your ears will detect and miss if it's unnaturally chopped. It's like bad lighting in a poorly edited photo, or CGI--you have to be an expert to know what you're looking for to see it, but we instinctively know when those subtleties are lost and it will seem wrong or fake.
Anyway, good luck with your channel. Hopefully you spend some time learning and doing some research before you go off and make something that's going to confuse or misinform your viewers.
I'm not reading this novel especially when you start with a completely false statement that tape has more dynamic range... There's no point when you're so off base from the start.
@@FilmmakerIQ Maybe if you read what I wrote, you'd actually learn something smart guy. Feel free to look it up. Analog tape recorded with Dolby SR noise reduction, which was the standard in professional studios, had a dynamic range of 110dB, while 16-bit digital audio has a dynamic range of 96dB. I'm not talking about cassette tapes here bud, I'm talking about 1/2 tape used in professional studios to make multi-track recordings.
So, please just STOP with your nonsense, because you don't know what the hell you're even talking about. You looked some things up on Wikipedia, and think that you're a professional because you make TH-cam videos.
How many professional studios have you been in that actually had 1/2 tape machines? I guarantee you've never even seen a 1/2 inch tape in your life, let alone heard one played back over the studio monitors in a real studio.
Clearly, you seem to fancy yourself a "Filmmaker" and not a recording engineer, or producer--so why don't you go make your silly little videos about lenses, or light meters, because you don't know shit about digital audio or recording.
4:51 TH-cam actually converts to 48kHz.
The reason is that the developers of Opus audio codec decide to support 48kHz but not 44.1kHz. (They have an FAQ for this.)
But if you watch TH-cam on an Apple device, TH-cam will deliver an MP4 format with AAC audio codec, that will be either 44.1kHz or 48kHz.
Well when I download the video from my own TH-cam Studio - it's 44.1 - so I think most everything is delivered at that sample rate and it conforms with everything I've read so far.
@@FilmmakerIQ That's probably because you are downloading it as MP4 format (H.264+AAC). For Chrome / Firefox / Edge streaming, TH-cam defaults to use WebM format (VP9+Opus), which uses 48kHz sample rate.
Ah
Thank you for this excellent explanation. I am an audio engineer for a living, for many years I used a digital mixing console (a Panasonic Ramsa WR-DA7) which can operate at both 44.1k and 48k. I was always able to hear the difference between the two even when only recording voiceover, which I've done a lot of. I also have read Lavry's work in the past, when he previously insisted that there was no difference whatsoever between the two sampling rates and no need to ever use above 44.1K, and knew something had to be wrong. I also have used high sample rates, particularly 96k, and agree that they require a LOT of processing power, which translates into a lower track count and fewer native plugins that can be used, which makes those high rates inconvenient at best, at least for now.
Coincidentally, it always seemed to me that the best compromise between computing power and the audio problems I was hearing would a sample rate of 64kHz (since in computing we like to use powers of 2 as factors, mostly because it's easy to clock-divide by 2 or 4, etc.). It's interesting that Lavry's proposed sample rate of 60K is very close to my own thoughts, and personally I'm glad to see that he has come around from his prior position that 44.1k was just fine.
I also knew that when using wave generation software just like you illustrated in Adobe Audition, when generating a 16K sine wave at 48k sampling rate, the result is a wave with only three data points per cycle: one at zero, one near the peak, and one near the trough - which is of course a 16K TRIANGLE wave, not a sine wave, albeit a someone oblique one. Yes, those overtones are outside the range of hearing, and yet you could hear that something was wrong - it definitely was not a sine wave that was playing back. Aliasing is exactly the problem - there was no anti-aliasing applied to the data generated by Audition or any other similar program, or any anti-aliasing generated by the WR-DA7 that was outputting it and that the computer was digitally connected to - and there still isn't today on most high-end professional equipment. So there's just no question that the VAST majority of digital playback equipment out there simply applies no anti-aliasing filtering at all and never did. To my trained ear, this has been quite annoying indeed.
I also remember the very early days of CDs, and the first CD player I bought, a Sony. I didn't like it, because the top end sounded "brittle", which was a common complaint in those days. And in fact it wasn't until CD players introduced "oversampling" that the problem went away - basically moving the aliasing frequencies so they are all hypersonic, by extrapolating and outputting one or three "samples between the samples" caused later generation CD players to sound significantly better.
The bottom line is that Nyquist really doesn't handle the concept of aliasing very well, as you aptly point out. And what is needed, particularly for audio production, is a sampling rate that allows all of the alias frequencies to be moved above the 20kHz threshold of hearing. Computing power is a temporary problem, so I have a feeling that in the not too distant future all professional audio production will be done at 96k, even though we don't really need it to be quite that high. Thank you for what I believe settles this issue hopefully for good.
Sorry but three sample points do not produce a sawtooth wave, it's produces a sine wave. You don't connect the dots with straight lines, you draw a sine wave through the dots.
A saw tooth wave has integer harmonics, it would need to be constructed with many sine waves which works probably be above Nyquist if the wave is only 3 samples wide.
Lastly, I don't think you understand why Lavry suggests 60. He stated in the paper that 44.1 is if not perfect, close to perfect.
@@FilmmakerIQ I think you misunderstood what I said - "triangle", not "sawtooth". And I wasn't referring to an actual triangle wave, I was only referring to the shape created by the three points if you connect them, which isn't exactly what's going to happen in the DAC anyway, because DACs don't transition from one point to the next in any smooth way, they simply jump to the next value. The bottom line is that for a 16kHz sine wave, only three data points are created, and only three data points are going to be output by a DAC. The DAC itself is not going to "draw a sine wave through the dots". It's just going to output stairsteps at three data points and that's it (unless of course we're talking about oversampling, which would instead use spline interpolation or some similar approach to approximate where the additional samples would be. But to my knowledge no production hardware - such as Pro Tools or UAD Apollo etc. - utilizes oversampling on output).
For example, if you create a 16kHz 24-bit sine wave at -3.0db, each cycle will have exactly three points - one at zero, one at -4.2 db above zero (sample value 5,143,049) and one at -4.2 db below zero (sample value -5,143,049). The DAC isn't going to transition smoothly between those points, it's simply going to output a zero for 20.83 milliseconds, followed by a sample value of 5,143,049 for 20.83 ms, and then a sample value of -5,143,049 for 20.83 ms. If DACs did indeed "draw a sine wave through the dots", then aliasing wouldn't be a problem, because the DAC itself would be reacting perfectly to the INTENTION of the data - just as analog tape used to do. But the problem is of course, as with many things computer-related, DACs simply don't do that. They just output a voltage corresponding to a number for a specified number of milliseconds as dictated by the sampling rate. It is of course this behavior that causes the alias frequencies to result, as you have very correctly and articulately described.
As for Lavry's 60, correct me if I'm wrong, but my understanding is that the advantage here is twofold: 1) it pushes the vast majority of alias frequencies into the supersonic range, making them a non-problem, and 2) it provides more headroom for creating anti-aliasing filters, should a playback hardware developer choose to do so, which sadly, very few ever seem to. My point was merely to essentially agree with Lavry, but I'm suggesting that when taking into account the fact that digital hardware designers prefer to do things in powers of 2, that a better choice for "optimal sampling rate" should be 64kHz specifically. Personally, I wish hardware developers provided that option in addition to 48k and 96k because that's what I would use for production instead of 48k or 96k. It would be quite a good compromise.
That's completely incorrect.
Yes the DAC does draw a sine wave because it's coverting it back to analog. The speakers cone is a physical object and it moves through space with inertia, it can just jump to each sample point and hold for the next one.
So If you produced three samples you will not get a triangle, you will get a sine wave. Watch Monty's video in my description. Samples are not stair steps, they define the points of a sinusoidal wave. This is the key to Fourier transform and Nyquest theorem.
Aliasing has nothing to do with stair steps (because there aren't any stairsteps). Aliasing is the result of frequencies that are higher than the sampling frequency.
Your understanding of Lavry's 60 is incorrect as well. It doesn't push alias frequencies in the ultrasonic... You don't push alias frequencies... it provides enough headroom for anti aliasing filters to work without affecting the audible range.
Lastly clock speed has zip to do with binary. 64khz is meaningless because time is irrelevant construct. Look at the history of computing you will not see any clock speed correlating with any binary numbers... Because it's been simply not how that works...
Also 64khz isn't a binary number. The closest is 2^16 which is 65.636khz
I'm glad someone GETS IT, regarding aliasing. I've had this argument with so many tone-deaf wanna be engineers that do not understand why percussion sampled at 44 Khz sounds like so much white noise but sampled at 192 Khz sounds like percussion instruments.
>> percussion sampled at 44 Khz sounds like so much white noise but sampled at 192 Khz sounds like percussion instruments...
Very informative and clearly a ton of research went into this!
I once studied electrical engineering with a bit of signal processing but then went into energy (the big kilo volts stuff) and finally computer science. And in computer graphics I was right back in the fourier transform again, because yep.. its exactly the same thing in computer graphics. And while all the theory has been ages ago so I really need a refresher myself, I find this discussion everywhere:
This debate of higher sampling rate, completely ignoring aliasing is going on in graphics just as well. Just look at all the "graphics mods" for games that upload huge textures for absolutely everything and then change the engine settings so bigger textures are being sampled for small objects, then wonder why performance goes down the toilet while aliasing artifacts appear and make things look worse instead of better. Its almost as if game and engine developers know about these engineering principles and optimize for them. Like... as if they know what they're doing :D
Same goes for mesh level of detail too btw. Rendering a triangulated mesh is nothing but sampling. The sampling rate is your screen resolution. If you make an insanely detailed mesh that will show up small on your screen, you'll get mesh aliasing which will also look crap. People always thing smaller textures, mipmaps, and LODs are only used for performance, and if my PC is kick-ass, I should always load everything at the biggest size (bigger/more is better), completely ignoring signal processing principles and aliasing.
Fascinating analogy
@@FilmmakerIQ at the end of the day it's all about sampling at a limited frequency. Doesn't matter what the data is
Glad to see you dug in a little more to check out the difference between the theoretical "ideal", and what actually works in practice. There are still, of course, many other variables, but the answer to "which sample rate?" is always "it depends". Jumping back to the last video, my comment was only that I found it interesting that the original concept sample rate being 60K was almost a happy accident of ending up with that ideal range suggested by folks like Dan Lavry. It would likely have radically changed the course of digital audio development as we all know it.
Great video. LOVE the Monty video. It is awesome in it's clarity.
13:10 "If we had an infinite sampling rate ..." Isn't an infinite sampling called 'Analog'? :oP Kinda defeats the purpose of digital which can be MUCH smaller in storage size.
16:50 I could pick out that the frequency was transitioning from sine wave to square wave, but the tone was indistinguishable to my ears. (yes I listened to your linked video)
Thank you for the time and effort to produce this video. It is appreciated!
Same, I can't tell the difference between those last 2 waves but I can hear the transition. Almost sounds like there's a short crossfade
There is a short cross fade. I couldn't get the waves to cut exactly at the same amplitude so it was either a crossfade or a pop on the switch and I chose the cross fade.
Analog is not "infinite sample", your tape has a frequency range based on how fast you run it, normal copper wires will struggle with RF frequency, even the air has a frequency range because it's made of individual molecules. Natural is more "digital" than "analog" in the sense that energy comes in discrete packets because of quantum mechanics.
One time you said the right words "four ninety-three" while the numbers on screen said "439". I was not expecting to hear the difference between 440 and 439!
Yeah, I dyslexia
Wow this video is so interesting!~ I was (thinking) sure it's not just twice the frequency because if i downsampled some audio file to just 22.1 kHz (after checking that the treble was well below 10kHz) to save space on my CDs, it just didn't sound right, almost like sandpaper trebles. Well, now i know, thanks to your helpful explanations. Harmonics do affect the timbre of the sound, even though we can't hear them directly.
Another engineer here. In my laziness I was edging into camp 2. Thank you for showing me the error of my ways, and reminding me of what I knew 40 years ago.
Nyquist's sampling theorem is correct, and it assumes a perfectly band-limited signal. You band-limit a wider bandwidth signal using a low-pass (anti-aliasing) filter. Precision analogue filters can be expensive and difficult to create. Further, if you have a sharp transition in the filter, you introduce artefacts which are visible on transients in the signal, and might be audible, although I really don't know. To allow an easy-to-implement gentle roll-off filter without attenuating your wanted signal in the passband, you need a lot of headroom.
BTW, to me this is all theoretical. As somebody of retirement ago, with loud tinnitus, 20 KHz sampling rate would be just fine.
Thank you for also talking about and checking the audio uploaded to YT. Years ago, some science documentary on NatGeo or Discovery was being broadcast on TV, explaining how adults can't hear above 16kHz and to test it out with your friendly adult (or parent) nearby. To my shock, i myself couldn't hear the 16kHz wave they were "playing".
Not wanting to age so quickly (and good thing i had a computer as well), i generated a 16 kHz sine wave, and i was _so relived_ to know that i could hear it lol. And sadly the TV didn't have a comment section like here to complain.
Rant:
Then, wanting to check "how old i was", i tried with higher frequencies, and found out that i couldn't hear more than 18 kHz. Still not wanting to age so quickly, i was sure something was amiss. Then i found out. My speaker system itself had a frequency response range from 18 Hz to 18 kHz. argh lol. I bought better speakers with response up to 20 kHz and sure enough, i could hear it.
This just makes me wonder. Do we really "age" out of this frequency or do we just "waste it away" because we don't use it any more.
I still practise hearing 18 kHz (with good speakers/earphones) every now and then. And also have a save file on my phone to test out earbuds; before i can buy them and it making me lose my hearing range.
P.S: i couldn't hear 20 kHz sine wave. I don't know if it's my limitation or the speaker's. Until i can get a volunteer who can blind test, i'll still be searching. (i'm not sure earbuds/speakers produce enough power anyway at the 20 kHz frequency, to use resonance on other objects.)
It has to do with the hairs in the cochlea of the ear. The ones responsible for the highest frequencies are in the smallest part of the cochlea (they have to vibrate the fastest). As we age, the cochlea becomes more rigid and inflexible to those high frequencies, and that's why we lose the high range.
@@FilmmakerIQ oh my…
Great job. Thank you for this interesting info.
It’s a great video! Many thanks.
Excellent, excellent video! It does an excellent job of cutting through the woo-woo and uninformed opinions out there. You even corrected a misconception I had of aliasing, namely that the aliased frequency “wrapped around” to the low end of the spectrum vs being “folded back”! That is, I thought that a 25 KHz signal samples at 48 KHz would appear as a 1 KHz one, not 23 KHz. (I was going to correct you, but I couldn’t explain the Audition display, so went and looked it up. Duh…) Thanks for correcting a misconception I’ve held ever since my Signals and Systems class in undergrad :-)
One note and a question for you and/or the audience though: I had understood that one of the problems with CD-grade audio wasn’t the potential for aliasing so much as it was the “brick wall” low pass filter you had to use to allow 20 KHz to get through but cut out anything beyond 22.1 entirely. AFAIK, filters with such abrupt frequency cutoffs mess with signal phase well down into the audible range. Is this the case? My knowledge of such things dates back a good 35 years, so it’s possible that modern technology has found a way around the problem. (This would of course be a further argument in favor of a 60kHz sampling frequency, you could use a less-abrupt filter that wouldn’t impact phase relationships in the pass and.)
==> So my question: With current technology, can you get a flat passband, sharp cutoff and linear phase all at the same time (in the analog domain)?
Thanks again for the fantastic video!
Yeah I got that misconception as well... But there's another issue that shows up as low frequency "beating" when you start to get close to the Nyquest limit but still under. I don't know what it's called but it's worth looking into.
The brick wall limiter was definitely the issue, I think it's particularly prominent in a mastering scenario... After you add up all the little tiny decrease in high frequency in all the audio stages plus all the filter chains... it becomes substantial enough to notice.
In the old days that was an issue. These days ADCs are sampling much faster internally and downsampling on the output, so the brick wall filter requirements are much less steep.
Comparing a 10khz sine wave to a 10khz square wave, only proves you have 10 more khz left to hear the aliasing artefacts of the higher overtones that you cannot hear. You are not hearing the 30khz overtone. You are hearing the aliasing artefacts at 20khz. That's what is making the slightly crisper sound. Harmonic distortion. One of the reasons why people prefer analogue sound over digital is because it sounds warmer and more organic. But that's not how the real music sounded. The digital version is a perfect replica of the original sound. The analogue version suffers harmonic distortion which is pleasing to the ear. The Minimoog analogue synth had such a great sound because Bob Moog miscalculated the resister values in the filter, causing the sound to distort in a pleasing way.
Good explanations of complex information, great demo ideas
This was great! Thank you. It seems that ultimately, this matters for producing, but not at all for listening. By time I get it to listen to, those high frequencies should have been long filtered out.
However, I wonder how many PC sound systems (windows, alsa, openal, etc) bother to apply a low pass to signals they downsample in order to avoid aliasing?
As someone who has repeatedly defended digital audio, including debunking false claims, I've been posting that Monty video for years. Great stuff that. Dan Lavry's White Paper has also been quite informative. I own a Lavry AD11 as well as a DA10 and record at 24/96 kHz most of the time for my songs, a handful of which you can find on Soundcloud. You can almost make out the AD11 under the desk behind my guitars in this video: th-cam.com/video/HwNNdziJbbw/w-d-xo.html.
I'm so happy I watched this video because a while back I watched a video which contained a sweep up to 20kHz and noticed that the sound cut off abruptly at 16kHz. I was unsure whether the culprit was TH-cam, some other link in my audio chain or if the limit of human hearing is experienced as a hard limit (intuitively, this didn't seem right).
I really need to have my hearing properly tested. I'm 38 now and I was "still in the game" comfortably up to 16kHz and I can definitely hear below 20Hz (I think it was somewhere around 16-18Hz when I stopped experiencing it as sound when I tested a while back). My mother told me that when I had my hearing tested as a kid by my school, they said my hearing was above average and that I could hear tones most couldn't. The funny thing is, the reason my hearing was being tested was because they thought I was deaf. My brother used to throw tantrums at home and I learned to "tune out" sounds I found annoying. Turns out I found the teachers annoying too.
Signal Engineer here. You also just made a brilliant explanation of Gibbs phenomenon in less than 1 minutes too!
Well done, brilliantly explained
Great video. Equally entertaining and informative as always!
GREAT video and explanation !
I won't mention the 439 vs. 493 😃
BTW, I used to work with Dan. Very smart guy. Nice guy too :)
Very informative, thanks John. Reminds me of my electrical engineering classes back in school :)
It's the other way round. The Gibbs phenomenon shows up, when there is NO aliasing. It is the result of running through the antialiasing-filter in a DAC. The antialiasing-filter rolls off all frequencies above 20kHz and the result are the squiggles around the edges. A perfect square wave has an infinite number of overtones, and when you cut those off with the antialiasing filter, the result is a band limited square wave, which exhibits Gibbs phenomenon.
Yeah I over stated the Gibbs part
When the film guy gives better explanation about sound stuff then actual sound guys...
Also, sound was recorded with distortion in the analog era, and now we crave that distortion, so why should we go against THIS form of aliasing distortion?
Don't bother. Life is Short. Record in 44.1k.
Great video :) Cheers!
Well, there are music styles that use aliasing as a stylistic element. But one thing dynamic distortion has going for it is that the frequency content it adds is harmonically related to the input and even with intermodulation (where there's a risk of losing this feature) you have some intervals where it remains so - hence the popularity of power chords, where the added frequency is an octave below the root note. So I doubt it will become something with popular appeal and your reasoning reminds me of the old "we went from triads to tertrachords and now romanticism is regularly using quintachords and sextachords, so obviously the next big thing will be 12-tone music".
The 16k cut off is probably the encoding setting TH-cam picked for the codec, not some hard filter they applied. Most perpetual encoders (AAC, MP3) will throw away high frequency content. I mean, it probably wasn’t a nefarious decision by TH-cam.
Of course it wasn't nefarious... But it was one annoying obstacle in trying to demonstrate this concept.
And then it's only on SOME of the streams, not all...
the issue is more to do with the fundamental frequency of the instruments we use and the way we construct music.
we only have drums in high frequency or transients of vocal sibilance.
we don't need to hear anything about 10khz ( except harmonics)
Absolutely fantastic work as usual. I'd love to see how this whole thing compares to analog sound though. I've only ever worked digitally before, but I've always been fascinated by the physical manifestation of sound and it's analog recordings.
Sound people that slow down sounds for sound effects etc, say they need more room like 192k. It's kinda like slowing down 120fps to 25fps in video.
I've heard that too -- but unless they have special scientific microphones designed to capture frequencies above human hearing, I'm not sure it matters.
I don't think it's about frequency width. It's about stretching the entire recording. When you stretch it makes everything thinner. Like pulling rubber band. Signal would get less resolution. Less data points. Through entire range of frequencies.
This guy talks about pitch and time stretching uses 96k recording. Sound effects for movies. th-cam.com/video/3o-Se8qQdDk/w-d-xo.html
Reasons for 96k recording. Time and pitch stretching. th-cam.com/video/3o-Se8qQdDk/w-d-xo.html
Problem is the rubber band analogy doesn't work because Nyquist does not work that way. Using 24khz, the audio would be EXACTLY the same as 48khz in every respect BUT only up to 12khz. So it's not that you have more data points - that doesn't matter when the audio is sent back to analog in the speakers.
I suspect the reason 96hz would be used for slow downed effects is the same reason I discussed in the video: headroom. With 96kz there's about an octave and change you can manuever around in without running into a Nyquist limit that dips into the perceivable range.
You skipped an important point: the steeper a filter, the greater the induced phase shift introduced to the signal. You can get around this using different types of filters, but those introduce other temporal artifacts (such as pre-ringing with linear phase filters). And crucially, just as an anti-aliasing filter is needed at the analog input to a digital system, a reconstruction filter is needed at the analog output from a digital system. Therefore, the primary advantage of higher sampling rates in audio is that one can use less steep anti-aliasing and reconstruction filters starting at higher frequencies well outside the audible range, but well below the Nyquist frequency-all while generating fewer artifacts within the audible range.
>You are watching this at 44.1 kHz, because that's what TH-cam converts every video to.
That's not entirely true. TH-cam has the audio available in two formats, AAC and Opus. For AAC it uses 44.1 kHz, but the Opus audio is in 48 kHz, as Opus does not support 44.1 by default. Which one is used depends on the platform; in most web browsers (except Safari) it usually uses Opus.
And of course, it might be converted to 48 kHz while being played back, and I'm not sure what they do to it internally, if they convert it to 44.1 and back to 48 for Opus.
And the low pass filter you mentioned near the end might be different depending on the audio format too. Which format did you download it in?
Edit: Apparently most of this has been commented already.
The only thing I can download using official means (and somewhat unofficial means) are the AAC 44.1 versions
@@FilmmakerIQ Otherwise, you can use youtube-dl, which let's you select format
15:20 John (from the future), audio above 16 kHz is possible on TH-cam these days, but depending on the download method you use, it may get re-encoded using an mp3 codec and cut everything above 16kHz. I recently brought up this topic in the comments of a VSauce3 video called "Could You Survive A QUIET PLACE?" because that video includes a sample of an 18kHz tone. Like you, I thought youtube had a hard limit at 16 kHz, but someone advised me in a since deleted comment how to download it in such a way that I wouldn't lose the upper frequencies and I was able to verify that the 18 kHz frequency was in fact there. The audio codec on that video is Opus.
When I download the video from TH-cam's own official means (in creator studio) I see the 16khz hard limit. If I cannot verify that TH-cam isn't cutting off the high end, there's no point in the test when I'm asking people to see if they can hear the high end.
@@FilmmakerIQ I'm sure it depends on the audio codec you used in the video file you uploaded too. TH-cam probably transcodes some but not other formats. Anyway, just thought I'd mentioned that is in fact possible somehow. I verified it using the spectrogram view in Adobe Audition, btw.
Maybe, I only tried uploading h.264 and when that didn't work I tried uploading a ProRes file with uncompressed audio. Both came back with hard limit on the audio.
@@FilmmakerIQ
TH-cam's official means only give you a very inferior 720p h.264 + hard-limited AAC media file. There are less-than-official means that actually allows you to download TH-cam videos at the qualities they're streamed at. That particular program even had their github taken down at some point and it caused a big fuss, but it's back now.
A small error:
The Nyquist frequency is the first one you can't reproduce not the last one you can. Imagine a sine wave that happens to cross through zero right at each sampling point and you will see why.
Yes that is correct
17:23
I can hear the CRT whine type noise
But tbh it just sounds like the 5.2khz sine wave + CRT whine, definitely not a square
That's all it can be...
5.2khz square WILL sound like only 2 waves since next wave is at 25kHz, and it does sound like CRT whine... becasue CRT whine is very close at 15,7kHz (PAL is a bit higher than NTSC) and is a result of line frequency and caused by magnets shoving the electron beam.
Speaking of CRTs. I happen to still hear up to 18kHz and it annoyed me SO MUCH that for so long people just used CRTs everywhere and not cared about the noise it creates... from supermarkets to some movies and tv shows 15,7kHz was everywhere.
Hi John, you're probably just explaining "aliasing" as a phenomenon, but the video title "baited me clicks" and now I'm all confused about if I disagree or not, how dare you! :D
Aliasing requires some knowledge on how to handle/avoid it, but the example with the generated square wave and its mirrored-back harmonics, to my understanding at least, only showed that that particular piece of software that was used had no way of properly handling the calculated harmonics for the chosen sample rate. It's not exactly a real-life use case.
I personally can't see any use for sample rates higher than maybe 48k for simple audio recording/playback (provided that your audio capture device does a good job at band-limiting).
I do see a potential use for higher sample rates (as in "the resolution in which your DAW project operates") during post-processing if you're using tools where you know that they produce aliasing themselves when the project runs at 44.1k or 48k (i.e. plugins that produce extra harmonic distortion and aren't internally oversampled).
Or if you record audio that you want to pitch down an octave or more.
Using such high sampling rates surely is much more demanding on any computer system compared to simple band-limiting, which should be done at the analog-to-digital stage anyway, whatever your chosen sample rate. So yes, we all want to avoid aliasing, absolutely. But using high sample rates isn't necessarily the best way to do that.
Does the "audio capture device" even sound good when recording at a higher rate? Many might support 192k, not all of them work well when doing that, some few cheaper ones might even sound worse. Or is it the opposite and your device *only* sounds good at higher sample rates? Perhaps the high frequency rolloff of your specific audio capture device starts so "early" that when using 48k you can actually hear it dampen the high frequency content?
My point, it's better to know your tools and make a decision based on that.
Also, love your channel!
"Showed that that particular piece of software that was used had no way of properly handling the calculated harmonics for the chosen sample rate." - I think that's the BENEFIT of that software.
It was simple test tone generator not a musical application. The test tone reveals something interesting about aliasing - a topic I've heard about but didn't truly understand until I saw the folding reflected aliases.
If the square wave had been constructed AND then band limited, then I would have never had the true revelation of what aliasing actually is - it would always be this mysterious filter thing that engineers talk about.
@@FilmmakerIQ Oh I agree, for revealing aliasing it was perfect!
Fun fact... People that lived with older TVs with noisy line-output transformers may have developed notches in their hearing at 15734Hz (NTSC) or 15625Hz PAL) although if they are that old they may not now hear much above 12kHz or so anyway (that's me at 63). I remembered this when you picked 5.2 and 15.6kHz for the demonstration. I also wondered how hard that 16kHz wall is that TH-cam apply, and would probably have gone with 5 and 15kHz or even 4 and 12kHz.
If interested, it's also instructive to construct square waves visually using a graphing calculator to help with understanding how each odd harmonic improves the squareness of the waveform, although I guess Audition can do that as well.
Great video, too, by the way.
Hello, apologies for being slightly picky. Electrical engineer here, I work with high speed data converters, >1Gbit type work. You nailed pretty much everything, but I’d like to make 1 tiny (but SUPER important) note. BTW, this mistake is even in some EE text books: nyquist theorem doesn’t say you should sample at twice the maximum frequency in your signal, but rather twice the maximum bandwidth of your signal. This ensures your entire signal falls within the first nyquist zone, allowing your anti-aliasing filter to cut out all unwanted signals. The reason square waves never work well, their bandwidth is technically infinite. Again, don’t want to take anything away from your video, great work!
Edit: I wrote that first paragraph not trying not get to technical, but I feel it leaves a bit to be desired. I don't do anything audio, and I just realized the maximum frequency in an audio signal usually describes the maximum bandwidth (please correct me if this assumption is wrong), making the first paragraph a distinction without a difference to audio folks. However, is it import to make that distinction between maximum frequency and bandwidth because it allows for the use of underdamping to still faithfully reproduce your signal. In my world, I often times want to digitize signals in the GHz range, but if i know my signals bandwidth, i can sample a much much MUCH lower frequency, and filter my output to any higher nyquist zone. In this case I am using the aliased signals to faithfully represent the initial signal. This technique requires pretty fancy filtering, and knowing the center frequency and bandwidth of your incoming signal. Often time we dont have that info.
I remember some article in an audiophile magazine about a study in the early days of CDs.
A recording company recorded a classic orchestra on both reel-to-reel tape and a PCM processor.
When played back to an audience, there was no clear line between the media. Depending on the piece played, the majority preferred one or the other.
The conclusion at this point was that each recording added some specific artifacts to the music, which might benefit one piece, but not the other.
After this, they went to analogue and digital mastered vinyl records and high end tape cassettes on one hand, CD on the other. All of the same performance.
Oddly enough, here the lines were defined more clearly. The digital camp voted for the CD, the analogue camp for vinyl and cassette.
Then one of the technicians had an idea: They went back to the master recordings, but added noise from a blank vinyl record or a blank tape.
The result was that everyone voted for their favoured medium. Vinyl enthusiasts picked up on the clicking noise from the blank record, the tape guys picked up the tape noise. So either consciously or subconsciously, they confirmed their bias.
I wish I could find that study online, maybe someone reading this can help?
Different sample rates, compression methods and bitrates affect music recordings. The artifacts become part of the music and some will prefer the sound of one type over another.
A lot of it also depends on how much care has been taken during production, from recording to mastering to compression of the publishing file.
The audible difference between low and high sample rate might be minuscule, but because more care has been taken to produce the high end recording, the result sounds better.
Now throw in confirmation bias, and everyone will say they are right because ...
I'm probably missing something big, but wouldn't high sample rate be useful for slowed down audio? Kind of like shooting in highres for legroom when it comes to editing and zooming in in post?
I think that would be entirely based on how the audio is slowed down. The thing to remember with Nyquist is that higher sampling rates do not actually give us more information in the frequency covered under the Nyquist limit, they give us a higher Nyquist limit.
@@FilmmakerIQ Thanks for the response. I need to go learn more 😅.
Perhaps, if you had not very good resampling algorithms. But nowadays even a laptop or table can run high quality resampling algorithms.
Thank you. I learned a lot.
Curiously, it's possible to hear frequencies up to at least 50KHz, it has been demonstrated with bone conduction experiments all the way back in the 50s or so; however it was also demonstrated that they basically don't really matter - it's impossible to tell apart frequencies above approximately 16.5 KHz, they all sound the same, and there is some hard anatomical reason for that, i forget. So you may perhaps actually want to capture a little ultrasonic energy, but you can fold it back into the band above 17-ish KHz.
Band limited synthesis of the square wave is a solved issue. I think the simplest way is additive synthesis from sines, which you cover right in this video. Since Adobe has ignored this well known insight, one can consider their square wave synthesizer buggy by design, maybe they made it this way to look good to amateurs, band limited square wave always looks like it's heavily ringing, even though it's not. Unfortunately a lot of algorithms and plug-ins have some aliasing or other sampling rate related issues such as "EQ cramping", either due to limited computational budget or by oversight. So high sample rate intermediates are sometimes good, though should be ever more rarely actually needed as far as DAWs, their builtin effects and generators, and higher end plugins are concerned. Audition probably doesn't have quite that professional an ambition for a silly effect.
Something to keep in mind that most recording devices don't truly have a configurable sampling rate at the lowest hardware level. The reason is that the analogue filter that would reject aliasing needs to be tuned to the sampling frequency, and you don't want to include the same hardware several times, and yet more hardware to switch between those variants, not only for cost, but also for noise and other degradation that ensues. So the internal sampling rate can be for example 384KHz, and often analogue anti aliasing filter will have a corner frequency of somewhere just north of 20KHz. So you have over 3 octaves of filter room, so at 36db/oct filter, there's like 110db of suppression for all the junk. Then the ADC will have an internal downsampling to something more palatable, like 48/96/192 KHz, and these are easily aliasing-free. This isn't entirely how modern ADCs work (keyword delta-sigma modulation), but it's not too unfair a simplified representation. If 44.1/88.2 KHz are desired, resampling happens elsewhere downstream, in a DSP or software, and of course then it's the issue of how much you trust that particular implementation to be low aliasing. Just 12 years ago, it was not uncommon to find fairly low quality sample rate conversion in a major DAW! It's not entirely trivial and fairly computationally taxing to get right. Things got a lot better since. But for a given audio interface, you shouldn't expect 48KHz mode to introduce any aliasing that you can avoid by recording at 96/192.
Besides aliasing, the other potential resampler behaviour trait is phase shift, which nominally isn't audible, but under circumstances can be.
I bet it's not 50 kHz sound you can hear but lower harmonics from that vibration exciting stuff in your body.
@@VioletGiraffe Harmonics are always above the fundamental, not below.
But indeed it has shown that there are no auditory hair cells that correspond to higher frequencies than about 16.5 KHz. And yet there is apparently a mechanism to excite them with a higher frequency signal.
yeah subharmonic resonances would make that possible, it's the same sort of thing where humans can (pretty easily) distinguish phenomena above 60hz - despite that when *staring dead-on at a screen*, your eyes can't tell the difference. but your cochlea has actual pitch-specific resonators; the hairs float in the resonator bit, and the resonator bit definitely does not have a 50khz band. so, yeah, it makes sense that you could identify the presence of sound in your environment that was generated by a 50khz emitter, but there is actually no possible way your brain could receive it as 50khz sound, it would be like seeing gamma rays or being touched above the top of your head - you can hallucinate the experience due to a real phenomenon in your environment, but it's not really representing reality correctly.
Any video that links to the Monty's video is good video 👍
The difference heard between sample rates and bit depth are only down to the various filters applied to the signal when the audio device change mode. The best practise is to calibrate one and only use it, and work on your room acoustics and modes. If ou change bit depth or sampling rate you have to re-do your calibrations. It is actually better to play various sample rate materials all down to the same calibrated output than to change the output to the same as the material input. But inside the computer keep all your files as they came. When playing only use the output that is well known and calibrated. My outpud DAC is outputting 20bits 48 khz because I need the 48khz for the bluetooth compatibility and I use 20 bits because I am snob (it's for a type of stream). If I had not other constraints I would use 44.1 and 16 bits because that's enough, even when I want to smash the ears with 115db. As long as it is balanced xlr out for analog, we are happy. There is no excuse for time domain (and yes whe hear 7mn time bounces) n excuse for depth, no excuse for noise, no excuse for ultrasound.
Great video and demonstration. I can clearly hear the difference between A and B when side by side but suspect I wouldn’t be able to discern the two if separated by 15 seconds of silence OR if comparing a more real world scenario with the organic complexities that our perception sorta smooths out when not listening to some sterile demonstration of sine vs square.
Maybe worth noting I’m by no means young (39) yet still heard the difference though am likely an outlier as my earring presently scores within the range of teenagers and I remember frequently being bothered by high frequency sounds that nobody else seemed to hear back when I actually was a teen. 20+ years of working at shows and touring with bands took care of that curse😂
You can hear the difference but did you hear the fake out I pulled ;)
Oh, that makes perfect sense, it's kind of frustrating that there's still so much confusion when the explanation is pretty graspable.
Great explanation
Thank you for this video. I want to listen to it probably many times to understand more what is happening. I also loved the Monty Montgomery video! It was so neatly presented, even I with no audio degree could understand it.
So to sum it up, if I understood it correctly:
✅ Audio engineers use high sample rate (96 kHz+) for recording to avoid aliasing (and therefore avoid any unwanted weird sounds)?
✅ For consumer audio playback (music, games, movies etc.) nothing more than 44.1 kHz is even needed for any human being? As it is a waste of system resources for no benefit.