Obligatory correction: The reason that 44.1 was chosen is not because they measured the threshold of human hearing at precisely 22.05, it's because it was "about 20kHz" and they needed to add a bit more on the high end as a transition band, which you might think of as...room for error in how they design the electronics. The reason it's 44,100 and not 44000 or 45000 (or 48,000) is related to how it was stored on old video recording systems. Today, if we didn't care about keeping to standards developed ages ago, it really just needs to be "40,000Hz + a bit more than that" The "Humans can hear 20kHz" thing is just a general guideline, not a hard and fast rule. Biology is not precise enough to say "22.05kHz is exactly the right amount." People vary too much for that. edit: Also, sampling a signal at twice the frequency is called the "Nyquist rate" and will give you alias-free sampling, which is why we use that. It's not just arbitrarily "Yeah, that seems like enough", it's a mathematical rule. That's not really super important to know, but it's a good term to Google if you want to know more.
+Joe Mills most people won't even hear a 17khz tone (which isn't a problem, there's nothing interesting for us above that), and if you think you can, double-check your system. And this whole video, while not saying huge BS, is technically rather vague.
@donald trump why should a speaker not be able to output 20kHz+? besides that it isn't particularly flat in that band because it's not designed for that. DAC makes sense, do you know why in detail?
i guess im asking the wrong place but does someone know a tool to get back into an Instagram account? I stupidly forgot the password. I love any tips you can offer me
and at 7:20 the Square wave myth!!! This has been disproven countless times. The dots on a waveform although they look like square waves on the screen are actually just average values of where the real waveform is. When it goes through your DAC a perfect sine-wave will ALWAYS be reproduced even at 2-bit 22Khz tone. Furthermore a Square wave is physically impossible in nature (check out Fourier expansion and adding sine-waves to represent square waves)
[not a reply to your comment, seems I can't post a new comment but only "reply"] Brainy guys: 1) What about the information theory (or whatever it's called) where it takes like 3 samples to reconstruct a sine wave?? So two samples of a 20khz wave at 44khz doesn't sound like enough. I dunno. 2) With regards to depth; why not crowd up the samples at lower volumes, in a logarithmic manner - the way we percieve sound. That way high frq, low volume sounds would have better reproduction. Again I dunno.
+schitlipz for the most part, many people are unable to hear up to 20k, and tbh, its not a huge loss. my own ears cut off around 18.5k. Unless you are gonna design software around audio, you wont have to worry if 44.1 is enough. For the most part, it is. just do your own hearing tests. If you are super curious though, look up the nyquist theory. and about bit depth, that one is a lot more complicated. the guy in the video wasnt entirely right when he explained it. i cant explain it to you in simplified terms. just look it up on youtube
The very first statement was wrong. The air doesn't move across a room until it hits your eardrums, that would imply that the air in the room moves at the speed of sound across a room. The air on average stays in the same place, but as it is an elastic medium the information it carries moves across the room at the speed of sound, so not so much like billiard balls passing on energy, but rather billiard balls attached by springs that return to where they were initially.
Working in IT and being a huge music lover, this gives me a huge level of appreciation for artists and their studio engineers on how much work goes into putting together an album with lots of tiny microdetails never really ever heard. Definitely makes me want to go out and buy some higher end audio rather than all the streaming MP3's we do nowadays.
Correction (at 7:10): The explanation of the noise ("grain") of very low volume signals is wrong. The computer doesn't "connect the samples with lines". Instead, whenever the computer measures the signal, it has to be rounded to the next associated integer value. This causes the so called quantization error - basically the rounding error. So the signal you hear upon playback is the original one plus the quantization error, which causes distortion (unless dithering is used, but that's another story). No square waves here.
Exactly, if we look at images and say we have only 2 bits to store intensity value of pixels so effectively 4 levels between highest and lowest intensity value, the lower values will be mapped to zero and the medium to higher all will be mapped to 1 and hence will create very bad looking photos as the detailing is gone
44.1 kHz was chosen as it captured all desired frequencies and it also just happened to be the perfect rate to store the samples digitally on PAL and NTSC videotape. This was used as the storage method for transporting the audio between locations. Later, when we didnt need to care about storing it on videotape we switched to 48 kHz as it helps with making filtering much simpler. Also you dont get square waves, thats impossible with a band limited signal. You can only get the original signal, smooth and non-square. 16 bits determines where the noise floor is and for playback 16 bits is more than enough to capture the dying cymbal. You use more bits, like 32 bit float, when editing before finally exporting to 16 bits. Editing with 24 bits or 32 bit floats helps because it gives you so much headroom to apply filters etc without adding any more noise that will be noticeable.
The hearing range is up to 16-20 kHz. The reason it's 44.1 kHz is because it allows to detect 22.05 kHz frequencies. The gap between 20 kHz and 22.05 kHz is because before the A/D converter there is a filter that cuts off anything above 22.05 kHz to avoid aliasing. That filter starts cutting off at 20 kHz and reaches -60dB at 22.05 kHz so that nothing audible is lost. If the filter cut-off at exactly 20 kHz (or much closer to it) it would introduce a lot of distortion in frequncies and phases.
+Paweł Palczyński Well, the actual reason why 44.1 KHz was chosen is because, originally, digital audio for CD production was recorded in Sony U-Matic tapes (yes, the format for analogue video), and the technical specifications of the tape made 44.1KHz the most obvious choice. Having a roll-off filter at the top of the spectrum is useful to avoid distortion, but the exact frequency you choose is not really very important; it's not like a lot of people can hear anything above 18 KHz.
+Joe Mills If you use bones (skull, jaw) to input sound instead of eardrum you can go even much higher. There is a upper cutoff frequency for air transmitted sound however, because at some point sound would need to be painfully loud for you to hear. A frequency at which it is not yet painful but you can still hear it as loud as 1kHz at 10^(-12) W m^(-2) intensity would be a highest you can hear then.
+Paweł Palczyński Actually 44,1 has nothing to do with any type of filter. You can always calculate and implement a slightly different one. There is one simple reason for 44,1 or 48 kHz was chosen. There were no HARD DISKS big nough at that time to perform all digital master. Remember it's 1970s. Only posibll method to make a master from witch you could make a matrice to press CD was to record digital signal on video tapes.This was pre BETA time. So only viable option was to use UMATIC (stanard video recorders in TV production of that period) I cant remember exact resolution in nowadays digital terms but it worked out you could record (in monochrome) 16bit/48kHz or 16bit/44.1kHz on tape running 30 frames per second (NTSC framerate) 44.1 was choosen for CD probably only because you could fit extra 8% running time on disk that way. As for argue that it is inferior to 48, that doesn't matter. First CD players from Phillips and Sony were equipped with 14 bit DAC's, and up for this day people value this players for pleasant sound.
Only partially correct about the bit-depth. It really only determines the noise floor (noise made from quantisation). Very important for recording. For consumer-playback - not so much. The human hearing threshold is generally regarded as 20kHz, but sensitivity drops way before that, and both the sensitivity and threshold lowers with age.
Hi Computerphile, great video. I'll just add a minor correction for you. Humans can hear up to 20kHz , not 22kHz. In fact, by the time people reach adulthood, the top end of hearing is closer to 16kHz. The reason a sampling frequency of 44.1kHz was chosen as a standard was not because it is twice that of 22050Hz. It's to do with a problem called aliasing. Any frequency content contained within a signal, which is above half of the sampling frequency will introduce low frequency alias signals (See Nyquist Theorem). This is the exact same reason helicopter blades appear to spin backwards or slowly in video. For audio we would like to capture frequency information up to 20kHz, thus determining the sample frequency to be 40kHz. The only problem is, any high frequency information above 20kHz will ruin the audio due to aliasing. So in addition we add a low pass filter, called an anti-aliasing filter. Filters can't have a really steep cut off without causing all sorts of distortion, so we need to leave a little bit of space in the frequency spectrum to fit one in. Hence we oversample at 44.1kHz to allow for that.
+Sam Smith I would add that 16KHz is IF you have looked after your hearing, e.g. wearing ear plugs at concerts and not blowing your ear drums out with awful dance music. I've done a few sample audio tests and found I can hear to around 17KHz and I am 44 years old. Certain sounds at specific frequencies cause me a significant amount of physical pain, however this had no effect on my mum who could not hear anything. So I wonder if the rate of decline in hearing is steady or suddenly drops off the cliff at a certain age?
Now I know why higher sample rates improve treble clarity, in a waveform high frequencies spike up and down a lot very quickly, a slow sample rate misses those spikes. Great video!
+Tyler Watthanaphand Yup, that's part of the Sampling Theorem. There's a rule that says "if you wanna capture perfect audio up to X KHz, then you need to sample *at a minimum" of 2X KHz." Search for Nyquist Frequency for more info... :-)
+Orestes Zoupanos Better results are achieved when you double the sampling frequency (according to the maximum frequency of the signal) and add a little more sampling cycles. Hence the 44.1 KHz, which is 22 KHz * 2 + 100 Hz. The extra 100 Hz prevent the signal from containing aliasing artifacts.
+Hugo Neves You're right, but he did basically say that. That's what he meant by _sample *at a minimum" of 2X KHz."_ . Granted, that doesn't really emphasise the benefit of the fudge factor. I guess it depends on how you think of it, it's either _Nyquist >= 2X_ or _2X+100 = Nyquist_.
+Hugo Neves they picked 44.1 because it was compatible with both PAL and NTSC video equipment. Early digital audio was stored on video cassettes. They needed a minimum of 40hz, and then extra room for the antialiasing filter, and it had to be divisible by both PAL and NTSC standards.
+Tyler Watthanaphand Sample rates above 44.1KHz do not improve audio quality at all. It doesn't matter if you go from 20Hz to 20KHz in an instant, at 44.1 it will all be perfect.
You also get "squares" at the top end when you compress to hard as the peak of the waves can be cut off. Any sharp edges are heard as a form of distortion.
Great video. Just thought I'd add to the chorus, and comment on an irony in digital audio. Back when CDs were introduced, everyone was trumpeting how amazing their 90+db of dynamic range was. Now, we're lucky to see discs released from major record labels that use more than 10db :-) And, in fact, most albums I've bought recently go one step worse, and use heaps of heavy digital clipping on all of the drum hits. A bit sad, I suppose....
Clipping on the drum tracks has a long history. Back in the day Motown got a great drum sound that way. But analog and digital clipping are different beasts. What sounds great on tape sounds really shitty in the digital domain.
If sampling of a signal with frequency higher than half of the sampling frequency occurs then the signal will not be 'cut off', it will transform into a signal with another frequency.
+Sparker oh, pardon, I "overlooked" that (shame that you can't use the word "overheard" in the english language like that) Yeah, I guess that can only ever be taken as an abstract concept where you "cut off" at an information threshold :/
I saw on another channel that the 44.1 kHz sampling rate was because we kept the human limit of hearing as 20 kHz and added a 2.05 kHz extra limit because low pass filters weren't accurate enough to cut off exactly above 20 kHz, so we added a bit of leeway. Double that and you end up with 44.1 kHz
Nice illustrations, but this was just the tip of the tip of the iceberg. It would be nice for further episodes about the topic of digital audio to mention Shannon, the logarithmic nature of the decibel scale, real-life annoyances like the noise floor and other concepts, e.g. bit rate, data compression (in FLAC, for example) vs data reduction (i.e. data loss, and that in more than one respect, like in MP3). It's a very interesting field of topics.
This is actually an important part of the Sound Engineering degree that I took. Understanding digital audio means that you need to understand the Nyquist Theorem to make the best decisions on how you're going to record a particular source for an end medium and really influences the quality of the final product. Digital audio processing has become an every day part of your average Sound Engineers job these days.
+Sandra Nicole Interestingly, this is sort of what happens with audio compression techniques. Many audio codecs chop up the continuous sound signal into short sample frames (of a few milliseconds long each) and convert the audio signal into discrete frequencies (with a Fourier Transform of some kind). Then a psychoacoustic model is applied to the frequency spectrum to eliminate details our ears care less about, and apply emphasis to those remaining that it does care about. This is sometimes done with a MEL Frequency analysis, thought every codec has its own method. Lastly, the remaining frequencies are packed in a compressed data stream, such as a Huffman tree, or LZW, or whatever the codec calls for. Decoding and reconstituting the original sound perfectly is of course impossible at this point, so they call these kinds of codecs "Lossy", since they throw out a lot of information to fit the most important parts of the sound into the least amount of data possible.
+Sandra Nicole as written, it does exist, but for 8bit audio. For 16bit it's simply not needed. And in fact, even 14bit is already enough for our perception, unless you like to crank up the volume a lot on quiet parts of a song.
How come you didn't mention the Nyquist-Shannon sampling theorem? I know you are trying to make it less 'technical' but sometimes it's nice to mention some of the more technical things.
Yeah, I will have to call BS about the part about the sound card outputting a square wave if the bit depth is too low. All the bit depth affects is the noise floor.
+Gordon Freeman he wasn't wrong, he was talking about in an audio editing situation, if you add two 24bit tracks together without halving the volume of each beforehand you will get clipping because you will likely have parts of the audio that are at lets say 40,000/65536 in each, add those together and you get 80,000 which is out of range of that 65536 so anything above 65536 will be cut off, if this is bad enough it would become a square wave, and the reason you wouldn't halve it beforehand even though that might seem simpler is because you will throw away some of the data in doing that, better to add them together in a higher bit depth and then convert that back down to 24bit afterwards or in the case of the final output 16bit. in terms of the final output file you are right but thats not what he was explaining.
Gordon Freeman this isnt about your average joe though, this applies to the audio engineers producing the audio in the first place before it gets compressed down to 16bit and put on a CD DVD or game
Gordon Freeman Yes its for the average joe, he was just explaining that anything higher than 16bit is only needed for audio that is going to be edited, not really rocket science.
megaspeed2v2 Well, he said that the computer draws lines to connect the samples, that is not what happens, any audio engineer would know that's not how you would explain it to the average joe. He also could have simplified it by saying that the bit depth affects the quietest sounds which don't end up sounding like square waves. He got quite a few things wrong, he could have done better.
I decided to learn something about digital audio, and ended up here (among other places). Great explanation, you should be a teacher the way you communicate.
1:46 with this information, i was able to deduce the solution to my staticky PC audio; I had the sample rate for audio output set to max setting of "24 bit, 192,000 Hz (Studio quality)," and once I lowered the sample rate to "24 bits, 96000 Hz (Studio quality)", all the static noise magically vanished! YAY, thank you, this had been harming me for years.
+johannes914 dynamic compression comes into the process after recording. It is technically a post production tool but can be used in other places like live performance. You would choose to use it when a sound source is very dynamic, in that the volume levels change a lot and the rate that they change is not predictable. Think of it basically as an AI fader that can analyse the input and quickly make a decision of how much volume breaking or boosting it requires.
+johannes914 ln simple terms, compression makes the loudest parts of the signal quieter which then means you can turn the whole thing back up without distortion.
I'd love to see a video about different audio formats, not in the .wav or .mp3 file type sense, but rather the encoding methods, like PCM, ADPCM, DPCM, PWM, etc.
+Benny Kolesnikov That usually doesn't happen because the cpu speed is WAY higher than the sampling frequency and between 2 samples the cpu has more than enough time to do whatever it needs. However i've seen it happen, it sounded like slow-mo with stuttering, it was really weird. And just before my pc crashed lol.
There is no need to go into floating point math and sampling theorem on the first video! This explanation is simple to follow and covers the basics very well. Maybe another video can go into more details on different aspects. Great video. Thanks :)
Very well said! There's a lot of analogies you can draw to photography and Photoshop. Clipping is like over-exposing an image. Sampling frequency is like pixel resolution. Bit-depth is like color-depth. In photography, ideally you want you final image to span the entire 24bit colorspace, with no obvious pixelation, and no obvious posterization. In audio production, you want your final track to span the entire 16bit _wavespace_ with no obvious digital distortions.
+JamesMulvale Well, they mostly use AAC (Advanced audio codec). HD quality (720p+) is usually accompanied by 256kbps audio and Non-HD (480p and lower) by 128 kbps audio. Now, I heard that due to AAC being better compression codec than MP3 and the quality of TH-cam's encoders, you can get the same result with 96 kbps AAC as with 128 kbps MP3. That 96kbps bitrate is used in MP4 360p video+audio stream on TH-cam. Also, 320 kbps AAC uploaded audio track is allegedly of same fidelity when they encode it at TH-cam AAC 256 kbps. At this point you get to *psychometric* measuring of *subjective quality* of sound. That's a whole other discussion. You people might want to look up "Fraunhofer Institute for Integrated Circuits IIS" for more information.
the point is you will never get 192khz/24bit audio on youtube its a waste of data. HD maxes out around 192kbps (which is NOT 192khz - it's samplerate and bitdepth combined). i can't be bothered arguing any more. bye
I agree that setting soundcard to 192kHz is useless, and I know the difference between bitrate (bps) and sample rate (Hz) or depth (bit). I was trying to add to your information. I didn't speak about sample rate because I don't understand what sample means in widely used lossy compression codecs vs. in WAVE (PCM) codec. Sorry to bother you. But original poster seemed to be ignorant so I replied to you. I'm, however, *not* a sound engineer, but this much I know. (No need to reply.) Bye
I know he was simplifying but to be sure, if you sample frequencies higher than twice the sample rate, you're not simply "cutting them off." It's actually much worse: you're introducing kind of "phantom frequencies" that weren't in the signal but turn up in your digital signal. Those are mirrored at the highest frequencies you can represent, so the higher you go into "you didn't filter properly" territory, the lower the frequency gets (until it can't anymore, then it gets higher again), thus being more off and mostly more noticeable, too.
7:15 The computer doesn't have to draw lines between the samples. They can just remain individual samples. When you output the signal it's smoothed out anyway (I assume the momentum of the moving bits in your speakers help with that).
Awesome to know how the technical part of my audio work works. Working with orchestral pieces and a lot of low and a lot of high and those 'lingering' cymbal notes I did work at the higher settings like 24 bit. Good to know why exactly I have to do so on a technical level.
if the wave does not fit in the bit depth , why don't we just increase the bit depth like 32 bit or 48 bit ? Another question is why not bit depth is 16 bit or 32 bit ? why 24 bit ?
+Alex Lee Most humans can't detect the difference between an analog audio signal and a digital audio signal with 16 bits/sample. Also, when you use higher and higher bit depths, you need better digitization equipment. The least significant bits tend to be mostly noise if you don't possess a good equipment.
+Muztaba Hasanat Every wave *could* fit in 16bit. It's just that for mixing and recording they require overhead. This is because every time you manipulate a wave a little bit, the volume of every sample needs to be described using a number between -16,000 and +16,000 (or so). If you change the volume for a sample and the maths ends up saying 'give this sample a volume of 5,256.6'. But you can't have floating point numbers (no decimal places). So it rounds it up to 5,257. You can't hear this change in volume, but it over many thousands of changes to a sample this begins to create a fairly significant change. a combination of lots of differences in .5 from what the wave should be. Done to many samples in a song, this creates a quiet noise. Larger bit depths make this noise much quieter. So that is why 24-bit (or 32bit) is generally used in mixing. The final master doesn't need this because no more changes will be made and the noise doesn't have a chance to get loud. 16-bit is more than enough for even hi-fi listening. 16-bit gives enough control over volume that the quietest sounds are around -120 dB, which is well out of any human's hearing range.
I hope someone can answer me on this: From 2:10 he says that we only need a low sample rate (or sample frequency as he calls it) to represent the low frequency sine wave. But why doesn't that produce a sort of triangle shape? As he says later on in the video, the computer thinks it should go straight from point to point, and if it did that with the low sample rate, it would not produce a sine wave sound. I think I watched video explaining this once, but I can't find it now... I believe it has something to do with dithering? or something idk
I get that part, and that wasn't really my question. I can try and explain it differently: If you know a bit about sound and its digital representation, you'd know how a sine wave sounds and how a triangle wave sounds. To me it seems like a triangle wave needs much fewer "points" to represent that wave, it actually only needs the maximum value and the minimum value. But the slope of the sine wave is constantly changing, and so to me it seems like it would need an infinite amount of points to represent that sound wave.
+paulcmnt This is entirely incorrect. Photos are not equivalent to audio. Just like pixels, audio files contain a bunch of 'point samples' however unlike pixels, they are not represented as 'squares' or as a flat section of the audio waveform. Whenever you listen to a digital file, it goes through the DAC, which looks at all of the point samples and then finds the only possible combination of sin waves that will fit, and outputs that sin wave as a perfectly smooth voltage change (analogue signal).
+Alpha Kay His explaination is essentially wrong - there is no "point to point" in the reconstruction. (And there are no "stair steps" either.) Any competent DAC will produce a lovely sinusoidal output, regardless of the bit depth or sample rate. Please watch the video I linked to earlier; it does a superb job explaining and demonstrating how all this works. :)
+Alpha Kay The digital -> analog converter understands this and "fills in" the "missing" sine wave information in a smooth, intelligent, accurate manner.
+Dirty Robot I wouldn't do that considering the sad economic state the audio engineering industry is right now. Lots of really good sound engineers go months upon end without gigs these days.
+morgogs Depends if you want to go study or you want to dive in. When I made my choice there were no courses you could do but I had a fair bit of experience so I contacted all the recording studios in my area and took a low paid position then worked my way up.
I just synthesised a wave in Sound Forge, then I boosted it and it 100% squared the wave off, I also reduced the bit rate and it made the wave much more blocky. His explanation seems very good to me. They even show the squared off wave when he shouts in the video. I'm sure there are a lot more details to why and how it does this but you cannot deny that the waves are somewhat cut off and look square. Download audacity and try it yourself.
Working with audio is super interesting, I love it. Analogue and digital sounds are great fun to play with, i would recommend FL Studios demo for anyone interested in becoming a sound engineer :P
+Kath Alave Go watch D/A and A/D | Digital Show and Tell (Monty Montgomery @ xiph.org) here in youtube, it gives you a better explanation than this video.
I remember my first soundblaster card, it could not only sample at 44.1kHz, but also 22.05kHz, and I think even something lower than that (16kHz?) - it also had an option to record at a bitrate down to 8 bits. So basically, it sounded like talking through an old telephone. But then, this was an amount of data the PCs of back in the day could handle much better. It was an old 386, and the soundcard could not use a bitrate any higher than 16 bits.
My DAW has settings up to 192,000 Hz, are there any benefits or downsides of using a sample rate this high? Considering the "industry standard" is significantly lower, what applications make use of this sample rate ?
+OnixRose almost none. 192 will give you a huge file size and slow your sessions down. On top of that, there is scientific evidence that ultra high sample rates actually sound worse (intermodulation distortion). Plugins and DAWs like higher sample rates because it gives them more information to process. as a result, most plugins up sample to 88.2 or 96k when your session is running at 44.1/48k. I wouldn't bother using higher than 96k. 96 would be a good sample rate to run at if you think you're going to be doing a lot of time stretching or similar processing. Other than that, you cant really go wrong with 48k.
+Philip Stuckey The ideal scale would depend on what kinds of sounds you are sampling. More often than not equipment we have access to is far simpler and falls into linear scale.
Amazing video! It would be great if you pick up from here to talk a little about digital compression and the infamous Loudness Wars, which would make a great title! The War for your Ears! lol Great work. Cheers to you guys.
Drawing lines between points? Sorry but that's just not true. That would produce a lot of aliasing and distortion (like square waves as he said) and that's certainly not what happens. DA converter draws smooth curve with reconstruction filter. Only thing you loose with lowering bit depth is data lost with quantizing, which makes noise floor louder. I think you should correct that statement.
+David Domminney Fowler Please don't do that, and if you feel it is necessary to oversimplify to such an extent pleast specify in the video "this is something of an oversimplification". Because this video is downright incorrect in some cases.
+David Domminney Fowler, I agree with +Tommy59375 completely. I watch the *phile videos precisely because of the way that masters of their craft are able to explain deeply complicated concepts without distortion or oversimplification. These aren't buzzfeed videos.
I guess floating point audio formats can help with two of the problems. The headroom before digital clipping occurs and the fidelity at very low volumes.
Rather inaccurate video imho. 1. It's not 44.1KHz because the limit of human hearing is 22.05KHz. It has to do with the upper limit of human hearing and the choice of anti-aliasing filter (transition band width). The assumed human hearing range is roughly 20Hz to 20KHz. The Nyquist sampling theorem tells us that the sampling rate should be at least twice the max frequency of the signal (so 40Khz). The problem now is that our original signal contains frequency above 20KHz, if we try to sample at 40Khz aliasing will occurs (frequency above 20KHz fold into the hearing range). The signal must be low pass filtered (anti-aliasing filter). We can't perfectly cut frequency right at 20KHz, in practice a transition band is necessary. For practical and economic reasons, a 2.05KHz transition band was chosen. Now our signal contains frequency from 20Hz to 20+2.05 = 22.05KHz. Back to Niquist, we need to sample at 44.1KHz. 2. You DON'T end up with square wave. You could use 4 bits per sample and you still would not get square wave. You'll get a ton of quantization error (rounding error) and the signal will be drawn into noise (low SNR). Moreover, the computer doesn't draw line between point. It finds a continuous signal from the sequence of samples, there is a unique solution.
If getting the small details for quiet notes is an issue, why not use a logarithmic scale, where values are further apart at higher intervals? Then you could keep detail at small volumes when you need it and lower resolution at high values where you don’t. Also, why not use a relative scale, where you mention how much the wave changes each sample? That way you'd have no upper or lower end.
5:25 Why use negative volumes? If you didn't, you'd get double the bit depth, and you wouldn't have to deal with phase cancellation. I'm sure there's a good reason why, so tell me
NeonsStyle Nope. Summing audio is fine, you just have to make sure your audio is not out of phase. Keep in mind that for this to happen, both waves would have to be on the same channel, so you wouldn't hear anything if that happened. What may happen, though, is you may have a voiceover that you recorded in stereo incorrectly, and you have it with the right phase on the left channel, but the right channel has it inverted. Guess what happens when someone watches your video on a phone speaker or anything that mixes down to mono? Your voice disappears! Well... almost. Since TH-cam codes it stuff into AAC/Opus, your viewer will only hear artifacts that slightly resemble your voice.
NeonsStyle Audacity for recording? All right! For processing? Yeah, no. You should use your video editor instead, with a nice free VST compressor. Don't exaggerate on the ratio and attack. If you're looking for a deep radio voice, you'll want a multiband compressor. Have a tight compressor (ratio of 3?) on a band that goes from 0hz to around 90/100hz with a generous makeup gain. You might want to get a small compressor on the 200hz-500hz band, but with NO MAKEUP GAIN, just for controlling the mud. For brightness, compress midly the ~6khz band (have a Q that goes from 4.5khz to 8khz if you want) with a little makeup gain. Just don't exaggerate, any ratio above 4 is too much! Remember to add a highpass before any processing (except noise removal, which if you're going to use, I recommend iZotope's RX3 and a mild setting) around 20hz. If you're getting too much noise, lay down a bit of compression and get a little non-aggressive gate going (just don't make it too obvious). By the way, I'm not the guy in the video, just an audio freak! :)
One thing I wonder is about the sounds we cannot hear but still have an effect on us, like when happened in a laboratory that at night people started to get visual artifacts that where traced to I guess infra sound/noise made by an air conditioner or something like that, how do you treat that, or would use in a scary film for instance, or if they are totally cut out. Also there was a buzzing/fan/whatever noise in the background all the way through that was kinda... distracting.
If you would like to know more about the algorithms for converting from a higher sampling rate to another ( _downsampling_ , e.g. recorded @48KHz then written to a CD @44.1KHz) look up the term _dithering_ in relation to digital audio.
Obligatory correction: The reason that 44.1 was chosen is not because they measured the threshold of human hearing at precisely 22.05, it's because it was "about 20kHz" and they needed to add a bit more on the high end as a transition band, which you might think of as...room for error in how they design the electronics. The reason it's 44,100 and not 44000 or 45000 (or 48,000) is related to how it was stored on old video recording systems. Today, if we didn't care about keeping to standards developed ages ago, it really just needs to be "40,000Hz + a bit more than that"
The "Humans can hear 20kHz" thing is just a general guideline, not a hard and fast rule. Biology is not precise enough to say "22.05kHz is exactly the right amount." People vary too much for that.
edit: Also, sampling a signal at twice the frequency is called the "Nyquist rate" and will give you alias-free sampling, which is why we use that. It's not just arbitrarily "Yeah, that seems like enough", it's a mathematical rule. That's not really super important to know, but it's a good term to Google if you want to know more.
+OneBigBug Exactly, there needs to be room for the antialiasing filter
+OneBigBug +1 for Nyquist (aka Shannon-Nyquist sampling theorem), i was a little disappointed because this great fact was missing from the video
+Joe Mills It sounds more like distortion from the speaker trying to play that loudly than a 24Khz tone though.
+OneBigBug 48kHz was used on DAT tape to prevent straight digital copying from CD which are 44.1kHz. Yes, 48kHz was used to combat piracy.
+Joe Mills most people won't even hear a 17khz tone (which isn't a problem, there's nothing interesting for us above that), and if you think you can, double-check your system.
And this whole video, while not saying huge BS, is technically rather vague.
That's why I always record at 88kHz, so my dog can appreciate the fine notes of the super piccolo.
hahaahha
HE IS NOT SPEAKING FOR DOGS LOL HAHAHAH
@donald trump why should a speaker not be able to output 20kHz+? besides that it isn't particularly flat in that band because it's not designed for that.
DAC makes sense, do you know why in detail?
i guess im asking the wrong place but does someone know a tool to get back into an Instagram account?
I stupidly forgot the password. I love any tips you can offer me
@Junior Castiel Instablaster :)
and at 7:20 the Square wave myth!!! This has been disproven countless times. The dots on a waveform although they look like square waves on the screen are actually just average values of where the real waveform is. When it goes through your DAC a perfect sine-wave will ALWAYS be reproduced even at 2-bit 22Khz tone. Furthermore a Square wave is physically impossible in nature (check out Fourier expansion and adding sine-waves to represent square waves)
That was the first "gotcha" I spotted in this video as well.
Glad to see that someone pointed this out. It's incredible how almost everybody got this wrong, even pros.
I'd love to see more videos on digital audio, sound recording and editing.
[not a reply to your comment, seems I can't post a new comment but only "reply"]
Brainy guys:
1) What about the information theory (or whatever it's called) where it takes like 3 samples to reconstruct a sine wave?? So two samples of a 20khz wave at 44khz doesn't sound like enough. I dunno.
2) With regards to depth; why not crowd up the samples at lower volumes, in a logarithmic manner - the way we percieve sound. That way high frq, low volume sounds would have better reproduction. Again I dunno.
+schitlipz for the most part, many people are unable to hear up to 20k, and tbh, its not a huge loss. my own ears cut off around 18.5k. Unless you are gonna design software around audio, you wont have to worry if 44.1 is enough. For the most part, it is. just do your own hearing tests. If you are super curious though, look up the nyquist theory.
and about bit depth, that one is a lot more complicated. the guy in the video wasnt entirely right when he explained it. i cant explain it to you in simplified terms. just look it up on youtube
The very first statement was wrong. The air doesn't move across a room until it hits your eardrums, that would imply that the air in the room moves at the speed of sound across a room. The air on average stays in the same place, but as it is an elastic medium the information it carries moves across the room at the speed of sound, so not so much like billiard balls passing on energy, but rather billiard balls attached by springs that return to where they were initially.
Working in IT and being a huge music lover, this gives me a huge level of appreciation for artists and their studio engineers on how much work goes into putting together an album with lots of tiny microdetails never really ever heard. Definitely makes me want to go out and buy some higher end audio rather than all the streaming MP3's we do nowadays.
Brian Pacheco yeah my sound system is pixel perfect and my sound system has microdetails that no other sound system has
When I had the HD800s they were great for micro-detail but I realized that it just gets fatiguing after a while and doesn't sound natural.
my tweeters and subwoofer has 21 watts but my mid range woofer has 190 watts why does my mid range woofer need so many watts ?
Correction (at 7:10): The explanation of the noise ("grain") of very low volume signals is wrong. The computer doesn't "connect the samples with lines". Instead, whenever the computer measures the signal, it has to be rounded to the next associated integer value. This causes the so called quantization error - basically the rounding error. So the signal you hear upon playback is the original one plus the quantization error, which causes distortion (unless dithering is used, but that's another story). No square waves here.
Exactly, if we look at images and say we have only 2 bits to store intensity value of pixels so effectively 4 levels between highest and lowest intensity value, the lower values will be mapped to zero and the medium to higher all will be mapped to 1 and hence will create very bad looking photos as the detailing is gone
Dither is a topic that could get its own video too
44.1 kHz was chosen as it captured all desired frequencies and it also just happened to be the perfect rate to store the samples digitally on PAL and NTSC videotape. This was used as the storage method for transporting the audio between locations.
Later, when we didnt need to care about storing it on videotape we switched to 48 kHz as it helps with making filtering much simpler.
Also you dont get square waves, thats impossible with a band limited signal. You can only get the original signal, smooth and non-square. 16 bits determines where the noise floor is and for playback 16 bits is more than enough to capture the dying cymbal. You use more bits, like 32 bit float, when editing before finally exporting to 16 bits. Editing with 24 bits or 32 bit floats helps because it gives you so much headroom to apply filters etc without adding any more noise that will be noticeable.
The hearing range is up to 16-20 kHz. The reason it's 44.1 kHz is because it allows to detect 22.05 kHz frequencies. The gap between 20 kHz and 22.05 kHz is because before the A/D converter there is a filter that cuts off anything above 22.05 kHz to avoid aliasing. That filter starts cutting off at 20 kHz and reaches -60dB at 22.05 kHz so that nothing audible is lost. If the filter cut-off at exactly 20 kHz (or much closer to it) it would introduce a lot of distortion in frequncies and phases.
+Paweł Palczyński That is some cool extra info, thanks for sharing!
+Paweł Palczyński Something like a badgap?
+Paweł Palczyński Well, the actual reason why 44.1 KHz was chosen is because, originally, digital audio for CD production was recorded in Sony U-Matic tapes (yes, the format for analogue video), and the technical specifications of the tape made 44.1KHz the most obvious choice. Having a roll-off filter at the top of the spectrum is useful to avoid distortion, but the exact frequency you choose is not really very important; it's not like a lot of people can hear anything above 18 KHz.
+Joe Mills If you use bones (skull, jaw) to input sound instead of eardrum you can go even much higher. There is a upper cutoff frequency for air transmitted sound however, because at some point sound would need to be painfully loud for you to hear. A frequency at which it is not yet painful but you can still hear it as loud as 1kHz at 10^(-12) W m^(-2) intensity would be a highest you can hear then.
+Paweł Palczyński Actually 44,1 has nothing to do with any type of filter. You can always calculate and implement a slightly different one. There is one simple reason for 44,1 or 48 kHz was chosen. There were no HARD DISKS big nough at that time to perform all digital master. Remember it's 1970s. Only posibll method to make a master from witch you could make a matrice to press CD was to record digital signal on video tapes.This was pre BETA time. So only viable option was to use UMATIC (stanard video recorders in TV production of that period) I cant remember exact resolution in nowadays digital terms but it worked out you could record (in monochrome) 16bit/48kHz or 16bit/44.1kHz on tape running 30 frames per second (NTSC framerate) 44.1 was choosen for CD probably only because you could fit extra 8% running time on disk that way. As for argue that it is inferior to 48, that doesn't matter. First CD players from Phillips and Sony were equipped with 14 bit DAC's, and up for this day people value this players for pleasant sound.
Only partially correct about the bit-depth. It really only determines the noise floor (noise made from quantisation). Very important for recording. For consumer-playback - not so much.
The human hearing threshold is generally regarded as 20kHz, but sensitivity drops way before that, and both the sensitivity and threshold lowers with age.
Yes. There's a lot of confusion about bit depth and resolution. For consumer playback there is no benefit as you say.
I can hear up to about 15k at 38. And a 15k tone really isnt very interesting!
Hi Computerphile, great video. I'll just add a minor correction for you. Humans can hear up to 20kHz , not 22kHz. In fact, by the time people reach adulthood, the top end of hearing is closer to 16kHz. The reason a sampling frequency of 44.1kHz was chosen as a standard was not because it is twice that of 22050Hz. It's to do with a problem called aliasing. Any frequency content contained within a signal, which is above half of the sampling frequency will introduce low frequency alias signals (See Nyquist Theorem). This is the exact same reason helicopter blades appear to spin backwards or slowly in video. For audio we would like to capture frequency information up to 20kHz, thus determining the sample frequency to be 40kHz. The only problem is, any high frequency information above 20kHz will ruin the audio due to aliasing. So in addition we add a low pass filter, called an anti-aliasing filter. Filters can't have a really steep cut off without causing all sorts of distortion, so we need to leave a little bit of space in the frequency spectrum to fit one in. Hence we oversample at 44.1kHz to allow for that.
+Sam Smith I would add that 16KHz is IF you have looked after your hearing, e.g. wearing ear plugs at concerts and not blowing your ear drums out with awful dance music. I've done a few sample audio tests and found I can hear to around 17KHz and I am 44 years old. Certain sounds at specific frequencies cause me a significant amount of physical pain, however this had no effect on my mum who could not hear anything. So I wonder if the rate of decline in hearing is steady or suddenly drops off the cliff at a certain age?
Eight years later, and I see this wonderful comment... Even tho I still can't quite make sense of aliasing.
Now I know why higher sample rates improve treble clarity, in a waveform high frequencies spike up and down a lot very quickly, a slow sample rate misses those spikes. Great video!
+Tyler Watthanaphand Yup, that's part of the Sampling Theorem. There's a rule that says "if you wanna capture perfect audio up to X KHz, then you need to sample *at a minimum" of 2X KHz." Search for Nyquist Frequency for more info... :-)
+Orestes Zoupanos Better results are achieved when you double the sampling frequency (according to the maximum frequency of the signal) and add a little more sampling cycles. Hence the 44.1 KHz, which is 22 KHz * 2 + 100 Hz. The extra 100 Hz prevent the signal from containing aliasing artifacts.
+Hugo Neves You're right, but he did basically say that. That's what he meant by _sample *at a minimum" of 2X KHz."_ . Granted, that doesn't really emphasise the benefit of the fudge factor.
I guess it depends on how you think of it, it's either _Nyquist >= 2X_ or _2X+100 = Nyquist_.
+Hugo Neves they picked 44.1 because it was compatible with both PAL and NTSC video equipment. Early digital audio was stored on video cassettes. They needed a minimum of 40hz, and then extra room for the antialiasing filter, and it had to be divisible by both PAL and NTSC standards.
+Tyler Watthanaphand
Sample rates above 44.1KHz do not improve audio quality at all.
It doesn't matter if you go from 20Hz to 20KHz in an instant, at 44.1 it will all be perfect.
a more in-depth explanation can be found by searching: Digital Show and Tell Monty Montgomery on youtube
+omgimgfut
Oh yes, that video explains it better.
+omgimgfut Was about to comment the exact same thing but you beat me to it :)
For some reason, I can only thumbs up your comment once. I was trying to thumbs up it a billion times.
It's not just more in depth. It's also the correct explanation.
Ha, I just posted the same thing.
I'm loving the audio stuff, between this and Sixty Symbols, I've learned a lot.
You also get "squares" at the top end when you compress to hard as the peak of the waves can be cut off. Any sharp edges are heard as a form of distortion.
Great video. Just thought I'd add to the chorus, and comment on an irony in digital audio. Back when CDs were introduced, everyone was trumpeting how amazing their 90+db of dynamic range was. Now, we're lucky to see discs released from major record labels that use more than 10db :-) And, in fact, most albums I've bought recently go one step worse, and use heaps of heavy digital clipping on all of the drum hits. A bit sad, I suppose....
+HandyAndy Tech Tips that's a consequence of the Loudness War...
Clipping on the drum tracks has a long history. Back in the day Motown got a great drum sound that way. But analog and digital clipping are different beasts. What sounds great on tape sounds really shitty in the digital domain.
excellent video... I hope there's more coming from this guy; Amazing, thanks for the upload.
I'd love to hear more from this guy!
I actually followed this whole thing. great explanation
Ha! I remember asking for a video on this topic ages ago! Its awesome that we finally got it!
If sampling of a signal with frequency higher than half of the sampling frequency occurs then the signal will not be 'cut off', it will transform into a signal with another frequency.
+Sparker yes, nobody said otherwise... they were only talking about cutting off when adding waves past the bit depth, weren't they?
Benjamin Philipp No, I'm talking about the 'cutting off' at 2:24.
+Sparker oh, pardon, I "overlooked" that (shame that you can't use the word "overheard" in the english language like that)
Yeah, I guess that can only ever be taken as an abstract concept where you "cut off" at an information threshold :/
Really nice explanation, thank you so much, been looking for a decent explanation for ages
I saw on another channel that the 44.1 kHz sampling rate was because we kept the human limit of hearing as 20 kHz and added a 2.05 kHz extra limit because low pass filters weren't accurate enough to cut off exactly above 20 kHz, so we added a bit of leeway. Double that and you end up with 44.1 kHz
and it's true !
It was also the prefect rate for data storage methods used at the time.
Awesome video. More like this, please.
Amazing work ! I had wanted to understand audio processing for a while now. Thank you for the lovely explanation and delivery.
Nice illustrations, but this was just the tip of the tip of the iceberg.
It would be nice for further episodes about the topic of digital audio to mention Shannon, the logarithmic nature of the decibel scale, real-life annoyances like the noise floor and other concepts, e.g. bit rate, data compression (in FLAC, for example) vs data reduction (i.e. data loss, and that in more than one respect, like in MP3). It's a very interesting field of topics.
Wow! This is just what I was waiting for! Thank you!
3bit are more than enough for the average loudness war song.
MovingThePicture true, but 4-bit is much easier to process
This is actually an important part of the Sound Engineering degree that I took. Understanding digital audio means that you need to understand the Nyquist Theorem to make the best decisions on how you're going to record a particular source for an end medium and really influences the quality of the final product. Digital audio processing has become an every day part of your average Sound Engineers job these days.
I love the square wave! Stop hating on the square wave! :D
+Mister Softy Pulse waves with different duty cycles... they're good for everything!
+Mister Softy
LOL, I'll have one of everything thanks Mr Fourier.
+Mister Softy Square waves can be dangerous depending on the size of your ship.
After "Bitshift Variations in C minor" I have a special place in my heart for sawtooth waves.
8 bit video games have some bomb music.
Could you do sound with a logarithmic scale? such that the shorter waves can get more detail than the larger ones?
Sandra Nicole Yeah, more like a float rather than integer :)
+Sandra Nicole Interestingly, this is sort of what happens with audio compression techniques. Many audio codecs chop up the continuous sound signal into short sample frames (of a few milliseconds long each) and convert the audio signal into discrete frequencies (with a Fourier Transform of some kind). Then a psychoacoustic model is applied to the frequency spectrum to eliminate details our ears care less about, and apply emphasis to those remaining that it does care about. This is sometimes done with a MEL Frequency analysis, thought every codec has its own method. Lastly, the remaining frequencies are packed in a compressed data stream, such as a Huffman tree, or LZW, or whatever the codec calls for. Decoding and reconstituting the original sound perfectly is of course impossible at this point, so they call these kinds of codecs "Lossy", since they throw out a lot of information to fit the most important parts of the sound into the least amount of data possible.
+Sandra Nicole as written, it does exist, but for 8bit audio. For 16bit it's simply not needed. And in fact, even 14bit is already enough for our perception, unless you like to crank up the volume a lot on quiet parts of a song.
Love it! More! For instance a video about the geeky side of compression. It would be nice to understand what I'm doing when I twiddle them knobs...
How come you didn't mention the Nyquist-Shannon sampling theorem?
I know you are trying to make it less 'technical' but sometimes it's nice to mention some of the more technical things.
+Teh Arbitur I was wandering the same thing
Excellent succinct explanation! Ironic that the audio had a hum throughout.
Yes, it is some kind of fan. But if the video editor heard it he would have removed it. That is why i suppose he did not hear it.
Dave's PC was by his knee, the fan was running at different speeds throughout the video so not easy to remove.
So it was Murphy's law :)
+Yan Wo It was on purpose, obviously
+Yan Wo They had a recording session of meditating monks in doing a new album. ;)
Instruction Clear. Successfully picked up acoustic waves through my red cat. Thank you.
This is exactly what I have been wondering about lately. I just got a Zoom H5 and was confused about all the recording settings! This explained it.
Yeah, I will have to call BS about the part about the sound card outputting a square wave if the bit depth is too low.
All the bit depth affects is the noise floor.
+Gordon Freeman he wasn't wrong, he was talking about in an audio editing situation, if you add two 24bit tracks together without halving the volume of each beforehand you will get clipping because you will likely have parts of the audio that are at lets say 40,000/65536 in each, add those together and you get 80,000 which is out of range of that 65536 so anything above 65536 will be cut off, if this is bad enough it would become a square wave, and the reason you wouldn't halve it beforehand even though that might seem simpler is because you will throw away some of the data in doing that, better to add them together in a higher bit depth and then convert that back down to 24bit afterwards or in the case of the final output 16bit. in terms of the final output file you are right but thats not what he was explaining.
megaspeed2v2
Fair enough, but your average joe will never get into contact with 24bit audio.
Gordon Freeman this isnt about your average joe though, this applies to the audio engineers producing the audio in the first place before it gets compressed down to 16bit and put on a CD DVD or game
Gordon Freeman Yes its for the average joe, he was just explaining that anything higher than 16bit is only needed for audio that is going to be edited, not really rocket science.
megaspeed2v2
Well, he said that the computer draws lines to connect the samples, that is not what happens, any audio engineer would know that's not how you would explain it to the average joe.
He also could have simplified it by saying that the bit depth affects the quietest sounds which don't end up sounding like square waves.
He got quite a few things wrong, he could have done better.
Best explanation of how Soundwaves are converted to digital. Thank you much!
Whilst I agree this is a simplification, this is pretty much what A Level music technology teaches and it's explained well.
Very nice video guys!
I decided to learn something about digital audio, and ended up here (among other places). Great explanation, you should be a teacher the way you communicate.
quiet signals don't sound grainy because they are square waves, it's because of quantization noise.
1:46 with this information, i was able to deduce the solution to my staticky PC audio; I had the sample rate for audio output set to max setting of "24 bit, 192,000 Hz (Studio quality)," and once I lowered the sample rate to "24 bits, 96000 Hz (Studio quality)", all the static noise magically vanished! YAY, thank you, this had been harming me for years.
Great show!
Hey you got the same speakers as I. Gotta love the alesis!
Please explain when dynamic compression comes in the process ...
+johannes914 It comes into the process, way overused, once a record company is involved.
rhoyt15 Yeah, I'm stereotyping. But 9 times outta 10, the record companies don't use it correctly.
+johannes914
dynamic compression comes into the process after recording.
It is technically a post production tool but can be used in other places like live performance.
You would choose to use it when a sound source is very dynamic, in that the volume levels change a lot and the rate that they change is not predictable.
Think of it basically as an AI fader that can analyse the input and quickly make a decision of how much volume breaking or boosting it requires.
+johannes914 ln simple terms, compression makes the loudest parts of the signal quieter which then means you can turn the whole thing back up without distortion.
*****
If someone gives too much mic, enough to damage the recording or performance then you just lost your job.
Fascinating and well explained.
I'd love to see a video about different audio formats, not in the .wav or .mp3 file type sense, but rather the encoding methods, like PCM, ADPCM, DPCM, PWM, etc.
So sample frequency is the "fps" of sounds ?
+BOBOUDA More like Vsync on a monitor. At least when converting from analog to digital =)
+Patrick The Buried Then what happens when your cpu stalls for a bit?
+BOBOUDA Pretty much, yes (analog audio/ live audio has infinite sample frequency, just like RL). Bit depth is like Contrast of a monitor.
+BOBOUDA Pretty much, yes (analog audio/ live audio has infinite sample frequency, just like RL). Bit depth is like Contrast of a monitor.
+Benny Kolesnikov That usually doesn't happen because the cpu speed is WAY higher than the sampling frequency and between 2 samples the cpu has more than enough time to do whatever it needs. However i've seen it happen, it sounded like slow-mo with stuttering, it was really weird. And just before my pc crashed lol.
There is no need to go into floating point math and sampling theorem on the first video! This explanation is simple to follow and covers the basics very well. Maybe another video can go into more details on different aspects. Great video. Thanks :)
+Yvonne Van Der Laak Yea, but... ;) He's saying some things that involve sampling theory that happen to be completely wrong.
Very well said!
There's a lot of analogies you can draw to photography and Photoshop. Clipping is like over-exposing an image. Sampling frequency is like pixel resolution. Bit-depth is like color-depth. In photography, ideally you want you final image to span the entire 24bit colorspace, with no obvious pixelation, and no obvious posterization. In audio production, you want your final track to span the entire 16bit _wavespace_ with no obvious digital distortions.
(My speaker properties after watching this video)
24-BIT
192,000hz
+FrankJavCee youtube changes it back to 16bit and 126kbps mp3 so have fun
+JamesMulvale Well, they mostly use AAC (Advanced audio codec). HD quality (720p+) is usually accompanied by 256kbps audio and Non-HD (480p and lower) by 128 kbps audio.
Now, I heard that due to AAC being better compression codec than MP3 and the quality of TH-cam's encoders, you can get the same result with 96 kbps AAC as with 128 kbps MP3. That 96kbps bitrate is used in MP4 360p video+audio stream on TH-cam.
Also, 320 kbps AAC uploaded audio track is allegedly of same fidelity when they encode it at TH-cam AAC 256 kbps.
At this point you get to *psychometric* measuring of *subjective quality* of sound. That's a whole other discussion. You people might want to look up "Fraunhofer Institute for Integrated Circuits IIS" for more information.
+JamesMulvale So, *in short*, if you don't turn on HD, you get about 128kbps MP3 equivalent quality.
the point is you will never get 192khz/24bit audio on youtube its a waste of data. HD maxes out around 192kbps (which is NOT 192khz - it's samplerate and bitdepth combined). i can't be bothered arguing any more. bye
I agree that setting soundcard to 192kHz is useless, and I know the difference between bitrate (bps) and sample rate (Hz) or depth (bit). I was trying to add to your information.
I didn't speak about sample rate because I don't understand what sample means in widely used lossy compression codecs vs. in WAVE (PCM) codec.
Sorry to bother you. But original poster seemed to be ignorant so I replied to you. I'm, however, *not* a sound engineer, but this much I know.
(No need to reply.) Bye
Yea man, would love more topics on digital audio
I know he was simplifying but to be sure, if you sample frequencies higher than twice the sample rate, you're not simply "cutting them off." It's actually much worse: you're introducing kind of "phantom frequencies" that weren't in the signal but turn up in your digital signal. Those are mirrored at the highest frequencies you can represent, so the higher you go into "you didn't filter properly" territory, the lower the frequency gets (until it can't anymore, then it gets higher again), thus being more off and mostly more noticeable, too.
I like the fact that 44100 = 2^2 * 3^2 * 5^2 * 7^2
(First four primes squared.)
I can't imagine that's a coincidence.
yur being a math nerd(and i like it)
Every number is some product of primes
Illuminati confirmed?
Nice job .
7:15 The computer doesn't have to draw lines between the samples. They can just remain individual samples. When you output the signal it's smoothed out anyway (I assume the momentum of the moving bits in your speakers help with that).
+RC-1290 It's not smoothed out so much as a sin wave is fitted to the samples, and the DAC just outputs a perfect combination of sin waves.
maybe one on synths? Different waveforms, envelops, filters, overdrive and such?
This is great info for a producer trying to approach it from an engineering side.
Having more experience with computer graphics, it is interesting to see how the same concepts apply to audio processing.
Great explanation!
Awesome to know how the technical part of my audio work works. Working with orchestral pieces and a lot of low and a lot of high and those 'lingering' cymbal notes I did work at the higher settings like 24 bit. Good to know why exactly I have to do so on a technical level.
Awesome video! Thanks a lot!
if the wave does not fit in the bit depth , why don't we just increase the bit depth like 32 bit or 48 bit ? Another question is why not bit depth is 16 bit or 32 bit ? why 24 bit ?
+Muztaba Hasanat the reason you can't keep increasing the bit depth is because the file size get bigger, remember 1 cd is 700mb and that only 16bit
Thanks to all :)
+Alex Lee Most humans can't detect the difference between an analog audio signal and a digital audio signal with 16 bits/sample. Also, when you use higher and higher bit depths, you need better digitization equipment. The least significant bits tend to be mostly noise if you don't possess a good equipment.
+Muztaba Hasanat Every wave *could* fit in 16bit. It's just that for mixing and recording they require overhead. This is because every time you manipulate a wave a little bit, the volume of every sample needs to be described using a number between -16,000 and +16,000 (or so). If you change the volume for a sample and the maths ends up saying 'give this sample a volume of 5,256.6'. But you can't have floating point numbers (no decimal places). So it rounds it up to 5,257. You can't hear this change in volume, but it over many thousands of changes to a sample this begins to create a fairly significant change. a combination of lots of differences in .5 from what the wave should be. Done to many samples in a song, this creates a quiet noise.
Larger bit depths make this noise much quieter. So that is why 24-bit (or 32bit) is generally used in mixing. The final master doesn't need this because no more changes will be made and the noise doesn't have a chance to get loud. 16-bit is more than enough for even hi-fi listening.
16-bit gives enough control over volume that the quietest sounds are around -120 dB, which is well out of any human's hearing range.
I hope someone can answer me on this:
From 2:10 he says that we only need a low sample rate (or sample frequency as he calls it) to represent the low frequency sine wave. But why doesn't that produce a sort of triangle shape? As he says later on in the video, the computer thinks it should go straight from point to point, and if it did that with the low sample rate, it would not produce a sine wave sound. I think I watched video explaining this once, but I can't find it now... I believe it has something to do with dithering? or something idk
We didn't really go into that in this video, it was very basic. Maybe it's a subject for another time.
I get that part, and that wasn't really my question. I can try and explain it differently:
If you know a bit about sound and its digital representation, you'd know how a sine wave sounds and how a triangle wave sounds. To me it seems like a triangle wave needs much fewer "points" to represent that wave, it actually only needs the maximum value and the minimum value. But the slope of the sine wave is constantly changing, and so to me it seems like it would need an infinite amount of points to represent that sound wave.
+paulcmnt This is entirely incorrect. Photos are not equivalent to audio. Just like pixels, audio files contain a bunch of 'point samples' however unlike pixels, they are not represented as 'squares' or as a flat section of the audio waveform. Whenever you listen to a digital file, it goes through the DAC, which looks at all of the point samples and then finds the only possible combination of sin waves that will fit, and outputs that sin wave as a perfectly smooth voltage change (analogue signal).
+Alpha Kay His explaination is essentially wrong - there is no "point to point" in the reconstruction. (And there are no "stair steps" either.) Any competent DAC will produce a lovely sinusoidal output, regardless of the bit depth or sample rate. Please watch the video I linked to earlier; it does a superb job explaining and demonstrating how all this works. :)
+Alpha Kay The digital -> analog converter understands this and "fills in" the "missing" sine wave information in a smooth, intelligent, accurate manner.
Very helpful.
Very well explained; thank you very much!
makes me want to be a sound engineer
+Victor P.
Do it, I did.
+Dirty Robot What does that involve?
+Dirty Robot I wouldn't do that considering the sad economic state the audio engineering industry is right now. Lots of really good sound engineers go months upon end without gigs these days.
+morgogs Math.
+morgogs Depends if you want to go study or you want to dive in.
When I made my choice there were no courses you could do but I had a fair bit of experience so I contacted all the recording studios in my area and took a low paid position then worked my way up.
Interesting, would be nice with a video on how the data is stored/compressed in files.
I just synthesised a wave in Sound Forge, then I boosted it and it 100% squared the wave off, I also reduced the bit rate and it made the wave much more blocky. His explanation seems very good to me. They even show the squared off wave when he shouts in the video. I'm sure there are a lot more details to why and how it does this but you cannot deny that the waves are somewhat cut off and look square. Download audacity and try it yourself.
Working with audio is super interesting, I love it. Analogue and digital sounds are great fun to play with, i would recommend FL Studios demo for anyone interested in becoming a sound engineer :P
it helps me to understand this because i have a subject digital audio..i am a MT student multimedia technology
+Kath Alave
Go watch
D/A and A/D | Digital Show and Tell (Monty Montgomery @ xiph.org)
here in youtube, it gives you a better explanation than this video.
nice bit of info for when, if ever i use my microphone
Good intro and a subject dear to my heart. Hope you do more video about digital audio.
Great work, God bless you all
Hmm, I want more of such videos. Music is so interessting when corresponding with electronics and informatics.
I remember my first soundblaster card, it could not only sample at 44.1kHz, but also 22.05kHz, and I think even something lower than that (16kHz?) - it also had an option to record at a bitrate down to 8 bits. So basically, it sounded like talking through an old telephone. But then, this was an amount of data the PCs of back in the day could handle much better. It was an old 386, and the soundcard could not use a bitrate any higher than 16 bits.
Great video! 😎👍
My DAW has settings up to 192,000 Hz, are there any benefits or downsides of using a sample rate this high? Considering the "industry standard" is significantly lower, what applications make use of this sample rate ?
+OnixRose almost none. 192 will give you a huge file size and slow your sessions down. On top of that, there is scientific evidence that ultra high sample rates actually sound worse (intermodulation distortion). Plugins and DAWs like higher sample rates because it gives them more information to process. as a result, most plugins up sample to 88.2 or 96k when your session is running at 44.1/48k. I wouldn't bother using higher than 96k. 96 would be a good sample rate to run at if you think you're going to be doing a lot of time stretching or similar processing. Other than that, you cant really go wrong with 48k.
Can you do more videos on digital audio? Specifically how audio software applications / plugins work.
what kind of scale would one use for mapping input to bits? would a logarithmic scale work better for keeping the small and loud sounds?
+Philip Stuckey Well, actually, dB is already a logarithmic unit...
yes that would work but i dont know of any log DACs.
+TheWeepingCorpse And it would be a PITA for audio processing if we used logarithmic sample scales.
+Philip Stuckey The ideal scale would depend on what kinds of sounds you are sampling. More often than not equipment we have access to is far simpler and falls into linear scale.
+Philip Stuckey More advanced methods for storing audio actually take this into account and actually changes the scale over time to avoid this problem
Will this be a series of videos? Will you go into audio compression and things like that like you did with pictures and JPEG compression?
loved it...informative
Amazing video! It would be great if you pick up from here to talk a little about digital compression and the infamous Loudness Wars, which would make a great title! The War for your Ears! lol Great work. Cheers to you guys.
The sampling rate is (at least) 2x the highest frequency you plan to sample..because,aliasing,Nyquist,something,something.
Fascinating
aaaaamazing video thank you thank you thank you!!!
Drawing lines between points? Sorry but that's just not true. That would produce a lot of aliasing and distortion (like square waves as he said) and that's certainly not what happens. DA converter draws smooth curve with reconstruction filter. Only thing you loose with lowering bit depth is data lost with quantizing, which makes noise floor louder.
I think you should correct that statement.
Apologies, but for the sake of a short introduction video I simplified the whole subject somewhat.
+David Domminney Fowler
Please don't do that, and if you feel it is necessary to oversimplify to such an extent pleast specify in the video "this is something of an oversimplification". Because this video is downright incorrect in some cases.
+David Domminney Fowler, I agree with +Tommy59375 completely. I watch the *phile videos precisely because of the way that masters of their craft are able to explain deeply complicated concepts without distortion or oversimplification. These aren't buzzfeed videos.
That guy is amazing!!!!!
I guess floating point audio formats can help with two of the problems. The headroom before digital clipping occurs and the fidelity at very low volumes.
Rather inaccurate video imho.
1. It's not 44.1KHz because the limit of human hearing is 22.05KHz. It has to do with the upper limit of human hearing and the choice of anti-aliasing filter (transition band width). The assumed human hearing range is roughly 20Hz to 20KHz. The Nyquist sampling theorem tells us that the sampling rate should be at least twice the max frequency of the signal (so 40Khz).
The problem now is that our original signal contains frequency above 20KHz, if we try to sample at 40Khz aliasing will occurs (frequency above 20KHz fold into the hearing range). The signal must be low pass filtered (anti-aliasing filter). We can't perfectly cut frequency right at 20KHz, in practice a transition band is necessary. For practical and economic reasons, a 2.05KHz transition band was chosen. Now our signal contains frequency from 20Hz to 20+2.05 = 22.05KHz. Back to Niquist, we need to sample at 44.1KHz.
2. You DON'T end up with square wave. You could use 4 bits per sample and you still would not get square wave. You'll get a ton of quantization error (rounding error) and the signal will be drawn into noise (low SNR). Moreover, the computer doesn't draw line between point. It finds a continuous signal from the sequence of samples, there is a unique solution.
What is "bit depth" is it the volume of the sound?
Thank you for the information brother. 👍🏼😆😃🤜🤛
If getting the small details for quiet notes is an issue, why not use a logarithmic scale, where values are further apart at higher intervals? Then you could keep detail at small volumes when you need it and lower resolution at high values where you don’t.
Also, why not use a relative scale, where you mention how much the wave changes each sample? That way you'd have no upper or lower end.
5:25
Why use negative volumes? If you didn't, you'd get double the bit depth, and you wouldn't have to deal with phase cancellation.
I'm sure there's a good reason why, so tell me
Are you saying that if I layer audio on my video that when it's rendered, those audio track levels will be summed? ie added together?
+NeonsStyle Yes, that's right. That is the reason why if you have two similar waves and one of them is phase inverted, it becomes mute!
So doing this would be a bad idea as the sounds could be out of phase and thus destructive. Thanks heaps :)
NeonsStyle Nope. Summing audio is fine, you just have to make sure your audio is not out of phase. Keep in mind that for this to happen, both waves would have to be on the same channel, so you wouldn't hear anything if that happened.
What may happen, though, is you may have a voiceover that you recorded in stereo incorrectly, and you have it with the right phase on the left channel, but the right channel has it inverted. Guess what happens when someone watches your video on a phone speaker or anything that mixes down to mono? Your voice disappears! Well... almost. Since TH-cam codes it stuff into AAC/Opus, your viewer will only hear artifacts that slightly resemble your voice.
NeonsStyle Audacity for recording? All right! For processing? Yeah, no. You should use your video editor instead, with a nice free VST compressor. Don't exaggerate on the ratio and attack. If you're looking for a deep radio voice, you'll want a multiband compressor. Have a tight compressor (ratio of 3?) on a band that goes from 0hz to around 90/100hz with a generous makeup gain. You might want to get a small compressor on the 200hz-500hz band, but with NO MAKEUP GAIN, just for controlling the mud. For brightness, compress midly the ~6khz band (have a Q that goes from 4.5khz to 8khz if you want) with a little makeup gain.
Just don't exaggerate, any ratio above 4 is too much! Remember to add a highpass before any processing (except noise removal, which if you're going to use, I recommend iZotope's RX3 and a mild setting) around 20hz. If you're getting too much noise, lay down a bit of compression and get a little non-aggressive gate going (just don't make it too obvious).
By the way, I'm not the guy in the video, just an audio freak! :)
Ahh I see... hehe... thanks heaps for all the info.. I really appreciate it. :) I'll see what I can do with that. Thanks :)
I suggest a followup about lossy audio compression.
Thank you for the information
One thing I wonder is about the sounds we cannot hear but still have an effect on us, like when happened in a laboratory that at night people started to get visual artifacts that where traced to I guess infra sound/noise made by an air conditioner or something like that, how do you treat that, or would use in a scary film for instance, or if they are totally cut out.
Also there was a buzzing/fan/whatever noise in the background all the way through that was kinda... distracting.
If you would like to know more about the algorithms for converting from a higher sampling rate to another ( _downsampling_ , e.g. recorded @48KHz then written to a CD @44.1KHz) look up the term _dithering_ in relation to digital audio.