How Shazam Works (Probably!) - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 มี.ค. 2021
  • Looking at the audio mechanics and algorithms behind music identifier apps. David Domminney Fowler built a demo you can try yourself.
    EXTRA BITS: • EXTRA BITS: Shazoom - ...
    Play with Dave's demonstrator here: bit.ly/3qRo9t9
    More about David Domminney Fowler: bit.ly/38IhX0p
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 394

  • @pathologicalliar8728
    @pathologicalliar8728 3 ปีที่แล้ว +1271

    I work at shazam and I can tell you that we just hire a bunch of teenagers to guess which song it is. thats why its not very good at rock from the 70's

    • @ArihantChawla
      @ArihantChawla 3 ปีที่แล้ว +220

      username checks out

    • @noomade
      @noomade 3 ปีที่แล้ว +4

      you are lying!

    • @pathologicalliar8728
      @pathologicalliar8728 3 ปีที่แล้ว +42

      @@noomade I never lie. lying is my least favorite thing to do.

    • @noomade
      @noomade 3 ปีที่แล้ว +15

      @@pathologicalliar8728 sounds pathological

    • @magma2680
      @magma2680 3 ปีที่แล้ว +1

      doesn't work on weeb music too well so i doubt it... it's probably just you

  • @DrJRMCFC
    @DrJRMCFC 3 ปีที่แล้ว +771

    there’s a paper published by the founder of shazam that describes the algorithm in some detail.

    • @danielalorbi
      @danielalorbi 3 ปีที่แล้ว +250

      This is a very relevant comment, with suprisingly little attention.
      Here's the title: "An Industrial-Strength Audio Search Algorithm - ALC Wang"
      (The founder in question is Avery Li-Chun Wang)

    • @bigpickles
      @bigpickles 3 ปีที่แล้ว +44

      I read this comment, liked it, and left without watching. Thank you.

    • @HesderOleh
      @HesderOleh 3 ปีที่แล้ว +1

      doesn't it use something more similar to DTW rather than FFT?

    • @Squossifrage
      @Squossifrage 3 ปีที่แล้ว +70

      And they address that in the beginning of the video. We know how it worked back then, we don't know how it works today.

    • @ChaitanyaBhagwatChai
      @ChaitanyaBhagwatChai 3 ปีที่แล้ว +26

      @@bigpickles your loss tbh. It's a great informative video

  • @realeques
    @realeques 3 ปีที่แล้ว +65

    even as a programmer myself shazam was always a mind-blowing app to me. The speed and accuracy is absolutely amazing. I've used it before to get a song from someone listening to it on his at the bus station.... or playing at the super market

  • @assepa
    @assepa 3 ปีที่แล้ว +314

    Kudos for the 30 years old printer paper this guy apparently still has a stack of, lol.

    • @domminney
      @domminney 3 ปีที่แล้ว +44

      From Sean’s last visit, I never throw anything away

    • @berzerkskwid
      @berzerkskwid 3 ปีที่แล้ว +34

      my working theory is that the university bought out the stock of a paper company going out of business in the 90s and has been giving reams of it out as faculty party favors ever since. Every Computerphile guest uses it.

    • @OffTheBeatenPath_
      @OffTheBeatenPath_ 3 ปีที่แล้ว +19

      @@berzerkskwid universities always carried a huge stock of that paper and then overnight the printers became obsolete. It was always great for sketching. I had boxes of it for free.

    • @kwabenaagyeiboitey6423
      @kwabenaagyeiboitey6423 3 ปีที่แล้ว +1

      The most decent but funny comment ever 😂😂

    • @RedwoodRhiadra
      @RedwoodRhiadra 3 ปีที่แล้ว +8

      They still make it, you can buy cases of two or three thousand sheets for sixty bucks or so on Amazon. (Search for "continuous feed green bar printer paper")

  • @lustechsource5197
    @lustechsource5197 3 ปีที่แล้ว +147

    Shazam is one of those programs that your jaw drops when you first see it in action.

    • @joel6672
      @joel6672 3 ปีที่แล้ว

      yep

    • @bearcb
      @bearcb 3 ปีที่แล้ว +6

      The app that made me buy a smartphone back in the day

  • @TECHN01200
    @TECHN01200 3 ปีที่แล้ว +234

    3blue1brown's videos about fourier transforms are great further viewing material for the math part of this.

    • @domminney
      @domminney 3 ปีที่แล้ว +11

      Absolutely. The subject is too deep to fully cover here

    • @lorenzobernardi5261
      @lorenzobernardi5261 3 ปีที่แล้ว +1

      I understood Fourier transform use with that video. That's awesome

  • @broderfoder9348
    @broderfoder9348 3 ปีที่แล้ว +304

    Shazam simulator, shouldn't it be Shasim?

    • @battman505
      @battman505 3 ปีที่แล้ว +9

      underrated coment

    • @Schindlabua
      @Schindlabua 3 ปีที่แล้ว +6

      Simzam

    • @lohphat
      @lohphat 3 ปีที่แล้ว +1

      It’s similar to a Shatner simulator. A Shatsim.

    • @vladimirpotrosky7855
      @vladimirpotrosky7855 3 ปีที่แล้ว

      Sha-sham

    • @thePronto
      @thePronto 3 ปีที่แล้ว

      That word would violate the YT terms of service.

  • @rich1051414
    @rich1051414 3 ปีที่แล้ว +117

    From what I understand, shazam generates a bunch of potential hashes per song, and it phones home to see if any of those hashes is a song id. It's a huge lookup table and a clever key generation based on relative frequency changes over time.

    • @shadebug
      @shadebug 3 ปีที่แล้ว +50

      That makes sense because anybody who uses it a lot knows you sometimes get some matches that are nothing alike, which is weird if it’s just comparing sounds but makes perfect sense if your hashes are clashing

    • @ArihantChawla
      @ArihantChawla 3 ปีที่แล้ว +3

      @@shadebug exactly

  • @CalvinsWorldNews
    @CalvinsWorldNews 3 ปีที่แล้ว +22

    I always found it awesome that it wouldn't just tell that some music was, eg Beethoven's 8th Symphony, it would tell you specifically that it was a Berlin Philharmonic recording from 2004 or whatever, when even a conductor couldn't be that specific

  • @HarmonicaMustang
    @HarmonicaMustang 3 ปีที่แล้ว +37

    As a trained audio engineer turned IT admin, it is so refreshing to hear audio terminology again, really brings me back to my uni days; Studying the Fourier transform, physics of room acoustics (like calculating the RT-60 or reverberation of a room), recording and mixing tracks. I miss working in a professional recording studio, but I found a second love in IT and even married the two fields together in projects. This was an interesting and entertaining video Computerphile. It felt like watching a bilingual movie where you can speak both languages.

  • @finnqni8563
    @finnqni8563 3 ปีที่แล้ว +64

    I just want to say that I appreciate you making all of these awesome videos

  • @3339LuXz
    @3339LuXz ปีที่แล้ว +3

    This man described Fourier Transformation in a beautiful way! Respect!!

  • @alltheeasynamesweregone
    @alltheeasynamesweregone 3 ปีที่แล้ว +40

    “I had to beat them to death with their own shoes!” ..... “nasty business really”... “but we got the M&M’s and Ozzy went on and had a great show” - Del Preston

    • @mulletronuk
      @mulletronuk 3 ปีที่แล้ว +1

      Jeff Beck pops his head round the door and mentions there's a little sweetshop on the edge of town

    • @alltheeasynamesweregone
      @alltheeasynamesweregone 3 ปีที่แล้ว +2

      @@mulletronuk So...we go, and... it’s closed. So there’s me and Keith Moon and David Crosby breaking into this little sweet shop right!

  • @Gamah1991
    @Gamah1991 3 ปีที่แล้ว +34

    David Domminney Fowler: If James May went all rock and roll/computer nerd...

    • @domminney
      @domminney 2 ปีที่แล้ว +1

      Ha ha 🤣

  • @bigbri64
    @bigbri64 ปีที่แล้ว +5

    This is absolutely amazing and great work deciphering and explaining this algorithm!!

  • @n0handles
    @n0handles 3 ปีที่แล้ว +3

    Love these audio related videos! Keep em coming

  • @GlitchingWithAlandi
    @GlitchingWithAlandi 3 ปีที่แล้ว +2

    Brilliant, David explains concepts so well.

  • @Baxtexx
    @Baxtexx 3 ปีที่แล้ว +47

    Real Engineering did one of these a whiöe back, great to see more!

    • @Norsilca
      @Norsilca 3 ปีที่แล้ว +1

      Wondering what language keyboard that is to produce that typo

    • @ake_lindblom
      @ake_lindblom 3 ปีที่แล้ว +10

      @@Norsilca swedish!

    • @deidara_8598
      @deidara_8598 3 ปีที่แล้ว +1

      @@Norsilca I hope Swedish, but it could be Finnish, Icelandic, or Estonian too. Other nordic languages like Norwegian and Danish use a similar letter "ø", which is pronounced the same. ÆÄØÖÅ

  • @lustechsource5197
    @lustechsource5197 3 ปีที่แล้ว +6

    Shazam seems like magic to me, but the way it was explained in this video took away a little of that magic. Wish schools would teach math concepts explaining how the tech we use at the time works.

    • @Michallote
      @Michallote 6 หลายเดือนก่อน

      The thing is that he is an expert on the field. It would be a tall order for a high school or earlier teacher to be able to explain and understand this as well as he does. He even built a proof of concept. That is no trivial task

  • @fireclub493
    @fireclub493 3 ปีที่แล้ว

    Awesome! Would love to see more attempts of implementing other interesting algorithms too

  • @jixster1566
    @jixster1566 3 ปีที่แล้ว +1

    This is great timing. I was just wondering last night

  • @LeDabe
    @LeDabe 3 ปีที่แล้ว +6

    Don't mix Fast Fourier transform (FFT) and the Fourier transform (FT) concept. FFT is an algorithm, there are different FFT algorithms that do different things in different ways. The Fourier transform (more generally the transformation "concept") is what's interesting here as it allows to "change" your point of view of the data.

    • @RaunienTheFirst
      @RaunienTheFirst 3 ปีที่แล้ว

      My first fo(u)ray into Fourier transforms was during my uni days studying chemistry. It's used to process the radio frequency data from an NMR machine into a nice graph which is essentially how "strong" the signal is at a particular frequency and how quickly it tails off (although the end result doesn't actually have any units) that allows you to identify chemical groups, how many there are, and with a bit more analysis where they are in relation to each other. Fourier transforms are black magic as far as I'm concerned.

    • @LeDabe
      @LeDabe 3 ปีที่แล้ว

      @@RaunienTheFirst You can see the Fourier transform as the scalar product of the signal with an other signal of a given frequency and phase.

  • @aliwelchoo
    @aliwelchoo 3 ปีที่แล้ว +34

    Interesting. I think the way Shazam searches through so many songs despite noise etc and does it so fast is the impressive part. The matching frequencies part is simple in comparison

    • @pluto8404
      @pluto8404 3 ปีที่แล้ว +32

      They actually just employ thousands of indian children to listen to your audio and guess the song.

    • @MattB90
      @MattB90 3 ปีที่แล้ว +6

      search is relatively straight forward though, the fft and slicing intervals and trying to get the amount of identifying info from each song as small as possible, way more interesting

    • @Henrix1998
      @Henrix1998 3 ปีที่แล้ว

      @@MattB90 The amount of songs in the database is huge tho

    • @mesut1261
      @mesut1261 3 ปีที่แล้ว

      @@pluto8404 that's why, major companies CEO's are Indian.

    • @iWhacko
      @iWhacko 3 ปีที่แล้ว

      No the searching is the easy part. it's not searching through all songs to find a match, like comparing every song in the database with the sample you played it. It's searches are way "more complicated", yet databases are optimised for that so it's really fast. google how a lookup table works, or binary search to understand the simple ones.
      of you sort all songs based on their first not for instance, you can discard all the songs that start with a different note, and then search for the next note, just as you would alphabetise a list of names. in practice its more advanced, but not really more difficult than the profiling of the song itself.

  • @artit91
    @artit91 3 ปีที่แล้ว +2

    They use watermarking/hashing/fingerprinting, you name it. They measure beat per minute, chords, key etc and have them stored for them for every song. I think there are around 100 different properties they can calculate and use a composite index to use them to get the song out but the most of the time 5-6 properties are enough and you can get them from a few seconds.
    It's pretty easy job if half of the world listens to the same 100 songs. Those hash ids can be pulled on the device and the app can work offline even with a 1000 song hashes downloaded.
    Apparently Tinder and Shazam work very similarly under the hood.

  • @jonathanargumedo7809
    @jonathanargumedo7809 3 ปีที่แล้ว +3

    As soon as you upload a video, no matter what I'm doing, I click and watch.

  • @monono8156
    @monono8156 3 ปีที่แล้ว +1

    Woo! A DSP video on computerphile! This is great :)

  • @GordonjSmith1
    @GordonjSmith1 2 ปีที่แล้ว

    Really insightful from a digital problem that I had not considered before. Nice piece of 'blogging'!

  • @antonin7703
    @antonin7703 3 ปีที่แล้ว +8

    He looks like a dark wizard and he talk about black magic.
    Great video, thanks!

  • @phyphor
    @phyphor 3 ปีที่แล้ว +1

    `Ruddy 'ell. I've been a fan of Dave's music since I saw his band Diversion about 25 years ago...
    Small world!

  • @gothxx
    @gothxx 3 ปีที่แล้ว +2

    I guess it is also easy for them to optimize the parameters with ML since they have song data and user samples so they have known input for known answers that they can try different algorithms/parameters on in a large scale.

  • @jmonsted
    @jmonsted 3 ปีที่แล้ว +9

    Now i'm tempted to run his code with my music collection and look for duplicates...

  • @nietschecrossout550
    @nietschecrossout550 3 ปีที่แล้ว +2

    My guess would be that they generate a identity vector of the frequency space of a bunch of samples of the song, to fill up a db. Then they generate the identity vector the same way when recoding, eliminating background noice etc.. Now you can just compare the difference between vectors, which is what you can train an ML model for.

  • @TerjeMathisen
    @TerjeMathisen 3 ปีที่แล้ว +3

    I am pretty sure you would use an algorithm which is much closer related to the SIFT algorithm which is used by all panorama stitching programs, including the super-resolution and night mode features of modern cell phone cameras. I.e. this will allow you to recognize songs that are played back at a slightly different speed, so that all notes and intervals will be shifted.

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว

      In order to detect speed altered audio all you need is a feature in your hash that is timeskew invariant. I read a paper on it some time ago where the author also gave an example of such invariants.

  • @stevepoling
    @stevepoling 3 ปีที่แล้ว +17

    The business with clocks is probably the most tortured non-explanation of the Fourier Transform I've ever heard.

    • @infinite1der
      @infinite1der 3 ปีที่แล้ว +1

      Agreed... ugh.

    • @stevepoling
      @stevepoling 3 ปีที่แล้ว +1

      @@infinite1der How's this: An FFT is a specific algorithm to calculate a Fourier Transform. What's a FT? Every periodic function can be approximated by a sum of sine or cosine waves. The Fourier Transform shows how much of each to add to approximate the time-waveform. The FT is thus a frequency-domain representation of its time-waveform. You can do a Fourier Transform by many means (ever hear of a holograph), but computationally, an efficient algorithm is the FFT. There are hints of the FFT going back as far as Gauss, but it only became commonplace once we got the hardware capable of doing digital signals processing.
      I haven't delved into the "how" of FFT so much as the "what" of FT, which is much more useful to someone who is likely to just pick up an FFT implementation from a library, or a specialized DSP chip. The FFT is worthy of its own detailed exposition aside from its application to the Shazam search algorithm.
      Non-stationary signals (i.e. all the interesting ones) can also be modeled (often more accurately) with other basis functions... ferinstance wavelets.

    • @SkylerAk
      @SkylerAk 3 ปีที่แล้ว +2

      @@stevepoling The Fourier transform is a mathematical function that can be used to find the base frequencies that a wave is made of.

    • @stevepoling
      @stevepoling 3 ปีที่แล้ว +1

      @@SkylerAk you can think of an arbitrary waveform, e.g. your favorite tune, as a function. And every function can be the resultant of the sum of other functions. The it helps if the other functions are related in some way. Use your intuitions of how vectors work in terms of basis vectors and vector sums. Just as a vector may be represented as a sum of any orthogonal set of basis vectors, any function can be represented as a sum of one or another set of basis functions. I gave the examples of sinusoids and wavelets. Sine waves are characterized by phase, amplitude, and frequency. The FT gives you the parameterization of the sinusoids approximating your input wave function. I've seen folks look at phase but most people just look at frequency & amplitude.
      I'm not disagreeing with you, Skyler, but pointing out there's a lot more interesting mathematics going on (despite the fact that FT is mostly used as you describe). And there are families of orthogonal functions that describe some signals more efficiently than sinusoids, e.g. wavelets.

  • @BestHKisDLM
    @BestHKisDLM 3 ปีที่แล้ว

    Excellent contect. Thank you!

  • @DavidHamster88
    @DavidHamster88 2 ปีที่แล้ว

    Interesting video. Nice job!

  • @Donder1337
    @Donder1337 3 ปีที่แล้ว

    This was really good!! Thx alot!

  • @rootsquare
    @rootsquare 3 ปีที่แล้ว

    Kantar/Nielsen TV boxes utilise a similar technique to shazam to identify programmes being watched. IIRC. This enables them to also pick up on time-shifted content being watched after airing...

  • @lunasophia9002
    @lunasophia9002 3 ปีที่แล้ว +5

    The photo of David in the thumbnail makes him look like a Sith lord.

  • @ryanlorenti469
    @ryanlorenti469 3 ปีที่แล้ว

    this is a really cool use of trig

  • @ahmedelphi
    @ahmedelphi 3 ปีที่แล้ว +2

    Another brilliant video. But how would it sound if these frequency-domain arrays are transformed back to time-domain? It would be interesting to hear how a clip “sounds” to the application. I wish that was in the video, or it’s extra.

    • @harrysvensson2610
      @harrysvensson2610 3 ปีที่แล้ว

      ​@@moduu24 FFT is reversible, but that's not what Ahmed asked about. In the video it's said that the loudest frequencies of some within some bucket size are extracted. That means that some frequencies are set to 0 amplitude and that is not reversible. It would be interesting to listen to that to see how it compares to the original song.

  • @metroidandroid
    @metroidandroid 3 ปีที่แล้ว +25

    come on, explaining fft with phasors is not the most basic way

    • @HesderOleh
      @HesderOleh 3 ปีที่แล้ว

      Also I think Dynamic time warping is used rather than FFT

    • @iWhacko
      @iWhacko 3 ปีที่แล้ว

      @@HesderOleh Or Auto Correlation. I played around with note recognition when the iPhone3 came out, and FFT was just barely out of reach for computing power on the iPhone. Auto Correlation is much less intense for the cpu.

    • @spectralpiano3881
      @spectralpiano3881 3 ปีที่แล้ว +1

      @@iWhacko Ironically, autocorrelation is calculated using the FFT: AC(x) = IFFT( | FFT(x) |^2 ) . For short signals it could be that it's faster to simply calculate SUM_n ( x(n)*x(n+tau) ) but I'm not too sure about that

    • @andinomm
      @andinomm 3 ปีที่แล้ว +5

      The FFT explanation was horrible..... He could've just said that every periodic signal is made from sines and use some graphs to display that as you see like everywhere you search for it. Would've been more useful for beginners/newcomers.

  • @davidgillies620
    @davidgillies620 3 ปีที่แล้ว +4

    The FFT gives you frequencies _and_ phases, as mentioned. This is important. It really helps you understand what's going on when you realise that a real-valued signal going in (like the voltage level from a microphone) gets transformed into a complex-valued output.

  • @Nemitorsis
    @Nemitorsis 3 ปีที่แล้ว +1

    I am curious to whether it is better (or worse) to use MFCC + MFCC delta, compared to direct FFT or frequency distribution in these cases.

  • @willemvdk4886
    @willemvdk4886 3 ปีที่แล้ว +7

    Love the short Pearl Jam riff ;)

  • @RGS61
    @RGS61 3 ปีที่แล้ว

    So a great explanation of how Shazam (probably) works to identify a tune .. from a 'user sample in' perspective .. But what about the other side of the comparison match? .. How does Shazam manage to sample so many source files to build up its database? .. Through agreements with the music rights holders? .. or music streaming services? .. or ..?? .. or am I missing something?!

  • @comedyhunter
    @comedyhunter 3 ปีที่แล้ว

    Very interesting, always wondered how it works. I have a couple of questions for you. *(1)* Once its identified a song you can click on "Lyrics" and it shows in in real-time the words to the song its just identified and its in the correct place and follows it, how on Earth does it manage to do that? It manages to work out which chorus its on, amazed it can do that. *(2)* Also I've used Shazam in very noisy bars where you can only just about hear music and Shazam actually manages to identify it even though you can only just about make out music at all ! how does it cope with all the noise and pick out FFT's when its such a noisy environment?
    Thanks for the explanation on FFT, nice and easy to understand how that works now.

    • @axelnils
      @axelnils 3 ปีที่แล้ว +1

      Lyrics and chords are probably just pulled from some database.

    • @comedyhunter
      @comedyhunter 3 ปีที่แล้ว

      @@axelnils yes but how does it manage to line up the words precisely and even know which chorus it’s on

    • @eloskowy4954
      @eloskowy4954 2 ปีที่แล้ว +1

      @@comedyhunter I found document called "An Industrial-Strength Audio Search Algorithm".
      Music has been found, right? So it needs to just process the notes that device is hearing and sync it with lyrics stored in database. Those lyrics contains info about in what time the text is been sang. You can try something like QuickLyrics and you'll know how it works because it takes data about what song is been played from notifications.

    • @comedyhunter
      @comedyhunter 2 ปีที่แล้ว

      @@eloskowy4954 Thanks for the extra info.

  • @UxRandom
    @UxRandom 3 ปีที่แล้ว +2

    I remember using soundhound back in the day.

  • @markusklyver6277
    @markusklyver6277 3 ปีที่แล้ว +4

    This was extremely hand wavey

  • @CaptainJeoy
    @CaptainJeoy 3 ปีที่แล้ว +21

    _Meanwhile how Shazam actually works is that it secretly uploads your recording to TH-cam then gets copyright struck and then voila, the song ID :)_

    • @ZipplyZane
      @ZipplyZane 3 ปีที่แล้ว +2

      I hope not, as TH-cam seems to constantly get it wrong.

  • @mountp1391
    @mountp1391 2 ปีที่แล้ว

    How awesome shazam is! Thank you.

  • @Car_Ram_Rod
    @Car_Ram_Rod 3 ปีที่แล้ว +2

    This is interesting!

  • @laxr5rs
    @laxr5rs 3 ปีที่แล้ว +4

    Nice BASS! GREAT quality on the video (screen grab maybe?)

    • @domminney
      @domminney 3 ปีที่แล้ว

      yes, I recorded it in OBS!

  • @megavide0
    @megavide0 3 ปีที่แล้ว

    JavaScript & Web Audio ..? amazing!

  • @hobo9968
    @hobo9968 3 ปีที่แล้ว +2

    Woow. Basically you look at the music in the frequency domain and do an optimized search based on the prevalent frequency slices

    • @TheAudioCGMan
      @TheAudioCGMan 3 ปีที่แล้ว

      This sounds like the abstract of a paper

  • @bmwolfe2786
    @bmwolfe2786 3 ปีที่แล้ว +1

    Was that a little pearl jam alive lick you threw in there, around 14:36 ?

  • @Krcha13
    @Krcha13 3 ปีที่แล้ว

    On the shazam side they have converted all songs from audio signal > digital stream > signature that is binary.
    On user side your part of recording is converted as well in just part of that signature
    you send that to shazam, they match part of signature to signature and thats it

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว +1

      Very oversimplified

  • @vinzzz666
    @vinzzz666 3 ปีที่แล้ว

    Thank you for confirming :)

  • @Scranny
    @Scranny 3 ปีที่แล้ว

    Nowadays I assume this can be accomplished using an LSTM neural network? or incorporating a CNN? But perhaps NNs are a bit overkill for this task and slower engineering wise (I mean, you can't have an output layer the size of all the songs in your DB, but you can produce a searchable vector which you would need lots of memory and fast servers to support).

  • @RealismHD1
    @RealismHD1 3 ปีที่แล้ว

    can you make a video on how ipv6 works? or is there already one existing on this channel?

  • @neilAneerGAmAI
    @neilAneerGAmAI 3 ปีที่แล้ว

    Great video

  • @BlankBrain
    @BlankBrain 3 ปีที่แล้ว +2

    I wrote something a little like this back in the '80s to take raw digital (payroll) data off unlabeled mag-tape and figure out the record size and columns. There weren't any nice FFT libraries to assist. There wasn't even a lot of memory to hold sample data (without paging). The program had to control the tape drive to make multiple passes in some cases. It worked well enough to automate reading customer payroll data from a myriad of custom payroll systems. Once the data was in records with columns, the analysts were able to use other tools to update pension contributions.

  • @vladimirpotrosky7855
    @vladimirpotrosky7855 3 ปีที่แล้ว +4

    16:20 "avoid any imperial entanglements"
    Amazing Star Wars ref

  • @xxsummer666xx4
    @xxsummer666xx4 3 ปีที่แล้ว +1

    something i find really impressive is the fact that shazam can detect harsh noise tracks

    • @carlosmspk
      @carlosmspk 3 ปีที่แล้ว

      He asked that at the end. A noisy low quality A#, is still an A# so that's the frequency that stands out the most

  • @goodkavin
    @goodkavin 3 ปีที่แล้ว

    What would need to be done additionally to make it work for humming as well, like in Google Assistant/SoundHound?

  • @Benoit-Pierre
    @Benoit-Pierre 3 ปีที่แล้ว +1

    You spend less than 20s on the only point I am interested in : how to convert a part of fft into a hash that can one shot reach the song in a very large database.
    Do you produce several hash per song ? How do you store this ? Do you need parallele servers to search tags ?
    What do you hash on client side exactly ?
    Comparing 5 songs is trivial. The Shazam db has millions or billions.

  • @8015908
    @8015908 3 ปีที่แล้ว +17

    In the thumbnail I thought his hair was a robe.

    • @domminney
      @domminney 3 ปีที่แล้ว

      🤣

    • @kourii
      @kourii 3 ปีที่แล้ว +1

      Lol yeah at a glance I thought it was a hood too

    • @MattExzy
      @MattExzy 3 ปีที่แล้ว +4

      "Do what must be done Lord Vader..."

  • @zwz.zdenek
    @zwz.zdenek 3 ปีที่แล้ว +1

    There is this competing app that can sometimes tell the song just by you singing it. No bassline, poor timing, probably in the wrong key, but it still works. I wonder if they hired a bunch of former Indian captcha solvers for this...

    • @laurinneff4304
      @laurinneff4304 3 ปีที่แล้ว

      Google assistant can do this

  • @otiagomarques
    @otiagomarques 3 ปีที่แล้ว +1

    you gotta make a videoon how you got that 2:00 camera move!

  • @bdot02
    @bdot02 3 ปีที่แล้ว

    How do you search a database of all these songs efficiently though. Seems like there's a lot of processing just to try and match a clip to a single song. Sure you could multithread it but there's no way they're doing that... There's got to be further compression of the samples before search...

  • @joerivde
    @joerivde 3 ปีที่แล้ว

    Good stuff

  • @CosmicPrawny
    @CosmicPrawny 3 ปีที่แล้ว +1

    What's the frequency, Kenneth?

  • @sciencefirefly837
    @sciencefirefly837 3 ปีที่แล้ว +1

    Does anyone know about the git repository for this project? It would be fun if it is open source and we can improve it.

  • @fixfaxerify
    @fixfaxerify ปีที่แล้ว

    As always, computerphile standard issue matrix printer paper. +1 for consistency!

  • @tyjuji
    @tyjuji 3 ปีที่แล้ว +1

    I might have a use for this in a future project. Cheers!

  • @venil82
    @venil82 3 ปีที่แล้ว +4

    Was surprised to see he has written it in JavaScript

    • @domminney
      @domminney 3 ปีที่แล้ว +1

      Following on from my html5/js video I felt it should be

    • @max_kl
      @max_kl 3 ปีที่แล้ว

      To be fair, the browser's audio processing and graphics code is written in C++, and it does the heavy lifting

  • @DJDiskmachine
    @DJDiskmachine 3 ปีที่แล้ว

    What's FFT?
    Starts explaining the Shannon Nyquist sampling theorem and aliasing. =D

  • @jean-marcherard9216
    @jean-marcherard9216 3 ปีที่แล้ว +1

    Nice, sounds like Spectometry. Always wonder if shazam actually works with live versions or covers versions

    • @adminadmin8992
      @adminadmin8992 3 ปีที่แล้ว

      I tried it with live version. I worked.

    • @jean-marcherard9216
      @jean-marcherard9216 3 ปีที่แล้ว

      @@adminadmin8992 nice ...
      What about cover versions (songs sung by different artists), what about insrrumentals?

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว

      @@adminadmin8992 Whoever was performing live must have been doing it very accurately to the original

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว

      @@jean-marcherard9216 That will probably only work if the instrumental that's being used in the cover is the same as the one from the original.

  • @brilliantbrunch
    @brilliantbrunch 3 ปีที่แล้ว +4

    14:36 The guitar riff sounds like it's from Alive by Pearl Jam

  • @RGCTech
    @RGCTech 3 ปีที่แล้ว

    Cool video

  • @shinobu5359
    @shinobu5359 3 ปีที่แล้ว

    14:35 Pearl Jam - Alive riff, you won't have to use any of Shazuumy for that now.

  • @RedeemerZG
    @RedeemerZG 3 ปีที่แล้ว

    please provide a mathematical definition of that bijection from the set of songs to the set of song hashes

  • @NK-fx1qs
    @NK-fx1qs 3 ปีที่แล้ว +1

    Do one of these with some space transmissions and draw bob ross style. That's a nice signal right there, yes it is.

    • @ibrahim47x
      @ibrahim47x 3 ปีที่แล้ว +1

      It smells like Mary Jane in this comment

  • @Sattkopf
    @Sattkopf 3 ปีที่แล้ว

    is the same method used to turn music into midi notes?

  • @drezzylol
    @drezzylol 3 ปีที่แล้ว

    Does it mean that if two song have the same harmony and tempo they might be considered the same (or similar to an extent)?

    • @TheSteveSteele
      @TheSteveSteele 3 ปีที่แล้ว

      It depends on what you mean by “similar”. An ensemble of vocalist singing vowels will have a “fingerprint” that’s going to be quite different from a percussion ensemble. Both can produce the harmony and tempo but the difference in timbre could cause a program that’s using an FFT for “hits” and “misses” to give an uncharacteristic result. That in itself might be telling that there’s a variation of the arrangement. Regardless though, there’s more to identify than harmony and tempo.

  • @marco_gallone
    @marco_gallone 3 ปีที่แล้ว +1

    But is the search algorithm as simple as checking all sample to library cross correlations? For the multi-hundred-million songs in Shazam’s library that is still a timely search process, don’t you agree?

    • @GeneralBlackNorway
      @GeneralBlackNorway 3 ปีที่แล้ว

      If they take each slice of all the songs and list them in ascending order they can then take the values from the phone recording and starting from the middle, see if the value is greater or smaller. Then they go to the middle of the upper or lower half of the list and make the same check until they eventually reach a matching slice. This search method halves the search space with each comparison and can thus be done very quickly. Each song slice would be connected to a full song in the database and thus you have the ID of your song. They would probably repeat this search with several slices from the phone recording to eliminate false positives.

  • @ujjvalw2684
    @ujjvalw2684 3 ปีที่แล้ว

    I always thought about this

  • @wgs3leed
    @wgs3leed 3 ปีที่แล้ว

    Brilliant

  • @1966human
    @1966human ปีที่แล้ว

    Could there be a sound code in the background of the songs

  • @slovakthrowback3738
    @slovakthrowback3738 3 ปีที่แล้ว +10

    After watching this, I really wonder how things like Google Assistant's "What's this Song" which works with things like humming, and instrumentals only etc..

    • @metroidandroid
      @metroidandroid 3 ปีที่แล้ว +3

      black magic 101

    • @slovakthrowback3738
      @slovakthrowback3738 3 ปีที่แล้ว +1

      @@metroidandroid fair enough

    • @HesderOleh
      @HesderOleh 3 ปีที่แล้ว +1

      probably something similar to Dynamic Time Warping

    • @Mr.Leeroy
      @Mr.Leeroy 3 ปีที่แล้ว

      probably interpreting the notes, which may be feasible as the samples are less complex and matching dictionary could be put together.

    • @slovakthrowback3738
      @slovakthrowback3738 3 ปีที่แล้ว

      @@Mr.Leeroy May be, but it's definitely more complicated, since it can find off note songs, songs based on lyrics, sometimes even two songs overlapped 👌

  •  3 ปีที่แล้ว

    I just wonder wouldn't be a wavelet transform better for this task?

  • @RagHelen
    @RagHelen 3 ปีที่แล้ว

    Song recognition was already an integrated service in mobile phones years before the smart phone. You had to call a number, though.

    • @SproutyPottedPlant
      @SproutyPottedPlant 3 ปีที่แล้ว +1

      And the service was called Shazam!

    • @RagHelen
      @RagHelen 3 ปีที่แล้ว

      @@SproutyPottedPlant No. Certainly not.

  • @ezOqekuRitusohI
    @ezOqekuRitusohI 5 หลายเดือนก่อน +1

    I work at Shazam and I can confirm this is how Shazam works.
    I'm also lying.

  • @Nadroj72
    @Nadroj72 3 ปีที่แล้ว +7

    You should figure out how Googles' Now Playing feature works. It even works without being connected to data.

  • @zoggoth
    @zoggoth 3 ปีที่แล้ว

    You mentioned that you had constant size slices (23.4 Hz).
    At the bottom end of human hearing range. 100Hz -> 123.4Hz is a jump of almost 4 semi-tones
    At the top end. 5000Hz -> 5023.4Hz is 1/12 of a semitone
    From the point of view of us foolish humans, you're undersampling at the bottom & oversampling at the top
    with 1024 frequencies, you could sample every quarter semitone (hemidemisemitone?) between 24KHz & 9 mHz, or every eight of a semitone in human hearing range (20 to 20k)
    Does taking into account human perception help identify human music

  • @n7565j
    @n7565j 3 ปีที่แล้ว +1

    "My A chord rarely sounds great"... I feel your pain ;-)
    My G, C, B, and all the rest rarely sound great... That's why we work with computers instead of filling stadiums :-)

  • @TheSuperSaiyanGoku1
    @TheSuperSaiyanGoku1 ปีที่แล้ว

    Shazam Programmers: Write that down! Write that down!

  • @Bruh-el9js
    @Bruh-el9js 3 ปีที่แล้ว +13

    1:42 Fast & Furious Transformers

  • @zestyammar1973
    @zestyammar1973 3 ปีที่แล้ว

    the besties simply snapped

  • @DingoAteMeBaby
    @DingoAteMeBaby 3 ปีที่แล้ว

    or did they just hash it and do hash distance comparisons