How Shazam Works (Probably!) - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ก.ย. 2024
  • Looking at the audio mechanics and algorithms behind music identifier apps. David Domminney Fowler built a demo you can try yourself.
    EXTRA BITS: • EXTRA BITS: Shazoom - ...
    Play with Dave's demonstrator here: bit.ly/3qRo9t9
    More about David Domminney Fowler: bit.ly/38IhX0p
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottsco...
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 388

  • @DrJRMCFC
    @DrJRMCFC 3 ปีที่แล้ว +773

    there’s a paper published by the founder of shazam that describes the algorithm in some detail.

    • @danielalorbi
      @danielalorbi 3 ปีที่แล้ว +250

      This is a very relevant comment, with suprisingly little attention.
      Here's the title: "An Industrial-Strength Audio Search Algorithm - ALC Wang"
      (The founder in question is Avery Li-Chun Wang)

    • @bigpickles
      @bigpickles 3 ปีที่แล้ว +44

      I read this comment, liked it, and left without watching. Thank you.

    • @HesderOleh
      @HesderOleh 3 ปีที่แล้ว +1

      doesn't it use something more similar to DTW rather than FFT?

    • @Squossifrage
      @Squossifrage 3 ปีที่แล้ว +70

      And they address that in the beginning of the video. We know how it worked back then, we don't know how it works today.

    • @ChaitanyaBhagwatChai
      @ChaitanyaBhagwatChai 3 ปีที่แล้ว +27

      @@bigpickles your loss tbh. It's a great informative video

  • @pathologicalliar8728
    @pathologicalliar8728 3 ปีที่แล้ว +1277

    I work at shazam and I can tell you that we just hire a bunch of teenagers to guess which song it is. thats why its not very good at rock from the 70's

    • @ArihantChawla
      @ArihantChawla 3 ปีที่แล้ว +220

      username checks out

    • @pathologicalliar8728
      @pathologicalliar8728 3 ปีที่แล้ว +42

      @@noomade I never lie. lying is my least favorite thing to do.

    • @magma2680
      @magma2680 3 ปีที่แล้ว +1

      doesn't work on weeb music too well so i doubt it... it's probably just you

    • @ksportz66
      @ksportz66 3 ปีที่แล้ว

      Why does shazam not have a theme tune?

    • @pathologicalliar8728
      @pathologicalliar8728 3 ปีที่แล้ว +3

      @@ksportz66 because It's already patented by Nintendo and they refuse to license it out.

  • @realeques
    @realeques 3 ปีที่แล้ว +66

    even as a programmer myself shazam was always a mind-blowing app to me. The speed and accuracy is absolutely amazing. I've used it before to get a song from someone listening to it on his at the bus station.... or playing at the super market

  • @lustechsource5197
    @lustechsource5197 3 ปีที่แล้ว +148

    Shazam is one of those programs that your jaw drops when you first see it in action.

    • @joel6672
      @joel6672 3 ปีที่แล้ว

      yep

    • @bearcb
      @bearcb 3 ปีที่แล้ว +6

      The app that made me buy a smartphone back in the day

  • @HarmonicaMustang
    @HarmonicaMustang 3 ปีที่แล้ว +37

    As a trained audio engineer turned IT admin, it is so refreshing to hear audio terminology again, really brings me back to my uni days; Studying the Fourier transform, physics of room acoustics (like calculating the RT-60 or reverberation of a room), recording and mixing tracks. I miss working in a professional recording studio, but I found a second love in IT and even married the two fields together in projects. This was an interesting and entertaining video Computerphile. It felt like watching a bilingual movie where you can speak both languages.

  • @assepa
    @assepa 3 ปีที่แล้ว +314

    Kudos for the 30 years old printer paper this guy apparently still has a stack of, lol.

    • @domminney
      @domminney 3 ปีที่แล้ว +44

      From Sean’s last visit, I never throw anything away

    • @berzerkskwid
      @berzerkskwid 3 ปีที่แล้ว +34

      my working theory is that the university bought out the stock of a paper company going out of business in the 90s and has been giving reams of it out as faculty party favors ever since. Every Computerphile guest uses it.

    • @OffTheBeatenPath_
      @OffTheBeatenPath_ 3 ปีที่แล้ว +19

      @@berzerkskwid universities always carried a huge stock of that paper and then overnight the printers became obsolete. It was always great for sketching. I had boxes of it for free.

    • @kwabenaagyeiboitey6423
      @kwabenaagyeiboitey6423 3 ปีที่แล้ว +1

      The most decent but funny comment ever 😂😂

    • @RedwoodRhiadra
      @RedwoodRhiadra 3 ปีที่แล้ว +8

      They still make it, you can buy cases of two or three thousand sheets for sixty bucks or so on Amazon. (Search for "continuous feed green bar printer paper")

  • @finnqni8563
    @finnqni8563 3 ปีที่แล้ว +64

    I just want to say that I appreciate you making all of these awesome videos

  • @broderfoder9348
    @broderfoder9348 3 ปีที่แล้ว +305

    Shazam simulator, shouldn't it be Shasim?

    • @battman505
      @battman505 3 ปีที่แล้ว +9

      underrated coment

    • @Schindlabua
      @Schindlabua 3 ปีที่แล้ว +6

      Simzam

    • @lohphat
      @lohphat 3 ปีที่แล้ว +1

      It’s similar to a Shatner simulator. A Shatsim.

    • @vladimirpotrosky7855
      @vladimirpotrosky7855 3 ปีที่แล้ว

      Sha-sham

    • @thePronto
      @thePronto 3 ปีที่แล้ว

      That word would violate the YT terms of service.

  • @CalvinsWorldNews
    @CalvinsWorldNews 3 ปีที่แล้ว +22

    I always found it awesome that it wouldn't just tell that some music was, eg Beethoven's 8th Symphony, it would tell you specifically that it was a Berlin Philharmonic recording from 2004 or whatever, when even a conductor couldn't be that specific

  • @rich1051414
    @rich1051414 3 ปีที่แล้ว +118

    From what I understand, shazam generates a bunch of potential hashes per song, and it phones home to see if any of those hashes is a song id. It's a huge lookup table and a clever key generation based on relative frequency changes over time.

    • @shadebug
      @shadebug 3 ปีที่แล้ว +51

      That makes sense because anybody who uses it a lot knows you sometimes get some matches that are nothing alike, which is weird if it’s just comparing sounds but makes perfect sense if your hashes are clashing

    • @ArihantChawla
      @ArihantChawla 3 ปีที่แล้ว +3

      @@shadebug exactly

  • @3339LuXz
    @3339LuXz ปีที่แล้ว +3

    This man described Fourier Transformation in a beautiful way! Respect!!

  • @bigbri64
    @bigbri64 ปีที่แล้ว +5

    This is absolutely amazing and great work deciphering and explaining this algorithm!!

  • @Gamah1991
    @Gamah1991 3 ปีที่แล้ว +34

    David Domminney Fowler: If James May went all rock and roll/computer nerd...

    • @domminney
      @domminney 2 ปีที่แล้ว +1

      Ha ha 🤣

  • @Baxtexx
    @Baxtexx 3 ปีที่แล้ว +47

    Real Engineering did one of these a whiöe back, great to see more!

    • @Norsilca
      @Norsilca 3 ปีที่แล้ว +1

      Wondering what language keyboard that is to produce that typo

    • @ake_lindblom
      @ake_lindblom 3 ปีที่แล้ว +10

      @@Norsilca swedish!

    • @deidara_8598
      @deidara_8598 3 ปีที่แล้ว +1

      @@Norsilca I hope Swedish, but it could be Finnish, Icelandic, or Estonian too. Other nordic languages like Norwegian and Danish use a similar letter "ø", which is pronounced the same. ÆÄØÖÅ

  • @LeDabe
    @LeDabe 3 ปีที่แล้ว +6

    Don't mix Fast Fourier transform (FFT) and the Fourier transform (FT) concept. FFT is an algorithm, there are different FFT algorithms that do different things in different ways. The Fourier transform (more generally the transformation "concept") is what's interesting here as it allows to "change" your point of view of the data.

    • @RaunienTheFirst
      @RaunienTheFirst 3 ปีที่แล้ว

      My first fo(u)ray into Fourier transforms was during my uni days studying chemistry. It's used to process the radio frequency data from an NMR machine into a nice graph which is essentially how "strong" the signal is at a particular frequency and how quickly it tails off (although the end result doesn't actually have any units) that allows you to identify chemical groups, how many there are, and with a bit more analysis where they are in relation to each other. Fourier transforms are black magic as far as I'm concerned.

    • @LeDabe
      @LeDabe 3 ปีที่แล้ว

      @@RaunienTheFirst You can see the Fourier transform as the scalar product of the signal with an other signal of a given frequency and phase.

  • @lustechsource5197
    @lustechsource5197 3 ปีที่แล้ว +6

    Shazam seems like magic to me, but the way it was explained in this video took away a little of that magic. Wish schools would teach math concepts explaining how the tech we use at the time works.

    • @Michallote
      @Michallote 9 หลายเดือนก่อน

      The thing is that he is an expert on the field. It would be a tall order for a high school or earlier teacher to be able to explain and understand this as well as he does. He even built a proof of concept. That is no trivial task

  • @aliwelchoo
    @aliwelchoo 3 ปีที่แล้ว +34

    Interesting. I think the way Shazam searches through so many songs despite noise etc and does it so fast is the impressive part. The matching frequencies part is simple in comparison

    • @pluto8404
      @pluto8404 3 ปีที่แล้ว +32

      They actually just employ thousands of indian children to listen to your audio and guess the song.

    • @MattB90
      @MattB90 3 ปีที่แล้ว +6

      search is relatively straight forward though, the fft and slicing intervals and trying to get the amount of identifying info from each song as small as possible, way more interesting

    • @Henrix1998
      @Henrix1998 3 ปีที่แล้ว

      @@MattB90 The amount of songs in the database is huge tho

    • @mesut1261
      @mesut1261 3 ปีที่แล้ว

      @@pluto8404 that's why, major companies CEO's are Indian.

    • @iWhacko
      @iWhacko 3 ปีที่แล้ว

      No the searching is the easy part. it's not searching through all songs to find a match, like comparing every song in the database with the sample you played it. It's searches are way "more complicated", yet databases are optimised for that so it's really fast. google how a lookup table works, or binary search to understand the simple ones.
      of you sort all songs based on their first not for instance, you can discard all the songs that start with a different note, and then search for the next note, just as you would alphabetise a list of names. in practice its more advanced, but not really more difficult than the profiling of the song itself.

  • @stevepoling
    @stevepoling 3 ปีที่แล้ว +17

    The business with clocks is probably the most tortured non-explanation of the Fourier Transform I've ever heard.

    • @infinite1der
      @infinite1der 3 ปีที่แล้ว +1

      Agreed... ugh.

    • @stevepoling
      @stevepoling 3 ปีที่แล้ว +1

      @@infinite1der How's this: An FFT is a specific algorithm to calculate a Fourier Transform. What's a FT? Every periodic function can be approximated by a sum of sine or cosine waves. The Fourier Transform shows how much of each to add to approximate the time-waveform. The FT is thus a frequency-domain representation of its time-waveform. You can do a Fourier Transform by many means (ever hear of a holograph), but computationally, an efficient algorithm is the FFT. There are hints of the FFT going back as far as Gauss, but it only became commonplace once we got the hardware capable of doing digital signals processing.
      I haven't delved into the "how" of FFT so much as the "what" of FT, which is much more useful to someone who is likely to just pick up an FFT implementation from a library, or a specialized DSP chip. The FFT is worthy of its own detailed exposition aside from its application to the Shazam search algorithm.
      Non-stationary signals (i.e. all the interesting ones) can also be modeled (often more accurately) with other basis functions... ferinstance wavelets.

    • @SkylerAk
      @SkylerAk 3 ปีที่แล้ว +2

      @@stevepoling The Fourier transform is a mathematical function that can be used to find the base frequencies that a wave is made of.

    • @stevepoling
      @stevepoling 3 ปีที่แล้ว +1

      @@SkylerAk you can think of an arbitrary waveform, e.g. your favorite tune, as a function. And every function can be the resultant of the sum of other functions. The it helps if the other functions are related in some way. Use your intuitions of how vectors work in terms of basis vectors and vector sums. Just as a vector may be represented as a sum of any orthogonal set of basis vectors, any function can be represented as a sum of one or another set of basis functions. I gave the examples of sinusoids and wavelets. Sine waves are characterized by phase, amplitude, and frequency. The FT gives you the parameterization of the sinusoids approximating your input wave function. I've seen folks look at phase but most people just look at frequency & amplitude.
      I'm not disagreeing with you, Skyler, but pointing out there's a lot more interesting mathematics going on (despite the fact that FT is mostly used as you describe). And there are families of orthogonal functions that describe some signals more efficiently than sinusoids, e.g. wavelets.

  • @artit91
    @artit91 3 ปีที่แล้ว +2

    They use watermarking/hashing/fingerprinting, you name it. They measure beat per minute, chords, key etc and have them stored for them for every song. I think there are around 100 different properties they can calculate and use a composite index to use them to get the song out but the most of the time 5-6 properties are enough and you can get them from a few seconds.
    It's pretty easy job if half of the world listens to the same 100 songs. Those hash ids can be pulled on the device and the app can work offline even with a 1000 song hashes downloaded.
    Apparently Tinder and Shazam work very similarly under the hood.

  • @n0handles
    @n0handles 3 ปีที่แล้ว +3

    Love these audio related videos! Keep em coming

  • @GlitchingWithAlandi
    @GlitchingWithAlandi 3 ปีที่แล้ว +2

    Brilliant, David explains concepts so well.

  • @metroidandroid
    @metroidandroid 3 ปีที่แล้ว +25

    come on, explaining fft with phasors is not the most basic way

    • @HesderOleh
      @HesderOleh 3 ปีที่แล้ว

      Also I think Dynamic time warping is used rather than FFT

    • @iWhacko
      @iWhacko 3 ปีที่แล้ว

      @@HesderOleh Or Auto Correlation. I played around with note recognition when the iPhone3 came out, and FFT was just barely out of reach for computing power on the iPhone. Auto Correlation is much less intense for the cpu.

    • @spectralpiano3881
      @spectralpiano3881 3 ปีที่แล้ว +1

      @@iWhacko Ironically, autocorrelation is calculated using the FFT: AC(x) = IFFT( | FFT(x) |^2 ) . For short signals it could be that it's faster to simply calculate SUM_n ( x(n)*x(n+tau) ) but I'm not too sure about that

    • @andinomm
      @andinomm 3 ปีที่แล้ว +5

      The FFT explanation was horrible..... He could've just said that every periodic signal is made from sines and use some graphs to display that as you see like everywhere you search for it. Would've been more useful for beginners/newcomers.

  • @CaptainJeoy
    @CaptainJeoy 3 ปีที่แล้ว +21

    _Meanwhile how Shazam actually works is that it secretly uploads your recording to TH-cam then gets copyright struck and then voila, the song ID :)_

    • @ZipplyZane
      @ZipplyZane 3 ปีที่แล้ว +2

      I hope not, as TH-cam seems to constantly get it wrong.

  • @Cyber99221
    @Cyber99221 3 ปีที่แล้ว +3

    As soon as you upload a video, no matter what I'm doing, I click and watch.

  • @TerjeMathisen
    @TerjeMathisen 3 ปีที่แล้ว +3

    I am pretty sure you would use an algorithm which is much closer related to the SIFT algorithm which is used by all panorama stitching programs, including the super-resolution and night mode features of modern cell phone cameras. I.e. this will allow you to recognize songs that are played back at a slightly different speed, so that all notes and intervals will be shifted.

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว

      In order to detect speed altered audio all you need is a feature in your hash that is timeskew invariant. I read a paper on it some time ago where the author also gave an example of such invariants.

  • @jmonsted
    @jmonsted 3 ปีที่แล้ว +9

    Now i'm tempted to run his code with my music collection and look for duplicates...

  • @lunasophia9002
    @lunasophia9002 3 ปีที่แล้ว +5

    The photo of David in the thumbnail makes him look like a Sith lord.

  • @gothxx
    @gothxx 3 ปีที่แล้ว +2

    I guess it is also easy for them to optimize the parameters with ML since they have song data and user samples so they have known input for known answers that they can try different algorithms/parameters on in a large scale.

  • @monono8156
    @monono8156 3 ปีที่แล้ว +1

    Woo! A DSP video on computerphile! This is great :)

  • @UxRandom
    @UxRandom 3 ปีที่แล้ว +2

    I remember using soundhound back in the day.

  • @phyphor
    @phyphor 3 ปีที่แล้ว +1

    `Ruddy 'ell. I've been a fan of Dave's music since I saw his band Diversion about 25 years ago...
    Small world!

  • @vladimirpotrosky7855
    @vladimirpotrosky7855 3 ปีที่แล้ว +4

    16:20 "avoid any imperial entanglements"
    Amazing Star Wars ref

  • @willemvdk4886
    @willemvdk4886 3 ปีที่แล้ว +7

    Love the short Pearl Jam riff ;)

  • @Benoit-Pierre
    @Benoit-Pierre 3 ปีที่แล้ว +1

    You spend less than 20s on the only point I am interested in : how to convert a part of fft into a hash that can one shot reach the song in a very large database.
    Do you produce several hash per song ? How do you store this ? Do you need parallele servers to search tags ?
    What do you hash on client side exactly ?
    Comparing 5 songs is trivial. The Shazam db has millions or billions.

  • @nietschecrossout550
    @nietschecrossout550 3 ปีที่แล้ว +2

    My guess would be that they generate a identity vector of the frequency space of a bunch of samples of the song, to fill up a db. Then they generate the identity vector the same way when recoding, eliminating background noice etc.. Now you can just compare the difference between vectors, which is what you can train an ML model for.

  • @markusklyver6277
    @markusklyver6277 3 ปีที่แล้ว +4

    This was extremely hand wavey

  • @ahmedelphi
    @ahmedelphi 3 ปีที่แล้ว +2

    Another brilliant video. But how would it sound if these frequency-domain arrays are transformed back to time-domain? It would be interesting to hear how a clip “sounds” to the application. I wish that was in the video, or it’s extra.

    • @harrysvensson2610
      @harrysvensson2610 3 ปีที่แล้ว

      ​@@moduu24 FFT is reversible, but that's not what Ahmed asked about. In the video it's said that the loudest frequencies of some within some bucket size are extracted. That means that some frequencies are set to 0 amplitude and that is not reversible. It would be interesting to listen to that to see how it compares to the original song.

  • @hobo9968
    @hobo9968 3 ปีที่แล้ว +2

    Woow. Basically you look at the music in the frequency domain and do an optimized search based on the prevalent frequency slices

    • @TheAudioCGMan
      @TheAudioCGMan 3 ปีที่แล้ว

      This sounds like the abstract of a paper

  • @ToniT800
    @ToniT800 3 ปีที่แล้ว +8

    I honestly think hat the explanation of the FFT was way too overcomplicated, especially for this kind of videos.

    • @chicoktc
      @chicoktc 3 ปีที่แล้ว +2

      Yeah, they could have simplified to "it takes the top frequencies from that tiny chunk of song" and that's it. Maybe in the extras?

    • @colin_hart
      @colin_hart 3 ปีที่แล้ว

      You can jump from 1:54 to 8:04

  • @BlankBrain
    @BlankBrain 3 ปีที่แล้ว +2

    I wrote something a little like this back in the '80s to take raw digital (payroll) data off unlabeled mag-tape and figure out the record size and columns. There weren't any nice FFT libraries to assist. There wasn't even a lot of memory to hold sample data (without paging). The program had to control the tape drive to make multiple passes in some cases. It worked well enough to automate reading customer payroll data from a myriad of custom payroll systems. Once the data was in records with columns, the analysts were able to use other tools to update pension contributions.

  • @GordonjSmith1
    @GordonjSmith1 2 ปีที่แล้ว

    Really insightful from a digital problem that I had not considered before. Nice piece of 'blogging'!

  • @8015908
    @8015908 3 ปีที่แล้ว +17

    In the thumbnail I thought his hair was a robe.

    • @domminney
      @domminney 3 ปีที่แล้ว

      🤣

    • @kourii
      @kourii 3 ปีที่แล้ว +1

      Lol yeah at a glance I thought it was a hood too

    • @MattExzy
      @MattExzy 3 ปีที่แล้ว +4

      "Do what must be done Lord Vader..."

  • @zwz.zdenek
    @zwz.zdenek 3 ปีที่แล้ว +1

    There is this competing app that can sometimes tell the song just by you singing it. No bassline, poor timing, probably in the wrong key, but it still works. I wonder if they hired a bunch of former Indian captcha solvers for this...

    • @laurinneff4304
      @laurinneff4304 3 ปีที่แล้ว

      Google assistant can do this

  • @bmwolfe2786
    @bmwolfe2786 3 ปีที่แล้ว +1

    Was that a little pearl jam alive lick you threw in there, around 14:36 ?

  • @rootsquare
    @rootsquare 3 ปีที่แล้ว

    Kantar/Nielsen TV boxes utilise a similar technique to shazam to identify programmes being watched. IIRC. This enables them to also pick up on time-shifted content being watched after airing...

  • @TheSuperSaiyanGoku1
    @TheSuperSaiyanGoku1 ปีที่แล้ว

    Shazam Programmers: Write that down! Write that down!

  • @ezOqekuRitusohI
    @ezOqekuRitusohI 8 หลายเดือนก่อน +1

    I work at Shazam and I can confirm this is how Shazam works.
    I'm also lying.

  • @zoggoth
    @zoggoth 3 ปีที่แล้ว

    You mentioned that you had constant size slices (23.4 Hz).
    At the bottom end of human hearing range. 100Hz -> 123.4Hz is a jump of almost 4 semi-tones
    At the top end. 5000Hz -> 5023.4Hz is 1/12 of a semitone
    From the point of view of us foolish humans, you're undersampling at the bottom & oversampling at the top
    with 1024 frequencies, you could sample every quarter semitone (hemidemisemitone?) between 24KHz & 9 mHz, or every eight of a semitone in human hearing range (20 to 20k)
    Does taking into account human perception help identify human music

  • @slovakthrowback3738
    @slovakthrowback3738 3 ปีที่แล้ว +10

    After watching this, I really wonder how things like Google Assistant's "What's this Song" which works with things like humming, and instrumentals only etc..

    • @metroidandroid
      @metroidandroid 3 ปีที่แล้ว +3

      black magic 101

    • @slovakthrowback3738
      @slovakthrowback3738 3 ปีที่แล้ว +1

      @@metroidandroid fair enough

    • @HesderOleh
      @HesderOleh 3 ปีที่แล้ว +1

      probably something similar to Dynamic Time Warping

    • @Mr.Leeroy
      @Mr.Leeroy 3 ปีที่แล้ว

      probably interpreting the notes, which may be feasible as the samples are less complex and matching dictionary could be put together.

    • @slovakthrowback3738
      @slovakthrowback3738 3 ปีที่แล้ว

      @@Mr.Leeroy May be, but it's definitely more complicated, since it can find off note songs, songs based on lyrics, sometimes even two songs overlapped 👌

  • @ultimaterocker4
    @ultimaterocker4 3 ปีที่แล้ว +45

    "Let's say the period is 100hz" really annoyed me

    • @metroidandroid
      @metroidandroid 3 ปีที่แล้ว +7

      it hurts

    • @Peds013
      @Peds013 3 ปีที่แล้ว +1

      I thought it was just me!

    • @imamalox
      @imamalox 3 ปีที่แล้ว +1

      What also annoyed me is him drawing the markings for the period at the top of the sine wave instead of convexity shift part

    • @ilyboc
      @ilyboc 3 ปีที่แล้ว +3

      lol whatever we all know he meant the period related to 100hz of freq.

    • @Ceelvain
      @Ceelvain 3 ปีที่แล้ว +2

      That and using "FFT" when he means "Fourier Transform".

  • @ryanlorenti469
    @ryanlorenti469 3 ปีที่แล้ว

    this is a really cool use of trig

  • @xxsummer666xx4
    @xxsummer666xx4 3 ปีที่แล้ว +1

    something i find really impressive is the fact that shazam can detect harsh noise tracks

    • @carlosmspk
      @carlosmspk 3 ปีที่แล้ว

      He asked that at the end. A noisy low quality A#, is still an A# so that's the frequency that stands out the most

  • @venil82
    @venil82 3 ปีที่แล้ว +4

    Was surprised to see he has written it in JavaScript

    • @domminney
      @domminney 3 ปีที่แล้ว +1

      Following on from my html5/js video I felt it should be

    • @max_kl
      @max_kl 3 ปีที่แล้ว

      To be fair, the browser's audio processing and graphics code is written in C++, and it does the heavy lifting

  • @brilliantbrunch
    @brilliantbrunch 3 ปีที่แล้ว +4

    14:36 The guitar riff sounds like it's from Alive by Pearl Jam

  • @laxr5rs
    @laxr5rs 3 ปีที่แล้ว +4

    Nice BASS! GREAT quality on the video (screen grab maybe?)

    • @domminney
      @domminney 3 ปีที่แล้ว

      yes, I recorded it in OBS!

  • @Nemitorsis
    @Nemitorsis 3 ปีที่แล้ว +1

    I am curious to whether it is better (or worse) to use MFCC + MFCC delta, compared to direct FFT or frequency distribution in these cases.

  • @RGS61
    @RGS61 3 ปีที่แล้ว

    So a great explanation of how Shazam (probably) works to identify a tune .. from a 'user sample in' perspective .. But what about the other side of the comparison match? .. How does Shazam manage to sample so many source files to build up its database? .. Through agreements with the music rights holders? .. or music streaming services? .. or ..?? .. or am I missing something?!

  • @Bruh-el9js
    @Bruh-el9js 3 ปีที่แล้ว +13

    1:42 Fast & Furious Transformers

  • @dralfonzo24
    @dralfonzo24 3 ปีที่แล้ว +2

    Man, I've seen a lot of "simplified" explanations for the Fourier Transform over the years, but this one's gotta be the messiest. One couldn't possibly understand what he's talking about without a previous course on the subject.

  • @fireclub493
    @fireclub493 3 ปีที่แล้ว

    Awesome! Would love to see more attempts of implementing other interesting algorithms too

  • @davideareias7876
    @davideareias7876 3 ปีที่แล้ว +1

    I have acctually created a shazam my self that uses entropy and other stuff and prints out the probability of song being what I inputed it, not an hard project but its kinda slow

  • @mountp1391
    @mountp1391 2 ปีที่แล้ว

    How awesome shazam is! Thank you.

  • @cypher9000
    @cypher9000 3 ปีที่แล้ว +1

    There's also SoundHound by the way. Also "var"? "VAR"!? What is this, 2000? 😁

    • @KX36
      @KX36 3 ปีที่แล้ว

      I'd *let* him off

  • @sciencefirefly837
    @sciencefirefly837 3 ปีที่แล้ว +1

    Does anyone know about the git repository for this project? It would be fun if it is open source and we can improve it.

  • @octavylon9008
    @octavylon9008 3 ปีที่แล้ว +1

    Guys why are you using zoom ? Its proprietary spyware .
    You should use Jitsi instead

  • @IWubYooz
    @IWubYooz 3 ปีที่แล้ว

    I only clicked on this because of the presumption in the title that it does work... I've tried many, many times and shazam has yet to tell me a song title.

  • @surters
    @surters 3 ปีที่แล้ว +4

    So did you get copyright strike despite owning the music?

    • @max_kl
      @max_kl 3 ปีที่แล้ว

      That only works for music listed in TH-cam's Content ID database and you'd see the songs below the video description. So I don't think TH-cam picked it up

  • @comedyhunter
    @comedyhunter 3 ปีที่แล้ว

    Very interesting, always wondered how it works. I have a couple of questions for you. *(1)* Once its identified a song you can click on "Lyrics" and it shows in in real-time the words to the song its just identified and its in the correct place and follows it, how on Earth does it manage to do that? It manages to work out which chorus its on, amazed it can do that. *(2)* Also I've used Shazam in very noisy bars where you can only just about hear music and Shazam actually manages to identify it even though you can only just about make out music at all ! how does it cope with all the noise and pick out FFT's when its such a noisy environment?
    Thanks for the explanation on FFT, nice and easy to understand how that works now.

    • @axelnils
      @axelnils 3 ปีที่แล้ว +1

      Lyrics and chords are probably just pulled from some database.

    • @comedyhunter
      @comedyhunter 3 ปีที่แล้ว

      @@axelnils yes but how does it manage to line up the words precisely and even know which chorus it’s on

    • @eloskowy4954
      @eloskowy4954 3 ปีที่แล้ว +1

      @@comedyhunter I found document called "An Industrial-Strength Audio Search Algorithm".
      Music has been found, right? So it needs to just process the notes that device is hearing and sync it with lyrics stored in database. Those lyrics contains info about in what time the text is been sang. You can try something like QuickLyrics and you'll know how it works because it takes data about what song is been played from notifications.

    • @comedyhunter
      @comedyhunter 3 ปีที่แล้ว

      @@eloskowy4954 Thanks for the extra info.

  • @marco_gallone
    @marco_gallone 3 ปีที่แล้ว +1

    But is the search algorithm as simple as checking all sample to library cross correlations? For the multi-hundred-million songs in Shazam’s library that is still a timely search process, don’t you agree?

    • @GeneralBlackNorway
      @GeneralBlackNorway 3 ปีที่แล้ว

      If they take each slice of all the songs and list them in ascending order they can then take the values from the phone recording and starting from the middle, see if the value is greater or smaller. Then they go to the middle of the upper or lower half of the list and make the same check until they eventually reach a matching slice. This search method halves the search space with each comparison and can thus be done very quickly. Each song slice would be connected to a full song in the database and thus you have the ID of your song. They would probably repeat this search with several slices from the phone recording to eliminate false positives.

  • @Nadroj72
    @Nadroj72 3 ปีที่แล้ว +7

    You should figure out how Googles' Now Playing feature works. It even works without being connected to data.

  • @Krcha13
    @Krcha13 3 ปีที่แล้ว

    On the shazam side they have converted all songs from audio signal > digital stream > signature that is binary.
    On user side your part of recording is converted as well in just part of that signature
    you send that to shazam, they match part of signature to signature and thats it

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว +1

      Very oversimplified

  • @megavide0
    @megavide0 3 ปีที่แล้ว

    JavaScript & Web Audio ..? amazing!

  • @BestHKisDLM
    @BestHKisDLM 3 ปีที่แล้ว

    Excellent contect. Thank you!

  • @Ceelvain
    @Ceelvain 3 ปีที่แล้ว

    I guess by "FFT" he meant "Fourier Transform". FFT is a (class of) algorithms that compute the Discrete Fourier Transform with a better complexity than the naive approach. Effectively, an FFT does *NOT* perform the full calculation for every frequency independently, that's why it's called "Fast".

  • @fixfaxerify
    @fixfaxerify ปีที่แล้ว

    As always, computerphile standard issue matrix printer paper. +1 for consistency!

  • @otiagomarques
    @otiagomarques 3 ปีที่แล้ว +1

    you gotta make a videoon how you got that 2:00 camera move!

  • @1966human
    @1966human ปีที่แล้ว

    Could there be a sound code in the background of the songs

  • @wgs3leed
    @wgs3leed 3 ปีที่แล้ว

    Brilliant

  • @Scranny
    @Scranny 3 ปีที่แล้ว

    Nowadays I assume this can be accomplished using an LSTM neural network? or incorporating a CNN? But perhaps NNs are a bit overkill for this task and slower engineering wise (I mean, you can't have an output layer the size of all the songs in your DB, but you can produce a searchable vector which you would need lots of memory and fast servers to support).

  • @Car_Ram_Rod
    @Car_Ram_Rod 3 ปีที่แล้ว +2

    This is interesting!

  • @n7565j
    @n7565j 3 ปีที่แล้ว +1

    "My A chord rarely sounds great"... I feel your pain ;-)
    My G, C, B, and all the rest rarely sound great... That's why we work with computers instead of filling stadiums :-)

  • @egumit
    @egumit 3 ปีที่แล้ว

    I was most impressed by the fact the drawing was done on dot matrix paper which were used for printers before laser printers. Very good presentation.

  • @goodkavin
    @goodkavin 3 ปีที่แล้ว

    What would need to be done additionally to make it work for humming as well, like in Google Assistant/SoundHound?

  • @lavericklavericklave
    @lavericklavericklave 2 ปีที่แล้ว

    how did you manage to make fft seem so complicated?

  • @NK-fx1qs
    @NK-fx1qs 3 ปีที่แล้ว +1

    Do one of these with some space transmissions and draw bob ross style. That's a nice signal right there, yes it is.

    • @ibrahim47x
      @ibrahim47x 3 ปีที่แล้ว +1

      It smells like Mary Jane in this comment

  • @muneebkh4n
    @muneebkh4n 3 ปีที่แล้ว

    This guy plays black metal in his leisure time

  • @jean-marcherard9216
    @jean-marcherard9216 3 ปีที่แล้ว +1

    Nice, sounds like Spectometry. Always wonder if shazam actually works with live versions or covers versions

    • @adminadmin8992
      @adminadmin8992 3 ปีที่แล้ว

      I tried it with live version. I worked.

    • @jean-marcherard9216
      @jean-marcherard9216 3 ปีที่แล้ว

      @@adminadmin8992 nice ...
      What about cover versions (songs sung by different artists), what about insrrumentals?

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว

      @@adminadmin8992 Whoever was performing live must have been doing it very accurately to the original

    • @grankoczsk
      @grankoczsk 3 ปีที่แล้ว

      @@jean-marcherard9216 That will probably only work if the instrumental that's being used in the cover is the same as the one from the original.

  • @MrNacknime
    @MrNacknime 3 ปีที่แล้ว

    How did he manage to spend 20 minutes on the FFT part without actually explaining the maths behind it, and then only spend a very short bit on the actual comparison algorithm? Surely the matching is actually the interesting bit, as FFT is just an "off-the-shelf" standard algorithm

  • @DJDiskmachine
    @DJDiskmachine 3 ปีที่แล้ว

    What's FFT?
    Starts explaining the Shannon Nyquist sampling theorem and aliasing. =D

  • @cantcommute
    @cantcommute 3 ปีที่แล้ว

    This guy's great

    • @cantcommute
      @cantcommute 3 ปีที่แล้ว

      Though the comments here are even worse than numberphile's somehow...

  • @bisepost
    @bisepost 3 ปีที่แล้ว

    Is that Alive by Pearl Jam @14:36?

  • @TheGreatAtario
    @TheGreatAtario 3 ปีที่แล้ว +1

    What I'm really curious about is when they can get it from me _humming_ the song

    • @markjwilcox
      @markjwilcox 3 ปีที่แล้ว

      You must be much better at humming than I am then. 😉

  • @oddnuts5764
    @oddnuts5764 3 ปีที่แล้ว

    I didn't think dot matrix printer paper like that existed any more.

  • @neilAneerGAmAI
    @neilAneerGAmAI 3 ปีที่แล้ว

    Great video

  • @matthewwhite546
    @matthewwhite546 ปีที่แล้ว +1

    This guy sounds like Simon from Bodger and Badger.

    • @domminney
      @domminney 8 หลายเดือนก่อน

      That’s the best comment I’ve ever read about myself 😂

  • @doougle
    @doougle ปีที่แล้ว

    5:17 A little behind the scenes view, eh?

  • @blk_lies
    @blk_lies 3 ปีที่แล้ว

    can you make a video on how ipv6 works? or is there already one existing on this channel?

  • @RagHelen
    @RagHelen 3 ปีที่แล้ว

    Song recognition was already an integrated service in mobile phones years before the smart phone. You had to call a number, though.

    • @SproutyPottedPlant
      @SproutyPottedPlant 3 ปีที่แล้ว +1

      And the service was called Shazam!

    • @RagHelen
      @RagHelen 3 ปีที่แล้ว

      @@SproutyPottedPlant No. Certainly not.

  • @MrSigmaSharp
    @MrSigmaSharp 3 ปีที่แล้ว

    I guess this is one of the rare actually useful (but admittedly a bit complex) computerphile videos of all times

  • @joinedupjon
    @joinedupjon 3 ปีที่แล้ว +3

    I like this guy but seems like he took a long time explaining that FFT turns audio into a spectrograph before showing us a spectrograph of some audio.

    • @JohnSmith-hn6kv
      @JohnSmith-hn6kv 3 ปีที่แล้ว

      Yeah I skipped that bit.

    • @joinedupjon
      @joinedupjon 3 ปีที่แล้ว

      @Ollie Saer yeah and tbh covid is probably disrupting video making and we're lucky to be getting any computerphile to cheer us up at the moment - hope everyone realises I'm not trying to be mean.