Subtitles section Play video Print subtitles This episode of Real Engineering is brought to you by Brilliant, a problem solving website that teaches you to think like an engineer. Introduction: Opening, scene in a pub listening to a song and opening the shazam app. Maybe be tricky to film. What you just witnessed was the Shazam app recognising a song in a noisy environment, and proceeding to find a match for it among the millions of songs in its servers database. For most this probably seems like a trivial task. Our brains can identify songs incredibly quickly from a young age, but the pathways in your brain that allow you to identify a song quickly are incredibly complex. Often times you simply need to hear just a few chords to know exactly what song is about to play, that jolt of excitement when you can hear a DJ fading in the baseline of your favourite song. A simple combination of tones in a specific order allow you to identify a song from the thousands of other songs you have heard in your life in an instant, but coding a computer do the same thing is an incredible challenge. A computer does not have an intuitive understanding of music. A computer can only compare songs to other songs in its database, looking for a match by comparison. It is a problem akin to finding a needle in a haystack, where you can only find the needle by looking at a picture of a needle and comparing it to each individual straw, comparing it's length and colour until you finally find the needle. To create a software capable of doing this task quickly poses a very interesting coding challenge and the solution the engineers at Shazam came up with gives us some interesting insight into how our own brains work. A study by the Manchester Museum of Science and Industry tested 12,000 people's ability to recognise a song. They created an interactive game to search for the most recognisable songs, where they would play the hook of 1000 best selling songs and recorded the time required to identify them.[1] Can you identify this song with just 2.3 seconds of the hook? That was the Spice Girls song Wannabe, which ranks highest with a recognition time averaging just 2.3 seconds, and that's including the reaction time required to hit the button. Our brains are hardwired for this kind of pattern recognition. In a world where recognising the sound of an approaching threat meant life or death, we have evolved incredibly efficient ways of categorizing and accessing historical data like this. Our brain does not take the sound and compare it to every sound we have ever heard like a computer, the specific combination of chords in progression simply activates specific neurons that unlock that historical data. What if the chords were played by a different instrument? Would we recognise the song as quickly? Those same 2.3 seconds played on a guitar sounds like this. The notes are exactly the same, but they don't sound the exactly the same. We even know intuitively what instrument is playing. Why is that? This is called the timbre of a note and different instruments have different timbres. Pianos and guitars are examples of harmonic instruments and when they produce a note, they aren't just producing a pure note of a single frequency. Each note is a combination of multiple frequencies all related to the base note, the fundamental frequency. These are called overtones, and they are simply multiples of the base frequency. Each instrument has a unique combination and evolution of these overtones that give it that unique sound. Again, it's quite easy for our brains to distinguish between a piano and a guitar, but we need a way to quantify these characteristics for a computer to recognise, and this is where the spectrogram comes in. A spectrogram is a visual representation of sound. It's a 3D graph with time on the x-axis, frequency on the y-axis, and the amplitude of the frequency, or in other words the loudness, on the z-axis, which is often represented by a colour. This 3D graph is something a computer can absolutely recognise and store as data, but there is huge amount of data within a spectrogram like this, and the more data there is the more computation time is required to find a match. So the first step in reducing computation time is reducing the data required to classify a song. Shazam uses something they call a fingerprint, where they transform these spectrograms into something that looks like star map. [2] Here each star represents the strongest frequencies at particular times. Doing this, we have not only reduced our graph from 3 dimensions down to 2, but have drastically reduced the amount of data points on the graph. This is the first vital part of Shazam's technology. Every single song in Shazams database is stored in a fingerprint like this. When you open your phone and hit that Shazam button, the app accesses your microphone and begins to create its own fingerprint of the sound waves it receives. This ingenious method also helps the shazam app to filter out noise because it only creates data points for stand-out frequencies. Once the app has created a fingerprint of your audio, it then sends it to the shazam servers where the recognition part of the process begins. This is where things get difficult. Let's look at a simplified song fingerprint, and a recorded fingerprint to see why. The recorded fingerprint is only a short recording of the song, in our example we have just 3 possible frequencies, and each recorded fingerprint will have just 3 time points. If we want to check the first 3 time points in the song for a match we first check the 3 frequencies, then we move onto the next time point and check the 3 possible frequencies again, and do the same for the final time point. If we find a match, that is 9 operations required to find a match, but obviously that isn't likely. We then need to do those nine operations for every time point in the song, or perhaps every time point in Shazams massive music archive, this obviously is going to take a lot of computation time. This is not how Shazam looks for a match. First Shazam categorises fingerprints in a clever way. We don't search to see if a note exists in a song, we search to see if several notes exist separated by a particular time, just as brain does. This becomes our searchable address for a hash table. Hashes and hash functions are an incredibly useful technique that appear everywhere in computer science. Hash functions can be found in search algorithms used by Google, to make sure files are downloaded correctly, and are the backbone of crypto currencies like bitcoin. [3] A hash function takes a varying length of input and produces a fixed length output, called a hash. In practice, the input can be anything from a short piece of text, like a password, to a long data stream like an entire movie. Consider a library of books. We want to store each book on a shelf so we can find it later, and we know we'll have the title of the book when we're searching for it. We can use a hash function to decide which shelf to put a book on, using the title of the book as the input and producing a shelf number as an output. The first goal of a hash function is to produce outputs that are uniformly distributed...In our library, we want the books to be spread evenly across the shelves, so no shelf in particular will end up full of books, leaving others almost empty. The second goal of a hash function is that it should reduce collisions. A collision is when two different inputs produce the same output hash. In our case, a collision results in two or more books on the same shelf. If our library only has two shelves, collisions will be really common, no matter what hash function you use. If our library had a billion shelves, a good hash function will mean collisions will be rare. Another goal of a hash function is that it should be quick to calculate. If our library has millions of books, we don't want to take too long figuring out which shelf each one needs to be on. A simple hash function might be to take the title of a book, and group them on shelves alphabetically. This would be really quick to calculate, but it would result in a lot of collisions, with many books on the same shelf, and wouldn't be very well distributed. Think about how many book titles begin with the word “THE”, compared to how many book titles start with the letter “Z”. An alternative might be to take the position of each letter in the alphabet and sum up the letters in the book title. We could then divide that number by the number of shelves we have and take the remainder as the shelf number to store the book on. This would still be fairly fast to calculate, and would prevent all the books with titles starting with the word “THE” being stored on the same shelf. Now imagine, instead of book titles, our hash function takes data from our two frequencies separated by particular time as an input, t, and produces a number between 1 and...say 1 billion. First, we go through our database of songs and calculate the hash number for each anchor point. Songs will contain multiple anchor points, which will allow us to categorise short snippets of songs by the frequency of the anchor point, the frequency the following point and the time between them. And just like the library, we store each anchor point in order by the hash. These addresses' are also categorised with song IDs and time stamps within the song in a secondary hash table, allowing us to search for matching songs. This makes it much faster to locate our matches, and to find our song we will require multiple matching anchor points. This ingenious method of song recognition allowed Shazam to be sold for 400 million dollars to Apple, and help you figure out just what that catchy song is. This is a very simplified view of how the programming of Shazam works, but I have linked my research materials below if you would like to read more into the process. Or if you would like to begin learning more about programming and build a solid foundation of understanding, then you could take this course on Computer Science Fundamentals on Brilliant which will start with an intro to algorithms and gradually build you up to more complex ideas like data types and structures, by the end of this course you will have discovered algorithms that can be used to extract answers from data, and when you are finished you can take the follow up course in computer science algorithms. This is just one of many courses on Brilliant, with more courses due to released soon on things like automotive engineering and Python Coding. If I have inspired you and you want to educate yourself, then go to brilliant.org/RealEngineering and sign up for free.And the first 73 people that go to that link will get 20% off the annual Premium subscription. As always thanks for watching and thank you to all my Patreon supporters. If you would like to see more from me the links to my instagram, twitter, subreddit and discord server are below.
B1 US hash shazam fingerprint shelf data frequency How Shazam Works 9 1 joey joey posted on 2021/06/11 More Share Save Report Video vocabulary