return to table of content

How does Shazam work? (2022)

mmaunder
49 replies
19h26m

This was the smart approach when Shazam launched in 2008. I would have done exactly the same thing - gone straight to developing a method to turn every song into a hash as computationally efficiently as possible. If you launched this today the default R&D approach would be to train a model which may turn out to be far less efficient and more expensive to host. It feels like the kind of thing a model might be good at, but given that there are a finite number of songs, taking a hash-based approach is probably way more performant.

crazygringo
16 replies
17h50m

to turn every song into a hash

Just to be clear, it's not turning each song into a hash.

It's turning each song into many hundreds (thousands?) of hashes.

And then you're looking for the greatest number of mostly-consecutive matches of tens (or low hundreds) of hashes from your shorter sample.

Also, I don't think this would be done with training a model today, because you're adding many, many new songs each day, that would necessitate constant retraining. Hashes are still going to be the superior approach, not just for efficiency but for robustness generally.

casualscience
6 replies
13h18m

I'm an MLE, I would probably chop the songs into short segments, add noise (particularly trying to layer in people talking, room noise, and apply frequency-based filtering), and create a dataset like that. Then I would create contrastive embeddings with a hinge loss with a convnet on the spectrogram.

Ultimately this looks the same, but the "hashes" come from a convnet now. But you still are doing some nearest neighbor thing to actually choose the best match.

I imagine this is what 90% of MLEs would do, not sure if it would work better or worse than what Shazaam did. Prior to knowing Shazaam works, I might think this is a pretty hard problem, knowing Shazaam works, I am very confident the approach above would be competitive.

osrec
5 replies
9h19m

Why add noise to the training set, rather than attempt to denoise the input?

wiz21c
0 replies
8h46m

To start from an original song and move it towards something that resebme a real life recording ? IOW : make the NN learn to distinguish between the song sound and its environment ?

sdenton4
0 replies
7h40m

So you want a location-sensitive hash, or embedding, and you want it to be noise resistant.

The ml approach is to define a family of data augmentations A, and a network N, such that for some augmentation f, we have N(f(x)) ~= N(x). Then we learn the weights of N, and on real data have N(x')~=N(x).

The denoising approach is to define a set of denoising algorithms D and hash function H, so that H(D(x'))~=H(x). This largely relies on D(x')~=x, which may have real problems.

So the neutral network learns the function we actually need, with the properties we want, where the denoiser is designed for a proxy problem.

But that's not all...

Eventually our noise model needs extending (eg, reverb is a problem): the ML approach adds a new set of augmentations to A. This is fine: it's easy to add new augmentations.

But the denoiser might need some real algorithm work, and hope that there's no bad interaction with other parts of the pipeline, or too much additional compute overhead. (And de-reverb is notoriously hard.)

omnster
0 replies
8h41m

I'd assume adding noise is done once per song and is thus a bit computationally cheaper than trying to denoise each input.

mk67
0 replies
2h50m

Adding noise is generally helpful for regularization in ML. Most modern deep learning approaches do this in one way or the other - mostly dropout. It improves generalization capabilities of the model.

bottled_poe
0 replies
8h12m

Because then you’re training it on data that is more similar to the operating environment for the application. It’s a better fit for purpose. If the target environment was a clean audio signal, you’d optimise for that instead.

teeray
3 replies
12h17m

Depends what the model targets. If I gave this problem to a bunch of musicians, they’d be pulling out features like the key, tempo, meter, chord progressions, any distinctive riffs or basslines, etc. Those are the things hashes could be built from and would be more information-dense than samples of the particular recording.

Using a model to deconstruct a song like that might enable the ability to recognize someone playing the opening bars of Mr. Brightside on a piano in a loud bar as well as its drunkest patrons.

namtab00
1 replies
8h36m

you couldn't recognize anything ambient like, let's say, Loscil

sdenton4
0 replies
7h53m

Hard times for Indian classical music, as well...

gowld
0 replies
2h3m

"Features" was Pandora's original design (Music Genome Project). IIRC you can still see it in the UI in how they describe songs.

brainfog
2 replies
16h20m

that would necessitate constant retraining

You wouldn't necessarily need to retrain that frequently. If your model outputs hashes / vectors that can be used for searching, you just need to run inference on your new data as it comes in.

Legend2440
1 replies
15h56m

Definitely this, embeddings would be the modern approach.

raylad
0 replies
10h52m

The "modern" approach sounds like it would be a lot worse than this approach both in terms of input and runtime performance.

Trendy ("modern") is not necessarily better.

lysecret
0 replies
10h57m

You would just rename hashes to embeddings.

iamflimflam1
0 replies
12h50m

You would probably take the approach that is used for face recognition. You train a model that can tell if two faces are the “same”. Then you match an unknown face against your database of faces. You can get clever and pull out the “encoding” of a face from the trained model.

simonw
8 replies
18h1m

If you trained a model for this, how would you avoid having to run the entire training process again every time you needed to add another song?

I wonder if there's a way to build an embeddings model for this kind of thing, such that you can calculate an embedding vector for each new song without needing to fully retrain.

koolba
6 replies
16h43m

If the model is successful, it will be able to predict the artist and song title for music it’s never heard.

zer00eyz
3 replies
16h5m

No,

People who are highly skilled at this, can be easily stumped. Sure it might workfor artist who are more focused (tailor swift), it might pick out some interesting guest appearances (Eddie Van Halen on Beat It) but when you get multi talented performers who change everything about what do, they don't fit a "model". The most current example would be Andre3000's latest release.

anon84873628
1 replies
10h1m

Um, yeah, you won't be able to model artists who don't follow a model (especially done so deliberately). As you say that is true of humans or computers alike. But it's not the problem anyone cares about and not what the parent comment intended.

Certainly a well trained model will be able to have incredible accuracy just with vocals alone. It will be able to identify Lady Gaga regardless of whether she is singing a new art pop track or old standard with Tony Bennett.

zer00eyz
0 replies
9h28m

Top 40, pop != music

We could have a debate about the consistency of Gaga or Taylor Swift and profit a motive (and we could go all the way back to composers of the classical period with this).

What about all the people who back pop artists? I dont think picking out Wrecking Crew is gonna be possible (It might but harder) https://en.wikipedia.org/wiki/The_Wrecking_Crew_(music)

I could also point you to Diplo who, as a "producer" is responsible for diverse sounds with his name directly on them and then side projects Like Major Lazer or MIA's paper planes that have his hallmarks but aren't "musicaly" linked. How about the collected work of Richard D. James, I'm no so sure that all the parts fit together outside the whole of them.

Stuart Copland was the drummer for the police, a very distinct and POP sound. Are we going to be able to use ML to take those works and correlate them to his Film scores? How about his opera? Dave Grohl, Phil Colins, Sheila E, more drummers who became singers, what is the context for ML finding those connections (or people).

John Cages 4'33 is gonna be an interesting dilemma.

DO you think the player piano black hole sun, and C.R.E.A.M cover from Westworld are picked up as stylized choices by Ramin Djawadi, and would it link those to the sound track of Game of Thrones?

Even with all the details it's sometimes hard to believe what talented people can do and how diverse their output can be!

ravetcofx
0 replies
12h11m

Given our current trajectory this will probably be possible in 10 to 15 years

bohadi
0 replies
15h31m

this is a rather funny joke

but if it is not that would be extremely impressive! determinism/freewill reduces to shazam!?

whats the training data to predict new song titles? heh

check out this reply from claude2:

predict the next 3 new song titles from artist Taylor Swift

1. Last Dance with You - A reflective ballad about finding closure after a breakup. 2. Never Getting Back Together - A pop tune emphasizing that the same mistakes won't be made twice in a relationship. 3. 22 Was My Prime - A lighthearted look back on her early 20s as carefree years that can't be replicated.

...

TerrifiedMouse
0 replies
15h44m

Shazam is used to identify the artist and name of songs. I don’t want it guessing when precise information is available.

jscob
0 replies
17h51m

You'd just have the network generate fingerprints for any given song similar to how facial recogniton is done

Siamese networks are what you want, two identical pairs of layers (one cached in this case) which act as the fingerprints then then the final layers are doing the similarity matching

mattl
5 replies
13h16m

2008? I remember using it in the UK in 2000/2001?

wruza
2 replies
9h54m

Was it iphone or android?

tomduncalf
0 replies
9h49m

It was a phone number you called, which then SMSed you the result

bloqs
0 replies
9h18m

It was originally sms service

aidenn0
1 replies
7h56m

Pretty sure it launched in 2002 for 60p per song.

mattl
0 replies
5h47m

I don’t remember it being something you paid for.

FirmwareBurner
5 replies
17h56m

>This was the smart approach when Shazam launched in 2008.

A noteworthy mention would be that Sony's TrackID did most likely the same thing on their feature phones a few years before Shazam.

bartonsurfer1
4 replies
16h39m

Hi - This is Chris Barton (founder of Shazam). Sony's TrackID was built by licensing (and later buying) a technology invented by Philips. That tech was invented after Shazam. Shazam was the first to create an algorithm that identifies recorded music with background noise in a highly scaled fashion. My co-founder, Avery Wang, invented the algorithm in 2000. Chris (www.chrisjbarton.com)

loxias
3 replies
13h24m

Cough Cough, Geoff Schmidt, Matthew Belmonte, Tuneprint?

Edit: tho for sure, the Philips algorithm was better than either of ours.

aidenn0
2 replies
8h1m

Tuneprint was published in 2004, no? Shazam filed their patents in 2000 and launched in 2002.

loxias
1 replies
7h3m

No no no, Tuneprint was well before that. By 2004 we were LONG gone. Shazam didn't show up until I think years later.

And I might be confusing them with another group but I thought, at the time, they were doing some goofy hash of the highest energy Fourier components -- a source of entertainment in our office. ;-)

I think Geoff had the vision and algorithm from the 90s as part of an ISEF project (!?). We had funding in 2001, when we got the real world go-to-your-car-and-get-a-cd and then we identify it ... using the audio signal alone.... demo working.

With a corpus of hundreds of thousands of songs. Positive match in less than 2 seconds.

Sadly, in 2001 there's no market for such whizbang amazing tech.

swores
0 replies
6h22m

Sadly, in 2001 there's no market for such whizbang amazing tech.

Shazam only launched one year after that, maybe the problem was in the marketing not the market itself?

guyomes
2 replies
8h56m

The smart approach in 1975 was to use Parsons code, which was also turning songs into hashes, computable in your head. You could then find your song back as simply as looking a word in a dictionary. Hopefully this idea won't die any time soon.

[1]: https://en.wikipedia.org/wiki/Parsons_code

huhtenberg
0 replies
6h22m

I'd guess that this won't work well for EDM (electronic dance music) tracks.

aidenn0
0 replies
8h8m

That requires identifying the melody, which is certainly not something all humans can do, and was probably not generally doable by a machine in 1975. It also throws away a huge amount of information, and requires starting from the beginning of the melody.

dylan604
2 replies
17h34m

I could totally see RIAA shutting it down or ASCAP issuing a licensing invoice to stop a train on its tracks just because that's how they think

a2dam
1 replies
12h37m

What makes you say this? They existed and there were a ton of similar products as well. None of them got any grief from those organizations.

dylan604
0 replies
12h20m

none of the companies making the previous hashing apps had the money that AI companies do. around here, the phrase "fish in a barrel" might be used to describe the situation.

aidenn0
1 replies
7h57m

This was the smart approach when Shazam launched in 2008.

Nitpick, but Shazam launched in 2002 as a dial-in service that replied with a text-message of the result. The first phone app was for BREW in 2006.

The 2008 date is just when Apple launched the app store; it was not possible for a third party to make an iPhone app before 2008.

justusthane
0 replies
4h27m

That’s more than a nitpick, that’s incredible! It still feels a bit magic today; I can’t imagine how magic that seemed in 2002!

ixbt77
0 replies
2h35m

This is true, they even tried to treat it as a speech recognition (ASR) problem in 2012: https://research.google/pubs/pub37754/

DeathArrow
0 replies
7h36m

Well, under the hood a neural net kind of also builds hashes, just less accurate.

AntonioCao
0 replies
17h6m

tbh for tools like Shazam there's no fundamental difference between a database + hashing algorithm and a self-supervised model; both are great indexing & compression solutions, just for different scales of data.

madrox
25 replies
19h14m

Shazam is one of those rare products that hasn't stopped feeling magical in two decades. They're really the thing technologists aught to aspire to.

wruza
8 replies
9h47m

At the same time it turned from “tap to listen, here you go” to a sluggish af and ads-infested bloatware. I remember when I stopped using it and deleted the app because my prev-gen iphone couldn’t load it in time anyway.

wodenokoto
2 replies
7h58m

What are you on about? There’s no ads in Shazam. You just open the app and it starts searching. No need for even tapping a button.

wruza
0 replies
7h24m

On about a decade ago.

Shrezzing
0 replies
6h27m

There’s no ads in Shazam

I'd argue that Shazam doesn't have ads, rather it is an ad. You search for the song, then see links to buy it in Apple Music. You'll also see "subscribe to Apple Music" type widgets on just about every screen on the app.

ahartmetz
1 replies
6h35m

I actually went back to Shazam from SoundHound (on Android). Yeeears back, SoundHound had some advantage, I forget which. Now, Shazam starts in a second (just tried it, never used since last reboot) and SoundHound in like 15 (from memory). Inexcusable in general and for that kind of app in particular.

roelschroeven
0 replies
4h34m

I seem to remember Soundhound claimed (and maybe still claims) to be able to work not only on recorded music, but when you whistle/hum/sing a song. I never had much luck with that, but that could be because of my lack of musical talent.

What Soundhound does these days that Shazam doesn't (I think; I haven't actually tried Shazam in a long time) is that it displays lyrics for many songs, and is often able to synchronize those lyrics with where you are in the song.

jedrek
0 replies
8h34m

Right now it's integrated with siri so you just hold down your power button and say, "what is this song?" and it tells you.

hbs18
0 replies
3h53m

On iPhones you can add a Shazam-powered music recognition quick toggle which acts just as you described.

gniv
0 replies
9h9m

These days you don't need the app. You can put it in the control center and just push the button to listen.

bobbylarrybobby
7 replies
18h23m

If anything it's gotten even more magical. I was blown away when I tried to find the song someone was singing on America’s Got Talent and the result it returned was the singer on AGT (they index tv shows!?).

pavlov
3 replies
18h3m

> “…they index tv shows!?”

It makes more sense if you think of a production like AGT less as the reality show it pretends to be, and more as a promotional reel for labels.

Of course the content they choose to promote is indexed.

dbtc
1 replies
16h36m

Amazing how much the name we use can change the thing we see.

quickthrower2
0 replies
8h22m

You could also call it a music factory, or indeed, spec work.

vrosas
0 replies
5h24m

Your TV is actually actively “shazaming” everything you watch. That’s one way ads are attributed to you. TV manufacturers sell that data, that’s one of the reasons TVs are so cheap now - the ad view data is the real money.

al_borland
1 replies
18h14m

I haven’t seen a commercial in quite a long time, but for a while many years ago, Shazam was being used like an audio QR code. Commercials would tell you to use Shazam on their ad to get a deal or something. My guess is after Apple bought Shazam they stopped needing to do stuff like that to monetize.

cool_scatter
0 replies
17h9m

Oh wow, I forgot all about that. That was a strange trend.

papichulo4
0 replies
2h37m

Ah, I thought they did this to know what people are watching. Yes... see here:

Alphonso's software uses the same technology that Shazam and similar services employ to automatically detect the song you're listening to. It samples small bits of audio, creating a digital "fingerprint" of it, and comparing it against a a database on their server to identify the show or movie. In fact, Alphonso's CEO says they have a deal with Shazam, and use their specific technology to do this. But this embedded software can even be listening even when your phone's screen is turned off and it's ostensibly idle.
TillE
3 replies
17h53m

For the technically knowledgeable, music fingerprinting is a concrete problem which is understandable but pretty difficult if you get into the details without looking at how other people have already solved it.

It fits into that space of an unusual but comprehensible problem, unlike superficially similar features like recognizing animals or objects in images, which is mostly weird ML magic.

loxias
1 replies
13h33m

It was really hard to do the first time. :) I'm honored to have been part of the first team to do any viable acoustic music recognition, in 2001 (much earlier than Shazam, a point of pride of course[0]).

You're dead on that it's pretty difficult if you don't benefit from others, we did a ton of work that in retrospect wasn't necessary. I liked the advanced psychoacoustic model, faithfully implemented in high performant C direct from Zwicker. (Psychoacoustics). To a first approximation, about 10/s model -> pca -> top 16 dim -> VQ and the resulting bytes contain more than 50% of the entropy (!!) Shove all of those in a home grown what-you-now-call-a vector DB, do dozens of range queries, and search for any song common to multiple results. Boom, music recognition. Understandable in retrospect but things like that aren't Everest they're like... multiple unclimbed mountains.

0. And far too early to have any applications. Company existed 2000-2001 \o/

garrettgrimsley
0 replies
13h12m

Looks like others, including Shazam, beat you to the punch in 2000:

https://patents.google.com/patent/US7853664B1/

https://patents.google.com/patent/US6941275

Very interesting hearing about all of the differing approaches people have taken to solving this problem! Do you have further writings on this topic?

hunter2_
0 replies
15h49m

I would say animals in images is more akin to matching two different musical performances of the same song (where one of the two can even be the user humming), which Shazam doesn't offer but some systems like Google Assistant do!

Rather, matching two recordings of the exact same performance (one ingested by Shazam at training time and one ingested by Shazam at run time) is more akin to identifying individuals (facial recognition) than identifying species.

Moldoteck
2 replies
7h7m

Google raised it to another lvl: Now playing feature, so it constantly detects songs and will register them in a history log, and you can also search songs in google assistant by just humming (not working reliable but sometimes it nails it)

aeyes
1 replies
6h59m

Shazam has the same feature, I went to parties almost 10 years ago with my phone in my pocket having Shazam in background detection mode. The result was a 90% complete playlist.

Moldoteck
0 replies
6h43m

nice, didn't know about this

gumballindie
0 replies
15h25m

Oh technologists do. But what will product managers do if not constantly break the product to get a bonus and an extra holiday.

hideo
14 replies
19h17m

Why are there so few Shazam alternatives? Does it have something to do with licensing perhaps? The algorithm itself is fascinating but I don't get why this space seems to have just one player - i.e. Shazam

jdadj
4 replies
19h15m

SoundHound is a credible alternative.

klipt
3 replies
19h3m

SoundHound it's actually much more impressive because it can recognize hummed or whistled songs too!

esafak
2 replies
18h54m

Google search in Android can do that too. Albeit not very well.

2c2c2c
1 replies
18h35m

pixel series since 6 has an option to passively listen and document songs by default. probably the only feature I miss moving to iphone

jiberius
0 replies
17h50m

It dates all the way back to the Pixel 2 from 2017, still feels like a magical feature though.

I think the historical log of songs that the phone heard launched a little later, on the Pixel 3.

tialaramex
2 replies
19h8m

Where's the value? My Android phone just does this locally, obviously Shazam has more storage and so they're going to handle more obscure stuff that way, but for example I just set my "Power of Love" playlist running, and the Pixel's built in "Now Playing" knows both the Frankie Goes To Hollywood track and the Huey Lewis number from Back to the Future.

When a "phone" was a dumb device just barely capable of implementing GSM and displaying a clock then this might be worth something as a business, but given where the $0 baseline is, I don't see enough margin to justify competition, I'm surprised even Shazam still makes commercial sense.

TerrifiedMouse
1 replies
15h38m

I'm surprised even Shazam still makes commercial sense.

Isn’t Shazam owned by Apple now? It doesn’t need to make “financial sense” if it’s a service Apple runs.

Racing0461
0 replies
12h18m

I disagree. Apple isn't like Google/FB where they take a loss on a service and make it back on the backend with ads. Apple seems like they evaluate each service as if it were a physical product ie it costs X and each service needs to stand on its own and make X + Y back.

willseth
0 replies
16h15m

They have a patent, which IIRC should be expiring fairly soon (the original patent anyway)

pxeger1
0 replies
19h11m

It definitely has more than one player. Google Assistant has had this ability for a while, for example. But Shazam has the advantage of being built in to iOS, which might be why you think it’s the only player

jldugger
0 replies
17h1m

My understanding is that there is a patent holder https://en.wikipedia.org/wiki/Shazam_(application)#Patent_in... in the US.

forgotpwd16
0 replies
7h20m

Besides potential licese issues (that may not exist), legally creating the hashes database is a big effort as access to an near-all-encompassing song library is required.

al_borland
0 replies
18h11m

I was a SoundHound user for a long time. It came out around the same time as Shazam. Shazam had all the brand recognition, but I typically went for the underdog.

Recently, in an effort to simply things, I moved over to Shazam. It’s owned by Apple now, so it’s already built into the iPhone, even without the app. The app allows for saving things a bit easier and I find it to be a lot cleaner than the SoundHound app.

Thaxll
0 replies
16h13m

I remember a while back 10 years? On HN someone did a poc ( code ) of Shazamn like and he was sued or asked to remove everything.

jimmySixDOF
7 replies
19h9m

For what it's worth there is a phenomenal site that applies algorithmic matching not to songs but to genera classification and the branching sub generas that new song signatures introduce. An amazing resource run as a solo sidehussle and looks like it is at risk of getting clipped due to hosting issues or something. There was Music DNA from Pandora and something similar on LastFM back a long time ago but this site is like the visual connectome of all human music produced through to 2023 and would be a loss for the World Wide Web if it stops.....

Every Noise At Once https://everynoise.com

bobsmooth
2 replies
19h4m

Oh damn, looks the creator was part of Spotify's recent layoffs. He was a genre researcher while he was there.

simonklitj
0 replies
18h2m

What a loss for Spotify.

km3r
0 replies
16h56m

The guy who helped create Wrapped too... sad.

collegeburner
1 replies
17h55m

related, Maroofy: https://maroofy.com/

shows you similar songs and imo does a pretty good job of it

omneity
0 replies
10h34m

I use Maroofy quite a lot and it was funny to discover many ripoffs and plagiarism between songs by complete chance.

irrational
0 replies
18h54m

Okay, that is super impressive. Especially what it does when you search for an artist.

dang
0 replies
17h55m

Related:

Every Noise at Once - https://news.ycombinator.com/item?id=26668426 - April 2021 (94 comments)

Every Noise at Once - https://news.ycombinator.com/item?id=20585447 - Aug 2019 (82 comments)

Every Noise at Once – an algorithmically-generated scatter-plot of musical genre - https://news.ycombinator.com/item?id=10269685 - Sept 2015 (23 comments)

An algorithmically-generated scatter-plot of musical genres - with samples - https://news.ycombinator.com/item?id=9315499 - April 2015 (3 comments)

snoopsnopp
4 replies
18h2m

I don’t want to be combative, but this has simply never worked for me. No matter what I do Shazam has produced incorrect results. I wonder if I’m the only one.

yamazakiwi
0 replies
17h39m

Not necessarily all of your issue I'm sure but genre plays a large part in discovery. I find it has a hard time with less popular electronic and soundcloud type beats for instance.

xcv123
0 replies
16h35m

Could be poor sound quality from your phone microphone

underlipton
0 replies
14h28m

It's not just you. It's super hit-or-miss for me. Do you have an Android phone? I've read complaints that Shazam's accuracy dropped for them after Apple bought it.

crazygringo
0 replies
17h48m

That is very strange. Either you've got some kind of network bug or problem with your microphone, or you're looking up music so obscure it's not on Shazam. Or the music is just way too quiet, especially if there's a lot of other noise.

ruuda
4 replies
19h34m

There is also Chromaprint [1], which works slightly differently. It’s based on pitch change patterns instead of maxima in the spectrum. Chromaprint is used by AcoustID, which is a large open database that links audio fingerprints to MusicBrainz recordings. I find it astonishing how much music is in there despite having not nearly as much commercial backing as Shazam.

[1]: https://oxygene.sk/2011/01/how-does-chromaprint-work/

willseth
3 replies
16h16m

Doesn't Chromaprint have to compare the whole song? This is great for detecting duplicates, but Shazam's fingerprint design allows it to match a short snippet to the complete song.

ks2048
1 replies
8h48m

Chromaprint computes “features” roughly 8 times per second. You can do a brute-force search checking different times in a song or potentially do some more fancy indexing once you have the features. (I did some experiments with Chromaprint - described here, https://kenschutte.com/phingerprint/)

willseth
0 replies
39m

Cool experiment. I suspect your version would be far more permissive in matching than Shazam, which makes sense for your test case. Shazam's fingerprints are a lot more specific, e.g. they would differentiate different mixes of the same recording, potentially even different masters.

ruuda
0 replies
10h18m

I think it's possible to match on a subset in principe, but I don't know if/how this is implemented.

pbj1968
2 replies
15h51m

One of my greatest free joys in life is subjecting Shazam to the unpopular music I like and watching it be unable to identify it.

(Screaming electro industrial for those that are curious)

underlipton
1 replies
14h33m

My hit rate for those 15-to-30-second snippets they play at the top of the hour on NPR is something like 10%. It's been getting worse and worse over the years.

pbj1968
0 replies
14h0m

I used to do the same back when I commuted and when it DOES identify it, it tends to be some weird, weird stuff.

vkaku
1 replies
20h0m

Thank you for sharing.

This is a great post that captures what a spectrogram does, and a must read for people who want to understand how audio fingerprinting works.

There are similar approximate algorithms available for other media as well, so anyone who wishes to understand real world hashing should take their time to study this article.

innagadadavida
0 replies
19h57m

The normal spectrogram technique was already invented by Phillips prior to Shazam. What Shazam did was to hash things combinatorial to reduce false positives.

totetsu
0 replies
17h23m

I remember I heard about this technique being developed by a guy from my NZ university years before there was ever a commercial product called Shazam. Maybe I eve heard and interview with him on the radio. But I’ve never heard of him again..

svilen_dobrev
0 replies
7h42m

anyone has an idea does shazam cope with time-axis not being linear/const?

Think tapes, wow-and-flutter, speed-up-then-down-all-the-time..

AFAIK fingerprinting is highly time-sensitive (unless cut into ~50ms pieces.. and still not quite).

Last time i looked, the general technique for that - Dynamic Time Warping ? - was prohibitely compute-expensive..

seydor
0 replies
19h1m

Shazam isn't great, so many tracks it can't identify or comes up with random electro-trance tracks. online services like a-ha are better

rvba
0 replies
7h1m

What are the legal aspects of building a service like this?

New music is created every day. Can Shazam just buy MP3s at "normal prices" (non commercial) and use them commercially?

Also if they buy music to encode its signature, wouldnt that be very big part of their running costs?

What if someone makes a track and asks 100k USD for it? Will Shazam recognize it? I doubt they want to pay some ridiculous money just to recognize something.

It's like those AI backups that seem to use books for training. Did they pay for those books?

reqo
0 replies
17h59m

Shazam is perhaps the most profitable music informatics technology of all time, yet it knows nothing about music at all! It’s basically just a fast hashing algorithm!

nyc111
0 replies
5h33m

I never used Shazam. What is its success rate? Does it guess or give options?

maxehmookau
0 replies
8h0m

There used to be an open-source version of this called Echoprint which was shuttered after the company was bought by Spotify in 2012.

Much the same technologies though and an interesting research project: https://www.echoprint.me/

lstodd
0 replies
16h26m

Eh, the memories.

Once upon a time, back in Russia we had a service that captured most of FM radio stations and detected what was playing real-time as you listened from Moscow to some obscure station 2000 km away. Sadly it was all trampled by copyright idiots by about 2013.

But the technology, the capture boxes which were SDRs before SDRs were a thing, hashes, station bosses calling in the night because DJs got too drunk and went rogue live.. oh the memories..

logbiscuitswave
0 replies
9h5m

This was a fascinating read, not only for understanding about how Shazam works which is something I’ve long been curious about, but also a great primer on digital signal processing.

kinj28
0 replies
2h23m

I always knew this had something to do with time domain to frequency domain. And I used question to interview engineers and see how creative would they think. Glad to see this post.

joshuahaglund
0 replies
13h35m

I'm curious to hear a constellation map played back. I imagine it'd be a bizarre robotic midi sorta representation of the original

hoherd
0 replies
19h28m

If this interests you, then take a look at sCrAmBlEd?HaCkZ!, a music software project from 2006 that uses similar classifying techniques.

https://youtu.be/eRlhKaxcKpA

forgotpwd16
0 replies
7h30m

Was in a live event few days ago and was wondering whether an attempt to recognizing live/noisy songs has been made. Shazam has failed me anytime I tried it for this.

foobiekr
0 replies
14h2m

The algorithm is interesting - at one time I was going to use https://theory.stanford.edu/~aiken/publications/papers/sigmo... to do a startup.

This is different but very similar and contemporaneous.

financypants
0 replies
14h55m

I’m really interested in an implementation of this that adjusts for something like pitch. So it could tell when someone is mimicing a speaker, and could even grade them on how well they did. Or even a celebrity voice matcher that tells you which celebrity you sound most like. Does this exist?

drewmol
0 replies
15h39m

I'd sure like a Shazam for ads with mute capability, something I've been thinking about building for some time now just don't see any road to profitability and not getting adblock blocked.

dang
0 replies
19h22m

Related. Others?

How Shazam Works (2003 Paper) - https://news.ycombinator.com/item?id=33299853 - Oct 2022 (1 comment)

Creating Shazam in Java (2010) - https://news.ycombinator.com/item?id=32530056 - Aug 2022 (36 comments)

Shazam turns 20 - https://news.ycombinator.com/item?id=32520593 - Aug 2022 (227 comments)

How Shazam Works (2015) - https://news.ycombinator.com/item?id=23806142 - July 2020 (7 comments)

Designing an audio adblocker - https://news.ycombinator.com/item?id=18855029 - Jan 2019 (186 comments)

Show HN: A radio/podcast adblocker featuring ML and Shazam-like fingerprinting - https://news.ycombinator.com/item?id=18459058 - Nov 2018 (2 comments)

Show HN: Shazam-like acoustic fingerprinting of continuous audio streams - https://news.ycombinator.com/item?id=15809291 - Nov 2017 (76 comments)

How Shazam Works (2015) - https://news.ycombinator.com/item?id=15350729 - Sept 2017 (13 comments)

Tell HN: Shazam picks up song from my kitchen light - https://news.ycombinator.com/item?id=11593305 - April 2016 (2 comments)

How Shazam works - https://news.ycombinator.com/item?id=9870408 - July 2015 (48 comments)

Patent infringement claim re: “Creating Shazam in Java” blogpost (2010) - https://news.ycombinator.com/item?id=9594480 - May 2015 (18 comments)

The Shazam Effect (2014) - https://news.ycombinator.com/item?id=9593429 - May 2015 (37 comments)

The Shazam Effect - https://news.ycombinator.com/item?id=8634357 - Nov 2014 (34 comments)

Ask HN: Is there an audio search technology that finds exact and similar audio? - https://news.ycombinator.com/item?id=8420141 - Oct 2014 (3 comments)

Source code example of the Shazam algorithm - https://news.ycombinator.com/item?id=5724442 - May 2013 (16 comments)

Creating Shazam in Java - https://news.ycombinator.com/item?id=5723863 - May 2013 (43 comments)

An Industrial-Strength Audio Search Algorithm (Shazam) - https://news.ycombinator.com/item?id=2621103 - June 2011 (4 comments)

Shazam's Search for Songs Creates New Music Jobs - https://news.ycombinator.com/item?id=2215295 - Feb 2011 (1 comment)

How does the music-identifying app Shazam work its magic? - https://news.ycombinator.com/item?id=2214992 - Feb 2011 (2 comments)

Implementing Shazam with Java in a weekend - https://news.ycombinator.com/item?id=1702975 - Sept 2010 (23 comments)

Shazam: not magic after all - https://news.ycombinator.com/item?id=909263 - Oct 2009 (28 comments)

How does the music-identifying app Shazam work its magic? - https://news.ycombinator.com/item?id=893353 - Oct 2009 (16 comments)

crazygringo
0 replies
17h41m

I just want to say, it's remarkable how intuitive this is, and just how well it matches our own recognition process.

It's more-or-less identifying melody fragments*, and then just trying to match those up in a sequence. The same way we'll recognize something after 5 or 7 or 10 notes.

I'm pretty sure I've read about other methods for song fingerprinting that rely on things like loudness peaks, where it might work equally well, but that doesn't match how our own brains do it at all. It's pretty cool that this isn't relying on "artifacts" but basically works the same way we do.

* Technically not always melody, but probably is most of the time

bartonsurfer1
0 replies
16h36m

Here is a beautifully produced video by Wall Street Journal that explains Shazam:

https://www.wsj.com/video/series/in-depth-features/how-shaza...

Chris (Shazam co-founder)

alexpotato
0 replies
46m

Many years ago I was trying to build a "shot counter" to measure balls fired per second for a college paintball league.

I ended up using sox + some Perl to be able to:

- filter out only the frequency of the gun cycling

- identify the peak amplitude to know when a shot started

- use a sliding window of about 30ms to figure out when the shot ended

- once the above was done, I could count how many shots per second and output it

I even posted the code which you can see here: https://www.pbnation.com/showthread.php?t=3216349&highlight=...

aftbit
0 replies
17h57m

This is a platform feature of Google assistant, I believe. I have a "search a song" on my Pixel, but not in airplane mode, so probably being done at the mothership.

aftbit
0 replies
18h1m

Thank you for releasing this as open source under MIT license. I wonder how well it works on human speech. I have a few thousand hours of recorded word content that I need to deduplicate. If I get around to it, I'll report back.

adhi01
0 replies
14h59m

Talk by the author in EuroPython 2016 on the same topic: https://youtu.be/LZ7THTB88AE?si=BQPgp-rxg32bTPMK

JohnMakin
0 replies
15h54m

sinbad comes out of a lamp and grants you 3 wishes

Jean-Papoulos
0 replies
9h19m

Well well well, if it isn't our old friend the Fourier transform.

Exoristos
0 replies
10h36m

Not very well.

6stringmerc
0 replies
17h43m

It's the reverse approach to similar engineering of what the Pop industry attempts to do to make genre based hits.