This seems really clever -- kudos!
I'm curious how it actually works?
At first I assumed you were comparing the vinyl track to the reference digital track from some streaming service and either analyzing the frequencies with FFT or the timing of peaks.
But I watched your demo video and you don't need to tell it anything about the song.
Which makes me think you're rather doing an FFT and simply trying to match how well the frequencies align to the well-tempered scale based on A 440 hZ.
Which then leads me to three questions:
1) Obviously this wouldn't work if you were playing a standup comedy album or something? Or the drum solo part of a song?
2) Are essentially all albums perfectly in tune with A 440 hZ? For example with classical music, I understand that 442 is also used, baroque is sometimes at 415, and 432 has been common as well? I don't know about pop but I can't help but wonder if some artists have intentionally chosen something other than 440 over the decades.
3) I assume this won't work if the turntable speed is off by more than 2.8%? Since the distance from A4 (440 hZ) to A-flat-4 (415.30 hZ) is a decrease of 5.6%, and so if the turntable is off by more than half of that you'd be trying to align to the wrong note?
You can extract pitch information and compare it to a set of quantized reference pitches as an instrument tuning app does, sure.
But you can also extract BPM info and compare that to a set of quantized reference BPMs that anyone would ever bother to use.
And also, since you’re getting multiple pitches from multiple instruments in a time series, if you can isolate particular instruments, you can calculate the key the melody and/or harmony of the song is in.
Then, you could either come up with a heuristic, or just train a Bayesian filter, using datasets of “real” and “erroneous” (key, BPM) pairs.
Are there really "standard" BPM? As far as I know, most music production is just "whatever tempo the band feels like playing." I haven't been in a recording studio, though, so maybe a use of a metronome is more common there?
In some countries there are legal and illegal BPMs, e.g. according to [0] only the BPM range from 80 to 116 is legal in Chechnya.
[0] https://www.npr.org/2024/04/09/1243632570/chechnya-music-ban...
Adam Neely did a good video[1] showing just how silly it is.
[1] https://www.youtube.com/watch?v=Q811R6YsM0s
That moment when you read the first half of a sentence and know it's clearly sarcasm, and then you finish the sentence and realize it's not... :(
I'm not in the music industry in any capacity myself, so take everything I say here with a grain of salt.
And more specifically, to address your question, I don't know enough about the recording part of music production to say whether most performers get a click-track played into their monitors or not. (I know that I've seen at least some performers using click-tracks for studio recording; but I'm not sure if that's only given if they ask for it, or if it's the studio pushing it on performer to make their jobs easier. I imagine click-tracks would always be used for multi-performer async studio recording — as otherwise you'd have performances that have conflicting BPM. But maybe not for single-performer pop/EDM songs, where it's just a vocalist laying down a track, and then everything else is done in software?)
The evidence I can cite, is from the outcome of the production process: I have a strong recollection from back in the late 2000s, of using some music library auto-tagging software that would, among other things, analyze the BPM of a song to populate the BPM ID3 field of MP3s. For at least the music I loaded into it, BPM did appear to be (mostly) quantized to nice, round numbers. (And it wasn't just the software being coarse-grained in its estimates; it did give weird numbers for unusually-produced songs.)
I do know that for any modern "produced" music (i.e. music not recorded as a live ensemble session), even if the performers did lay down their tracks at a weird BPM, the audio engineer is still going to be throwing their performances as separate tracks into a DAW. And one of the things you tell a DAW, when creating a project, is the BPM — which creates a grid of bars that tracks/samples want to snap onto. Even if you're not trimming or speed-adjusting performances so as to snap the ends of each track/sample to the DAW's grid, you're still likely snapping the beginning of each track/sample to the grid — you'd have to fight the DAW to not. Which means that, by default, if the song is "produced" enough — tracks cut up and slid around, reused, etc — then the output of this moving-and-snapping process will be a song that reads as the project's BPM.
Separately, I know that people learning to play an instrument for classical/orchestra music, often practice their instrument using a metronome (at least at the early levels.) Which perhaps gives at least some of those people the speed equivalent of perfect pitch — the ability to "lock into" certain BPMs; and to know if other performers are running fast/slow vs the "expected" BPM of the song.
I would assume that even for rock performers with no classical training, the early-level "textbook" practice for drummers and rhythm guitarists also involves playing with a metronome. It might be a lot more tenable to play a rock song at e.g. 117BPM, if everyone's just playing on the down-beat of the drums; but these performers still don't want to have a sudden, unintended shift in their BPM due to not being able to do a drum fill / complex chord change quickly enough; nor to start rushing as the song amps up, forcing everyone else into increasingly-stressful playing. In doing drills with a metronome to keep their BPM constant, these performers are likely implicitly learning to lock into certain specific BPMs as well.
My impression is that the one case where this probably isn't true, is in live jazz performance, and more specifically in live jazz improv "jam sessions." In those, the percussion (if any) isn't driving the performance, but rather is just another component of the harmony, following the (often wavering!) BPM of the lead performer.
(Is this why the classic meme of classically-trained performers not getting along with jazz artists? Because the jazz artists won't stay on beat, and this irritates some perfect-pitch-like "out of tune" feeling in the classical musician?)
I think you’re getting tempo, pitch and time signature mixed up. Jazz musicians are excellent at staying on tempo. Staying on tempo is so important in jazz that when I was young, we had to earn the right to practice without a metronome. At the highest levels, jazz can be mixed with jazz because the tempo is solid enough.
Jazz though uses some time signatures that are hard to visualize. Once you’ve played them for long enough they become second nature but they’re likely the hardest part of learning to play jazz. Those time signatures are also common in classical music. Getting time signatures mixed up can sound absolutely awful and that’s a hard thing to fix on the fly unless you know the musicians extremely well (though then the odds of playing the wrong time signature are quite low). Learning to improvise is an entirely different can of poisonous snakes but in an ideal setting you wouldn’t start improvising until you can play.
Pitch can be a problem, but it’s easy to hear and correct on the fly.
As for classical and jazz musicians getting along or not, when I grew up the usual transition was classical to jazz. When jazz was first being invented, there were some racial issues between (generally Black) jazzbos and classical heads. But when I was growing up, we were all band geeks and had a lot of mutual respect for each other.
One thing I’ll note though is that music communities (much like software communities) can seem very toxic to outsiders. If you want to get good, you learn to take and give very direct feedback. It’s a problem when you have a very gifted musician playing with beginners, but amongst musicians of the same calibre, it’s a really warm, loving feeling.
Wrong, drums (or the bass) most usually is driving the rhythm, and jazz musicians are extremely adept at staying on tempo.
Try the exact opposite. https://www.youtube.com/watch?v=rEbUNDW9bDA
That's just outright false. This is a basic action in all commonly used DAWs. eg https://promixacademy.com/blog/how-to-nudge-tracks-in-logic-...
There aren't really reference BPM's.
Not everything is recorded with a click track, though a lot is.
But even so, while it will usually be an integer BPM, it's not always rounded to the nearest 5. 120 might be too fast while 115 is too slow, so you pick 118.
For an app that's trying to correct your turntable speed by e.g. 3%, it couldn't even begin to guess the "correct" BPM.
But would a large-enough majority of songs be recorded with a multiple-of-5 BPM, that this app could presume the first record you play has a multiple-of-5 BPM, and be right 80+% of the time?
If so, then that could still be how it was done here. It wouldn't always work for the first record, but you'd be highly likely to get a "good reading" by at least the third one you try.
So I found some data and the answer is -- definitely not:
https://assets-global.website-files.com/6188e55dd468b56ab674...
from:
https://blog.musiio.com/posts/which-musical-tempos-are-peopl...
There are plenty of songs ending in every integer.
I would expect:
1. Detect wow: https://www.youtube.com/watch?v=kCwRdrFtJuE
2. Time wow.
3. Round to nearest record speed and report delta.
I'd also expect that armed with a normal FFT algorithm that Ye Olde Bash On Algorithme Until Functionale would be fairly likely to work with reasonable effort, without having to get too "signal processor-y" on the FFT output.
Wow is wavering speed. A constant turntable speed error (that this helps you calibrate out) doesn't produce wow.
All turntables have wow. Wow is determined by the rotational speed. So if you measure 33.2326 wow oscillations per minute, you can know the turn table is a touch too fast. Wow would not be used to measure the error, it would be used to measure the absolute speed. Getting the error from that point is trivial.
I don't think you could use FFT or similar to reliably measure wow to anywhere near the accuracy necessary to produce a reliable amount of correction from just a few seconds of an arbitrary LP.
If you were using an LP with a known pure 440 hZ sine wave that you could lock onto exactly, then sure. That's kind of how frequency modulation -- FM radio -- works.
But I really don't see how you could do this with songs that are full of frequencies coming in and out and changing all over the place. If you analyzed the full side of the LP, you could probably get enough signal out of that.
But trying to measure the precise wavelength of wow when you're only getting a few wavelengths' worth, from a complex signal? I don't see how.
I am 100% abundantly positive that signal processing code could do this, and in fact by signal processing standards it's not even particularly hard. You may need more time than the app takes, though I am inclined to think that the app doesn't exactly need a number to twelve significant digits and you could get a single-digit-percentage significant digit lock pretty quickly. Even at 33&1/3rd you only need a few seconds for three or four revolutions.
I am much less confident about my claims that it is something you could bash together in "normal code" from an FFT, without advanced math, but it still seems likely to me. You have huge stonking correlations between the frequencies you can exploit. Imagine a normal FFT chart like you've seen any number of times. Now, take that same thing and wave it up and down quite visibly on a sine wave. Nice and big in your imagination so you can see it. You think that's not something that could be picked up? Now, scaling it down to where you can't hear it anymore may make it harder to believe, but the same code will pick it up. To a computer it would still be clear as day. This is one of those things microphones pick up much, much better than human ears, just like microphones trivially pick up a quite 1001Hz tone next to a loud 1000Hz tone even though we can't hear it at all.
Compared to, say, recognizing a voice and extracting words from it, this is pretty trivial stuff.
That's where you're wrong. FFT frequency bands are surprisingly wide. You can make them narrower but with the tradeoff of losing temporal resolution. And it gets worse the lower the frequency gets.
There is absolutely no way you're going to detect a near-0.555 hZ effect from a few seconds of audio and determine whether it's off the frequency by 0.1% or even 1%.
Like I said, sure if you're dealing with a pure sine wave. But not a complex signal using FFT.
Or to put it another way -- a 1,000 hZ signal? Absolutely. But a 0.5 hZ signal? Absolutely not.
I agree that FFT is not the easiest tool to use for the job. If I were trying to solve this problem, I'd use autocorrelation.
But it still sounds very challenging: there's multiple sources of periodic frequency change both in recordings and in playback mechanisms.
Maybe the distortion in the physical disk has some type of signature, and it can be determined whether that distortion is fast or slow.
For ex, I would bet that 2x intended speed isn’t simply 2x every frequency in the FFT, when played through vinyl on a physical turntable.
Maybe some collection of distortion signals based on disk manufacturing and turntable manufacturing.
I understand what you're saying better, but I don't think you're going to be able to autocorrelate the frequency variation precisely at all from a few revolutions, especially since there are other non-periodic sources of frequency variability.
It does if the hole in the record is not exactly in the center.
My first thought was that it is listening to the scratches rather than the music, but I guess they speed up as the record gets played and the arm moves inwards. So, my second guess is that it's listening to the pitch of the notes being played. Of course, old albums (especially punk albums) are probably tuned to whatever worked on the day—maybe a tuning fork, possibly a piano, maybe just to one of the instruments.
Umm, no, scratches meet the needle once per revolution, regardless of the needle's position. So that would definitely be a possible method. But not all records have scratches, or they could be at an angle, which would give the wrong estimate. Also, actual sounds on the record could have scratch-like qualities.
I need to go away and think about that :D
The needle doesn't pass at a constant "ground speed" everywhere on the record. On the outside the needle "travels faster" and in theory you have more detail. Near the middle the needle travels over the record surface slower.
That's one of the reasons you'd put your best song on side A, song 1 and your 2nd best song on side B, song 1 - you got the most detail on the outside track. I remember seeing an interview with Led Zeppelin quite a few years back where they were claiming they knew they'd have a smash hit with Stairway to Heaven. Which was funny, because I remembered then seeing an interview several years before where they said it was a track they never thought was going to go anywhere.
Which is it? It's track position - 4th track on the A side, i.e. the last track, tells you how important they thought it was. Which is to say, it was a "throwaway" track that made it big. It happens.
The other odd thing about that album is "When the Levee Breaks" is considered to be the 2nd-most popular song off of the album and it too was the last track on the B side.
What this tells me is the album's producer, one Jimmy Page, was out of touch with what Led Zeppelin fans liked most. Based off the sound of the next album, Houses of the Holy, I'd say Mr. Page got the message loud and clear.
I’m not sure about this. Thriller (by Michael Jackson) is one of the biggest pop songs of all time. It was also the fourth track on the A side. Bohemian Rhapsody (another one of the best selling singles of all time) was the fourth track on the B side.
Those are two examples off the top of my head. If you give me a bit of time to put my kid to bed, I’m sure that I’ll find many more.
What you've just learned is that pop producers are not in tune with what's good. It's a hard lesson to learn.
Not really. I found data that shows that released singles aren’t necessarily placed first on a side.
As a turntable rotates at a constant RPM, scratches change in pitch (if there is such a thing) but it will always "pop" once per rotation.
My guess would be some filtering (to remove the actual music) and some kind of autocorrelation algorithm to detect some periodic patterns with a period matching the expected rotation speed of a record within a few percent (33.3 RPM, it doesn't talk about 45/78).
Thank you so much! I spoke about why I'm not flat out saying how it works in another comment but let me answer your questions:
1) It actually might! Worth a shot I guess, but I don't have any comedy albums to try it out. I was able to get drum solos working fine
2) Was answered better by somebody else
3) I was able to get it to detect speed issues up to 9% off, beyond that it just stops working completely. Though that was in controlled environment so YMMV. If you see the sample vid I posted above, my player is roughly 4% off which is a lot but I genuinely believe a whole lot of people wouldn't notice that
This is pretty cool, thanks for sharing it.
Is this the aforementioned comment about _why_ you aren't being specific about the implementation?
If it isn't then I couldn't find it. I was very curious about that too (both how it worked, and then the incentive for secrecy--totally your prerogative but again curious given the veiled-to-me explanation) but didn't seem to find such a comment. Protecting novelty seems like the implied reason.
Here's the comment: https://news.ycombinator.com/item?id=40502383
Essentially I don't want to have somebody swoop in, replicate the same thing, be better at marketing, and charge money for it. I'm both protecting users because they shouldn't be charged for something that is free, but also protecting my ego because I spent time and effort and as far as I can tell I'm the first one to build an app that works this way and it sucks when somebody takes a community thing and paywalls it.
Once the Android version is out and everything blows over I might consider making the apps open-source so that anyone can see and learn how it works, then potentially make derivative works.
Usually when an app comes out on iPhone but not Android the excuse is that "Android users don't pay for apps so it wasn't a priority", despite it being way more difficult and a bit more expensive to develop for iPhone over Android.
But in your case you don't plan on monetizing, so why iPhone first?
Well I have been developing native apps for both for over a decade, and I don't think either is particularly more difficult than the other.
What it came down to is that I use an iPhone as my daily driver, and when I pulled my Android test device out of a drawer the battery was twice the size so I immediately brought it to an electronics recycling center.
Which means that in order to complete the Android version, I need to shell out $400 for a new phone I'm only gonna use once for a non-commercial project. So my idea was, let's release iOS first, see if people care, and then I'll spend the remaining time and money to finish up the Android build!
I think if I did try to finish Android at the same time, I would have given up on both and released nothing.
Makes sense to develop for what you use yourself.
I've developed for both as well and would say getting _started_ with iOS development is about 10x more time consuming and complicated than Android -- or at least it was about 8 years ago.
I know you know this, but you don't need to own an Android phone to develop for Android (and you don't have to spend anywhere near $400 on one if you do want one).
Looks like it'll be a pretty great app and hope you do manage to get the Android version up and running.
Mostly likely, they have an iPhone and this is for a need they had, so they developed it to solve their own need first.
I appreciate the fact that they're developing an Android version at all, because this sounds simple and useful for me!
That makes sense and seems like a justified concern. Sorry for what you experienced with Boop. And thanks for the reply and direction to the comment, I overlooked it.
ShazamKit, from that fetch the BPM, BPM detection from the mic, compare ?
It works offline, though.
I just tested and a song that works fine normally failed as soon as I turned Airplane Mode on.
Does it? I don't see that claim being made anywhere.
They do say that the audio is processed locally, but that does not preclude them from making an API call to find a signature match.
"Grooved does not use any third party library or API, just the built-in components provided by Apple."
ShazamKit is not third party, it is provided by Apple. Their website states "The audio stream is processed locally on your device and never recorded.", though this could refer to the audio statistics being collected locally, I assume ShazamKit still contacts an Apple API to get the classification.
Even pop albums aren't all 440hz. Def Leopard's Pyromania is about a 1/4 step off standard tuning, and there are many other examples of similar.
People sometimes can't even make backing tracks for improvisation that are in tune.
No they are not all tuned to 440Hz. This is really evident if you play an instrument and want to play along with an album.
Every. Rolling. Stones. Album.
If you play an orchestra with an organ the organ sets the reference point. 440 Hz is a relatively young standard, many organs were built before that.
It probably doesn't need to do an FFT, it just need to be able to count beats and have a database of BPM values for popular songs. You could use something like AcoustID/Chromaprint to identify the song you're playing.
A very simple idea, as all the best ideas are.