I got a masters degree in ML at a good school. I will say there’s pretty much nothing they taught me that I couldn’t have learned myself. That said, school focused my attention in ways I wouldn’t have alone, and provided pressure to keep going.
The single thing which I learned the most from was implementing a paper. Lectures and textbooks to me are just words. I understand them in the abstract but learning by doing gets you far deeper knowledge.
Others might suggest a more varied curriculum but to me nothing beats a one hour chunk of uninterrupted problem solving.
Here are a few suggested projects.
Train a baby neural network to learn a simple function like ax^2 + bx + c.
MNIST digits classifier. Basically the “hello world” of ML at this point.
Fine tune GPT2 on a specialized corpus like Shakespeare.
Train a Siamese neural network with triplet loss to measure visual similarity to find out which celeb you’re most similar to.
My $0.02: don’t waste your time writing your own neural net and backprop. It’s a biased opinion but this would be like implementing your own HashMap function. No company will ask you to do this. Instead, learn how to use profiling and debugging tools like tensorboard and the tf profiler.
ML is so much more than just neural networks.
I would start by taking a free university level course in statistics. Then I would continue with the basics: SVM, linear regression, naive Bayes, gradient boosting, neural nets etc. I would not only train and fine tune them, but I would also build simple ones myself instead of just using libraries. Then I would continue to what you said, participate in Kaggle competitions, try to solve real world problems.
I think that understanding the field from bottom up is priceless. Many people fine tune and train models, but they don't understand how that model works, nor do they know if the model they've chosen is the best fit for the problem they are trying to solve.
It's a rather long path if you really want to get good at it. Like in music: you can learn to play a tune by ear or you can learn to have a good, deep and thorough understanding of music.
This type of bottom up approach is terrible idea for a fast moving area like ML. Ultimately to get a job and make money in the area, you need to solve customer problems, starting with libraries and fine tuning is what needs to happen first. However you should try to learn fundamentals as you go and when you get stuck. It will take a very long time if you have a full time job or a student studying something else and doesn't help to get a job
Getting a job to make money and solve customer problems as fast as possible was not a stated goal by the OP.
Besides, you are also wrong, having good fundamentals in the maths will help you pick up new methods much faster as they pop up. And especially if you want to come up with new methods (as in research), there are no shortcuts.
They made a reasonable assumption.
Abstraction helps you to be productive. It's certainly good to understand everything all the way down to the actual physics behind computing, but its not necessary especially at the start. I don't have to know exactly how logic gates work to program in JS and make a difference. I assume the same applies to ML. Motivation is what prevents most people from learning difficult to learn skills. Finding ways to produce value early on can help keep you motivated.
Thats a stupid analogy to the above discussion. A better analogy would be trying to program in JS without understanding for loops and basic programmation concepts.
Thats typically what I observe with younger folks jumping into neural nets directly. They have a very shallow understanding of anything and survive with youtube tutorials.
There’s a reason people start with YouTube tutorials (which, let’s face it, are responsible for many of us passing undergrad CS classes). They give a broad, approachable explanation of the topic.
Depending on what you mean by “understand” I would guess most software engineers don’t “understand” for loops either. For loops are an abstraction built on the CPU instruction set provided by each programming language. We use them with the knowledge of how they behave, which is the correct level of abstraction nearly 100% of the time.
And in CS undergrads, we don’t throw people into a course on assembly first. First they learn something like Java or Python, and only later dig deeper into more fundamental concepts. That’s not an accident.
Most of what was listed above aren't the fundamentals which would help to properly understand what's happening with neural networks but rather completely different branches of machine learning that have little in common with neural networks. If they learn SVM, naive Bayes, and gradient boosting, then their knowledge will definitely be broader but just as shallow; using your analogy, it's not like trying to program in JS without understanding for loops but rather like trying to program in JS without understanding C, COBOL and Haskell.
I'm all for learning the fundamentals properly - but those fundamentals are going to be completely things, things like core principles of statistics (limitations of correlation, confounders, bias/variance, etc), the relevant parts of calculus and linear algebra that matter for understanding optimization, the best practices for management of data, experiments and measurement to not cheat yourself, etc - not the checklist of many different, parallel methods of machine learning like decision trees or reinforcement learning, which are both useful and interesting, but not related or required to properly apply e.g. transformer-based large language models for your task.
I don't regard time spent in University to learn how logical gates work and many other useful things, as a loss of time. And, as a web developer/architect (after many other industries I worked in), I typically make more money than peers who don't. And knowing how things work, helped me immensely in my career. Not everything is solvable by looking on Stack Overflow.
It depends on what professional level you are content with.
Learning was not only motivated by money - for me- I was genuinely curious about computers and software since I was a kid.
It’s unquestionably a loss of time, the only question is whether it’s an optimal use of time. This depends on your goals. For most software engineers looking to use ML, starting with existing frameworks and knowledge and drilling down as necessary is the most prudent method.
We all have to make sacrifices in what we learn. Even for yourself, the topics you chose to learn about implicitly left out other parallel topics you didn’t learn about. And you also didn’t learn everything from the least abstracted, most fundamental level. We need to choose the appropriate level of abstraction for the problem at hand, which will depend on each person’s goals.
Of course not. When you learn this way you will realize there are a myriad of problems that can be solved with more simple algorithms. Trying to make every problem fit to neural networks is pure cargo culting.
That's true, but the goal is to learn to use neural networks, not to solve problems efficiently, isn't it?
OP did not state that getting an ML job was a goal.
I for one would be interested in learning more foundational stuff; I have no interest (though perhaps that process would change this!) in a particularly ML job, and certainly not in learning how to point and click and run other people's work from a YouTube video with a scream-face thumbnail.
For those who feel similarly, I asked about it recently, maybe something you like the look of:
https://news.ycombinator.com/item?id=38320244
I also came across (doesn't seem it was mentioned there) Understanding Deep Learning (Simon Prince) which looks like it might be good:
https://mitpress.mit.edu/9780262048644/understanding-deep-le...
The better your skillset, the highly sought after you are and harder to replace. And that translates in more money, if money are your sole motivation.
Strong agree
When I studied ML in 2012, the very first course started with naive Bayes and went one from there. A decade after being away, I see a lot of people around me starting with neural nets to train a model that naive Bayes would be plenty enough for and never heard about naive Bayes. Is that only my experience?
Any data that is small enough to quickly iterate on for learning is small enough to use a simpler approach. The point is learning the techniques.
As with any super-hyped technology, people want to use the latest and greatest to solve it, even when traditional methods are cheaper, easier, more reliable and more accurate. See also: crypto.
It's useful to do learning on small toy problems for ease of debugging and speed of training, so if you want to learn how to apply a powerful technique, you're pretty much inevitably going to start learning it on something for which it's absolutely ineffective overkill. E.g. a common start for learning neural nets is a task like XOR which can be solved with literally a single machine instruction instead of training a ML model for it.
But also there are many tasks for which naive Bayes works, but a NN solution can be much more accurate if you're okay with it being also much more compute-intensive. E.g. things like sentiment analysis or simple spam filters are often used as a demonstration of naive Bayes, but you can do much better with more powerful models.
The risk with this is that you spend 10 hours learning statistics then get demotivated and never do the other 190 hours to get to the good stuff. Then quickly you forget the 10 hours of stats you learnt too as it's irrelevant and you don't use it.
For me, playing with things and doing cool & fun stuff is always the way to get deeper into something.
Stats is one of a very small number of college courses that I took where I came away thinking "this should be mandatory for all voting adults". I use that stats course way more often than I use even algebra, just to be a functional adult in a world where bad statistics are used day in and day out to manipulate and deceive people into buying things or voting for someone.
So, I have to disagree: not only is a basic foundation in stats essential for understanding ML, it's something everyone really should have under their belt anyway to live in the modern world without turning into someone else's pawn.
Love the suggestions, especially the one about implementing papers.
Do you have any starters on how one selects papers in the early days, to implement?
Also - any great papers you recommend beginners expose themselves to?
I don't know if anyone does it still, but a few years ago there were a lot of papers suggesting more or less clever alternatives to ReLU as activation function. There was also a whole zoo of optimizers as alternatives to SGD.
Those papers were within reach for me. Even if the math (or the collossal search effort) needed to find them was out of reach, implementing them wasn't.
There were some things besides optimizers and activation functions too. In particular I remember Dmitri Ulyanov's "Deep Image Priors" paper. He did publish code, but the thing he explored - using the implicit structure in a model architecture without training (or, training on just your input data!) is actually dead simple to try yourself.
I'm sure if you just drink from the firehose of the arxiv AI/ML feeds, you'll find something that tickles your interest that you can actually implement. Or at least play with published code.
The best site imo is Papers With Code. State of the art benchmarks, the papers which achieved them (along with previous papers) and github repos to actual implementations.
I wouldn’t recommend papers to absolute beginners though. For them, it’s best to go to HuggingFace, find a model that seems interesting and play with it in a Jupyter notebook. You’ll get a lot more bang for your buck.
Try the "historical papers" on this repo: https://github.com/aimerou/awesome-ai-papers And also you can find papers with their implementations in code here: http://paperswithcode.com
This seems like great advice.
You say don't write your own neural net and backprop implementation. That makes sense to me. What do you suggest using instead, for your suggested projects? I'm guessing tensorflow, based on your suggestions on profiling and debugging tools? Do the papers / projects you suggest map straightforwardly onto a tensorflow implementation, rather than a custom one?
Implementations in Tensorflow are widely considered to be technical debt. Internally, Google has mostly switched to JAX. PyTorch now has torch.compile and exports to ONNX so there's little reason to use Tensorflow these days except in niche cases.
This kind of information is why I was asking :)
I have a masters degree in computer science and took a fair share of ML graduate courses. That pretty much summed up what I was thinking. They basically forced me to sit and learn something I wouldn't have alone.
Now-- I'm not saying you need to go to grad school. You could buy some ML textbooks and force yourself through them and go from there... but how many people have that grit? I wouldn't have been one of them :)
I hate to say it, but the diploma also matters. Having an MS next to your name means employers will give you the time of day that others won’t get. I really don’t like that this is the way things work, but it is.
Does it? I have an MS, but never put it next to my name. I'm doing well :)
Are people actually going into masters degree to learn? I thought the whole point of paying for masters is just credentialism
Technically you could do it on your own. Practically speaking? I quit my job and went to school full time. I spent 2 years studying the stuff practically 10 hours a day. The only way this is socially acceptable is if you get a piece of paper at the end which says you did it.
That’s true. Or if you’re really rich I guess.
I only have old hardware at home. How viable is to practice this stuff on my own projects (and I'd like to touch JS as much as possible, despite everyone being on python)
You don’t need your own hardware. You can use Google Colab for free.
Most of the action happens in python. That being said, there’s a library called Tensorflow JS. It has some pre-trained models you can use off the shelf and run from your browser. Things like face detection and sentiment analysis.
I don't think you should be combining writing a neural network with doing backprop since I don't know anyone working with serious ML who is not using some sort of automatic differentiation library to handling the backprop part for them. I'm not entirely sure people even know what they're saying when they talk about backprop these days, and I suspect they're confusing it with gradient optimization.
But anyone seriously interested in ML absolutely should be building their own models from scratch and training them with gradient descent, ideally start with building out your own optimization routine rather than using a prepackaged one.
This is hugely important since the optimization part of the learning is really the heart of modern machine learning. If you really want to understand ML you should have a strong intuition about various methods of optimizing a given model. Additionally there are lots of details and tricks behind these models that are ignored if you're only calling an api around these models.
There's a world of difference between implementing an LSTM and calling one. You learn significantly more about what's actually happening by doing the former.
It’s an important component but I wouldn’t say it’s the main factor. ML is ultimately about your data, so understanding it is critical. Feature selection and engineering, sampling, subspace optimization (e.g. ESMMs) and interpreting the results correctly are really the main places you can squeeze the most juice out. Optimizing the function is the very last step.
Basically, you can go ahead and optimize down to the very bottom of the global min but a model with better features and better feature interactions is going to win.
Further, there are a ton of different optimizers available. SGD, Adam, Adagrad, RMSProp, FTRL, etc. With just one hour a day, you could spend six months simply writing and understanding the most popular ones.
I think that getting a feel for gradients and how it all works is a good reason for implementing your own - once.
Don't worry about what companies will ask you to do unless you absolutely have to.
As Andrej Karpathy says, understanding it deeply is required to take your skills to the next level and not make mistakes. As a practice, one should implement their own neutral net and manual backprop to start their understanding.
I'd argue backprop is still handy just to learn the basics
It doesn't have to be production ready of course, but spending 3-4 hours to write it out in code, debug a few steps, ... are useful in my opinion.
or at least watch the karpathy video and try to follow along
I think it's worth doing a very simple implementation at least once to ensure you have the fundamentals memorized. It's not actually that complicated to implement a very simple one. Maybe a day or two -long project
Also compsci master w/ only ML courses here. I actually enjoyed it and learned tons of stuff which I'd never have learned on my own. Who learns boltzmann machines, self organizing maps (and such) or fourier transformation/wavelets and stuff like that. I've never seen any of those in most ML books or courses and I really enjoyed learning all of these (and these are only the things I can think of right now, it's been a while).