Ask HN: Daily practices for building AI/ML skills?

I got a masters degree in ML at a good school. I will say there’s pretty much nothing they taught me that I couldn’t have learned myself. That said, school focused my attention in ways I wouldn’t have alone, and provided pressure to keep going.

The single thing which I learned the most from was implementing a paper. Lectures and textbooks to me are just words. I understand them in the abstract but learning by doing gets you far deeper knowledge.

Others might suggest a more varied curriculum but to me nothing beats a one hour chunk of uninterrupted problem solving.

Here are a few suggested projects.

Train a baby neural network to learn a simple function like ax^2 + bx + c.

MNIST digits classifier. Basically the “hello world” of ML at this point.

Fine tune GPT2 on a specialized corpus like Shakespeare.

Train a Siamese neural network with triplet loss to measure visual similarity to find out which celeb you’re most similar to.

My $0.02: don’t waste your time writing your own neural net and backprop. It’s a biased opinion but this would be like implementing your own HashMap function. No company will ask you to do this. Instead, learn how to use profiling and debugging tools like tensorboard and the tf profiler.

ML is so much more than just neural networks.

I would start by taking a free university level course in statistics. Then I would continue with the basics: SVM, linear regression, naive Bayes, gradient boosting, neural nets etc. I would not only train and fine tune them, but I would also build simple ones myself instead of just using libraries. Then I would continue to what you said, participate in Kaggle competitions, try to solve real world problems.

I think that understanding the field from bottom up is priceless. Many people fine tune and train models, but they don't understand how that model works, nor do they know if the model they've chosen is the best fit for the problem they are trying to solve.

It's a rather long path if you really want to get good at it. Like in music: you can learn to play a tune by ear or you can learn to have a good, deep and thorough understanding of music.

This type of bottom up approach is terrible idea for a fast moving area like ML. Ultimately to get a job and make money in the area, you need to solve customer problems, starting with libraries and fine tuning is what needs to happen first. However you should try to learn fundamentals as you go and when you get stuck. It will take a very long time if you have a full time job or a student studying something else and doesn't help to get a job

Getting a job to make money and solve customer problems as fast as possible was not a stated goal by the OP.

Besides, you are also wrong, having good fundamentals in the maths will help you pick up new methods much faster as they pop up. And especially if you want to come up with new methods (as in research), there are no shortcuts.

They made a reasonable assumption.

Abstraction helps you to be productive. It's certainly good to understand everything all the way down to the actual physics behind computing, but its not necessary especially at the start. I don't have to know exactly how logic gates work to program in JS and make a difference. I assume the same applies to ML. Motivation is what prevents most people from learning difficult to learn skills. Finding ways to produce value early on can help keep you motivated.

I don't have to know exactly how logic gates work to program in JS and make a difference.

Thats a stupid analogy to the above discussion. A better analogy would be trying to program in JS without understanding for loops and basic programmation concepts.

Thats typically what I observe with younger folks jumping into neural nets directly. They have a very shallow understanding of anything and survive with youtube tutorials.

There’s a reason people start with YouTube tutorials (which, let’s face it, are responsible for many of us passing undergrad CS classes). They give a broad, approachable explanation of the topic.

without understanding for loops and basic programming concepts

Depending on what you mean by “understand” I would guess most software engineers don’t “understand” for loops either. For loops are an abstraction built on the CPU instruction set provided by each programming language. We use them with the knowledge of how they behave, which is the correct level of abstraction nearly 100% of the time.

And in CS undergrads, we don’t throw people into a course on assembly first. First they learn something like Java or Python, and only later dig deeper into more fundamental concepts. That’s not an accident.

Most of what was listed above aren't the fundamentals which would help to properly understand what's happening with neural networks but rather completely different branches of machine learning that have little in common with neural networks. If they learn SVM, naive Bayes, and gradient boosting, then their knowledge will definitely be broader but just as shallow; using your analogy, it's not like trying to program in JS without understanding for loops but rather like trying to program in JS without understanding C, COBOL and Haskell.

I'm all for learning the fundamentals properly - but those fundamentals are going to be completely things, things like core principles of statistics (limitations of correlation, confounders, bias/variance, etc), the relevant parts of calculus and linear algebra that matter for understanding optimization, the best practices for management of data, experiments and measurement to not cheat yourself, etc - not the checklist of many different, parallel methods of machine learning like decision trees or reinforcement learning, which are both useful and interesting, but not related or required to properly apply e.g. transformer-based large language models for your task.

I don't have to know exactly how logic gates work to program in JS and make a difference.

I don't regard time spent in University to learn how logical gates work and many other useful things, as a loss of time. And, as a web developer/architect (after many other industries I worked in), I typically make more money than peers who don't. And knowing how things work, helped me immensely in my career. Not everything is solvable by looking on Stack Overflow.

It depends on what professional level you are content with.

Learning was not only motivated by money - for me- I was genuinely curious about computers and software since I was a kid.

I don't regard time spent in University to learn how logical gates work and many other useful things, as a loss of time.

It’s unquestionably a loss of time, the only question is whether it’s an optimal use of time. This depends on your goals. For most software engineers looking to use ML, starting with existing frameworks and knowledge and drilling down as necessary is the most prudent method.

We all have to make sacrifices in what we learn. Even for yourself, the topics you chose to learn about implicitly left out other parallel topics you didn’t learn about. And you also didn’t learn everything from the least abstracted, most fundamental level. We need to choose the appropriate level of abstraction for the problem at hand, which will depend on each person’s goals.

his type of bottom up approach is terrible idea for a fast moving area like ML.

Of course not. When you learn this way you will realize there are a myriad of problems that can be solved with more simple algorithms. Trying to make every problem fit to neural networks is pure cargo culting.

That's true, but the goal is to learn to use neural networks, not to solve problems efficiently, isn't it?

OP did not state that getting an ML job was a goal.

I for one would be interested in learning more foundational stuff; I have no interest (though perhaps that process would change this!) in a particularly ML job, and certainly not in learning how to point and click and run other people's work from a YouTube video with a scream-face thumbnail.

For those who feel similarly, I asked about it recently, maybe something you like the look of:

https://news.ycombinator.com/item?id=38320244

I also came across (doesn't seem it was mentioned there) Understanding Deep Learning (Simon Prince) which looks like it might be good:

https://mitpress.mit.edu/9780262048644/understanding-deep-le...

The better your skillset, the highly sought after you are and harder to replace. And that translates in more money, if money are your sole motivation.

Strong agree

SVM, linear regression, naive Bayes

When I studied ML in 2012, the very first course started with naive Bayes and went one from there. A decade after being away, I see a lot of people around me starting with neural nets to train a model that naive Bayes would be plenty enough for and never heard about naive Bayes. Is that only my experience?

Any data that is small enough to quickly iterate on for learning is small enough to use a simpler approach. The point is learning the techniques.

As with any super-hyped technology, people want to use the latest and greatest to solve it, even when traditional methods are cheaper, easier, more reliable and more accurate. See also: crypto.

It's useful to do learning on small toy problems for ease of debugging and speed of training, so if you want to learn how to apply a powerful technique, you're pretty much inevitably going to start learning it on something for which it's absolutely ineffective overkill. E.g. a common start for learning neural nets is a task like XOR which can be solved with literally a single machine instruction instead of training a ML model for it.

But also there are many tasks for which naive Bayes works, but a NN solution can be much more accurate if you're okay with it being also much more compute-intensive. E.g. things like sentiment analysis or simple spam filters are often used as a demonstration of naive Bayes, but you can do much better with more powerful models.

The risk with this is that you spend 10 hours learning statistics then get demotivated and never do the other 190 hours to get to the good stuff. Then quickly you forget the 10 hours of stats you learnt too as it's irrelevant and you don't use it.

For me, playing with things and doing cool & fun stuff is always the way to get deeper into something.

Then quickly you forget the 10 hours of stats you learnt too as it's irrelevant and you don't use it.

Stats is one of a very small number of college courses that I took where I came away thinking "this should be mandatory for all voting adults". I use that stats course way more often than I use even algebra, just to be a functional adult in a world where bad statistics are used day in and day out to manipulate and deceive people into buying things or voting for someone.

So, I have to disagree: not only is a basic foundation in stats essential for understanding ML, it's something everyone really should have under their belt anyway to live in the modern world without turning into someone else's pawn.

Love the suggestions, especially the one about implementing papers.

Do you have any starters on how one selects papers in the early days, to implement?

Also - any great papers you recommend beginners expose themselves to?

I don't know if anyone does it still, but a few years ago there were a lot of papers suggesting more or less clever alternatives to ReLU as activation function. There was also a whole zoo of optimizers as alternatives to SGD.

Those papers were within reach for me. Even if the math (or the collossal search effort) needed to find them was out of reach, implementing them wasn't.

There were some things besides optimizers and activation functions too. In particular I remember Dmitri Ulyanov's "Deep Image Priors" paper. He did publish code, but the thing he explored - using the implicit structure in a model architecture without training (or, training on just your input data!) is actually dead simple to try yourself.

I'm sure if you just drink from the firehose of the arxiv AI/ML feeds, you'll find something that tickles your interest that you can actually implement. Or at least play with published code.

The best site imo is Papers With Code. State of the art benchmarks, the papers which achieved them (along with previous papers) and github repos to actual implementations.

I wouldn’t recommend papers to absolute beginners though. For them, it’s best to go to HuggingFace, find a model that seems interesting and play with it in a Jupyter notebook. You’ll get a lot more bang for your buck.

Try the "historical papers" on this repo: https://github.com/aimerou/awesome-ai-papers And also you can find papers with their implementations in code here: http://paperswithcode.com

This seems like great advice.

You say don't write your own neural net and backprop implementation. That makes sense to me. What do you suggest using instead, for your suggested projects? I'm guessing tensorflow, based on your suggestions on profiling and debugging tools? Do the papers / projects you suggest map straightforwardly onto a tensorflow implementation, rather than a custom one?

Implementations in Tensorflow are widely considered to be technical debt. Internally, Google has mostly switched to JAX. PyTorch now has torch.compile and exports to ONNX so there's little reason to use Tensorflow these days except in niche cases.

This kind of information is why I was asking :)

I will say there’s pretty much nothing they taught me that I couldn’t have learned myself. That said, school focused my attention in ways I wouldn’t have alone, and provided pressure to keep going.

I have a masters degree in computer science and took a fair share of ML graduate courses. That pretty much summed up what I was thinking. They basically forced me to sit and learn something I wouldn't have alone.

Now-- I'm not saying you need to go to grad school. You could buy some ML textbooks and force yourself through them and go from there... but how many people have that grit? I wouldn't have been one of them :)

I hate to say it, but the diploma also matters. Having an MS next to your name means employers will give you the time of day that others won’t get. I really don’t like that this is the way things work, but it is.

Does it? I have an MS, but never put it next to my name. I'm doing well :)

I got a masters degree in ML at a good school. I will say there’s pretty much nothing they taught me that I couldn’t have learned myself.

Are people actually going into masters degree to learn? I thought the whole point of paying for masters is just credentialism

Technically you could do it on your own. Practically speaking? I quit my job and went to school full time. I spent 2 years studying the stuff practically 10 hours a day. The only way this is socially acceptable is if you get a piece of paper at the end which says you did it.

The only way this is socially acceptable is if you get a piece of paper at the end which says you did it.

That’s true. Or if you’re really rich I guess.

I only have old hardware at home. How viable is to practice this stuff on my own projects (and I'd like to touch JS as much as possible, despite everyone being on python)

You don’t need your own hardware. You can use Google Colab for free.

Most of the action happens in python. That being said, there’s a library called Tensorflow JS. It has some pre-trained models you can use off the shelf and run from your browser. Things like face detection and sentiment analysis.

don’t waste your time writing your own neural net and backprop.

I don't think you should be combining writing a neural network with doing backprop since I don't know anyone working with serious ML who is not using some sort of automatic differentiation library to handling the backprop part for them. I'm not entirely sure people even know what they're saying when they talk about backprop these days, and I suspect they're confusing it with gradient optimization.

But anyone seriously interested in ML absolutely should be building their own models from scratch and training them with gradient descent, ideally start with building out your own optimization routine rather than using a prepackaged one.

This is hugely important since the optimization part of the learning is really the heart of modern machine learning. If you really want to understand ML you should have a strong intuition about various methods of optimizing a given model. Additionally there are lots of details and tricks behind these models that are ignored if you're only calling an api around these models.

There's a world of difference between implementing an LSTM and calling one. You learn significantly more about what's actually happening by doing the former.

the optimization part of the learning is really the heart of modern machine learning

It’s an important component but I wouldn’t say it’s the main factor. ML is ultimately about your data, so understanding it is critical. Feature selection and engineering, sampling, subspace optimization (e.g. ESMMs) and interpreting the results correctly are really the main places you can squeeze the most juice out. Optimizing the function is the very last step.

Basically, you can go ahead and optimize down to the very bottom of the global min but a model with better features and better feature interactions is going to win.

Further, there are a ton of different optimizers available. SGD, Adam, Adagrad, RMSProp, FTRL, etc. With just one hour a day, you could spend six months simply writing and understanding the most popular ones.

I think that getting a feel for gradients and how it all works is a good reason for implementing your own - once.

Don't worry about what companies will ask you to do unless you absolutely have to.

As Andrej Karpathy says, understanding it deeply is required to take your skills to the next level and not make mistakes. As a practice, one should implement their own neutral net and manual backprop to start their understanding.

I'd argue backprop is still handy just to learn the basics

It doesn't have to be production ready of course, but spending 3-4 hours to write it out in code, debug a few steps, ... are useful in my opinion.

or at least watch the karpathy video and try to follow along

My $0.02: don’t waste your time writing your own neural net and backprop.

I think it's worth doing a very simple implementation at least once to ensure you have the fundamentals memorized. It's not actually that complicated to implement a very simple one. Maybe a day or two -long project

Also compsci master w/ only ML courses here. I actually enjoyed it and learned tons of stuff which I'd never have learned on my own. Who learns boltzmann machines, self organizing maps (and such) or fourier transformation/wavelets and stuff like that. I've never seen any of those in most ML books or courses and I really enjoyed learning all of these (and these are only the things I can think of right now, it's been a while).

Pardon me for hijacking this post but my question is something similar: What should be the roadmap as a developer to get into the GenerativeAI/LLM space? I want to learn how to use different LLMs, how to use them from hugging face and their different features like embeddings etc.

I am a Python developer who has never worked on ML/data science before, I am mostly into Data Engineering

Are you trying to learn how train LLMs or use LLMs to produce things?

Sorry! Just updated my comment. I was talking about usage to build products

I’m not an expert, but I just picked a project and used the OpenAI API — but with a wrapper that should let me swap out backends if/when I get a computer with a nice GPU.

Python is great for mixing API calls, document formatting, and other data scraping.

For myself, the problem was finding something interesting to do — in my case, generating videos from basic prompts.

With 2015 Macbook, I do not have such luxury

Does it not run Python?

I’m working off a 4GiB system with no dedicated GPU — and the Python APIs for OpenAI work fine, but admittedly, I have a Linux system.

I was talking about other LLMs, do they also provide API? What LLM are you using for image generation?

Try this https://www.bishopbook.com/ and solve the exercises.

I would not recommend doing many things at once.

That's only deep learning. There's so much more in machine learning and I think getting the basics right is more important than focusing only on one area.

It does claim to teach from beginning. Once comfortable they can try to cover other domains toi.

I had a look at the index and it does not cover nearly enough to have a solid foundation of ml.

Which book would you recommend then? Suggest only one.

Depending on the focus you're looking for I'd either say Machine Learning by Flach (really like that one!) or Artificial Intelligence by Russel & Norvig.

Presuming you want to work in the field and already have software development experience why not look at the confluence between ML and engineering?

Things like ML ops, application of DevOps, testing and ci/cd in the ml space, how to train across multiple gpus, how to actually host an LLM especially at scale and affordably.

In my experience there are hundreds of candidates coming from academia with strong academic backgrounds in ML. There are very few experienced engineers available to help them realise their ambitions!

Hey, I am a classic backend software engineer looking to learn how to do things you mentioned. I believe if I learn these skills, I will know how to make "shovels" during gold rush :)

Can you recommend any learning resources for things you mentioned? I don't have an option to learn these on my current job, so it will be hard to structure CV to prove my future employers I know them when I don't have real world experience.

Check out my reply to the sibling comment

Do you have any recommended resources on those topics? I'm coming from a strong ~30 year software engineering background which has been excellent, until now, as ML requires a completely different background. I'm trying to decide if I should start a new game+ with academic background, or get some expansion packs with what I already know and move into ML that way. I've found plenty of resources for the former and practically nothing for the latter.

Things like this give a good overview of the problems being face in productionising ML:

https://research.google/pubs/whats-your-ml-test-score-a-rubr...

Note they start to discuss things like unit testing, integration testing, processing pipelines, canary tests, rollbacks, etc. Sound familiar yet?

The same author has also written this book:

https://www.oreilly.com/library/view/reliable-machine-learni...

I don't see a software engineer's skills becoming redundant in this field, especially if you have a good level of experience in cloud infra and tooling. It seems more valuable that ever to me (e.g. I have worked with ML Researchers who don't grasp HTTP let alone could set up a fleet of severs to run their model developed entirely in Jupyter Notebook).

I have found it helpful to equate myself with the correct tools and terminology in order to speak the right language - there's specific tools lots of people use such as Weights & Biases for "Experiment Tracking", terms like "Model Repository" which is just what it sounds like. "Vector Databases" (Elastic Search had this feature for years), "Feature Stores" - feel familiar to big table type databases.

Reading up on a typical use case like "RAG - Retrieval Augmented Generation" is a good idea - alongside starting to think about how you'd actually build and deploy one.

Above all having a decent background in cloud infra, engineering and how to optimise systems and code for production deployment at scale is a very in demand at the moment.

Being the person helping these teams of PHDs (many of whom have little industry experience) to productionise and deploy is where I am at right now - it feels like a fruitful place to be :)

Roughly speaking, the roadmap for a typical ML/AI student looks like this:

0) Learn the pre-requisites of math, CS, etc. That usually means calc 1-3, linear algebra, probability and statistics, fundamental cs topics like programming, OOP, data structures and algorithms, etc.

1) Elementary machine learning course, which covers all the classic methods.

2) Deep Learning, which covers the fundamental parts of DL. Note, though, this one changes fast.

From there, you kind of split between ML engineering, or ML research.

For ML engineering, you study more technical things that relate to the whole ML-pipeline. Big data, distributed computing, way more software engineering topics.

For ML research, you focus more on the science itself - which usually involves reading papers, learning topics which are relevant to your research. This usually means having enough technical skills to translate research papers into code, but not necessarily at a level that makes the code good enough to ship.

I'll echo what others have said, though, use to tools at hand to implement stuff. It is fun and helpful to implement things from scratch, for the learning, but it is easy to get extremely bogged down trying to implement every model out there.

When I tried to learn "practical" ML, I took some model, and tried to implement it in such a way that I could input data via some API, and get back the results. That came with some challenges:

- Data processing (typical ETL problem)

- Developing and hosting software (core software engineering problems)

- API development

And then you have the model itself, lots of work goes toward that alone.

As someone a wee bit along the journey but with the maths dragging me down a bit, I've found that, while in a perfect world I'd love to get my maths up to solid 2nd year undergrad level, it's just going to take me another year or so. That hasn't stopped me moving forwards. I understand y = ax + b, bits of linear algebra, gradient descent, but I still don't have the critical intuition to pass a college level maths exam.

This has helped me build the intuition for understanding these concepts in ML, and as an experienced developer I've found I've been able to pick up the ML stuff relatively easily - it's mostly libraries at the practical level. This has in turn shown me two things: ML is data quality, prep, and monitoring; I actually like the maths: it annoys me that there's this whole branch of knowledge that I don't grok intuitively and I want to know more. As I go deeper on the maths, I find myself retrospectively contextualising my ML knowledge.

So: do both and they'll reinforce each other - just accept you'll be lost for a bit.

Also: working with LLMs is incredible, as you can skip the training step and go straight to using the models. They're fucking wild technology.

I personally think it is possible to get a grasp of how many ML models learn, if you can get the intuition behind it - without the formal math knowledge, but only up to a certain point.

From my time in college studying this, you had approximately four types of students:

1) Those that didn't understand how models worked, and lacked the math to theoretically understand the models (dropped out class after a couple of weeks)

2) Those that understood (intuitively) how the models worked, but lacked the math to read and formalize models. Lots of students from the CS program fell under this group - but I think that is due to CS programs here having less math requirements than traditional engineering and science majors.

3) Those that understood how the models worked, and had the math knowledge. This was the majority of students.

4) Those that did not understand the models, but had the math knowledge.

Of these, 2-3 were the most common types of students. In the rare occasion, you had type 4 students. They would have no problem with deriving formulas, or proving stuff - but they'd more or less freeze up or start to stumble when asked to explain how the models worked, on a blackboard.

With that said, if someone has any ambition of doing ML research, I think math prereqs are a must. Hell, even people with good (graduate level) math skills can have a hard time reading papers, as there are so many different fields/branches of math involved. Lots and lots of inconsistent math notation, overloading, and all that.

There's a lot of contrived "mathiness" in papers, even where simple diagrams will do the trick. If your paper doesn't include a certain amount of equations / math, people aren't taking it serious...so some authors will just spam their papers with somewhat related equations, using whatever notation they're most comfortable with.

#2 is interesting to me. My computer engineering degree had me do enough math classes that it only took a few extra classes to get a minor in math, so CS students not having the math background is interesting to me. Must be different curriculums.

If I can offer some unsolicited advice, try to seek out ML educational material with polished visualizations. If you're struggling with the math, trying to learn the concepts by reading libraries or textbooks or papers will be very hard - you might understood a specific thing if you look at it closely, but it will be hard to conceptually develop an intuition for why it works. A good visualization, or a strong educator explaining by way of analogy, can make a huge difference.

For example, gradient descent in conjunction with your learning rate can be visualized as calculating your error vector (your gradient), stretching it by your learning rate, applying it your parameters, computing the next error vector, and so on. If you think of what applying this vector might look like in 3d space, training your model is basically getting all your parameters to fall into a hole (an optimum). This kind of conceptualization helps you understand the purpose and impact of the learning rate: a way to stretch out the steps you make to descend into holes, so that you might hopefully "shoot over" local non-global optima while still being able to "fall into" other optima.

You could read papers and stare at code for a very very long time without developing that kind of intuition. I don't think I could ever come up with this myself just dabbling.

And as a side note, in mathematics at least for me, the most unexpectedly hugely important factor in understanding something is exposure-time. In college and grad school I found I didn't fully intuit most material until about 12mo after I had finished studying it - even if I hadn't actively studied it at all in the interim. I think it has something to do with the different ways our brains encode recent/medium/long term knowledge, or sleep, or something - not really sure, but I do know the earlier you started learning something and exposing yourself to the concepts, the sooner your subconscious builds that intuitive understanding. So you can do yourself a huge favor by just making an effort to dive into the math material even if it feels like a slog or that you're not getting it right this minute - you might make up one day in a few months and just "get it" somehow

(Former AI researcher + current technical founder here)

I assume you’re talking about the latest advances and not just regression and PAC learning fundamentals. I don’t recommend following a linear path - there’s too many rabbit holes. Do 2 things - a course and a small course project. Keep it time bound and aim to finish no matter what. Do not dabble outside of this for a few weeks :)

Then find an interesting area of research, find their github and run that code. Find a way to improve it and/or use it in an app

Some ideas.

- do the fast.ai course (https://www.fast.ai/)

- read karpathy’s blog posts about how transformers/llms work (https://lilianweng.github.io/posts/2023-01-27-the-transforme... for an update)

- stanford cs231n on vision basics(https://cs231n.github.io/)

- cs234 language models (https://stanford-cs324.github.io/winter2022/)

Now, find a project you’d like to do.

eg: https://dangeng.github.io/visual_anagrams/

or any of the ones that are posted to hn every day.

(posted on phone in transit, excuse typos/formatting)

Ah, visual anagrams, that was exactly the idea I had for a project that would allow me to learn. I hadn't dared looking if it already existed. I will try to pretend it doesn't and try to find my own way...

fast.ai course (https://www.fast.ai/) gets a thumbs up from me as well

Would recommend Zero to Hero by Karpathy as well

https://karpathy.ai/zero-to-hero.html

What's worked well for me: Find a way to put what AI/ML on your critical path. Think of it like learning a new language: classes, lessons, and watching TV helps, but nothing works like full-on immersion. In the context of AI/ML, that means find a way to turn AI/ML into your full-time job or school. It's not easy! But if you do, you'll see endless returns.

If you don't have a solid enough footing to get a job in the field yet, the next best thing in my opinion: find a passion project and keep cooking up new ways to tackle it. On the way to solving your problem, you'll undoubtedly begin absorbing the tools of the trade.

Lastly, consider going back to school (a Bachelor's or Master's, perhaps?). It'll take far more than 1 hour/day, but I promise you, you'll see results far faster and far more concretely than any other learning strategy.

Good luck!

Context: I've been a Researcher/Engineer at Google DeepMind (formerly Google Brain) for the last ~7 years. I studied AI/ML in my BS and MS, but burnt out of a PhD before publishing my first paper. Now I do AI/ML research as a day job.

Yes, I was leaning more towards the "personal project" idea as well, something around document understanding. I subscribe to the "learning by doing/immersion" philosophy as well (upto a large extent).

The problem with projects is one's understanding tends to go more and more specialised, and collaborating/connecting with other ML engineers requires a broader knowledge base sometimes.

Also, for giving advice and useful inputs to others (on their projects), I feel a balanced knowledge base is useful.

Hence the question.

Greg Brockman's blog[1] has few links on how he picked up ML. Another link at [2] describes the path Michal(blog's author) followed (though it's aligned to "how i got into ..."). Both these blogs walk through how they were able to get into the ML bits of things. They have bunch of links (ex: [3]).

I think it'll help if you can get a job at a company who's main focus is ML, you'll talk to folks who are doing research or solving problems using ML, you'll learn. If not, i hope these links help as folks there (people way smarter than me, a swe) had similar question and documented the steps they took to reduce the gaps in their understanding.

[1] - https://blog.gregbrockman.com/how-i-became-a-machine-learnin... [2] - https://agentydragon.com/posts/2023-01-11-how-i-got-to-opena... [3] - https://github.com/jacobhilton/deep_learning_curriculum

Great resources, especially Brockman's blog makes the experiences so much acceptable, knowing that even the top people had to struggle to get going in ML

Quit your job and go all in.

On Bitcoin!

And never forget to trade on 125x leverage.

Focusing on Deep Learning specifically: - Most LLMs currently use the transformer architecture. You can learn about this visually (https://bbycroft.net/llm), or through this blog post (https://jalammar.github.io/illustrated-transformer/), or through any number of Andrej Karpathy's blog posts and materials. - To stay on top of papers that get published every week, I read a summary every Sunday: https://github.com/dair-ai/ML-Papers-of-the-Week - To learn more about the engineering side of it, you can join Discord servers such as EleutherAI's, or follow GitHub discussions of projects like llama.cpp

Personally I think the best way to develop per unit time is probably to try to re-implement some of the big papers in the field. There's a clear goal, there are clear signs of success, there are many implementations out there for you to check your work against and compare and learn from.

Good luck!

These are super helpful, thanks

In case you're unsure which papers would be good to implement, here's a nice GitHub repo: https://github.com/aimerou/awesome-ai-papers

Try out the "historical papers"! :)

I never get these type of questions because it’s like, what are you trying to do?

Just acquire skills for the sake of it?

Pooling experiences, to learn more efficiently

I'd spend most of that hour a day using ChatGPT, Bard and other models.

Learning how to effectively prompt an LLM is an enormous space in its own right - and there's no shortcut for it, you have to actively play with the things.

I've been using them constantly for over a year at this point and I'm still figuring out new tricks and strategies all the time.

Weirdly, knowledge of Machine Learning isn't actually that relevant to getting good at using LLMs to solve problems and build software.

Knowing how to train your own neural network will do little for your ability to build astonishingly cool software on top of existing LLMs.

Knowledge of how LLMs work is useful, because it can help you prompt them more effectively if you understand their limitations, have an idea of their training data etc.

I've seen people (who I respect) argue that deep knowledge of ML can be a disadvantage when exploring LLMs, because it can limit the way you think about and interact with them. Weird but possibly true!

That's a unique suggestion. Any chance you could share your favorite resources around it? Also, since you seem to have accumulated experience/expertise, would be super happy to read about it from you as well. Thanks for the advise.

There are so many mentions of reading paper. Do papers like these exists for regular enterprise software devs like me who make apis in Dotnet/go, good knol of multiple major cloud tools, k8s etc, has developed couple of iOS apps.

I can do my job but I always wanted to learn and understand more. Family circumstances mean I can't afford to quit my job or go to school.

For a software engineer I would rather recommend implementing from a reference software implementation. There are for example tons of "Model XX from scratch" in Python - which you can translate to Dotnet/Go.

Implementing code from a paper is almost its own skillset. Papers are often math heavy and they are information dense, with lots of references to exiting works. They are designed for communicating to other researchers in the same field/niche.

In case you missed it(it was on the frontpage here a couple of days ago), play around with this awesome 3D visualisation and animation to get a basic understanding:

https://bbycroft.net/llm

This is so cool, thanks for sharing.

One thing to point out: Try not to let your imagination run away, or get overconfident in what AI/ML can do.

I worked for a major company on an ML project for 2 years. By the time I left, I realized that:

1: The project I was working on has no improvement over ordinary statistical methods; yet the ability for people to understand the statistics (over the black box of ML) meant that the project had no tangible improvement over the processes we were trying to replace.

2: A lot of the ML I was working on was a solution in search of a problem.

I personally found the ML system I was working on fascinating; but the overconfidence about what it can infer, and the way that non-developers thought ML could make magical inferences, frustrating.

---

One other thing: Make sure you understand how to use databases, both SQL and non-SQL. In order to use ML effectively, you will need to be excellent at programming with large volumes of data in a performant manner.

this answers a different question - what is the simplest possible solution to the problem at hand.

answering that requires a good understanding of the problem at hand as well as knowledge to be able to propose a simple solution that could be the starting point, and then searching for improvements over the same - assuming the improvement they bring is useful to the solution at hand.

i guess what I am trying to say is, the question asked by op and your suggestion are orthogonal imo :)

Not a complete answer, but here are the most helpful resources for understanding transformer basics in particular:

Original transformer paper: https://arxiv.org/abs/1706.03762

Illustrated transformer: http://jalammar.github.io/illustrated-transformer/

Transformer visualization: https://bbycroft.net/llm

minGPT (Karpathy): https://github.com/karpathy/minGPT

---

Next, some foundational textbooks for general ML and deep learning:

Elements of Statistical Learning (aka the bible): https://hastie.su.domains/ElemStatLearn/

Probabilistic ML: https://probml.github.io/pml-book/book2.html

Deep Learning Book (Goodfellow/Bengio): https://www.deeplearningbook.org/

Understanding Deep Learning: https://udlbook.github.io/udlbook/

---

Finally, assorted tutorials/resources/intro courses:

Beyond the Illustrated Transformer: https://news.ycombinator.com/item?id=35712334

AI Zero to Hero: https://karpathy.ai/zero-to-hero.html

AI Canon: https://a16z.com/2023/05/25/ai-canon/

LLM University by Cohere: https://llm.university/

Practical Guide to LLMs: https://github.com/Mooler0410/LLMsPracticalGuide

Practical Deep Learning for Coders: https://course.fast.ai/Lessons/part2.html

---

Hope that helps!

I'd start by replacing "1 hour daily" with "4 uninterrupted hours every weekend". 1 hour is not enough for a focused deep dive into anything.

On top of what people have said, I have a few suggestions.

One is to reproduce recent papers for which the data is available and especially if the source code is available. Don’t look at their source code initially but use it if you get stuck as a debugging method (my model isn’t converging, do they get the same gradients given the same data?)

Another is a fun idea to play with: sports data sets. Of course you have to like at least one sport but there’s lots of sports data out there that is easy to download in convenient formats (especially for baseball, where professional statisticians have been employed to do analysis since at least the 50s, but afaik all the major sports have good records these days) and you can go a long way with simple models. I’ve wasted a lot of time on the weekend coming up with fun baseball analyses.

This is exactly the boat I'm in. I have a 1hr train commute to work that I spend skilling up in AI. I've been following the space for about 15 years and have done a bunch of self learning of earlier ML techniques (the early Stanford ML MooCs) so I'm not coming in cold. What I'm doing is:

- Following along with Karpathy's videos, which has been mentioned: https://karpathy.ai/zero-to-hero.html

- About to follow along with CS 231n, also mentioned: https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rB...

- Trying ideas and theories in a Jupyter notebook

- Reading papers

I would agree with other commenters that recommend learning how to implement a paper. As someone who barely managed to get their undergraduate degree, papers are intimidating. I don't know half the terms and the equations, while short, look complex. Often it will take me several reads to understand the gist, and I've yet to successfully implement a paper by myself without checking other sources. But I also know that this is where the tech is ultimately coming from and that any hope of staying current outside of academia is dependent on how well I can follow papers.

I've been doing this for about a month now, and I feel I definitely understand more of the theory of how most of this stuff works and can train a simple attention based model on a small-ish amount of data. I don't feel I could charge someone money for my skills yet, but I do feel that I will feel ready with about 6 months - 1 year of doing this.

If your computer's strong enough, install several models with ollama. Learn to prompt, fine-tune them.

https://ollama.ai/library

1. Get An Introduction to Statistical Learning with Applications in R/Python (aka ISLR/ISLP) by Hastie et al. Read this from cover-to-cover and make sure that you understand the concepts/ideas/nuances/subtleties explained.

2. Keep a couple of Mathematics/Statistics books handy while you are going through the above. When the above book talks about some Maths technique you don't know/understand you should immediately consult these books (and/or watch some short Youtube videos) to grasp the concept and usage. This way you learn/understand the necessary Mathematics inline without being overwhelmed.

This is the simplest and most direct route to studying and understanding AI/ML. Everything else mentioned in this thread should only come after this.

coming from a similar context, i believe going top down might be the way to go.

up to your motivation, doing basic level courses first (as shared by others) and then tackling your own application of the concepts might be the way to go.

i also observe the need for strong IT skills for implementing end-to-end ml systems. so, you can play to your strenghts and also consider working on MLOps. (online self-paced course - https://github.com/GokuMohandas/mlops-course)

i went back to school to get structured learning. whether you find it directly useful or not, i found it more effective than just motivating myself to self-learn dry theory. down the line, if you want to go all-in, this might be a good option for you too.

Depending on your goal, if it happened to be hired as ML engineer, then better to focus on building resume:

1. Build small projects in the area you have passion about, examples: try to beat benchmark, classify news and track stories of your interest, build auto manga generator

2. Kaggle competitions: not sure if employers are looking at this though

3. Write blog about your journey.

You can AI and ML to advanced level using github and LinkedIn just go from roadmap

Maybe start with FastAI course? Then from there go deeper into what interests you?

Related question: how can I learn how to read the mathematical notation used in AI/ML papers? Is there a definitive work that describes the basics? I am a post-grad Engineer, so I know the fundamentals, but I'm really struggling with a lot of the Arxiv papers. Any pointers hugely appreciated.

Hm, not exhaustive but I think these are potentially useful to you:

The deeplearning.ai math basics for deep learning, seems self-contained. MiniTorch repo (implement your own tiny torch) seems also helpful to understand what goes on during training. MinGPT repo (to understand a basic version of GPT model structure) Dive into deep learning (textbook avail online, more focused on practical DL)

If you're interested in AI but dislike Python you can join the Anti Python AI club here: https://github.com/Fileforma/AntiPython-AI-Club

We work together to build AI models in our favorite programming languages.

I built aiplanet.com where numerous beginners accessed free AI learning resources provided by a diverse group of contributors (primarily experienced AI practitioners). Building on the common advice, here are some insights to consider, given your existing context:

- AI/ML is diverse, with data scientists specializing in different areas. I know AI experts who have still not delved into LLMs; they have their specific focus areas. AI/ML skills encompass a wide range of topics, and data scientists often have specific focus areas. Continuous exploration and reading are crucial. Resources like paperswithcode.com are valuable for discovering new research areas and domains.

- While time-consuming, Kaggle offers exposure to robust modeling and validation skills. These skills are critical, though they are only a fraction of what's needed for real-world projects. It's beneficial to expand beyond these skills. This being said, it does give bragging rights. I've seen company founders, like those at H20.ai, often highlight their Kaggle Grandmasters.

- My current role is at Pathway.com. Over 80% hold of my colleagues PhDs, and our CTO has co-authored with folks like Geoff Hinton and Yoshua Bengio (I find that cool actually :)). But this environment may reflect my bias towards academic research. This being I said, I believe that strong foundational understanding is essential and also valued, especially when tackling complex challenges.

- Active participation in forums and communities related to the frameworks you use is highly recommended, like TensorFlow User Groups. At Pathway.com, we welcome those interested in stream data processing to our community. Engaging in these forums offers the chance to receive support from the original creators and leading community members. Other notable communities include DataTalks.Club and MLOps.Community.

I transitioned from infra to an AI research org within a large company and realized my need for practical PyTorch skills, like understanding tricks with multidimensional tensors, debugging exploding loss, and implementing math from specific papers, etc.

Finding resources for hands-on practice was tough, so my friend and I built Tensorgym (https://tensorgym.com), a site with practical ML exercises that increase in complexity.

QUIT YOUR JOB

I've learned the most from implementing papers. And being stuck. But me is me.

Since you mention SE, I'd choose a mini project in an area you love. The tooling you will learn along the way.

An hour a day is paradoxically not nearly enough, yet also a serious time investment of your day.

Maybe start by asking what exactly you want to learn? Applying ML to a practical problem, in user app? The math? The ideas?

Highly recommended video of Karpathy - https://www.youtube.com/watch?v=I2ZK3ngNvvI

essentially, don't get paralysed on designing the perfect path on how to invest time/energy. just focus on putting in the hours everyday.

I think you have listed 8 strategies that are both good, and you ordered them in most important strategies first.

For courses, Andrew Ng’s classes have always been good, starting with his Stanford ML class, Coursera deep learning classes, and now his short mini-classes on being an effective LLM practitioner.

Textbooks on LLMs are likely to quickly be out of date, at least I struggle to keep my LangChain/LlamaIndex book current.

My advice to you is to try to get into a paid AI job as your highest priority, and that is a lot of work: identifying possible employers, preparing for interviews, and having persistence. Some of the interesting AI work you might find will not be with “tech” companies, but rather small or medium profitable businesses that need to use ML, DL, LLMs lightly - just a small part of their successful businesses.

One unpopular opinion I have is that with LLM, the difficulty gap between develop LLM vs use LLM is going to be significantly wider, akin to that of chip design, making developing ML/AI skill, while still intelligence wise challenging, less useful in career growth.

what is AI/ML skills?

I just tried a lot and the best thing you can get is to do something practical (most ML is empirical anyway) and pick something that you can train on small machine. I picked working with audio since it usually don't need too much data, big networks and can be trained easily on a single 4090.

Specifically for LLMs— I recently gave a guest lecture on Intro to LLMs for non-CS (biomed) grad students. I wanted to assign a homework quiz but didn’t find any good ones, so I made a multiple choice quiz. It’s a bit “evil”: it trips you up if you don’t have a solid understanding. Several of the questions have nuances that both test your understanding and also help you learn by figuring out the right answer. It’s a google form that does NOT collect emails:

https://docs.google.com/forms/d/e/1FAIpQLScbWN3qwqeIc0b1cCRq...

Note this is for absolute LLM beginners, not if you’re already working with LLMs -- but even some of these folks have found it useful!

Hope you find this useful.

I think a lot of these comments will highlight the lower level parts of ML, but what ML needs right now in my opinion is really smart people at the implementation level. As an analogy, there are way less “frontend” ML practitioners than “backend” ones.

Leveraging existing LLM technologies and putting them in software where regular people can use them and have a great experience is important, necessary work. When I studied CS in college the data structure kids were the “cool kids”, but I don’t think that’s the case in ML.

The daily practice is to sketch applications, configure prompts and function calls, learn to market what you create, and try to create zero to one type tools. Here’s two examples I made, one where I took the commonplace book technique of the era of Aristotle and put it in our modern embeddings era [1] and one where I really pushed to understand the pure MD spec and integrate streaming generative models into it [2]

[1] - https://github.com/bramses/commonplace-bot

[2] - https://github.com/bramses/chatgpt-md

I am nowhere as technically proficient as most people in HN. I teach classes in Microsoft Office.

For the least technical, I suggest MattVidPro AI on YouTube.

For the slightly more technical, I suggest 1littlecoder also on YouTube.