return to table of content

Understanding Deep Learning

nsxwolf
24 replies
19h11m

As someone who missed the boat on this, is learning about this just for historical purposes now, or is there still relevance to future employment? I just imagine the OpenAI eats everyone's lunch in regards to anything AI related, am I way off base?

deepsquirrelnet
6 replies
17h52m

This is about deep learning, of which LLMs are a subset. If you are interested in machine learning, then you should learn deep learning. It is incredibly useful for a lot of reasons.

Unlike other areas of ML, the nature of deep learning is such that its parts are interoperable. You could use a transformer with a CNN if you wish. Also, deep learning enables you to do machine learning on any type of data, text, images, video, audio. Finally, it can naturally scale computationally.

As someone pretty involved in the field, I lament that LLMs are turning people away from ML and deep learning, and following the misconceptions that there’s no reason to do it anymore. Large algorithms are expensive to run, have slow throughput and are still generally poorer performing than purpose built models. They’re not even that easy to use for a lot of tasks, in comparison to encoder networks.

I’m biased, but I think it’s one of the most fun things to learn in computing. And if you have a good idea, you can still build state of the art things with a regular gpu at your house. You just have to find a niche that isn’t getting the attention that LLMs are ;)

dafaqueue
5 replies
15h8m

I started off being really excited to learn, but as time went on I actually lost interest in the field.

The whole thing is essentially curve fitting. The ML field is essentially an art more than a science and it's all about tricks and intuitions on different ways of getting that best fit curve.

From this angle the whole field got way less interesting. The field has nothing deeper or more insightful to offer beyond this concept of curve fitting.

Slix
2 replies
9h29m

Are deep learning and neural networks just curve fitting? I thought those were significantly different.

galangalalgol
0 replies
8h43m

You could argue all the building blocks are forms of curve fits, but that isn't a terribly useful statement even if true. If you can fit a curve to the desired behavior of any function, or composition of functions (which is a function) then you can solve any problem you can express the desired behavior of. Including the expressing of desired behavior for some other class if problems. Saying it is just curve fitting is like saying something is just math. The entirety of reality is just math.

deepsquirrelnet
0 replies
2h55m

By that logic, anything that is predictive is curve fitting, including entire academic fields like physics and climatology. You could say that all automation is curve fitting. I don’t think there’s much to be gained by being that reductive.

From a technical standpoint, it’s not correct analogy either, because it assumes you have a curve to fit. What curve is language? What’s curve is images? No answer, because there isn’t one. Deep learning is about modeling complex behaviors, not curve fitting. Images and language for instance are based in social and cultural patterns and not intrinsic curves to be fit.

At best, it’s an imprecise statement. But I’d disagree entirely.

wiz21c
0 replies
7h42m

Although it fundamentally is curve fitting, I'd venture to say that at some point, having to handle millions of parameters makes the curve fitting problem unrecognizable... A change in quantity is a change is nature if you will.

IOW: to me, fitting a generalized linear model is very different than fitting a convolutional network.

dbmikus
0 replies
9h38m

I've found this fun way to think of it: the goal is to invent a faster form of evolution for pattern recognition, learning, and autonomous task completion. I think one needs to consider it more like biology and a science than pure logic and math. We can discover things that work, and then after that we can study them to learn why they work, just like we don't fully understand the brain yet.

I think there are some really cool problems, such as:

    1. Is synthetic data viable for training?
    2. How do you make deep learning agents that can do task planning and introspection in complex environments?
    3. How do we efficiently build memory and data lookup into AI agents? And is this better/worse than making longer context windows?

ksherlock
5 replies
19h0m

Maybe last week's drama should have been a left-pad moment. For many things you can train your own NN and be just as good without being dependent on internet access, third parties, etc. Knowing how things work should give you insight into using them better.

mnky9800n
3 replies
18h46m

Which drama of last week are you referring to? The one about the openai guy saying it's all just the data set? Or something else?

quchen
1 replies
18h13m

Their CEO was fired, hired by Microsoft, took a bunch of people with him, and is now back at the company

giardini
0 replies
13h31m

Did the "bunch of people" also return (from Microsoft to OpenAI)?

whatever_yeah
0 replies
2h29m

I must have missed the "dataset" news you're referring to, could you elucidate?

lamroger
0 replies
18h56m

I wonder if using APIs was more of a first to market move

GeneralMayhem
3 replies
15h57m

The most important thing to learn for most practical purposes is what the thing can actually do. There's a lot of fuzzy thinking around ML - "throw AI at it and it'll magically get better!" Sources like Karpathy's recent video on what LLMs actually do are good anti-hype for the lay audience, but getting good practical working knowledge that's a level deeper is tough without working through it. You don't have to memorize all the math, but it's good to get a feel for the "interface" of the components. What is it that each model technique actually does - especially at inference time, where it needs to be well-integrated with the rest of the stack?

In terms of continued relevance - "deep learning", meaning, dense neural nets trained to optimize a particular function, haven't fundamentally changed in practice in ~15 years (and much longer than that in theory), and are still way more important and broadly used than the OpenAI stuff for most purposes. Anything that involves numerical estimation (e.g., ad optimization, financial modeling) is not going to use LLMs, it's going to use a purpose-built model as part of a larger system. The interface of "put numbers in, get number[s] out" is more explainable, easier to integrate with the rest of your software stack, and more measurable. It has error bars that are understandable and occasionally even consistent. It has a controllable interface that won't suddenly decide to blurt corporate secrets or forget how to serialize JSON. And it has much, much lower latency and cost - any time you're trying to render a web page in under 100ms or run an optimization over millions of options, generative AI just isn't a practical option (and is unlikely to become one, IMO).

I don't have a significant math or theoretical ML background, but I've spent most of the last 10 years working side by side with ML experts on infra, data pipelines, and monitoring. I'm not sure I could integrate the sigmoid off the top of my head, but that's not what's important - I've done it once, enough to have some idea how the function behaves, and I know how to reason about it as a black box component.

wilkystyle
2 replies
14h25m

Sources like Karpathy's recent video on what LLMs actually do are good anti-hype for the lay audience

Which video is this?

bjacobt
1 replies
14h18m

I believe op means Intro to Large Language Model

https://youtu.be/zjkBMFhNj_g?si=XQQ3p92ajuQYOyqN

GeneralMayhem
0 replies
7h4m

Yep, that's the one I meant - sorry, should have linked.

His series on making a GPT from scratch is also great for building intuition specifically about text-based generative AI, with an audience of software developers.

ww520
0 replies
15h10m

From an application perspective, it's more important to understand how the overall ML process work, the key concepts, and how things are fitted together. Deep Learning is a part of that. Lots of these are already wrapped in libraries and API, so it's a matter of preparing the correct data, calling the right API's, and utilizing the result.

two_in_one
0 replies
9h38m

It's like calculus, nothing new in the last years, is it still important? The answer is still "Yes".

After a glance, looks like too much for one book. Probably it was compressed with the assumption that reader already knows quite a lot. In other words it's not an easy reading.

quickthrower2
0 replies
13h15m

This would be like learning how your CPU / Memory works, even though JS is eating everyone's (web front-end) lunch.

So yes if you are prompt engineering, and wondering why X works and sometimes it doesn't, and why any of this works at all, it is good to study a bit.

niemandhier
0 replies
8h47m

Someone will dominate the AI as a service marked, but there are so many applications for tiny edge ai that no single player can dominate all of them.

OpenAI is for example not interested in developing small embedded neural networks that run on a sensor chip that real-time detects specific molecules in air.

lngnmn2
0 replies
13h20m
hedgehog
0 replies
10h27m

Highly relevant if you want to work on ML systems. Despite how much OpenAI dominates the press there are actually many, many teams building useful and interesting things.

Slix
0 replies
9h41m

I came here with the same question. After reading and learning these materials, will I have new job skills or AI knowledge that I can do something with?

ldjkfkdsjnv
23 replies
19h21m

I spent a decade working on various machine learning platforms at well known tech companies. Everything I ever worked on became obsolete pretty fast. From the ML algorithm to the compute platform, all of it was very transitory. That coupled with the fact that a few elite companies are responsible for all ML innovation, its oxymoronic to me to even learn a lot of this material.

nabla9
10 replies
18h56m

machine learning platforms

Machine learning platforms become obsolete.

Machine learning algorithms and ideas don't. If learning SVN or Naive Bayes did not teach you things that are useful today, you didn't learn anything.

ldjkfkdsjnv
8 replies
18h31m

Nobody is building real technology with either of those algorithms. Sure, they are theoretically helpful, but they arent valuable anymore. Spending your precious life learning them is a waste

nabla9
4 replies
18h9m

Spending your precious life learning them is a waste

So you really did not learn them.

There is nothing wrong with being user. You don't have to know how compilers work to use compiler. But then you should not say you understand compilers.

In the same way, you probably would benefit from a book "Using deep learning", not "Understanding deep learning".

ldjkfkdsjnv
3 replies
18h7m

I know them and am a founder of a vc funded ai startup. Nobody is deploying naive bayes algorithms

yeukhon
0 replies
8h39m

Yeah. But you didn’t build a plane without knowing physics right?

Nobody deploys a textbook algorithm because everyone knows textbooks algorithms and there are no advantages. So, no, there is real value in learning the fundamentals, dear founder.

nabla9
0 replies
18h4m

Nobody is deploying naive bayes algorithms

Exactly my point. You are so into user perspective that you think you are arguing against me.

Philpax
0 replies
17h36m

Yes, they’re not deploying them. That doesn’t mean it doesn’t still help to know the fundamentals of the field, especially when you’re trying to innovate.

xcv123
0 replies
18h28m

So what? The same fundamental machine learning concepts are still relevant to deep learning.

It's almost like arguing that everything you learned as a Java developer is completely useless when a new programming language replaces it.

spencerchubb
0 replies
15h5m

This brings up an important question: Is a topic useful to learn if you will never use it in your life?

To attempt answering this question, we can look at LLMs as an analogy. If you include code in the training set for an LLM, it also makes the LLM better at non-coding tasks, suggesting that sometimes learning something makes you also better at other things. I'm not saying the same necessarily applies for learning these "old school" AI techniques, but it's a decently analogy at least.

coef2
0 replies
17h16m

I started my journey in machine learning fifteen years ago. Ironically, at that time, my professor told me that neural networks were outdated and trying them wouldn't result in publishable research. SVMs were popular and emphasized in my coursework. I concur that SVMs don't hold as much practical significance today. But the progress in AI and ML is generally unpredictable, and no one knows what theory leads to the next leap in the field.

xcv123
0 replies
18h34m

Agreed. Look at the table of contents of this book. Whatever fundamental machine learning concepts you learned with SVM or other obsolete algorithms is still useful and applicable today.

HighFreqAsuka
4 replies
18h56m

Quite a lot of techniques in deep learning have stood the test of time at this point. Also new techniques are developed either depending on or trying to solved deficiencies in old techniques. For example Transformers were developed to solve vanishing gradients in LSTMs over long sequences and improve GPU utilization since LSTMs were inherently sequential in the time dimension.

ldjkfkdsjnv
3 replies
18h30m

Sure, but if you were an expert in LSTM, thats nice, you know the lineage of algorithms. But it probably isnt valuable, companies dont care, and you cant directly use that knowledge. You would never just randomly study LSTMs now.

opportune
1 replies
15h46m

There are plenty of transferrable skills you get from being an expert something that gets made obsolete by a similar-but-different iterative improvement. Maybe you're really good at implementing ideas from papers, you have a great intuitive understanding of how to structure a model to utilize some tech within a particular domain, you understand very well how to implement/use models that require state, you know how to clean and structure data to leverage a particular feature, etc.

Also, being an "expert in LSTM" is like being an "expert in HTTP/1.1" or "knowing a lot about Java 8". It's not knowledge or a skill that stands on its own. An expert in HTTP/1.1 is probably also very knowledge about web serving or networking or backend development. HTTP/2 being invented doesn't obsolete the knowledge at all. And that knowledge of HTTP/1.1 would certainly come in handy if you were trying to research or design something like a new protocol, just as knowledge of LSTMs could provide a lot of value for those looking for the next breakthrough in stateful models.

xcv123
0 replies
15h7m

FYI, LSTMs are not obsolete. They are still the best option in many cases and are being deployed today.

HighFreqAsuka
0 replies
18h20m

Transformers have disadvantages too, and so LSTMs are still used in industry. But also it's not that hard to learn a couple new things every year.

drBonkers
3 replies
19h16m

What would you recommend someone read instead?

ldjkfkdsjnv
2 replies
19h15m

Better to understand the bounds of whats currently possible. And then recognize when that changes. Much more economically valuable

trace_id
0 replies
17h10m

Even better: change the bounds of whats possible ;)

probablynish
0 replies
19h11m

Do you think there's a better way to do this than spending some time playing around with the latest releases of different tools?

wiz21c
0 replies
7h38m

So, what fundamental stuff should I learn ? I understand ML has some general principles that keeps on bein valid throughout the years. No ?

tysam_and
0 replies
14h15m

Highly, highly disagree.

If it became obsolete, then y'all were doing the new shiny.

The fundamentals don't really change. There are several different streams in the field, and there are many, many algorithms with good staying power in use. Of course, you can upgrade some if you like, but chase the white rabbit forever, and all you'll get is a handful of fluff.

reqo
0 replies
19h10m

Very few things stay the same in Technology. You should think of technology as another type of evolution! It is driven by the same type of forces as evolution IMO. I think even Linus Torvalds once stated that Linux evolved trough natural selection.

msie
8 replies
19h38m

This book looks impressive. There's a chapter on the unreasonable effectiveness of Deep Learning which I love. Any other books I should be on the lookout for?

nextos
3 replies
17h39m

This presentation from Deep Mind outlines some foundational ML books: https://drive.google.com/file/d/1lPePNMGMEKoaDvxiftc8hcy-rFp...

For the impatient, look into slide #123. Essentially, the recommendations are Murphy, Gelman, Barber, and Deisenroth.

Note these slides have a Bayesian bias. In spite of that, Murphy is a great DL book. Besides, going through GLMs is a great way to get into DL.

mitthrowaway2
1 replies
13h54m

What is a "Bayesian bias"?

nextos
0 replies
12h38m

I meant the presenter is discussing ML from a Bayesian point of view, which is interesting, but not something you need if your aim is just to understand deep learning.

fastneutron
0 replies
4h31m

Reality has a well known Bayesian bias…

Joking aside, these slides are excellent! Is there an associated video or course that they were a part of?

nootopian
2 replies
19h19m
theogravity
1 replies
17h50m

I wish it wasn't an X post. Can't see responses at all without an account.

nextos
0 replies
17h38m

Use nitter to go around X authwalls: https://nitter.net/suhail/status/1728676402864812466

teleforce
0 replies
12h42m

Yes, it looks very impressive indeed and it has the potential to be the seminal textbook on the subject.

Fun facts, the infamous Attention paper is closing in to reach the 10K citations, and it should reach this milestone by the end of this year. It's probably the fastest paper ever to reach this significant milestone. Any deep learning book written before the Attention paper should be considered out of date, and needs updating. The situation is not unlike an outdated Physics textbook with Newton's laws but devoid of the infamous Einstein's equation of energy equivalence.

martingoodson
6 replies
9h16m

Most comments here are in one of two camps: 1) you don't need to know any of this stuff, you can make AI systems without this knowledge, or 2) you need this foundational knowledge to really understand what's going on.

Both perspectives are correct. The field is bifurcating into two different skill sets: ML engineer and ML scientist (or researcher).

It's great to have both types on a team. The scientists will be too slow; the engineers will bound ahead trying out various APIs and open-source models. But when they hit a roadblock or need to adapt an algorithm many engineers will stumble. They need an R&D mindset that is quite alien to many of them.

This is when an AI scientists become essential.

mi_lk
1 replies
7h53m

I guess this message is delivered by an AI scientist, sure.

It's almost self-exploratory that when you hit a roadblock in practice you go back to foundations, and good people should aim to do both. In that case I don't see where ML engineer/scientist bifurcation comes from except for some to feel good about themselves

martingoodson
0 replies
4h0m

Not at all. It's something I've seen in practice over many years. Neither skill set is 'better' than the other, just different.

There is a need for people who are able to build using available tools, but who don't have an interest in the theory or foundations of the field. It's a valuable mindset and nothing in my original comment suggested otherwise.

It's also pretty clear that many comments on this post divide into the two mindsets I've described.

gardenhedge
1 replies
8h2m

Would you see these as analogous?

The people who create the models and the people that use them.

The people who create the programming languages and the people that use them.

martingoodson
0 replies
3h51m

I think because it's a relatively 'younger' field, there is a bit more need to know about the foundations in AI than in programming. You hit the perimeters a bit more often and need to do a bit of research to modify or create a model.

Whereas it's unlikely in most programming jobs you would need to do any research into programming language design.

3abiton
1 replies
8h2m

This sounds like a sell-pitch for an AI scientist.

rising-sky
0 replies
2h40m

This sounds like a dont-buy-pitch for an AI engineer...

The point the commenter is making is that both schools of thought in the comments are valuable and unless you perform both roles, i.e. an engineer who is familiar with the scientific foundations, both are symbiotic and not in contention.

Slix
5 replies
9h17m

If I start now and start reading up on AI, will I become anything close to an expert?

I'm worried that I'm starting a journey that requires a Master's or PhD.

ocharles
1 replies
8h34m

Very hard to answer without knowing what your goal is. Do you want to be a practitioner of DL, or do you want to be a researcher?

mavelikara
0 replies
8h10m

Not the OP, but I’d like to hear you answer and reasoning for the “practitioner of DL” case.

strikelaserclaw
0 replies
2h16m

The only guide post to use in this world of ever increasing information to learn is to ask yourself "do i find learning this stuff enjoyable?", questions like "can i become an expert" are vague and not good guideposts.

crimsoneer
0 replies
7h56m

You probably won't become an expert, but I'm not clear why you'd want to!

Chirono
0 replies
8h4m

From reading this book you’d have a very good grasp of the underlying theory, much more than many ML engineers. But you’d be missing out on the practical lessons, all the little tips and intuitions you need to be able to get systems working in practice. I think this just takes time and it’s as much an art as it is a science.

water-your-self
3 replies
19h20m

No chapter on RNNs, but one on transformers is interesting, having last read Deep learning by ian goodfellow in 2016

PeterisP
1 replies
18h54m

RNNs have "lost the hardware lottery" by being structurally not that efficient to train on the cost-effective hardware that's available. So they're not really used for much right now - though IMHO they are conceptually sufficiently interesting enough to cover in such a course.

nextos
0 replies
16h34m

That is not completely true. There are RNNs with transformer/LLM-like performance. See e.g. https://github.com/BlinkDL/RWKV-LM.

They are less popular, and less explored. But an interesting route ahead.

nothrowaways
0 replies
19h2m

Yeah, content looks interesting.

contrarian1234
3 replies
12h27m

It's very hard to judge a book like this... (based on a table of contents?)

Who is the author ?

Have they published anything else highly rated ?

Are there good reviews from people that know what they're talking about?

Are there good reviews from students that don't know anything ?

xcv123
0 replies
9h55m

based on a table of contents?

The entire pdf is available as a free download on that page. First link at the top.

https://github.com/udlbook/udlbook/releases/download/v1.16/U...

komatsu
0 replies
1h29m

I can highly recommend the author. His last book "Computer Vision: Models, Learning, and Inference" is very readable, approaches the matter from unorthodox viewpoints + includes lot of excellent figures supporting the text. I'm buying this on paper!

arman_hkh
0 replies
11h51m

Marcus Hutter on his [Marcus' AI Recommendation Page]: "Prince (2023) is the (only excellent) textbook on deep learning."

dchuk
1 replies
18h37m

Hopefully not a dumb question: how do I buy a physical copy?

rossant
0 replies
18h27m
adamnemecek
1 replies
18h55m

All machine learning is Hopf convolution, analogous to renormalization. This should come as no surprise, renormalization can be modeled via the Ising model which itself is closely related to Hopfield networks which are recurrent networks.

dbmikus
0 replies
9h16m

Don't know any of these terms, but you gave me some interesting topics to google about. Thanks!

oakejp12
0 replies
17h50m

The PDF figures for 'Why does deep learning work' seem to point to 'Deep learning and ethics' and vice versa.

ksvarma
0 replies
14h23m

Simply great work and making it freely available is outstanding!!

WeMoveOn
0 replies
19h52m

lit

TrackerFF
0 replies
15h53m

Reading through it, and it def looks accessible.