return to table of content

The Engineer’s Guide to Deep Learning: Understanding the Transformer Model

MAXPOOL
6 replies
1d9h

There are many others that are better.

1/ The Annotated Transformer Attention is All You Need http://nlp.seas.harvard.edu/annotated-transformer/

2/ Transformers from Scratch https://e2eml.school/transformers.html

3/ Andrej Karpathy has really good series of intros: https://karpathy.ai/zero-to-hero.html Let's build GPT: from scratch, in code, spelled out. https://www.youtube.com/watch?v=kCc8FmEb1nY GPT with Andrej Karpathy: Part 1 https://medium.com/@kdwa2404/gpt-with-andrej-karpathy-part-1...

4/ 3Blue1Brown: But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning https://www.youtube.com/watch?v=wjZofJX0v4M Attention in transformers, visually explained | Chapter 6, Deep Learning https://www.youtube.com/watch?v=eMlx5fFNoYc Full 3Blue1Brown Neural Networks playlist https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_6700...

mr_puzzled
2 replies
1d7h

Slightly off topic: I'm interested in taking part in the Vesuvius challenge[0], but I don't have a background in ML, just a regular web developer. Does anyone have suggestions on how to get started? I planned to get some background on practical ML by working through Karpathy's Zero to Hero series along with the Understanding Deep Learning book. Would that be enough or anything else I should learn? I plan to understand the existing solutions to last year's prize and then pick a smaller sub challenge.

[0] https://scrollprize.org/

trybackprop
1 replies
1d5h

I made a list of all the free resources I used to study ML and deep learning to become an ML engineer at FAANG, so I think it'll be helpful to follow these resources: https://www.trybackprop.com/blog/top_ml_learning_resources (links in the blog post)

Fundamentals Linear Algebra – 3Blue1Brown's Essence of Linear Algebra series, binged all these videos on a one hour train ride visiting my parents

Multivariable Calculus – Khan Academy's Multivariable Calculus lessons were a great refresher of what I had learned in college. Looking back, I just needed to have reviewed Unit 1 – intro and Unit 2 – derivatives.

Calculus for ML – this amazing animated video explains calculus and backpropagation

Information Theory – easy-to-understand book on information theory called Information Theory: A Tutorial Introduction.

Statistics and Probability – the StatQuest YouTube channel

Machine Learning Stanford Intro to Machine Learning by Andrew Ng – Stanford's CS229, the intro to machine learning course, published their lectures on YouTube for free. I watched lectures 1, 2, 3, 4, 8, 9, 11, 12, and 13, and I skipped the rest since I was eager to move onto deep learning. The course also offers a free set of course notes, which are very well written.

Caltech Machine Learning – Caltech's machine learning lectures on YouTube, less mathematical and more intuition based

Deep Learning Andrej Karpathy's Zero to Hero Series – Andrej Karpathy, an AI researcher who graduated with a Stanford PhD and led Tesla AI for several years, released an amazing series of hands on lectures on YouTube. highly highly recommend

Neural networks – Stanford's CS231n course notes and lecture videos were my gateway drug, so to speak, into the world of deep learning.

Transformers and LLMs Transformers – watched these two lectures: lecture from the University of Waterloo and lecture from the University of Michigan. I have also heard good things about Jay Alammar's The Illustrated Transformer guide

ChatGPT Explainer – Wolfram's YouTube explainer video on ChatGPT

Interactive LLM Visualization – This LLM visualization that you can play with in your browser is hands down the best interactive experience with an LLM.

Financial Times' Transformer Explainer – The Financial Times released a lovely interactive article that explains the transformer very well.

Residual Learning – 2023 Future Science Prize Laureates Lecture on residual learning.

Efficient ML and GPUs How are Microchips Made? – This YouTube video by Branch Education is one of the best free educational videos on the internet, regardless of subject, but also, it's the best video on understanding microchips.

CUDA – My FAANG coworkers acquired their CUDA knowledge from this series of lectures.

TinyML and Efficient Deep Learning Computing – 2023 lectures on efficient ML techniques online.

Chip War – Chip War is a bestselling book published in 2022 about microchip technology whose beginning chapters on the invention of the microchip actually explain CPUs very well

mr_puzzled
0 replies
1d4h

Wow, thanks for the links to all the resources. Lot of interesting stuff for me to learn!

rvnx
0 replies
1d8h

In addition, these websites are totally free.

The website listed here:

I consider requests for full commercial use of all content on this site (and the github repository). For a complete buyout of all content rights, the cost is €10,000,000. > I’d like to ask you what problems you have by that I keep on having the copyright of my document.

+ no commercial-use without paying 20% royalty.

So fairly expensive for a Keras tutorial.

SebFender
0 replies
1d7h

oh! 2/ recommendation is an absolute masterpiece of simplicity and effectiveness - cheers for that!

tuyguntn
4 replies
1d9h

question to experts of HN in ML/AI. Could you please share the beginner resources you think would worth for a person who wants to switch their domain from CRUD/backend APIs to ML/AI. There seems to be many branches of this domain, not sure where to start.

Is my understanding correct?

    * ML engineer -> engineer who builds ML models with pytorch (or similar frameworks)
    * AI engineer -> engineer who builds applications on top of AI solutions (prompt engineering, OpenAI, Claude APIs,....)
    * ML ops -> people who help with deploying, serving models

treme
0 replies
1d9h

Kaggle is a good start

qsort
0 replies
1d9h

None of these terms have a formal definition. The only association rule you need is:

* Fancy Title -> Whatever the company wants it to be.

All of the above could realistically span from "does bleeding-edge work" to "has once opened a CSV".

blowski
0 replies
1d8h

I'd call the "AI Engineer" an Application Engineer, albeit one that specialises in integrating ML into software.

belter
0 replies
1d8h

85% of your ML project time will be spent on Data Quality and a little bit of Domain Feature Engineering.

If you want to make an impact, become excellent at those, you will be able to use these skills, for domains like Systems Integration and Business Analytics. Let the people who do Research bring you the Algorithms and nowadays even the trained Models.

_giorgio_
4 replies
1d10h

It uses keras, which is obsolete. Nobody uses that thing anymore.

Stay away from this.

sva_
1 replies
1d8h

I prefer PyTorch myself, but to call Keras obsolete is quite the stretch. Just because academia has largely moved on from it, doesn't mean nobody uses it.

Also, the API isn't all that different from other libraries. The principles are the same.

whiplash451
0 replies
1d7h

The industry has also moved on from keras/tensorflow (apart from for legacy reasons).

Google itself has moved on to JAX.

albertzeyer
0 replies
1d8h

I wonder about Keras 3. It's now backend independent again, like in the early days, and supports JAX, TensorFlow, or PyTorch. It's a nice thing if you defined your model and can then easily switch between the frameworks, right? Or no-one cares about that, and everyone just uses PyTorch?

0xd1r
0 replies
1d8h

I suppose they are using keras because of its simplicity. But I agree, things have really moved on from keras.

gregw2
2 replies
1d10h

No content besides a few paragraphs of intro. Actual content has 404 not found errors.

smokel
0 replies
1d10h

The links in the running text are broken. The links in the hamburger menu work fine.

gpnt
0 replies
1d10h

The menu on the left on desktop is working. It seems only the links on the first page are broken.

alister
2 replies
1d8h

When you send me an email, please provide at least two SNS [social networking service] addresses (e.g. LinkedIn, Twitter) for verification purposes. ... I no longer accept contact from anonymous individuals.

It's pretty sad to see that social networking is being adopted as an identification and trust mechanism even by technical people. It was bad enough when some governments began demanding social networking usernames for visa/immigrant screening, but we can't even send an email without social proof to other technical people now?

kltzayh
0 replies
1d5h

With the ellipsis expanded: "Due to the XZ backdoor incident, I no longer accept contact from anonymous individuals."

The XZ cracker could have logged in via GitHub at numerous services. I bet that the OP downloads from PyPI that was potentially compromised for longer than a year due to an overlooked token leak.

I further bet that the OP, being in the machine learning space, downloads unauditable, huge Python frameworks from GitHub, conda or PyPI.

People in that space also download and experiment with untrusted models.

But hey, plain text email which you can read in a command line mail client with MIME and other extensions disabled is the problem!

belter
0 replies
1d8h

I no longer accept contact from anonymous individuals.

This reminds of that joke, where a guy shows up at the Air Force HQ recruitment center. They ask, "Pilot license? Experience? Qualifications?" He replies, "Nope, just here to say: Don't count on me!"

yobbo
0 replies
1d6h

This is a very compressed work-through from perceptron to transformer.

When he is working through the gradients of an LSTM, for example, it is to help understanding, not help you implement it in your favourite framework.

When he is showing solutions in various frameworks, the purpose is to help create connections between what the math looks like and what code can look like.

uoaei
0 replies
1d2h

One of the most frustrating things about all the documentation on Transformers is the sole emphasis on NLP.

In particular, one of the most interesting parts of the Transformer architecture to me is the attention mechanism which is permutation invariant (if not for the positional embeddings people use to counteract this inherent quality of attention layers). Also the ability to arbitrarily mask this or that node in the graph -- or even individual edges -- gives the whole thing so much flexibility for encoding domain knowledge into your architecture.

Positional embeddings may still be required in many cases but you can be clever about them beyond the overly restrictive perspective of attention layer inputs purely as one-dimensional sequences.

shubham13596
0 replies
1d9h

Very good resource on building from the basics in a concise manner.

revskill
0 replies
1d7h

Transformer tutorial is like the new "Monad tutorial".

benterix
0 replies
1d6h

In contrast, the AI technology of the current golden age, which began in the mid-2010s, has consistently exceeded our expectations.

Well, until recently, that is. It looks like we hit the wall as for what LLMs can do - some might call it a plateau of productivity. Namely, as far as coding is concerned, LLMs can successfully create chunk of code of limited length and tiny programs, can also review small pieces of code and suggest improvements that are not related to the context of the whole program (unless it can fit in the context window). In spite of huge effort put in creating a system where LLM agents could work together to create software such as AutoGPT, no non-trivial program has been created in this way so far.