return to table of content

Jeff Dean: Trends in Machine Learning [video]

nomilk
42 replies
22h51m

I see from comments I'm far from the only one using AI to summarise videos before deciding whether to watch them.

Reminds of the meme "why spend 10 minutes doing something when you can spend a week automating it". i.e. "why spend an hour watching a talk when you can spend 5 hours summarising it with AI and debating the summary's accuracy".

This sounds silly but potential gains from learning AI summarisation tooling/flows are large, hence why it warrants discussion. Learning how to summarise effectively might save hours per week and improve decisions about which sources deserve our limited time/attention.

bmitc
21 replies
22h10m

I feel like I'm missing some boat, but I'm not sure what boat it is. These "AI" systems seem very superficial to me, and they give me the same feeling as VR does. When I see VR be some terrible approximation of reality, it just makes me feel like I'm wasting my time in it, when I could go experience the real thing. Same with AI "augmentation" tooling. Why don't I just read a book instead of getting some unpredictable (or predictably unpredictable) synposis? It's not like there's too much specific information there. These tools are just exploding the amount of unspecific information. Who has ever said: "hey, I have too much information for building this system or learning this topic"? Basically no one.

It's just going to move everything to the middle of the Bell curve, leaving the wings to die in obscurity.

nomilk
9 replies
22h1m

If you know a book’s worth reading, going ahead and reading it works well. But for a lot of books/talks there’s competition for time - eg my bookshelf has 20 half read books (this is after triaging out the ones that aren’t worthy of my time) - any tooling that can help better determine where to invest tens or hundreds or hours of my time is a win.

Regarding accuracy, I think we’re at a tipping point where ease of use and accuracy is starting to make it worth the effort. For example Bard seems to know about youtube videos (just a couple of months ago you’d have to download it -> audio to text -> feed into a LLM). So the combination of greater accuracy and much easier to use make it worth considering.

ceruleanseas
3 replies
21h59m

LLM accuracy is so bad, especially in summarization, that I now have to fact check google search results because they’ve been repeatedly wrong about things like the hours restaurants are open.

tracerbulletx
1 replies
21h38m

There's a huge difference between summarizing a stable document that was part of the training data or the prompt, and knowing ephemeral facts like restaurant hours.

nvader
0 replies
20h18m

Technically true statement. If you're offering it to imply that the GP bears responsibility for knowing what document was in the training data and what's not, I have to quibble with you.

Knowing it's shortcomings should be the responsibility of the search app that is currently designed to give screen real estate to the wrong summary of the ephemeral fact. Or, users will start to lose trust.

seanbethard
0 replies
18h3m

It's because they don't understand language. You may have been mislead by their ability to generate language.

bmitc
3 replies
20h49m

If you know a book’s worth reading, going ahead and reading it works well. But for a lot of books/talks there’s competition for time - eg my bookshelf has 20 half read books (this is after triaging out the ones that aren’t worthy of my time) - any tooling that can help better determine where to invest tens or hundreds or hours of my time is a win.

Is it that hard to determine that a book is worth reading where worth is measured from your perspective? It's usually pretty easy, at least for technical books. Fiction books are another story, but that's life. Having some unknown stochastic system giving me a decision based upon some unknown statistical data is not something I'm particularly interested in. I'm interested in my stochastic system and decision making. Trying to automate life away is a fool's errand.

nomilk
2 replies
20h26m

Is it that hard to determine that a book is worth reading

I'm a huge believer in doing plenty of research about what to read. The simple rationale: it takes a tiny amount of time to learn about a book relative to the time it takes to read it. Even when I get a sense a book is bad, I still tend to spend at least a couple of hours before making the tough call not to bother reading further (I handled one literally 5 minutes ago that wasted a good few hours of my life). I'm not saying AI summaries solve this problem entirely, but they're just one additional avenue for consultation that might only take a minute or two and potentially save hours. It might improve my hit rate from - I dunno - 70% to 80%. Same idea for videos/articles/other media.

majormajor
0 replies
19h43m

I think the more you outsource "what is worth my time" the less you're actually getting an answer about what's worth YOUR time. The more you rule out the possibility of surprise up front, the less well-informed your assumption about worth can possibly be.

There are FAR too many dimensions like word choice, sentence style, allusion, etc, that resist effective summarization.

Ludleth19
0 replies
19h59m

I get where you're coming from and definitely vet books in similar ways depending on the subject, but I also feel like this process is pretty limited in ways too and appeals to some sort of objective third party that just doesn't exist. If you really want to know or have an opinion on a work/theory/book at the end of the day you have to engage with it yourself on some level.

In graduate school for example, it was pretty painfully obvious that most people didn't actually read a book and come to their own conclusions, but rather read summaries from people they already agreed with and worked backwards from there, especially on more theoretical matters.

I feel like on the long term this just leads to a person superficially knowing a lot about a wide variety of topics, but never truly going deep and gaining real understanding on any of them- it's less "knowing" and more the feeling of knowing.

Again, not saying this in an accusatory way because I totally do engage in this behavior too, I think everyone does to some degree, but I just feel the older I get, the less valuable this sort of information is. It's great for broad context and certain situations I suppose, but in a lot of areas I consider myself an expert, I would probably strongly disagree with summaries given on subjects and they also tend to miss finer details or qualifying points that are addressed with proper context.

alex_smart
0 replies
9h50m

IMHO, the good old method of skimming through the table of contents, reading the preface and perhaps the first couple of chapters is going to be a much higher fidelity indicator of whether a book is worth your time than reading an AI generated summary.

hn_throwaway_99
3 replies
19h3m

Why don't I just read a book instead of getting some unpredictable (or predictably unpredictable) synposis? It's not like there's too much specific information there.

I'm trying to understand this comment, because I couldn't disagree more. It is the absolute explosion of available data sources that has me wanting to be much more judicious with where I spend my time reading/watching in the first place.

Your comment was interesting to me because I feel like I agree with one of its main sentiments: that AI generated content all kinda "sounds the same" and gives a superficial-feeling analysis. But that is why I think AI is a fantastic tool for summarizing existing information sources so I can see if I want to spend anymore time digging in to begin with.

makerdiety
2 replies
18h49m

Relying on glorified matrices (that's what machine learning is) for world data curation is just begging to handicap yourself into a cyborg dependent on a mainframe computer's implementation. An implementation and design that is rarely scrutinized for safety and alignment features.

Why not just make your brain smarter, instead of trying to cram foreign silicon components into your skull?

kendalf89
1 replies
17h55m

Why not both?

makerdiety
0 replies
17h22m

Because maximizing both biological vectors of self-improvement and computing based avenues of skill acquisition is limited by the fact that it's a multi-objective optimization problem when you combine them together during maximization. Optimizing one de-optimizes the other. They, biology and computers, conflict with each other in fact. So, at best, you have to reach for a Pareto frontier.

And, it turns out, technology can't be trusted, as there is always some sort of black box associated with its employment. Formally, there is always a comprehension involved when it comes to the development and integration of technology into human life. You can't really trust this stubborn built-in feature of technological and economic success if you don't pierce through its secrets (knowledge is the power to counteract cryptographic objects). After all, it could be a malicious trojan horse that "basic common sense" insists on us all using for "bettering" our daily lives.

A very unfriendly artificial intelligence is trying to sneak through civilization for its own desires. And you're letting it just pass on by, as a result of your compliance with the dominant narrative and philosophy of capitalist economics.

saalweachter
2 replies
21h4m

I was thinking the other day: Star Trek computers make a lot of sense if they are working with our current level of AI.

You can talk to it, it can give you back answers that are mostly correct to many questions, but you don't really trust it. You have real people pilot the ship, aim and fire weapons, and anything else important.

cpeterso
1 replies
20h24m

And nobody in Star Trek thinks the ship computer is sentient. On the other hand, the holodeck sometimes malfunctions and some holodeck character (like Moriarty) becomes sentient running on just a subset of the ship computer. That suggests sentience (in the Star Trek universe) is a property of the software architecture, not hardware.

vinay_ys
0 replies
18h38m

Firstly, they had unlimited energy and replicators - which means they could make whatever hardware they wanted.

And they also had bio-neural circuits. And photonic chips.

So, hardware was already way ahead of software.

All this goes to show that in real world, the actual science (and fiction) around material sciences was already quite advanced compared to software.

JKCalhoun
1 replies
19h33m

I had a conversation with a friend where he suggested that he had had a broad range of experiences just from gaming. I think the context was a conversation about how experiences in life can expand you — something like that.

The whole premise bothered me though.

I can remember a bike ride where I was experiencing the onset of heat stroke and had to make quick decisions to perhaps save my life.

I remembered decades ago lost in Michigan's upper peninsula with the wife, on apparently some logging road and the truck getting into deeper and deeper snow as we proceeded until I made the decision to instead turn around and go back the way we came lest we become stranded in the middle of nowhere.

I remember having to use my wits, make difficult decisions while hitchhiking from Anchorage, Alaska to the lower 48 when I was in my early twenties....

The actual world, the chance of actual death, strangers, serendipity ... no amount of VR or AI really compares.

smoldesu
0 replies
18h58m

You're not wrong, but I also think the problem predates video games. Films, novels and even religious texts all are scrutinized for changing people's perspective on life. Fiction has a longstanding hold on society, but it inherently coexists with the "harsh reality" of survival and resource competition. Introducing video games into the equation is like re-hashing the centuries old Alice in Wonderland debate.

Playing video games all day isn't an enriching or well-rounded use of time, but neither is throwing yourself into danger and risk all the time. The real world is a game of carefully-considered strategy, where our response to hypothetical situations informs our performance during real ones. Careful reflection on fiction can be a philosophically powerful tool, for good or bad.

tmaly
0 replies
18h46m

100 percent on things moving to the center of the curve.

For now, that’s not a bad thing if you need to know what the average information is.

As time goes by it might not be a good thing.

mrbonner
0 replies
20h49m

I just read “Robust Python” book. My overall reaction is that book could have been written with half the length and still be valuable for me. I can't stop thinking if I could ask LLM to summarize each chapter for me, I still could "read" the whole book in the manner the author outlinea but save a tons of time.

tomrod
5 replies
21h33m

What is your workflow for this, if you don't mind me asking?

jerpint
2 replies
20h42m

If you’re interested I did a YouTube video and short blog post about it

https://www.jerpint.io/blog/yougptube/

https://www.youtube.com/watch?v=WtMrp2hp94E

kendalf89
1 replies
18h0m

This is pretty cool. Would it be possible to just stream the audio directly into Whisper, maybe using something like vlc, at x2 play speed to get the summary faster?

jerpint
0 replies
7h2m

Probably, the openAI api got a lot better since I made that post, though if you stream audio at 2x speed you have to expect a drop in quality since on average most clips whisper is trained on are not at 2x

sammyatman
0 replies
20h51m

Try out www.askyoutube.ai!

nomilk
0 replies
21h17m

My approach wasn't fancy, just asked bard (aka gemini). I was drawn to bard/gemini for this since the source video is on youtube, so figured google would better support its related service (although that was an arbitrary hunch)

https://imgur.com/a/psb64IP

sammyatman
2 replies
20h52m

This is exactly why I built https://www.askyoutube.ai. It helps you figure out if a video has the answer you want before you spend time watching it. It does this by aggregating information from multiple videos in one-go.

I don't think it completely replaces watching videos in some cases but it definitely helps you skip the fluff and even guides you to the right point in the video.

esafak
1 replies
17h28m

Do you transcribe the videos or use the captions, because GPT4 can already do the latter?

sammyatman
0 replies
15h19m

It can be either depending on the mode, I don't think GPT4 can already do the latter though.

name_nick_sex_m
2 replies
21h15m

What tool do you use to summarize video?

nomilk
0 replies
21h13m

(since it's a youtube video) I used bard/gemini: https://imgur.com/a/psb64IP

I have no idea if it's the best (or even a good) tool. Other commenters suggest some other tools (for both text summaries and condensed video summaries - a sort of 'highlights reel'):

https://news.ycombinator.com/item?id=39435930

https://news.ycombinator.com/item?id=39435964

kirill5pol
0 replies
14h48m

(Little self-plug) I made a tool that’s pretty relevant

https://www.platoedu.org/videos/oSCRZkSQ1CE/watch

It's not really giving summaries but gives topic/section timestamps and highlights what was discussed.

(Main focus is actually making mini-courses off of YouTube videos but I found the section summaries really useful for figuring out which parts to watch)

63
2 replies
22h47m

and debating the summary's accuracy

Perhaps consider simply reading the description for an accurate summary.

From the description:

Abstract: In this talk I’ll highlight several exciting trends in the field of AI and machine learning. Through a combination of improved algorithms and major efficiency improvements in ML-specialized hardware, we are now able to build much more capable, general purpose machine learning systems than ever before. As one example of this, I’ll give an overview of the Gemini family of multimodal models and their capabilities. These new models and approaches have dramatic implications for applying ML to many problems in the world, and I’ll highlight some of these applications in science, engineering, and health. This talk will present work done by many people at Google.
swyx
0 replies
21h46m

sure its an accurate summary, but is it at a granularity or specificity that you want? LLM summaries lets you move around the latent space of summaries and you probably dont agree with the one chosen for youtube descriprtions.

nomilk
0 replies
22h41m

In this case the video description contains a useful Abstract. AI summaries can offer additional value though, going into more/less detail (as desired), and allowing you ask follow up questions to drill into anything potentially of interest.

theGnuMe
0 replies
13h59m

What ai tool do you use to summarize?

ren_engineer
0 replies
21h58m

I've A/B tested this with webinars and the tools I've tried tend to miss some really valuable/interesting stuff even when I give it the full transcript. Same goes for when I try to use ChatGPT or other tools for full interactive analysis, even when I basically hand it what I'm looking for as if I hadn't watched the video it will leave out the critical information

munificent
0 replies
18h2m

1. Author spends a week producing a video when writing an article would have taken a day.

2. Viewer spends hours summarizing the video to an article so they don't have to watch it.

P R O G R E S S

kirill5pol
0 replies
16h11m

I made a tool that might be interesting for people here!

https://www.platoedu.org/videos/oSCRZkSQ1CE/watch

It's not really giving summaries but gives topic/section timestamps and highlights what was discussed

(for example: The Transformer Model (21:06 - 24:48) - Introduction of the Transformer model as a more efficient alternative to recurrent models for language processing)

The main focus is actually creating Anki-like spaced repetition questions/flashcards for videos and lectures you watch to retain knowledge, but I found the section information quite helpful for finding which parts of the video contain the info relating to topics/concepts

castles
0 replies
19h0m

Learning how to summarise effectively might save hours per week and improve decisions about which sources deserve our limited time/attention.

If you like summaries, you'll probably love de-summaries (WIP): https://socontextual.com/

paxys
17 replies
21h58m

Parts of it were good, but it mostly felt like he was reading through a slideshow created by the Google marketing team.

babl-yc
11 replies
21h32m

Agreed -- given the talk is titled "Exciting trends in Machine Learning" it feels pretty incomplete to gloss over ChatGPT blowing up.

OpenAI bet big on (1) emergent behaviors as you scale language models and (2) RLHF / fine-tuning to follow instructions.

Both those topics were lightly covered but very much from the "Google did this" perspective (word2vec, step-by-step reasoning).

lern_too_spel
6 replies
20h40m

The original paper on emergent behaviors in LLMs is from Google: https://arxiv.org/abs/2206.07682

The difference between OpenAI and Google isn't due to different research directions between the two firms. It is a difference in ability to execute.

pama
3 replies
18h31m

This paper was interesting at the time but the main sentence in it was wrong: “Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models.” Such behaviors can totally be predicted if one looks at the evolution of the likelihoods for these behaviors by scaling and extrapolating from the small models.

canjobear
1 replies
16h59m

Maybe you can predict emergent abilities post hoc, but I recall no one predicting beforehand that, for example, a pure language model could do translation simply be giving a prompt that said “translate to French”

jcuenod
0 replies
4h34m

Then they're not "emergent"

benlivengood
0 replies
17h46m

I don't think many-shot prompting or chain-of-thought were predictable from model size. They just showed up at a particular model+data size.

seanbethard
0 replies
17h45m

From the GPT-2 paper: ``Also thanks to all the Googlers who helped us with training infrastructure.''

babl-yc
0 replies
20h34m

Google certainly had influential papers on language models having emergent behaviors, but this isn't the one that inspired scaling up GPT. It was published August 2022 and GPTs at OpenAI were getting scaled up long before this.

elevatedastalt
3 replies
20h38m

The research breakthroughs all happened at Google, so it's not surprising.

janalsncm
2 replies
19h33m

RLHF was invented by OpenAI.

DPO was invented at Stanford.

seanbethard
1 replies
17h44m

Exactly. Google.

candiodari
0 replies
15h45m

And all the people that actually did the inventing promptly left Google.

(yes I get there's 2 exceptions)

godelski
2 replies
16h34m

What I hate about the ML community is that papers have become ad pieces now. I'm all for them releasing technical papers, but do we have to fuck up the review system for it. We don't need to railroad our research community and doing so is pretty risky. I don't buy the scale is all you need side. But if you do, doesn't matter if I'm wrong, we'll just get there when Sam gets his $7T. But if I'm right, we need new ideas if we're going to keep progressing without missing a step.

seanbethard
1 replies
16h5m

Domain adaptation across verticals is the only driver of innovation. Check out the NeSy computation engine. Its ``semantic parsing'' is domain parsing and its symbols are numeric. Scale if you want. It works for images. That's all that matters, right?

Mapping language onto the latent space of images gets you crude semantic attributes sometimes. If you have to push for multimodal out of the gate maybe start with articulatory perception. These LVMs aren't going to cut it.

ML research isn't meant to further your understanding of anything. You can't separate it from corporate interest and land grabbing. It's the same thing re-hashed every year by the same people. NLP is pretty much a subfield of IR at this point.

I love how fast my comment was blacklisted to the bottom of this thread. lol

godelski
0 replies
1h54m

I think you need to understand your audience better. Remember that most people have a very shallow understanding of DL systems. You got blog posts that talk about attention from a mathematical perspective but don't even hint at softmax tempering and just mention that there's a dot product. Or where ML researchers don't know things like why doubling batch size doesn't cut training in half or using fp16 doesn't cut memory in half. So I think toning down the language, being clearer and adding links will help you be more successful in communicating.

throwaway4good
0 replies
13h47m

I watched some of this after it was promoted to me on YouTube.

It is very much: Look at this other cool blackbox that Google has made - I think it has a helpful personality.

And sure: Applied AI / ML is a part of computer science but I had hoped it would be more of a walk-through of what has happened in terms of advances in theory, algorithms, architecture, training methodology, and perhaps some sort of explanation of why does it work giving this general purpose model a bunch of medical data? (Is it merely an elaborate fuzzy search, how does it extrapolate, is there actually reasoning going on and how does emerge from a neural net etc...)

knbknb
0 replies
11h53m

Maybe this was a review talk, designed for an audience of undergraduate students from various majors.

adamnemecek
16 replies
22h55m

All machine learning is convolution.

Legend2440
10 replies
22h42m

Only certain types of neural networks use convolutions. It's not universal.

CamperBob2
5 replies
22h26m

What are some examples of types that don't?

dontwearitout
2 replies
21h57m

The transformer module is currently dominating ML, and is widely used in text, vision, audio, and video models. It was introduced in 2017 and shows no real signs of being displaced. It has no convolutions.

https://en.wikipedia.org/wiki/Transformer_(deep_learning_arc... http://jalammar.github.io/illustrated-transformer/

adamnemecek
0 replies
21h33m
CamperBob2
0 replies
19h22m

If they use dot products on at least one layer with fully-connected inputs, which they do, along with everything else derived from the basic MLP model, then they're technically performing convolution.

Of course, the convolution concept breaks down when nonlinear activation functions are introduced, so I'm not sure the equivalence is really all that profound.

hackerlight
0 replies
22h0m

I think anything that doesn't have an explicit convolution layer? Transformer, MLP, RNN don't automatically have a convolution layer, although for many tasks you can add it in if you want.

Legend2440
0 replies
21h32m

Transformers, MLPs.

adamnemecek
3 replies
21h41m

No, ALL machine learning architectures are convolutions https://grlearning.github.io/papers/11.pdf

Legend2440
2 replies
20h30m

Reading the paper, they're saying convolutions are powerful enough to express any possible architecture. But that's just computational universality - this does not mean they are convolutions.

atq2119
1 replies
16h54m

Every matrix multiply can be viewed as a 1x1 convolution, just like every convolution can be viewed as an (implicit, for larger than 1x1 kernels) matrix multiply.

I'm not sure this is particularly enlightening, but it's probably one small step of understanding that is required to truly "get" the underlying math.

Edit: Should have said that every matrix multiply against weights is a convolution. The matrix multiply in the standard attention block (query times keys) really doesn't make sense to be seen as a convolution.

adamnemecek
0 replies
3h2m

Integral transform is the right abstraction level. Matmul is an integral transform.

dartos
3 replies
22h41m

All computers do is add.

tambourine_man
2 replies
21h21m

And conditional jumps

dartos
1 replies
6h50m

What is a jump but an addition to the instruction pointer?

tambourine_man
0 replies
1h10m

True

c0pium
0 replies
19h51m

Random forest would like a word.

rvnx
7 replies
22h53m

At 1:00:18 it's claimed, "more than just publications" and that products are live for the public to use.

And they mention something that sounds cool. It's called DermAssist, it's to find similar images related to skin-diseases.

However, when checking the website, you are supposed to join a waitlist, but then it 404s.

Is the tool already shutdowned or it was never released ?

nsoldiac
2 replies
19h9m

Hi, my partner worked in technology strategy for a large healthcare system where she saw many derm AI-based applications evaluated over the last decade. (Dis)Incentives aside, they all followed a similar\ story arc overpromising in their research findings and underdelivering in actual care delivery. They've been around for longer than you'd imagine. I hope they reach their potential, but suggest approaching them with healthy skepticism.

JohnCClarke
1 replies
11h3m

For an example of medical AI that delivers take a look at https://cydarmedical.com/

Real-time augmented reality for keyhole aortic surgery using machine vision to understand patients' individual anatomies.

Full disclosure: I was the original CTO.

rvnx
0 replies
10h26m

Looks awesome :)

realprimoh
1 replies
22h14m

It's not shutdowned I'm pretty sure. Looks like a bug. Hopefully it will be fixed soon.

rvnx
0 replies
22h11m

Let's hope so, because it seems really useful.

In the meantime, I found an open-source equivalent https://modelderm.com/en.html but if we could see the certified + tested tool it'd be nice :)

dr_kiszonka
1 replies
21h20m

If you are in the US, "DermAssist is not available in the United States"(https://health.google/consumers/dermassist/)

htrp
0 replies
9h49m

thats a new one... so where is it available?

karmasimida
4 replies
17h28m

Not to shade on anyone, but even Jeff Dean isn't good at predicting the future.

The pathways model/architecture aren't exactly where things are moving towards, LLM is.

https://blog.google/technology/ai/introducing-pathways-next-...

knowriju
3 replies
17h4m

Care to share some topics where the things are moving towards? I understand diffusion, GaNs and Mamba is in vogue these days, but those are different logical architecture. I am unsure where the next level ML physical architecture research is moving towards.

karmasimida
2 replies
15h21m

I think at this rate, everything is moving towards Transformer based models(text/audio/image/video), as Sora has shown, there isn't really anything Transformer can't do, it can generate both real life quality photo and video. Its ability to fit ANY given distribution is beyond compare, the most powerful neural network we have ever designed, nothing else is even close.

GANs are on the contrary, not hot any more in industry, diffusion models have achieved high fidelity in image generation, hard to see how GAN can make a comeback. It is faster, but it image generation in terms of quality is done, the wow factor is no more.

This might be a hot take, but I think architectural changes is going to die down in industry, Transformer is the new MOS transistor. As billions of dollars pumping into making it runs faster AND cheaper, other alternative architecture is going to have a hard time compete.

espadrine
1 replies
11h32m

There is no question in my mind that the transformer architecture will not stop to evolve. Already now, we are stretching the definition by calling current transformers that way; the 2017 transformer had an encoder block which is nearly always absent nowadays, and the positional encoder and multi-head attention have been substantially modified.

VRAM costs and latency constraints will drive architectural changes, which Mamba hints at: we will have less quadratic scaling in the architectures that transformers evolve into, and likely the attention mechanism will look more and more akin to database retrieval (for way more evolved database querying mechanisms than is often seen in relational databases). One day, the notion of a maximum context size will be archaeological. Breaking the sequentiality of predicting only the next token would also improve throughput which could require changes. I expect experts to also evolve into new forms of sparsity. More easily quantizable architectures may also emerge.

karmasimida
0 replies
11h1m

The original transformer is an encoder decoder model, where the decoder model is what leads to first GPT model. Except you need to feed the encoder states to the decoder attention module in the original proposal, it is basically the same decoder only model. I would argue the decoder only model is even simpler in that regards.

When it comes to the core attention mechanism it is surprisingly stable comparing to other techniques in neural networks. There is the qkv projection, then dot product attention, then two layer of ffn. Arguably the most influential changes regarding attention itself is the multi query/grouped attention, but that is still imo, a reasonably small change.

If you look back into the convolutional NNs, their shapes and operators just changes every six months back in the day.

At the same time, the original transformer today is still a useful architecture, even in production, some bert models must be hanging around still.

Not that I am saying it didn’t change at all, but the core stays very much stable across countless revisions. If you read the original transformer paper, you already understood 80% of what LLama model does, the same thing can’t be said for other models is what I meant

zyklonix
2 replies
19h58m
nliang86
1 replies
19h45m

Thanks for the post - I put together the tool above. I tried to strike a balance between being concise but also capturing all the important details. For that reason, the tool is hit or miss on longer (> 45 min) videos - the summary on this video is good but I've seen it omit important details on other long videos. The tool also captures relevant screenshots for each section.

Hopefully it's helpful. You can summarize additional videos by submitting a youtube URL in the nav bar or the home page. Also, feedback welcome!

crudalex
0 replies
19h36m

Are you using LLM to summarize? If yes can you share the prompts used?

seanbethard
1 replies
18h16m

There's a typo in the title of the talk but don't worry I fixed it: Trends in Computer Vision

``In recent years, ML has completely changed our view of what is possible with computers''

In recent years ML has completely changed our view of what's possible with computer vision.

``Increasing scale delivers better results''

This is true for computer vision.

``The kinds of computations we want to run and the hardware on which we run them is changing dramatically''

Optimizations on operations for computer vision isn't exactly dramatic change. Who is We?

Trends in Machine Learning, 2010: semantic search for advertising. Trends in Machine Learning, 2024: semantic search for advertising, short-form video content.

https://blog.seanbethard.net/five-epistemes/

seanbethard
0 replies
15h35m

lol. Three computer vision researchers dislike this comment. Do any of you want to respond to it?

opisthenar84
1 replies
18h20m

I'm surprised he didn't mention much about vector search

H8crilA
0 replies
12h34m

It is a finished thing, no? So many systems/products have it built in for quite a few years now.

npalli
1 replies
21h25m

Seeing a lot of people talk about auto-summary tools which reminds me of a joke.

“I took a speed-reading course and read War and Peace in twenty minutes. It involves Russia.” ― Woody Allen

Mistletoe
0 replies
18h28m

This is such a great comment about so many things right now. Thank you.

wslh
0 replies
11h3m

I think this is the time where the idiom "it's good fishing in troubled waters" makes completely sense.

This is a moment where it is wise to wait and see. Investing in building AI/AGI startups now is like being a fish more than a fishermen. An outlier could win market share but will be only one within zillions. Google is catching up OpenAI but their competition is fruitful for all. Typical oligopoly. A single startup showing good traction will be immediately acquired.

From the business perspective it is time to focus on "go to market execution" and less or nothing on research. The research results are coming alone, except if you are one of the top scientist teams in the world or someone alike Ramanujan.

osigurdson
0 replies
8h44m

One thing I think should be fairly clear is Google will actually try to compete in this space (vs quitting if it isn't an immediate success as they usually do). Traditional web search will be fairly uncommon in the long run I think.

idkdotcom
0 replies
20h28m

It's a nice high level overview about the state of the art in machine learning.

If you have watched his other talks, Jeff Dean generally does a very good job explaining things from a high level point of view.

Here is a controversial view: I think that the current neural networks driven approach coupled with massively distributed computing has plateaued.

For the machine learning field to move forward, it will need a different, less data/compute hungry paradigm.

denfromufa
0 replies
18h44m

The highlight of this event was running with Jeff at Rice University before his talk:

https://x.com/JeffDean/status/1756319820482592838?s=20