HN comments for: Ask HN: What things are happening in ML that we can't hear over the din of LLMs?

lelag

17 replies

8h12m

2024-03-28 10:13:31 UTC

Some exciting projects from the last months:

- 3d scene reconstruction from a few images: https://dust3r.europe.naverlabs.com/

- gaussian avatars: https://shenhanqian.github.io/gaussian-avatars

- relightable gaussian codec: https://shunsukesaito.github.io/rgca/

- track anything: https://co-tracker.github.io/ https://omnimotion.github.io/

- segment anything: https://github.com/facebookresearch/segment-anything

- good human pose estimate models: (Yolov8, Google's mediapipe models)

- realistic TTS: https://huggingface.co/coqui/XTTS-v2, bark TTS (hit or miss)

- open great STT (mostly whisper based)

- machine translation (ex: seamlessm4t from meta)

It's crazy to see how much is coming out of Meta's R&D alone.

nicce

12 replies

7h39m

2024-03-28 10:46:18 UTC

It's crazy to see how much is coming out of Meta's R&D alone.

They have the money...

logtempo

11 replies

7h31m

2024-03-28 10:54:41 UTC

and data

teaearlgraycold

9 replies

6h17m

2024-03-28 12:08:42 UTC

Hundreds of thousands of H100s…

FLT8

8 replies

6h8m

2024-03-28 12:17:24 UTC

And a dystopian vision for the future that can make profitable use of the above ...

AYBABTME

7 replies

6h1m

2024-03-28 12:24:25 UTC

On the plus side, people make up the organization and when they eventually grow fed up with the dystopia, they leave with their acquired knowledge and make their own thing. So dystopias aren't stable in the long term.

dmd

2 replies

5h53m

2024-03-28 12:32:23 UTC

The Ones Who Walk Away From O-Meta-s

randrus

1 replies

5h34m

2024-03-28 12:51:56 UTC

A very apt reference to the story

The ones who walk away from Omelas

Dunno how pasting a link works but here it is:

https://shsdavisapes.pbworks.com/f/Omelas.pdf

refulgentis

0 replies

5h0m

2024-03-28 13:26:11 UTC

I feel vaguely annoyed, I think it's because it took a lot of time to read through that, and it amounts to "bad to put child in solitary confinement to keep whole society happy."

What does a simplistic moral set piece about the abhorrence of sacrificing the good of one for the good of many have to do with (check notes) Facebook? Even as vague hand-wavey criticism, wouldn't Facebook would be the inverse?

teaearlgraycold

0 replies

5h36m

2024-03-28 12:49:59 UTC

For some people this is a stable dystopia.

karaterobot

0 replies

4h52m

2024-03-28 13:33:35 UTC

So dystopias aren't stable in the long term.

Unless they think to hire new people.

falcor84

0 replies

5h32m

2024-03-28 12:54:08 UTC

That seems to rely on the assumption that human input is required to keep the dystopia going. Maybe I watched too much sci-fi, but the more pessimistic view is that the AI dystopia will be self-sustaining and couldn't be overcome without the concerted use of force by humans. But we humans aren't that good in even agreeing on common goals, let alone exerting continuous effort to achieve them. And most likely, by the time we start to even think of organizing, the AI dystopia will be conducting effective psychological warfare (using social media bots etc.) to pit us against each other even more.

HPsquared

0 replies

4h56m

2024-03-28 13:29:48 UTC

So the dystopia spreads out... Metastasis

joshspankit

0 replies

3h54m

2024-03-28 14:31:32 UTC

and (rumours say) engineers who will bail if Meta doesn’t let them open source

turnsout

1 replies

4h40m

2024-03-28 13:45:35 UTC

Whoa, Bark got a major update recently. Thanks for the link as a reminder to check in on that project!

lelag

0 replies

3h2m

2024-03-28 15:24:09 UTC

Can you share what update you are referring to ?

I've played with Bark quite extensively a few month ago and I'm on the fence regarding that model: when it works, it's the best, but I found it to be pretty useless for most use-case I want to use TTS for because of the high rate of bad or weird output.

I'm pretty happy with XTTv2 though. It's reliable and output quality is still pretty good.

jusgu

0 replies

30m

2024-03-28 17:56:11 UTC

Not sure how relevant this is but note that Coqui TTS (the realistic TTS) has already shut down

https://coqui.ai

JL-Akrasia

0 replies

48m

2024-03-28 17:37:21 UTC

- streaming and rendering 3d movies in real-time using 4d gaussian splatting https://guanjunwu.github.io/4dgs/

mike_hearn

9 replies

9h11m

2024-03-28 09:15:01 UTC

NeRFS. It's a rethink of 3D graphics from the ground up, oriented around positioning glowing translucent orbs instead of textured polygons. The positioning and color of the orbs is learned by a NN given accurate multi-angle camera shots and poses, then you can render them on GPUs by ray tracing. The resulting scenes are entirely photo-realistic, as they were generated from photos, but they can also be explored.

In theory you can also animate such scenes but how to actually do that is still a research problem.

Whether this will end up being better than really well optimized polygon based systems like Nanite+photogrammetry is also an open question. The existing poly pipes are pretty damn good already.

baxuz

3 replies

8h59m

2024-03-28 09:26:23 UTC

What you're talking about is I think gaussian splats. NeRFS are exclusively radiance fields without any sort of regular 3d representation.

lelag

2 replies

8h27m

2024-03-28 09:58:12 UTC

Yes, I think Gaussian Splats are were all the rage is.

My limited understanding is that Nerfs are compute-heavy because each cloud point is essentially a small neural network that can compute its value from a specific camera angle. Gaussian splats are interesting since they achieve almost the same effect using a much simpler mechanism of using gaussian values at each cloud points and can be efficiently computed in real-time on GPU.

While a Nerf could be used to render a novel view of a scene, it could not do so in real-time, while gaussian splats can which opens up lots of use-cases.

Mandelmus

1 replies

6h17m

2024-03-28 12:08:54 UTC

My limited understanding is that Nerfs are compute-heavy because each cloud point is essentially a small neural network

There's no point cloud in NeRFs. A NeRF scene is a continuous representation in a neural network, i.e. the scene is represented by neural network weights, but (unlike with 3D Gaussian Splatting) there's no explicit representation of any points. Nobody can tell you what any of the network weights represent, and there's no part of it that explicitly tells you "we have a point at location (x, y, z)". That's why 3D Gaussian Splatting is much easier to work with and create editing tools for.

lelag

0 replies

3h55m

2024-03-28 14:30:52 UTC

Interesting. Thanks for the clarification.

sigmoid10

2 replies

8h26m

2024-03-28 09:59:55 UTC

Whether this will end up being better than really well optimized polygon based systems like Nanite+photogrammetry is also an open question

I think this is pretty much settled unless we encounter any fundamental new theory roadblocks on the path of scaling ML compute. Polygon based systems like Nanite took 40+ years to develop. With Moore's law finally out of the way and Huang's law replacing it for ML, hardware development is no longer the issue. Neural visual computing today is where polygons where in the 80s. I have no doubt that it will revolutionize the industry, if only because it is so much easier to work with for artists and designers in principle. As a near-term intermediate we will probably see a lot of polygon renderers with neural generated stuff inbetween, like DLSS or just artificially generated models/textures. But this stuff we have today is like the Wright brother's first flight compared to the moon landing. I think in 40 years we'll have comprehensive real time neural rendering engines. Possibly even rendering output directly to your visual cortex, if medical science can keep up.

WithinReason

1 replies

6h48m

2024-03-28 11:37:21 UTC

It's easier to just turn NeRFs/splats into polygons for faster rendering.

sigmoid10

0 replies

1h6m

2024-03-28 17:19:54 UTC

That's only true today. And it's quite difficult for artists by comparison. I don't think people will bother with the complexities of polygon based graphics once they no longer have to.

unwind

0 replies

9h3m

2024-03-28 09:23:10 UTC

Very cool, thanks! NeRFs = Neural Radiance Fields, here [1] is the first hit I got that provides some example images.

[1]: https://www.matthewtancik.com/nerf

phireal

0 replies

6h54m

2024-03-28 11:31:39 UTC

There's a couple of computerphile videos on this:

nerfs: https://youtu.be/wKsoGiENBHU Gaussian platting: https://youtu.be/VkIJbpdTujE

dmarchand90

7 replies

8h22m

2024-03-28 10:03:41 UTC

To plug my own field a bit, in material science and chemistry there is a lot of excitement in using machine learning to get better simulations of atomic behavior. This can open up exciting areas in drug and alloy design, maybe find new CO2 capturing material's or better cladding for fusion reactors, to name just a few.

The idea is that to solve these problems you need to solve the schrodinger equation (1). But the schrodinger equation scales really badly with the number of electrons and can't get computed directly for more than a few sample cases. Even Density Functional Theory (DFT), the most popular approximation that still is reasonably accurate scales N^3 with the number of electrons, with a pretty big pre factor. A reasonable rule of thumb would be 12 hours on 12 nodes (each node being 160 cpu cores) for 256 atoms. You can play with settings and increase your budget to maybe get 2000 (and only for a few timesteps) but good luck beyond that.

Machine learning seems to be really useful here. In my own work on aluminium alloys I was able to get the same simulations that would have needed hours on the supercomputer to run in seconds on a laptop. Or, do simulations with tens of thousands of atoms for long periods of time on the supercomputer. The most famous application is probably alphafold from deep mind.

There are a lot of interesting questions people are still working on:

What are the best input features? We don't have any nice equivalent to CNNs that are universally applicable, though some have tried 3d convnets. One of the best methods right now involves taking spherical harmonic based approximates of the local environment in some complex way I've never fully understood, but is closer to the underlying physics.

Can we put physics into these models? Almost all these models fail in dumb ways sometimes. For example if I begin to squish two atoms together they should eventually repel each other and that repulsion force should scale really fast (ok maybe they fuse into a black hole or something but we're not dealing with that kind of esoteric physics here). But, all machine learning potentials will by default fail to learn this and will only learn the repulsion to the closest distance of any two atoms in their training set. Beyond that and the guess wildly. Some people are able to put this physics into the model directly but I don't think we have it totally solved yet.

How do we know which atomic environments to simulate? These models can really only interpolate they can't extrapolate. But while I can get an intuition of interpolation in low dimensions once your training set consists of many features over many atoms in 3d space this becomes a high dimensional problem. In my own experience, I can get really good energies for shearing behavior of strengthening precipitates in aluminum without directly putting the structures in. But was this extrapolated or interpolated from the other structures. Not always clear.

(1) sometimes also the relativistic Dirac equation. E.g. fast moving moving atoms in some of the heavier elements move at relativistic speeds.

mynameismon

2 replies

7h16m

2024-03-28 11:10:05 UTC

In my own work on aluminium alloys I was able to get the same simulations that would have needed hours on the supercomputer to run in seconds on a laptop.

Could you elaborate on this further? How exactly were the simulations sped up? From what I could understand, were the ML models able to effectively approximate the Schrodinger's equation for larger systems?

dmarchand90

1 replies

6h40m

2024-03-28 11:45:53 UTC

What you do is you compute a lot of simulations with the expensive method. Then you train using neural neural networks (well any regression method you like).

Then you can use the trained method on new arbitrary structures. If you've done everything right you get good, or good enough results, but much much faster.

At a high level It's the same pipeline as in all ML. But some aspects are different, e.g. unlike image recognition you can generate training data on the fly by running more DFT simulations

fennecfoxy

0 replies

6h33m

2024-03-28 11:52:56 UTC

That's pretty cool! It seems like most of ML is just creating a higher dimensional representation of the problem space during training and then exploring that during inference.

I suppose your process would be using ML to get pointed in the "right direction" and then confirming the models theories using the expensive method?

aflip

1 replies

7h19m

2024-03-28 11:07:09 UTC

ibh i didn't understand most of that but sounds exciting.

dmarchand90

0 replies

4h1m

2024-03-28 14:24:17 UTC

We want to do computer experiments instead of real life experiments to discover or improve chemicals and materials. The current way of doing computer experiments is really really slow and takes a lot of computers. We now have much faster ways of doing the same computer experiments by first doing it the slow way a bunch of time to train an machine learning model. Then, with the trained model, we can do the same simulations but way way faster. Along the way there are tons of technical challenges that don't show up in LLMs or Visual machine learning.

If there is anything unclear you're interested in just let know. In my heart I feel I'm still just a McDonald's fry cook and feel like none of this is as scary as it might seem :)

rsfern

0 replies

6h1m

2024-03-28 12:24:34 UTC

More physical ML force fields is a super interesting topic that I feel like blurs the line between ML and actually just doing physics. My favorite topic lately is parametrizing tight binding models with neural nets, which hopefully would lead to more transferable potentials, but also let you predict electronic properties directly since you’re explicitly modeling the valence electrons

Context for the non-mat-sci crowd - numerically solving Schrodinger essentially means constructing a large matrix that describes all the electron interactions and computing its eigenvalues (iterated to convergence because the electron interactions are interdependent on the solutions). Density functional theory (for solids) uses a Fourier expansion for each electron (these are the one-electron wave functions), so the complexity of each eigensolve is cubic in the number of valence electrons times the number of Fourier components

The tight binding approximation is cool because it uses a small spherical harmonic basis set to represent the wavefunctions in real space - you still have the cubic complexity of the eigensolve, and you can model detailed electronic behavior, but the interaction matrix you’re building is much smaller.

Back to the ML variant: it’s a hard problem because ultimately you’re trying to predict a matrix that has the same eigenvalues as your training data, but there are tons of degeneracies that lead to loads of unphysical local minima (in my experience anyway, this is where I got stuck with it). The papers I’ve seen deal with it by basically only modeling deviations from an existing tight binding model, which in my opinion only kind of moves to problem upstream

occamschainsaw

0 replies

6h16m

2024-03-28 12:09:52 UTC

I am currently working on physics-informed ML models for accelerating DFT calculations and am broadly interested in ML PDE solvers. Overall, I think physics-informed ML (not just PINNs) will be very impactful for computationally heavy science and engineering simulations. Nvidia and Ansys already have "AI" acceleration for their sims.

https://developer.nvidia.com/modulus

https://www.ansys.com/ai

anshumankmr

6 replies

9h10m

2024-03-28 09:15:49 UTC

+1 to this, but one might be hard pressed to find anything nowadays that isn't involving a transfomer model somehow.

sdenton4

3 replies

6h43m

2024-03-28 11:42:45 UTC

In the area in working in (bioacoustics), embeddings from supervised learning are still consistently beating self supervised transformer embeddings. The transformers win on held out training data (in-domain) but greatly underperform on novel data (generalization).

I suspect that this is because we've actually got a much more complex supervised training task than average (10k classes, multilabel), leading to much better supervised embeddings, and rather more intense needs for generalization (new species, new microphones, new geographic areas) than 'yet more humans on the internet.'

tkulim

1 replies

6h15m

2024-03-28 12:10:39 UTC

Hey, that is a field that I am interested in (mostly inspired by a recent museum exhibition). Do you have recent papers on this topic, or labs/researchers to follow?

sdenton4

0 replies

5h51m

2024-03-28 12:35:06 UTC

It's a really fun area to work in, but beware that it's very easy to underestimate the complexity. And also very easy to do things which look helpful but actually are not (eg, improving classification on xeno canto, but degrading performance on real soundscapes).

Here's some recent-ish work: https://www.nature.com/articles/s41598-023-49989-z

We also run a yearly kaggle competition on birdsong recognition, called birdclef. Should be launching this year's edition this week, in fact!

Here's this year's competition, which will be a dead link for now: https://www.kaggle.com/competitions/birdclef-2024

And last year's: https://www.kaggle.com/competitions/birdclef-2023

PaulHoule

0 replies

4h17m

2024-03-28 14:08:22 UTC

In text analysis people usually get better results in many-shot scenarios (supervised training on data) vs zero-shot (give a prompt) and the various one-shot and few-shot approaches.

TheDudeMan

1 replies

9h2m

2024-03-28 09:23:39 UTC

Same sentiment here. Love the question, but transformers are still so new and so effective that they will probably dominate for a while.

We (humans) are following the last thing that worked (imagine if we could do true gradient decent on the algorithm space).

Good question, and I'm interested to hear the other responses.

antegamisou

0 replies

7h3m

2024-03-28 11:22:30 UTC

but transformers are still so new and so effective that they will probably dominate for a while.

They're mostly easy grant money and are being gamed by entire research groups worldwide to be seen as effective on the published papers. State of academia...

ok_dad

5 replies

7h54m

2024-03-28 10:31:53 UTC

Anyone know anything I can use to take video of a road from my car (a phone) and create a 3D scene from it? More focused on the scenery around the road as I can put a road surface in there myself later. I’m talking about several miles or perhaps more, but I don’t mind if it takes a lot of processing time or I need multiple angles, I can drive it several times from several directions. I’m trying to create a local road or two for driving on in racing simulators.

Jedd

1 replies

6h40m

2024-03-28 11:45:32 UTC

photogrammetry - is the key word you're looking to search on.

There's quite a few advanced solutions already (predating LLM/ML)

0_____0

0 replies

5h27m

2024-03-28 12:58:43 UTC

SLAM from monoscopic video. I imagine without an IMU or other high quality pose estimator you'll need to do a fair bit of manual cleanup.

sp332

0 replies

5h54m

2024-03-28 12:31:13 UTC

Microsoft's PhotoSynth did this years ago, but they cancelled it.

chpatrick

0 replies

5h6m

2024-03-28 13:19:38 UTC

You can do this for free now with RealityCapture, not ML though.

WhatIsDukkha

0 replies

5h32m

2024-03-28 12:53:57 UTC

Gaussian splatting, there is quite a bit of youtube about it and there are commercial packages that are trying to make a polished experience.

https://www.youtube.com/@OlliHuttunen78

edit - I just realized you want a mesh :) for which Gaussian splatting is not there yet! BUT there are multiple papers which are exploring adding gaussians to a mesh thats progressively refined, I think its inevitable based on what's needed for editing and usecases just like yours.

You could start exploring and compiling footage and testing and maybe it will work out but ...

Here is a news site focused on the field -

https://radiancefields.com/

wara23arish

4 replies

5h31m

2024-03-28 12:54:49 UTC

I was just going to ask a similar question recently. Ive been working on a side project involving xgboost and was wondering if ML is still worth learning in 2024.

My intuition says yes but what do I know.

danieldk

2 replies

5h24m

2024-03-28 13:01:44 UTC

I recently attended an interesting talk at a local conference. It was from someone that works at a company that makes heating systems. They want to optimize heating given the conditions of the day (building properties, outside temperature, amount of sunshine, humidity, past patterns, etc.). They have certain hard constraints wrt. model size, training/update compute, etc.

Turns out that for their use case a small (weights fit in tens of KiB IIRC) multilayer perceptron works the best.

There is a lot of machine learning out in the world like that, but it doesn't grab the headlines.

rkwz

0 replies

3h53m

2024-03-28 14:32:17 UTC

Sounds interesting, can you share a link to video if available?

krapht

0 replies

4h41m

2024-03-28 13:44:39 UTC

I have doubts that a simple adaptive building model-based controller wouldn't be better, and interpretable. I wonder why you'd go with a perceptron... those are so limited.

mjhay

0 replies

4h48m

2024-03-28 13:37:29 UTC

xgboost will still work better for most problems people encounter in industry (which usually involve tabular data).

FrustratedMonky

3 replies

8h38m

2024-03-28 09:47:51 UTC

Seems like there is always push back on LLM's that they don't learn to do proofs and reasoning.

Deepmind just placed pretty high at International Mathematical Olympiad . Here it does have to present reasoning.

https://arstechnica.com/ai/2024/01/deepmind-ai-rivals-the-wo...

And it's couple years old, but AlphaFold was pretty impressive.

EDIT: Sorry, I said LLM. But meant AI/ML/NN generally, people say a computer can't reason, but DeepMind is doing it.

imtringued

2 replies

8h0m

2024-03-28 10:25:20 UTC

To overcome this difficulty, DeepMind paired a language model with a more traditional symbolic deduction engine that performs algebraic and geometric reasoning.

I couldn't think of a better way to demonstrate that LLMs are poor at reasoning than using this crutch.

fennecfoxy

0 replies

6h30m

2024-03-28 11:55:15 UTC

I suppose it's because LLM training data uses text that can contain reasoning within it, but without any specific context to specifically learn reasoning. I feel like the little reasoning an LLM can do is a byproduct of the training data.

Does seem more realistic to train something not on text but on actual reasoning/logic concepts and use that along with other models for something more general purpose. LLMs should really only be used to turn "thoughts" into text and to receive instructions, not to do the actual reasoning.

FrustratedMonky

0 replies

4h10m

2024-03-28 14:15:16 UTC

I wouldn't say 'crutch' but component.

Eventually LLMs will be plugged into Vision Systems, and Symbolic Systems, and Motion Systems, etc... etc...

The LLM wont be the main 'thing'. But the text interface.

Even human brain is bit segmented with different faculties being 'processed' in different areas with different architectures.

publius_0xf3

2 replies

7h57m

2024-03-28 10:29:06 UTC

Is there anything cool going on in animation? Seems like an industry that relies on a lot of rote, repetitive work and is a prime candidate for using AI to interpolate movement.

soulofmischief

1 replies

7h19m

2024-03-28 11:06:35 UTC

3D animation is seeing tools like https://me.meshcapade.com/ crop up

xrd

0 replies

5h16m

2024-03-28 13:10:04 UTC

That is a really creepy demo. It is cool for sure, but creepy for sure.

kookamamie

2 replies

8h31m

2024-03-28 09:54:44 UTC

The SAM-family of computer-vision models have made many of the human annotation services and tools somewhat redundant, as it's possible to achieve relatively high-quality auto-labeling of vision data.

joshvm

1 replies

8h4m

2024-03-28 10:21:51 UTC

This is probably true for simple objects, but there is almost certainly a market for hiring people who use SAM-based tools (or similar) to label with some level of QA. I've tried a few implementations and they struggle with complex objects and can be quite slow (due to GPU overhead). Some platforms have had some variant of "click guided" labelling for a while (eg V7) but they're not cheap to use.

Prompt guided labelling is also pretty cool, but still in infancy (eg you can tell the model "label all the shadows"). Seg GPT for example. But now we're right back to LLMs...

On labelling, there is still a dearth of high quality niche datasets ($$$). Everyone tests on MS-COCO and the same 5-6 segmentation datasets. Very few papers provide solid instructions for fine tuning on bespoke data.

kookamamie

0 replies

5h13m

2024-03-28 13:12:19 UTC

That's basically what we are able to do now: showing models an image (or images, from video) and prompting for labels, such as with "person, soccer player".

angusturner

2 replies

8h48m

2024-03-28 09:37:54 UTC

One area that I would dive into (if I had more time) is "geometric deep learning". i.e) How to design models in a principled way to respect known symmetries in the data. ConvNets are the famous/obvious example for their translation equivariance, but there are many recent examples that extend the same logic to other symmetry groups. And then there is also a question of whether certain symmetries can be discovered or identified automatically.

mrdmnd

1 replies

5h34m

2024-03-28 12:51:29 UTC

I've been doing some reading on LLMs for protein/RNA structure prediction and I think there's a decent amount of work on SO3 invariant transformer architectures now

mjhay

0 replies

4h49m

2024-03-28 13:36:40 UTC

There's also been some work on more general Lie-group equivariant transformer models.

http://proceedings.mlr.press/v139/hutchinson21a/hutchinson21...

postatic

1 replies

6h22m

2024-03-28 12:03:17 UTC

I launched https://app.scholars.io to get latest research from arxiv on specific topics I’m interested in so I can filter out ones that I’m not interested. Hopefully it will help someone find research activities other than LLM.

4b11b4

0 replies

2h20m

2024-03-28 16:05:27 UTC

just signed up for computer vision and image processing related topics as this is what I'm specializing in for my Master's

The interface to sign up was very painless and straightforward

I signed up for a 2-week periodic digest

The first digest comes instantly and scanning through the titles alone was inspirational and I'm sure will provide me with more than a few great papers to read over upcoming years

hiddencost

1 replies

7h30m

2024-03-28 10:55:33 UTC

Keep in mind that LLMs are basically just sequence to sequence models that can scan 1 million tokens and do inference affordably. The underlying advances (attention, transformers, masking, scale) that made this possible are fungible to other settings. We have a recipe for learning similar models on a huge variety of other tasks and data types.

HarHarVeryFunny

0 replies

4h15m

2024-03-28 14:10:14 UTC

Transformers are really more general than seq-to-seq, maybe more like set-to-set or graph-to-graph.

The key insight (Jakob Uszkoreit) to using self-attention for language was that language is really more hierarchical than sequential, as indicated by linguist's tree diagrams for describing sentence structure. The leaves of one branch of a tree (or sub-tree) are independent of those in another sub-tree, allowing them to be processed in parallel (not in sequence). The idea of a multi-layer transformer is therefore to process this language hierarchy one level at a time, working from leaves on upwards through the layers of the transformer (processing smaller neighborhoods into increasingly larger neighborhoods).

babel_

1 replies

4h26m

2024-03-28 13:59:52 UTC

So, from the perspective I have within the subfield I work in, explainable AI (XAI), we're seeing a bunch of fascinating developments.

First, as you mentioned, Rudin continues to prove that the reason for using AI/ML is that we don't understand the problem well enough; otherwise we wouldn't even think to use it! So, pushing our focus to better understand the problem, and then levy ML concepts and techniques (including "classical AI" and statistical learning), we're able to make something that not only outperforms some state-of-the-art in most metrics, but often even is much less resource intensive to create and deploy (in compute, data, energy, and human labour), with added benefits from direct interpretability and post-hoc explanations. One example has been the continued primacy of tree ensembles on tabular datasets [0], even for the larger datasets, though they truly shine on the small to medium datasets that actually show up in practice, which from Tigani's observations [1] would include most of those who think they have big data.

Second, we're seeing practical examples of exactly this outside Rudin! In particular, people are using ML more to do live parameter fine-tuning that outwise would need more exhaustive searches or human labour that are difficult for real-time feedback, or copious human ingenuity to resolve in a closed-form solution. Opus 1.5 is introducing some experimental work here, as are a few approaches in video and image encoding. These are domains where, as in the first, we understand the problem, but also understand well enough that there's search spaces we simply don't know enough about to be able to dramatically reduce. Approaches like this have been bubbling out of other sciences (physics, complexity theory, bioinformatics, etc) that lead to some interesting work in distillation and extraction of new models from ML, or "physically aware" operators that dramatically improve neural nets, such as Fourier Neural Operators (FNO) [2], which embeds FFTs rather than forcing it to be relearned (as has been found to often happen) for remarkable speed-ups with PDEs such as for fluid dynamics, and has already shown promise with climate modelling [3], material science [4]. There are also many more operators, which all work completely differently, yet bring human insight back to the problem, and sometimes lead to extracting a new model for us to use without the ML! Understanding begets understanding, so the "shifting goalposts" of techniques considered "AI" is a good thing!

Third, specifically to improvements in explainability, we've seen the Neural Tangent Kernel (NTK) [5] rapidly go from strength to strength since its introduction. While rooted in core explainability vis a vis making neural nets more mathematically tractable to analysis, not only inspiring other approaches [6] and behavioural understanding of neural nets [7, 8], but novel ML itself [9] with ways to transfer the benefits of neural networks to far less resource intensive techniques; which [9]'s RFM kernel machine proves competitive with the best tree ensembles from [0], and even has advantage on numerical data (plus outperforms prior NTK based kernel machines). An added benefit is the approach used to underpin [9] itself leads to new interpretation and explanation techniques, similar to integrated gradients [10, 11] but perhaps more reminiscent of the idea in [6].

Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff! XAI in particular, yes, but also the myriad of interpretable models a la Rudin or the significant improvements found in hybrid approaches and reinforcement learning. Cicero [12], for example, does have an LLM component, but uses it in a radically different way compared to most people's current conception of LLMs (though, again, ironically closer to the "classic" LLMs for semantic markup), much like the AlphaGo series altered the way the deep learning component was utilised by embedding and hybridising it [13] (its successors obviating even the traditional supervised approach through self-play [14], and beyond Go). This is all without even mentioning the neurosymbolic and other approaches to embed "classical AI" in deep learning (such as RETRO [15]). Despite these successes, adoption of these approaches is still very far behind, especially compared to the zeitgeist of ChatGPT style LLMs (and general hype around transformers), and arguably much worse for XAI due to the barrier between adoption and deeper usage [16].

This is still early days, however, and again to harken Rudin, we don't understand the problem anywhere near well enough, and that extends to XAI and ML as problem domains themselves. Things we can actually understand seem a far better approach to me, but without getting too Monkey's Paw about it, I'd posit that we should really consider if some GPT-N or whatever is actually what we want, even if it did achieve what we thought we wanted. Constructing ML with useful and efficient inductive bias is a much harder challenge than we ever anticipated, hence the eternal 20 years away problem, so I just think it would perhaps be a better use of our time to make stuff like this, where we know what is actually going on, instead of just theoretically. It'll have a part, no doubt, Cicero showed that there's clear potential, but people seem to be realising "... is all you need" and "scaling laws" were just a myth (or worse, marketing). Plus, all those delays to the 20 years weren't for nothing, and there's a lot of really capable, understandable techniques just waiting to be used, with more being developed and refined every year. After all, look at the other comments! So many different areas, particularly within deep learning (such as NeRFs or NAS [17]), which really show we have so much left to learn. Exciting!

  [0]: Léo Grinsztajn et al. "Why do tree-based models still outperform deep learning on tabular data?" https://arxiv.org/abs/2207.08815
  [1]: Jordan Tigani "Big Data is Dead" https://motherduck.com/blog/big-data-is-dead/
  [2]: Zongyi Li et al. "Fourier Neural Operator for Parametric Partial Differential Equations" https://arxiv.org/abs/2010.08895
  [3]: Jaideep Pathak et al. "FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators" https://arxiv.org/abs/2202.11214
  [4]: Huaiqian You et al. "Learning Deep Implicit Fourier Neural Operators with Applications to Heterogeneous Material Modeling" https://arxiv.org/abs/2203.08205
  [5]: Arthur Jacot et al. "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" https://arxiv.org/abs/1806.07572
  [6]: Pedro Domingos "Every Model Learned by Gradient Descent Is Approximately a Kernel Machine" https://arxiv.org/abs/2012.00152
  [7]: Alexander Atanasov et al. "Neural Networks as Kernel Learners: The Silent Alignment Effect" https://arxiv.org/abs/2111.00034
  [8]: Yilan Chen et al. "On the Equivalence between Neural Network and Support Vector Machine" https://arxiv.org/abs/2111.06063
  [9]: Adityanarayanan Radhakrishnan et al. "Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features" https://arxiv.org/abs/2212.13881
  [10]: Mukund Sundararajan et al. "Axiomatic Attribution for Deep Networks" https://arxiv.org/abs/1703.01365
  [11]: Pramod Mudrakarta "Did the model understand the questions?" https://arxiv.org/abs/1805.05492
  [12]: META FAIR Diplomacy Team et al. "Human-level play in the game of Diplomacy by combining language models with strategic reasoning" https://www.science.org/doi/10.1126/science.ade9097
  [13]: DeepMind et al. "Mastering the game of Go with deep neural networks and tree search" https://www.nature.com/articles/nature16961
  [14]: DeepMind et al. "Mastering the game of Go without human knowledge" https://www.nature.com/articles/nature24270
  [15]: Sebastian Borgeaud et al. "Improving language models by retrieving from trillions of tokens" https://arxiv.org/abs/2112.04426
  [16]: Umang Bhatt et al. "Explainable Machine Learning in Deployment" https://dl.acm.org/doi/10.1145/3351095.3375624
  [17]: M. F. Kasim et al. "Building high accuracy emulators for scientific simulations with deep neural architecture search" https://arxiv.org/abs/2001.08055

strangecasts

0 replies

1h39m

2024-03-28 16:46:54 UTC

Thank you for providing an exhaustive list of references :)

Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff!

I am very curious to see which practical interpretability/explainability requirements enter into regulations - on one hand it's hard to imagine a one-size fits all approach, especially for applications incorporating LLMs, but Bordt et al. [1] demonstrate that you can provoke arbitrary feature attributions for a prediction if you can choose post-hoc explanations and parameters freely, making a case that it can't _just_ be left to the model developers either

[1] "Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts", Bordt et al. 2022, https://dl.acm.org/doi/10.1145/3531146.3533153

svdr

0 replies

6h9m

2024-03-28 12:17:00 UTC

This is a nice daily newsletter with AI news: https://tldr.tech/ai

ron0c

0 replies

2h26m

2024-03-28 15:59:28 UTC

UW-Madison's ML+X community is hosting Machine Learning Marathon that will be featured as a competition on Kaggle (https://www.kaggle.com/c/about/host)

"What is the 2024 Machine Learning Marathon (MLM24)?

This approximately 12-week summer event (exact dates TBA) is an opportunity for machine learning (ML) practitioners to learn and apply ML tools together and come up with innovative solutions to real-world datasets. There will be different challenges to select from — some suited for beginners and some suited for advanced practitioners. All participants, project advisors, and event organizers will gather on a weekly or biweekly basis to share tips with one another and present short demos/discussions (e.g., how to load and finetune a pretrained model, getting started with GitHub, how to select a model, etc.). Beyond the intrinsic rewards of skill enhancement and community building, the stakes are heightened by the prospect of a cash prize for the winning team."

More information here: https://datascience.wisc.edu/2024/03/19/crowdsourced-ml-for-...

king_magic

0 replies

6h45m

2024-03-28 11:40:38 UTC

featup

dartos

0 replies

6h43m

2024-03-28 11:42:57 UTC

Alpha fold seems like a major medical breakthrough

chronosift

0 replies

6h40m

2024-03-28 11:45:23 UTC

A novel SNN framework I'm working on. Newest post has been taking me a while. metalmind.substack.com

beklein

0 replies

8h33m

2024-03-28 09:52:19 UTC

More like a cousin of LLMs are Vision-Language-Action (VLA) models like RT-2 [1]. Additionally to text and vision data they also include data from robot actions as "another language" as tokens to output movement actions for robots.

[1]: https://robotics-transformer2.github.io

antegamisou

0 replies

7h1m

2024-03-28 11:24:21 UTC

I wager the better question is

    What things are happening in fields of, or other than, CS that we don't hear over the din of ML/AI

PaulHoule

0 replies

4h16m

2024-03-28 14:09:53 UTC

I'm just a touch disappointed that this thread is still dominated by neural-network methods, often that apply similar architectures as LLMs to other domains such as vision transformers.

I'd like to see something about other ML methods such as SVM, XGBoost, etc.