HN comments for: AlphaFold 3 predicts the structure and interactions of all of life's molecules

moconnor

99 replies

2h28m

2024-05-08 16:00:04 UTC

Stepping back, the high-order bit here is an ML method is beating physically-based methods for accurately predicting the world.

What happens when the best methods for computational fluid dynamics, molecular dynamics, nuclear physics are all uninterpretable ML models? Does this decouple progress from our current understanding of the scientific process - moving to better and better models of the world without human-interpretable theories and mathematical models / explanations? Is that even iteratively sustainable in the way that scientific progress has proven to be?

Interesting times ahead.

dekhn

20 replies

1h41m

2024-05-08 16:46:29 UTC

If you're a scientist who works in protein folding (or one of those other areas) and strongly believe that science's goal is to produce falsifiable hypotheses, these new approaches will be extremely depressing, especially if you aren't proficient enough with ML to reproduce this work in your own hands.

If you're a scientist who accepts that probabilist models beat interpretable ones (articulated well here: https://norvig.com/chomsky.html), then you'll be quite happy because this is yet another validation of the value of statistical approaches in moving our ability to predict the universe forward.

If you're the sort of person who believes that human brains are capable of understanding the "why" of how things work in all its true detail, you'll find this an interesting challenge- can we actually interpret these models, or are human brains too feeble to understand complex systems without sophisticated models?

If you're the sort of person who likes simple models with as few parameters as possible, you're probably excited because developing more comprehensible or interpretable models that have equivalent predictive ability is a very attractive research subject.

(FWIW, I'm in the camp of "we should simultaneously seek simpler, more interpretable models, while also seeking to improve native human intelligence using computational augmentation")

jprete

5 replies

21m

2024-05-08 18:06:24 UTC

The goal of science has always been to discover underlying principles and not merely to predict the outcome of experiments. I don't see any way to classify an opaque ML model as a scientific artifact since by definition it can't reveal the underlying principles. Maybe one could claim the ML model itself is the scientist and everyone else is just feeding it data. I doubt human scientists would be comfortable with that, but if they aren't trying to explain anything, what are they even doing?

fire_lake

2 replies

16m

2024-05-08 18:11:44 UTC

What if the underlying principles of the universe are too complex for human understanding but we can train a model that very closely follows them?

dekhn

0 replies

14m

2024-05-08 18:14:10 UTC

Then we should dedicate large fractions of human engineering towards finding ethical ways to improve human intelligence so that we can appreciate the underlying principles better.

Wilduck

0 replies

14m

2024-05-08 18:14:23 UTC

That sounds like useful engineering, but not useful science.

exe34

0 replies

2024-05-08 18:18:43 UTC

The ML model can also be an emulator of parts of the system that you don't want to personally understand, to help you get on with focusing on what you do want to figure out. Alternatively, the ML model can pretend to be the real world while you do experiments with it to figure out aspects of nature in minutes rather than hours-days of biological turnaround.

dekhn

0 replies

15m

2024-05-08 18:12:48 UTC

That's the aspirational goal. And I would say that it's a bit of an inflexible one- for example, if we had an ML that could generate molecules that cure diseases that would pass FDA approval, I wouldn't really care if scientists couldn't explain the underlying principles. But I'm an ex-scientist who is now an engineer, because I care more about tools that produce useful predictions than understanding underlying principles. I used to think that in principle we could identify all the laws of the universe, and in theory, simulate that would enough accuracy, and inspect the results, and gain enlightenment, but over time, I've concluded that's a really bad way to waste lots of time, money, and resources.

narrator

3 replies

1h4m

2024-05-08 17:23:52 UTC

What if our understanding of the laws of the natural sciences are subtly flawed and AI just corrects perfectly for our flawed understanding without telling us what the error in our theory was?

Forget trying to understand dark matter. Just use this model to correct for how the universe works. What is actually wrong with our current model and if dark matter exists or not or something else is causing things doesn't matter. "Shut up and calculate" becomes "Shut up and do inference."

visarga

0 replies

30m

2024-05-08 17:57:52 UTC

ML is accustomed with the idea that all models are bad, and there are ways to test how good or bad they are. It's all approximations and imperfect representations, but they can be good enough for some applications.

If you think carefully humans operate in the same regime. Our concepts are all like that - imperfect, approximative, glossing over some details. Our fundamental grounding and test is survival, an unforgiving filter, but lax enough to allow for anti-vaxxer movements during the pandemic - survival test is not testing for truth directly, only for ideas that fail to support life.

dekhn

0 replies

53m

2024-05-08 17:34:33 UTC

All models are wrong, but some models are useful.

RandomLensman

0 replies

53m

2024-05-08 17:35:12 UTC

High accuracy could result from pretty incorrect models. When and where that woukd then go completely off the rails is difficult to say.

croniev

3 replies

56m

2024-05-08 17:31:57 UTC

I'm in the following camp: It is wrong to think about the world or the models as "complex systems" that may or may not be understood by human intelligence. There is no meaning beyond that which is created by humans. There is no 'truth' that we can grasp in parts but not entirely. Being unable to understand these complex systems means that we have framed them in such a way (f.e. millions of matrix operations) that does not allow for our symbol-based, causal reasoning mode. That is on us, not our capabilities or the universe.

All our theories are built on observation, so these empirical models yielding such useful results is a great thing - it satisfies the need for observing and acting. Missing explainability of the models merely means we have less ability to act more precisely - but it does not devalue our ability to act coarsely.

visarga

1 replies

33m

2024-05-08 17:54:59 UTC

But the human brain has limited working memory and experience. Even in software development we are often teetering at the edge of the mental power to grasp and relate ideas. We have tried so much to manage complexity, but real world complexity doesn't care about human capabilities. So there might be high dimensional problems where we simply can't use our brains directly.

jvanderbot

0 replies

2024-05-08 18:19:41 UTC

A human mind is perfectly capable of following the same instructions as the computer did. Computers are stupidly simple and completely deterministic.

The concern is about "holding it all in your head", and depending on your preferred level of abstraction, "all" can perfectly reasonably be held in your head. For example: "This program generates the most likely outputs" makes perfect sense to me, even if I don't understand some of the code. I understand the system. Programmers went through this decades ago. Physicists had to do it too. Now, chemists I suppose.

slibhb

0 replies

27m

2024-05-08 18:01:01 UTC

There is no 'truth' that we can grasp in parts but not entirely.

If anyone actually thought this way -- no one does -- they definitely wouldn't build models like this.

interroboink

2 replies

51m

2024-05-08 17:36:48 UTC

... and strongly believe that science's goal is to produce falsifiable hypotheses, these new approaches will be extremely depressing

I don't quite understand this point — could you elaborate?

My understanding is that the ML model produces a hypothesis, which can then be tested via normal scientific method (perform experiment, observe results).

If we have a magic oracle that says "try this, it will work", and then we try it, and it works, we still got something falsifiable out of it.

Or is your point that we won't necessarily have a coherent/elegant explanation for why it works?

variadix

0 replies

16m

2024-05-08 18:11:54 UTC

There is an issue scientifically. I think this point was expressed by Feynman: the goal of scientific theories isn’t just to make better predictions, it’s to inform us about how and why the world works. Many ancient civilizations could accurately predict the position of celestial bodies with calendars derived from observations of their period, but it wasn’t until Copernicus proposed the heliocentric model and Galileo provided supporting observations that we understood the why and how, and that really matters for future progress and understanding.

dekhn

0 replies

40m

2024-05-08 17:48:07 UTC

People will be depressed because they spent decades getting into professorship positions and publishing papers with ostensible comprehensible interpretations of the generative processes that produced their observations, only to be "beat" in the game by a system that processed a lot of observations and can make predicts in a way that no individual human could comprehend. And those professors will have a harder time publishing, and therefore getting promoted, in the future.

Whether ML models produce hypotheses is something of an epistemiological argument that I think muddies the waters without bringing any light. I would only use the term "ML models generate predictions". In a sense, the model itself is the hypothesis, not any individual prediction.

divbzero

1 replies

52m

2024-05-08 17:36:23 UTC

There have been times in the past when usable technology surpassed our scientific understanding, and instead of being depressing it provided a map for scientific exploration. For example, the steam engine was developed by engineers in the 1600s/1700s (Savery, Newcomen, and others) but thermodynamics wasn’t developed by scientists until the 1800s (Carnot, Rankine, and others).

jprete

0 replies

10m

2024-05-08 18:17:31 UTC

I think the various contributors to the invention of the steam engine had a good idea of what they were trying to do and how their idea would physically work. Wikipedia lists the prerequisites as the concepts of a vacuum and pressure, methods for creating a vacuum and generating steam, and the piston and cylinder.

coffeemug

0 replies

24m

2024-05-08 18:04:00 UTC

> If you're the sort of person who believes that human brains are capable of understanding the "why" of how things work in all its true detail

This seems to me an empirical question about the world. It’s clear our minds are limited, and we understand complex phenomena through abstraction. So either we discover we can continue converting advanced models to simpler abstractions we can understand, or that’s impossible. Either way, it’s something we’ll find out and will have to live with in the coming decades. If it turns out further abstractions aren’t possible, well, enlightenment thought had lasted long enough. It’s exciting to live at a time in humanity’s history when we enter a totally uncharted new paradigm.

UniverseHacker

5 replies

1h16m

2024-05-08 17:11:53 UTC

It means we now have an accurate surrogate model or "digital twin" that can be experimented on almost instantaneously. So we can massively accelerate the traditional process of developing mechanistic understanding through experiment, while also immediately be able to benefit from the ability to make accurate predictions, even without needing understanding.

In reality, science has already pretty much gone this way long ago, even if people don't like to admit it. Simple, reductionist explanations for complex phenomena in living systems don't really exist. Virtually all of medicine nowadays is empirical: try something, and if you can prove its safe and effective, you keep doing it. We almost never have a meaningful explanation for how it really works, and when we think we do, it gets proven wrong repeatedly, while the treatment keeps working as always.

mathgradthrow

3 replies

1h15m

2024-05-08 17:13:11 UTC

instead of "in mice", we'll be able to say "in the cloud"

unsupp0rted

1 replies

57m

2024-05-08 17:30:57 UTC

In vivo in humans in the cloud

dekhn

0 replies

12m

2024-05-08 18:16:14 UTC

one of the companies I worked for, "insitro", is specificallyt named that to mean the combination of "in vivo, in vitro, in silicon".

topaz0

0 replies

33m

2024-05-08 17:54:38 UTC

"In nimbo" (though what people actually say is "in silico").

imchillyb

0 replies

56m

2024-05-08 17:31:43 UTC

Medicine can be explained fairly simply, and the why of how it works as it does is also explained by this:

Imagine a very large room that has every surface covered by on-off switches.

We cannot see inside of this room. We cannot see the switches. We cannot fit inside of this room, but a toddler fits through the tiny opening leading into the room. The toddler cannot reach the switches, so we equip the toddler with a pole that can flip the switches. We train the toddler, as much as possible, to flip a switch using the pole.

Then, we send the toddler into the room and ask the toddler to flip the switch or switches we desire to be flipped, and then do tests on the wires coming out of the room to see if the switches were flipped correctly. We also devise some tests for other wires to see if that naughty toddler flipped other switches on or off.

We cannot see inside the room. We cannot monitor the toddler. We can't know what _exactly_ the toddler did inside the room.

That room is the human body. The toddler with a pole is a medication.

We can't see or know enough to determine what was activated or deactivated. We can invent tests to narrow the scope of what was done, but the tests can never be 100% accurate because we can't test for every effect possible.

We introduce chemicals then we hope-&-pray that the chemicals only turned on or off the things we wanted turned on or off. Craft some qualifications testing for proofs, and do a 'long-term' study to determine if there were other things turned on or off, or a short circuit occurred, or we broke something.

I sincerely hope that even without human understanding, our AI models can determine what switches are present, which ones are on and off, and how best to go about selecting for the correct result.

Right now, modern medicine is almost a complete crap-shoot. Hopefully modern AI utilities can remedy the gambling aspect of medicine discovery and use.

xanderlewis

4 replies

2h20m

2024-05-08 16:07:31 UTC

It depends whether the value of science is human understanding or pure prediction. In some realms (for drug discovery, and other situations where we just need an answer and know what works and what doesn’t), pure prediction is all we really need. But if we could build an uninterpretable machine learning model that beats any hand-built traditional ‘physics’ model, would it really be physics?

Maybe there’ll be an intermediate era for a while where ML models outperform traditional analytical science, but then eventually we’ll still be able to find the (hopefully limited in number) principles from which it can all be derived. I don’t think we’ll ever find that Occam’s razor is no use to us.

gmarx

1 replies

57m

2024-05-08 17:30:25 UTC

The success of these ML models has me wondering if this is what Quantum Mechanics is. QM is notoriously difficult to interpret yet makes amazing predictions. Maybe wave functions are just really good at predicting system behavior but don't reflect the underlying way things work.

OTOH, Newtonian mechanics is great at predicting things under certain circumstances yet, in the same way, doesn't necessarily reflect the underlying mechanism of the system.

So maybe philosophers will eventually tell us the distinction we are trying to draw, although intuitive, isn't real

kolinko

0 replies

56m

2024-05-08 17:32:16 UTC

That’s what thermodynamics is - we initially only had laws about energy/heat flow, and only later we figured out how statistical particle movements cause these effects.

failTide

0 replies

1h46m

2024-05-08 16:41:40 UTC

But if we could build an uninterpretable machine learning model that beats any hand-built traditional ‘physics’ model, would it really be physics?

At that point I wonder if it would be possible to feed that uninterpretable model back into another model that makes sense of it all and outputs sets of equations that humans could understand.

RandomLensman

0 replies

50m

2024-05-08 17:38:22 UTC

Pure prediction is only all we need if the total end-to-end process is predicted correctly - otherwise there could be pretty nasty traps (e.g., drug works perfectly for the target disease but does something unexpected elsewhere etc.).

philip1209

3 replies

1h57m

2024-05-08 16:31:18 UTC

It makes me think about how Einstein was famous for making falsifiable real-world predictions to accompany his theoretical work. And, sometimes it took years for proper experiments to be run (such as measuring a solar eclipse during the breakout of a world war).

Perhaps the opportunity here is to provide a quicker feedback loop for theory about predictions in the real world. Almost like unit tests.

goggy_googy

1 replies

1h15m

2024-05-08 17:12:25 UTC

Agreed. At the very least, models of this nature let us iterate/filter our theories a little bit more quickly.

jprete

0 replies

2024-05-08 18:23:18 UTC

The model isn't reality. A theory that disagrees with the model but agrees with reality shouldn't be filtered, but in this process it will be.

HanClinto

0 replies

1h38m

2024-05-08 16:49:39 UTC

Perhaps the opportunity here is to provide a quicker feedback loop for theory about predictions in the real world. Almost like unit tests.

Or jumping the gap entirely to move towards more self-driven reinforcement learning.

Could one structure the training setup to be able to design its own experiments, make predictions, collect data, compare results, and adjust weights...? If that loop could be closed, then it feels like that would be a very powerful jump indeed.

In the area of LLMs, the SPAG paper from last week was very interesting on this topic, and I'm very interested in seeing how this can be expanded to other areas:

https://github.com/Linear95/SPAG

sdwr

2 replies

1h37m

2024-05-08 16:51:01 UTC

Does this decouple progress from our current understanding of the scientific process?

Thank God! As a person who uses my brain, I think I can say, pretty definitively, that people are bad at understanding things.

If this actually pans out, it means we will have harnessed knowledge/truth as a fundamental force, like fire or electricity. The "black box" as a building block.

tantalor

1 replies

1h25m

2024-05-08 17:02:47 UTC

This type of thing is called an "oracle".

We've had stuff like this for a long time.

Notable examples:

- Temple priestesses

- Tea-leaf reading

- Water scrying

- Palmistry

- Clairvoyance

- Feng shui

- Astrology

The only difference is, the ML model is really quite good at it.

unsupp0rted

0 replies

54m

2024-05-08 17:34:19 UTC

The only difference is, the ML model is really quite good at it.

That's the crux of it: we've had theories of physics and chemistry since before writing was invented.

None of that mattered until we came upon the ones that actually work.

pen2l

2 replies

1h12m

2024-05-08 17:15:59 UTC

The most moneyed and well-coordinated organizations have honed a large hammer, and they are going to use it for everything, and so almost certainly future big findings in the areas you mention, probabilistically inclined models coming from ML will be the new gold standard.

But yet the only thing that can save us from ML will be ML itself because it is ML that has the best chance to be able to extrapolate patterns from these blackbox models to develop human interpretable models. I hope we do dedicate explicit effort to this endeavor, and so continue the human advances and expanse of human knowledge in tandem with human ingenuity with computers at our assistance.

optimalsolver

1 replies

1h5m

2024-05-08 17:22:45 UTC

Spoiler: "Interpretable ML" will optimize for output that either looks plausible to humans, reinforces our preconceptions, or appeals to our aesthetic instincts. It will not converge with reality.

kolinko

0 replies

57m

2024-05-08 17:30:43 UTC

That is not considered interpretable then, and I think most people working in the field are aware of this gotcha.

Iirc when EU required banks to have interpretable rules for loans, a plain explanation was not considered enough. What was required was a clear process that was used from the beginning - i.e. you can use an AI to develop an algorightm to make a decision, but you can’t use AI to make a decision and explains reasons afterwards.

ldoughty

2 replies

2h2m

2024-05-08 16:25:48 UTC

My argument is: weather.

I think it is fine & better for society to have applications and models for things we don't fully understand... We can model lots of small aspects of weather, and we have a lot of factors nailed down, but not necessarily all the interactions.. and not all of the factors. (Additional example for the same reason: Gravity)

Used responsibly. Of course. I wouldn't think an AI model designing an airplane that no engineers understand how it works is a good idea :-)

And presumably all of this is followed by people trying to understand the results (expanding potential research areas)

GaggiX

1 replies

2h0m

2024-05-08 16:27:36 UTC

It would be cool to see an airplane made using generative design.

tech_buddha

0 replies

1h48m

2024-05-08 16:40:04 UTC

How about spaceship parts ? https://www.nasa.gov/technology/goddard-tech/nasa-turns-to-a...

tnias23

1 replies

1h14m

2024-05-08 17:13:53 UTC

I wonder if ML can someday be employed in deciphering such black box problems; a second model that can look under the hood at all the number crunching performed by the predictive model, identify the pattern that resulted in a prediction, and present it in a way we can understand.

That said, I don’t even know if ML is good at finding patterns in data.

lupire

0 replies

1h7m

2024-05-08 17:21:11 UTC

That said, I don’t even know if ML is good at finding patterns in data.

That's the only thing ML does.

thomasahle

1 replies

2h21m

2024-05-08 16:06:39 UTC

Stepping back, the high-order bit here is an ML method is beating physically-based methods for accurately predicting the world.

I mean, it's just faster, no? I don't think anyone is claiming it's a more _accurate_ model of the universe.

Jerrrry

0 replies

2h18m

2024-05-08 16:10:22 UTC

Collision libraries and fluid libraries have had baked-in memorized look-up tables that were generated with ML methods nearly a decade ago.

World is still here, although the Matrix/metaverse is becoming more attractive daily.

scotty79

1 replies

1h24m

2024-05-08 17:03:52 UTC

We should be thankful that we live in the universe that obeys math simple enough to comprehend that we were able to reach that level.

Imagine if optis was complex enough that it would require ML model to predict anything.

We'd be in permanent stone age without a way out.

lupire

0 replies

1h0m

2024-05-08 17:27:31 UTC

What would a universe look like that lacked simple things, and somehow only complex things existed?

It makes me think of how Gaussian integers have irreducibles but not prime numbers, where some large things cannot be uniquely expressed as combination of smaller things.

nexuist

1 replies

2h16m

2024-05-08 16:12:11 UTC

As a steelman, wouldn't the abundance of infinitely generate-able situations make it _easier_ for us to develop strong theories and models? The bottleneck has always been data. You have to do expensive work in the real world and accurately measure it before you can start fitting lines to it. If we were to birth an e.g. atomically accurate ML model of quantum physics, I bet it wouldn't take long until we have mathematical theories that explain why it works. Our current problem is that this stuff is super hard to manipulate and measure.

moconnor

0 replies

1h40m

2024-05-08 16:47:26 UTC

Maybe; AI chess engines have improved human understanding of the game very rapidly, even though humans cannot beat engines.

mberning

1 replies

1h22m

2024-05-08 17:06:23 UTC

I would assume that given enough hints from AI and if it is deemed important enough humans will come in to figure out the “first principles” required to arrive at the conclusion.

RobCat27

0 replies

45m

2024-05-08 17:43:20 UTC

I believe this is the case also. With a well enough performing AI/ML/probabilistic model where you can change the model's input parameters and get a highly accurate prediction basically instantly, we can test theories approximately and extremely fast rather than running completely new experiments, which will always come with it's own set of errors and problems.

jeffreyrogers

1 replies

2h6m

2024-05-08 16:21:30 UTC

I asked a friend of mine who is chemistry professor at a large research university something along these lines a while ago. He said that so far these models don't work well in regions where either theory or data is scarce, which is where most progress happens. So he felt that until they can start making progress in those areas it won't change things much.

mensetmanusman

0 replies

1h59m

2024-05-08 16:29:15 UTC

Major breakthroughs happen when clear connections can be made and engineered between the many bits of solved but obscured solutions.

insane_dreamer

1 replies

1h33m

2024-05-08 16:55:09 UTC

For me the big question is how do we confidently validate the output of this/these model(s).

topaz0

0 replies

29m

2024-05-08 17:58:55 UTC

It's the right question to ask, and the answer is that we will still have to confirm them by experimental structure determination.

CapeTheory

1 replies

2h15m

2024-05-08 16:13:16 UTC

Many of our existing physical models can be decomposed into "high-confidence, well tested bit" plus "hand-wavy empirically fitted bit". I'd like to see progress via ML replacing the empirical part - the real scientific advancement then becomes steadily reducing that contribution to the whole by improving the robust physical model incrementally. Computational performance is another big influence though. Replacing the whole of a simulation with an ML model might still make sense if the model training is transferrable and we can take advantage of the GPU speed-ups, which might not be so easy to apply to the foundational physical model solution. Whether your model needs to be verified against real physical models depends on the seriousness of your use-case; for nuclear weapons and aerospace weather forecasts I imagine it will remain essential, while for a lot of consumer-facing things the ML will be good enough.

jononor

0 replies

14m

2024-05-08 18:13:40 UTC

Physics-informed machine learning is a whole (nascent) subfield that is very much in line with this thinking. Steve Brunton has some good stuff about this on YouTube.

wslh

0 replies

1h28m

2024-05-08 16:59:41 UTC

This is the topic of epistemology of the sciences in books such as "New Direction in the Philosophy of Mathematics" [1] and happened before with problems such as the four color theorem [2] where AI was not involved.

Going back to the uninterpretable ML models in the context of AlphaFold 3, I think one method for trying to explain the findings is similar to the experimental methods of physics with reality: you perform experiments with the reality (in this case AlphaFold 3) to came up with sound conclusions. AI/ML is an interesting black-box system.

There are other open discussions on this topic. For example, can our human brain absorbe that knowledge or it is limited somehow with the scientific language that we have now?

[1] https://www.google.com.ar/books/edition/New_Directions_in_th...

[2] https://en.wikipedia.org/wiki/Four_color_theorem

visarga

0 replies

36m

2024-05-08 17:51:48 UTC

No, science doesn't work that way. You can just calculate your way to scientific discoveries, you got to test them in the real world. Learning, both in humans and AI, is based on the signals provided by the environment. There are plenty of things not written anywhere, so the models can't simply train on human text to discover new things. They learn directly from the environment to do that, like AlphaZero did when it beat humans at Go.

torrefatto

0 replies

1h28m

2024-05-08 17:00:04 UTC

You are conflating the whole scientific endeavor to a very specific problem to which this specific approach is effective at producing results that fit with the observable world. This has nothing to do with science as a whole.

topaz0

0 replies

37m

2024-05-08 17:50:32 UTC

In case it's not clear, this does not "beat" experimental structure determination. The matches to experiment are pretty close, but they will be closer in some cases than others and may or may not be close enough to answer a given question about the biochemistry. It certainly doesn't give much information about the dynamics or chemical perturbations that might be relevant in biological context. That's not to pooh-pooh alphafold's utility, just that it's a long way from making experimental structure determination unnecessary, and much much further away from replacing a carefully chosen scientific question and careful experimental design.

tomrod

0 replies

2h14m

2024-05-08 16:14:02 UTC

A few things:

1. Research can then focus on where things go wrong

2. ML models, despite being "black boxes," can still have brute-force assessment performed of the parameter space over covered and uncovered areas by input information

3. We tend to assume parsimony (i.e Occam's razor) to give preference to simpler models when all else is equal. More complex black-box models exceeding in prediction let us know the actual causal pathway may be more complex than simple models allow. This is okay too. We'll get it figured out. Not everything is closed-form, especially considering quantum effects may cause statistical/expected outcomes instead of deterministic outcomes.

tobrien6

0 replies

1h38m

2024-05-08 16:49:27 UTC

I suspect that ML will be state-of-the-art at generating human-interpretable theories as well. Just a matter of time.

timschmidt

0 replies

2h10m

2024-05-08 16:17:52 UTC

There will be an iterative process built around curated training datasets - continually improved, top tier models, teams reverse engineering the model's understanding and reasoning, and applying that to improve datasets and training.

thelastparadise

0 replies

1h52m

2024-05-08 16:35:50 UTC

The ML models will help us understand that :)

thegrim33

0 replies

1h6m

2024-05-08 17:22:06 UTC

Reminds me of the novel Blindsight - in it there's special individuals who work as synthesists, whos job it is to observe and understand and then somehow translate back to "lay person" the seemingly undecipherable actions/decisions of advanced computers and augmented humans.

tambourine_man

0 replies

1h30m

2024-05-08 16:58:11 UTC

Our metaphors and intuitions were crumbling already and stagnating. See quantum physics: sometimes a particle, sometimes a wave, and what constitute a measurement anyway?

I’ll take prediction over understanding if that’s the best our brains can do. We’ve evolved to deal with a few orders of magnitude around a meter and a second. Maybe dealing with light-years and femtometer/seconds is too much to ask.

t14n

0 replies

1h59m

2024-05-08 16:29:10 UTC

A new-ish field of "mechanistic interpretability" is trying to poke at weights and activations and find human-interpretable ideas w/in them. Making lots of progress lately, and there are some folks trying to apply ideas from the field to Alphafold 2. There are hopes of learning the ideas about biology/molecular interactions that the model has "discovered".

Perhaps we're in an early stage of Ted Chiang's story "The Evolution of Human Science", where AIs have largely taken over scientific research and a field of "meta-science" developed where humans translate AI research into more human-interpretable artifacts.

slibhb

0 replies

26m

2024-05-08 18:02:01 UTC

It's interesting to compare this situation to earlier eras in science. Newton, for example, gave us equations that were very accurate but left us with no understanding at all of why they were accurate.

It seems like we're repeating that here, albeit with wildly different methods. We're getting better models but by giving up on the possibility of actually understanding things from first principles.

signal_space

0 replies

11m

2024-05-08 18:17:01 UTC

Is alphafold doing model generation or is it just reducing a massive state space?

The current computational and systems biochemistry approaches struggle to model large biomolecules and their interactions due to the large degrees of freedom of the models.

I think it is reasonable to rely on statistical methods to lead researchers down paths that have a high likelihood of being correct versus brute forcing the chemical kinetics.

After all chemistry is inherently stochastic…

ozten

0 replies

2h17m

2024-05-08 16:11:18 UTC

Science has always given us better, but error prone tooling to see further and make better guesses. There is still a scientific test. In a clinical trial, is this new drug safe and effective.

ogogmad

0 replies

1h44m

2024-05-08 16:43:46 UTC

Some machine learning models might be more interpretable than others. I think the recent "KAN" model might be a step forward.

mnky9800n

0 replies

1h56m

2024-05-08 16:31:38 UTC

I believe it simply tells us that our understanding of mechanical systems, especially chaotic ones, is not as well defined as we thought.

https://journals.aps.org/prresearch/abstract/10.1103/PhysRev...

kylebenzle

0 replies

2h11m

2024-05-08 16:17:13 UTC

That is not a real concern, just a confusion on how statistics works :(

krzat

0 replies

2h4m

2024-05-08 16:24:01 UTC

We will get better with understanding black boxes, if a model can be compressed into simple math formula then it's both easier to understand and to compute.

jononor

0 replies

2024-05-08 18:20:14 UTC

I think it likely that instead of replacing existing methods, we will see a fusion. Or rather, many different kinds of fusions - depending on the exact needs of the problems at hand (or in science, the current boundary of knowledge). If nothing else then to provide appropriate/desirable level of explainability, correctness etc. Hypothetically the combination will also have better predictive performance and be more data efficient - but it remains to be seen how well this plays out in practice. The field of "physics informed machine learning" is all about this.

jncfhnb

0 replies

2h14m

2024-05-08 16:13:59 UTC

These processes are both beyond human comprehension because they contain vast layers of tiny interactions and also not practical to simulate. This tech will allow for exploration for accurate simulations to better understand new ideas if needed.

jes5199

0 replies

1h49m

2024-05-08 16:39:18 UTC

every time the two systems disagree, it's an opportunity to learn something. both kinds of models can be improved with new information, done through real-world experiments

goggy_googy

0 replies

1h12m

2024-05-08 17:15:29 UTC

I think at some point, we will be able to produce models that are able to pass data into a target model and observe its activations and outputs and put together some interpretable pattern or loose set of rules that govern the input-output relationship in the target model. Using this on a model like AlphaFold might enable us to translate inferred chemical laws into natural language.

fnikacevic

0 replies

2h22m

2024-05-08 16:06:13 UTC

I can only hope the models will be sophisticated enough and willing to explain their reasoning to us.

dyauspitr

0 replies

1h29m

2024-05-08 16:59:13 UTC

Whatever it is if we needed to we could follow each instruction through the black box. It’s never going to be as opaque as something organic.

danielmarkbruce

0 replies

1h19m

2024-05-08 17:08:30 UTC

"better and better models of the world" does not always mean "more accurate" and never has.

We already know how to model the vast majority of things, just not at a speed and cost which makes it worthwhile. There are dimensions of value - one is accuracy, another speed, another cost, and in different domains additional dimensions. There are all kinds of models used in different disciplines which are empirical and not completely understood. Reducing things to the lowest level of physics and building up models from there has never been the only approach. Biology, geology, weather, materials all have models which have hacks in them, known simplifications, statistical approximations, so the result can be calculated. It's just about choosing the best hacks to get the best trade off of time/money/accuracy.

cgearhart

0 replies

2h23m

2024-05-08 16:04:57 UTC

This is a neat observation. Slightly terrifying, but still interesting. Seems like there will also be cases where we discover new theories through the uninterpretable models—much easier and faster to experiment endlessly with a computer.

burny_tech

0 replies

1h13m

2024-05-08 17:14:26 UTC

We need to advance mechanistic interpretability (field reverse engineering neural networks) https://www.youtube.com/watch?v=P7sjVMtb5Sg https://www.youtube.com/watch?v=7t9umZ1tFso https://www.youtube.com/watch?v=2Rdp9GvcYOE

bluerooibos

0 replies

37m

2024-05-08 17:51:12 UTC

What happens when...

I can only assume that existing methods would still be used for verification. At least we understand the logic used behind these methods. The ML models might become more accurate on average but they could still throw out results that are way off occasionally, so their error rate would have to become equal to the existing methods.

bbor

0 replies

2h6m

2024-05-08 16:21:41 UTC

This is exactly how the physicists felt at the dawn of quantum physics - the loss of meaningful human inquiry to blindly effective statistics. Sobering stuff…

Personally, I’m convinced that human reason is less pure than we think it to be, and that the move to large mathematical models might just be formalizing a lack-of-control that was always there. But that’s less of a philosophy of science discussion and more of a cognitive science one

adw

0 replies

2h9m

2024-05-08 16:19:15 UTC

What happens when the best methods for computational fluid dynamics, molecular dynamics, nuclear physics are all uninterpretable ML models?

A better analogy is "weather forecasting".

advisedwang

0 replies

47m

2024-05-08 17:40:51 UTC

In physics, we already deal with the fact that many of the core equations cannot be analytically solved for more than the most basic scenarios. We've had to adapt to using approximation methods and numerical methods. This will have to be another place where we adapt to a practical way of getting results.

aaroninsf

0 replies

19m

2024-05-08 18:08:43 UTC

The top HN response to this should be,

what happens is an opportunity has entered the chat.

There is a wave coming—I won't try to predict if it's the next one—where the hot thing in AI/ML is going to be profoundly powerful tools for analyze other such tools and render them intelligible to us,

which will I imagine mean providing something like a zoomable explainer. At every level there are footnotes; if you want to understand why the simplified model is a simplification, you look at the fine print. Which has fine print. Which has...

Which doesn't mean there is not a stable level at which some formal notion of "accurate" cannot be said to exist, which is the minimum viable level of simplification.

Etc.

This sort of thing will of course will the input to many other things.

RandomLensman

0 replies

55m

2024-05-08 17:32:43 UTC

We could be entering a new age of epicycles - high accuracy but very flawed understanding.

Grieverheart

0 replies

2024-05-08 18:24:29 UTC

Perhaps for understanding the structure itself, but having the structure available allows us to focus on a coarser level. We also don't want to use quantum mechanics to understand the everyday world, and that's why we have classic mechanics etc.

GistNoesis

0 replies

36m

2024-05-08 17:51:40 UTC

The frontier in model space is kind of fluid. It's all about solving differential equations.

In theoretical physics, you know the equations, you solve equations analytically, but you can only do that when the model is simple.

In numerical physics, you know the equations, you discretize the problem on a grid, and you solve the constraint defined by the equations with various numerical integration schemes like RK4, but you can only do that when the model is small and you know the equations, and you find a single solution.

Then you want the result faster, so you use mesh-free methods and adaptive grids. It works on bigger models but you have to know the equations, finding a single solution to the differential equations.

Then you compress this adaptive grid with a neural network, while still knowing the governing equations, and you have things like Physics Informed Neural Networks ( https://arxiv.org/pdf/1711.10561 and following papers) where you can bound the approximation error. This method allows solve all solutions to the differential equations simultaneously, sharing the computations.

Then when knowing explicitly your governing equations is too complex, so you assume that there are some governing stochastic equations implicitly, which you learn the end-result of the dynamic with a diffusion model, that's what this alpha-fold is doing.

ML is kind of a memoization technique, analog to hashlife in the game of life, that allows you reuse your past computational efforts. You are free to choose on this ladder which memory-compute trade-off you want to use to model the world.

Gimpei

0 replies

1h18m

2024-05-08 17:10:20 UTC

Might be easier to come up with new models with analytic solutions if you have a probabilistic model at hand. A lot easier to evaluate against data and iterate. Also, I wouldn't be surprised if we develop better tools for introspecting these models over time.

ChuckMcM

0 replies

22m

2024-05-08 18:05:32 UTC

Interesting times indeed. I think the early history of medicines takes away from your observation though. In the 19th and early 20th century people didn't know why medicines worked, they just did. The whole "try a bunch of things on mice, pick the best ones and try them on pigs, and then the best of those and try a few on people" kind of thing. In many ways the mice were a stand in for these models, at the time scientists didn't understand nearly as much about how mice worked (early mice models were pretty crude by today's standards) but they knew they were a close enough analog to the "real thing" that the information provided by mouse studies was usefully translated into things that might help/harm humans.

So when you're tools can produce outputs that you find useful, you can then use those tools to develop your understanding and insights. As a tool, this is quite good.

6gvONxR4sf7o

0 replies

1h1m

2024-05-08 17:27:13 UTC

"Best methods" is doing a lot of heavy lifting here. "Best" is a very multidimensional thing, with different priorities leading to different "bests." Someone will inevitably prioritize reliability/accuracy/fidelity/interpretability, and that's probably going to be a significant segment of the sciences. Maybe it's like how engineers just need an approximation that's predictive enough to build with, but scientists still want to understand the underlying phenomena. There will be an analogy to how some people just want an opaque model that works on a restricted domain for their purposes, but others will be interested in clearer models or unrestricted/less restricted domain models.

It could lead to a very interesting ecosystem of roles.

Even if you just limit the discussion to using the best model of X to design a better Y, limited to the model's domain of validity, that might translate the usage problem to finding argmax_X of valueFunction of modelPrediction of design of X. In some sense a good predictive model is enough to solve this with brute force, but this still leaves room for tons of fascinating foundational work. Maybe you start to find that the (wow so small) errors in modelPrediction are correlated with valueFunction, so the most accurate predictions don't make it the best for argmax (aka optimization might exploit model errors rather than optimizing the real thing). Or maybe brute force just isn't computationally feasible, so you need to understand something deeper about the problem to simplify the optimization to make it cheap.

Metacelsus

35 replies

2h54m

2024-05-08 15:34:08 UTC

From: https://www.nature.com/articles/d41586-024-01383-z

Unlike RoseTTAFold and AlphaFold2, scientists will not be able to run their own version of AlphaFold3, nor will the code underlying AlphaFold3 or other information obtained after training the model be made public. Instead, researchers will have access to an ‘AlphaFold3 server’, on which they can input their protein sequence of choice, alongside a selection of accessory molecules. [. . .] Scientists are currently restricted to 10 predictions per day, and it is not possible to obtain structures of proteins bound to possible drugs.

This is unfortunate. I wonder how long until David Baker's lab upgrades RoseTTAFold to catch up.

mhrmsn

11 replies

2h38m

2024-05-08 15:50:12 UTC

Also no commercial use, from the paper:

AlphaFold 3 will be available as a non-commercial usage only server at https://www.alphafoldserver.com, with restrictions on allowed ligands and covalent modifications. Pseudocode describing the algorithms is available in the Supplementary Information. Code is not provided.

p3opl3

3 replies

1h43m

2024-05-08 16:44:44 UTC

Yes, because that's going to stop competitors.. it's why they didn't release code I guess.

This is yet another large part of a biotech related Gutenberg moment.

natechols

2 replies

1h15m

2024-05-08 17:13:15 UTC

The DeepMind team was essentially forced to publish and release an earlier iteration of AlphaFold after the Rosetta team effectively duplicated their work and published a paper about it in Science. Meanwhile, the Rosetta team just published a similar work about co-folding ligands and proteins in Science a few weeks ago. These are hardly the only teams working in this space - I would expect progress to be very fast in the next few years.

dekhn

1 replies

33m

2024-05-08 17:54:44 UTC

How much has changed- I talked with David Baker at CASP around 2003 and he said at the time, while Rosetta was the best modeller, every time they updated its models with newly determined structures, its predictions got worse :)

natechols

0 replies

24m

2024-05-08 18:03:46 UTC

It's kind of amazing in retrospect that it was possible to (occasionally) produce very good predictions 20 years ago with at least an order of magnitude smaller training set. I'm very curious whether DeepMind has tried trimming the inputs back to an earlier cutoff point and re-training their models - assuming the same computing technologies were available, how well would their methods have worked a decade or two ago? Was there an inflection point somewhere?

pantalaimon

2 replies

1h13m

2024-05-08 17:15:22 UTC

What's the point in that that - I mean who does non-commercial drug research?

sangnoir

0 replies

15m

2024-05-08 18:12:50 UTC

Academia

karencarits

0 replies

2024-05-08 18:21:30 UTC

Public universities?

moralestapia

2 replies

1h49m

2024-05-08 16:38:36 UTC

How easy/hard would be for the scientific community to come up with an "OpenFold" model which is pretty much AF3 but fully open source and without restrictions in it?

I can image training will be expensive, but I don't think it will be at a GPT-4 level of expensive.

dekhn

1 replies

1h16m

2024-05-08 17:12:02 UTC

already did it, https://openfold.io/ https://github.com/aqlaboratory/openfold https://www.biorxiv.org/content/10.1101/2022.11.20.517210v1 https://lupoglaz.github.io/OpenFold2/ https://www.biospace.com/article/releases/openfold-biotech-a...

I really have to emphasize that transformers have literally transformed science in only a few years. Truly extraordinary.

moralestapia

0 replies

1h11m

2024-05-08 17:16:34 UTC

Lol, even nailed the name.

Pretty cool! Thanks for sharing.

obmelvin

0 replies

59m

2024-05-08 17:29:13 UTC

If you need to submit to their server, I don't know who would use it for commercial reasons anyway. Most biotech startups and pharma companies are very careful about entering sequences into online tools like this.

l33tman

7 replies

2h48m

2024-05-08 15:39:51 UTC

That sucks a bit. I was just wondering why they are touting that 3rd party company in their own blog post, who commercialise research tools, as well. Maybe there are some corporate agreements with them that prevents them from opening the system...

Imagine the goodwill for humanity for releasing these pure research systems for free. I just have a hard time understanding how you can motivate to keep it closed. Let's hope it will be replicated by someone who doesn't have to hide behind the "responsible AI" curtain as it seems they are now.

Are they really thinking that someone who needs to predict 11 structures per day are more likely to be a nefarious evil protein guy than someone who predicts 10 structures a day? Was AlphaFold-2 (that was open-sourced) used by evil researchers?

perihelions

4 replies

1h35m

2024-05-08 16:52:50 UTC

- "Imagine the goodwill for humanity for releasing these pure research systems for free."

The entire point[0] is that they want to sell an API to drug-developer labs, at exclusive-monopoly pricing. Those labs in turn discover life-saving drugs, and recoup their costs from e.g. parents of otherwise-terminally-ill children—again, priced as an exclusive monopoly.

[0] As signaled by "it is not possible to obtain structures of proteins bound to possible drugs"

It's a massive windfall for Alphabet, and it'd be a profound breach of their fiduciary duties as a public company to do anything other than lock-down and hoard this API, and squeeze it for every last billion.

This is a deeply, deeply, deeply broken situation.

lupire

0 replies

59m

2024-05-08 17:29:22 UTC

The parents of those otherwise terminally ill children disagree with you in the strongest possible terms.

karencarits

0 replies

1h17m

2024-05-08 17:10:44 UTC

What is the current status of drugs where the major contribution is from AI? Are they protectable like other drugs? Or are they more copyless like AI art and so on?

iknowstuff

0 replies

1h13m

2024-05-08 17:15:19 UTC

Is it broken if it yields new drugs? Is there a system that yields more? The whole point of capitalism is that it incentivizes this in a way that no other system does.

goggy_googy

0 replies

50m

2024-05-08 17:37:25 UTC

What makes this such a "deeply broken situation"?

I agree that late-stage capitalism can create really tough situations for poor families trying to afford drugs. At the same time, I don't know any other incentive structure that would have brought us a breakthrough like AlphaFold this soon. For the first time in history, we have ML models that are beating out the scientific models by huge margins. The very fact that this comes out of the richest, most competitive country in the history of the world is not a coincidence.

The proximate cause of the suffering for terminally-ill children is really the drug company's pricing. If you want to regulate this, though, you'll almost certainly have fewer breakthroughs like AlphaFold. From a utilitarian perspective, by preserving the existing incentive structure (the "deeply broken situation" as you call it), you will be extending the lifespans of more people in the future (as opposed to extending lifespans of more people now by lowering drug prices).

staminade

0 replies

2h45m

2024-05-08 15:43:20 UTC

Isomorphic Labs? That's an Alphabet owned startup run by Denis Hassabis that they created to commercialise the Alphafold work, so it's not really a 3rd party at all.

SubiculumCode

0 replies

2h16m

2024-05-08 16:12:05 UTC

There is at least some difference between a monitored server and a privately ran one, if negative consequences are possible

Jerrrry

4 replies

2h47m

2024-05-08 15:41:19 UTC

The second amendment prevents the government's overreaching perversion to restrict me from having the ability to print biological weapons from the comfort of my couch.

Google has no such restriction.

gameman144

1 replies

2h13m

2024-05-08 16:14:31 UTC

I know this is tongue in cheek, but you absolutely can be restricted from having a biological weapons factory in your basement (similar to not being able to pick "nuclear bombs" as your arms to bear).

timschmidt

0 replies

1h57m

2024-05-08 16:31:11 UTC

Seems like the recipe for independence, and agreed upon borders, and thus whatever interpretation of the second amendment one wants involves exactly choosing nuclear bombs, and managing to stockpile enough of them before being bombed oneself. At least at the nation state scale. Sealand certainly resorted to arms at several points in it's history.

dekhn

0 replies

1h15m

2024-05-08 17:12:59 UTC

Sergey once said "We don't have an army per-se" (he was referring the size of Google's physical security group) at TGIF.

There was a nervous chuckle from the audience.

SubiculumCode

0 replies

2h14m

2024-05-08 16:13:49 UTC

/s is strong with this one

tepal

1 replies

2h26m

2024-05-08 16:01:51 UTC

Or OpenFold, which is the more literal reproduction of AlphaFold 2: https://github.com/aqlaboratory/openfold

LarsDu88

0 replies

10m

2024-05-08 18:17:45 UTC

Time for an OpenFold3? Or would it be an OpenFold2?

rolph

1 replies

2h41m

2024-05-08 15:46:55 UTC

in other words, this has been converted to a novelty, and has no use for scientific purposes.

ebiester

0 replies

2h28m

2024-05-08 15:59:49 UTC

No. It just means that scientific purposes will have an additional tax paid to google. This will likely reduce use in academia but won't deter pharmaceutical companies.

ranger_danger

1 replies

2h33m

2024-05-08 15:54:32 UTC

Not just unfortunate, but doesn't this make it completely untrustable? How can you be sure the data was not modified in any way? How can you verify any results?

dekhn

0 replies

2h25m

2024-05-08 16:02:53 UTC

You determine a crystal structure of a known protein which does not previously have a known structure, and compare the prediction to the experimentally determined structure.

There is a biennial (biannual?) competition known as CASP where some new structures, not yet published, are used for testing predictions from a wide range of protein structure prediction (so, basically blind predictions which are then compared when the competition wraps up). AlphaFold beat all the competitors by a very wide margin (much larger than the regular rate of improvement in the competition), and within a couple years, the leading academic groups adopted the same techniques and caught up.

It was one of the most important and satisfying moments in structure prediction in the past two+ decades. The community was a bit skeptical but as it's been repeatedly tested, validated, and reproduced, people are generally of the opinion that DeepMind "solved" protein structure prediction (with some notable exceptions), and did so without having the solve the full "protein folding problem" (which is actually great news while also being somewhat depressing).

wslh

0 replies

2h49m

2024-05-08 15:38:42 UTC

The AI call is rolling fast, I see similarities with cryptography in the 90s.

I have a history to tell for the record, back in the 90s we developed a home banking for Palm (with a modem), it was impossible to perform RSA because of the speed so I contacted the CEO of Certicom which was the unique elliptic curve cryptography implementation at that time. Fast forward and ECC is everywhere.

photochemsyn

0 replies

2h13m

2024-05-08 16:14:33 UTC

Well, it's because you can design deadly viruses using this technology. Viruses gain entry to living cells via cell-surface receptor proteins whose normal job is to bind signalling molecules, alter their conformation and translate that external signal into the cellular interior where it triggers various responses from genomic transcription to release of other signal molecules. Viruses hijack such mechanisms to gain entry to cells.

Thus if you can design a viral coat protein to bind to a human cell-surface receptor, such that it gets translocated into the cell, then it doesn't matter so much where that virus came from. The cell's firewall against viruses is the cell membrane, and once inside, the biomolecular replication machinery is very similar from species to species, particularly within restricted domains, such as all mammals.

Thus viruses from rats, mice, bats... aren't going to have major problems replicating in their new host - a host they only gained access to because some nation-state actors working in collaboration on such gain-of-function research in at least two labs on opposite sides of the world with funds and material provided by the two largest economic powers for reasons that are still rather opaque, though suspiciously banal...

Now while you don't need something like AlphaFold3 to do recklessly stupid things (you could use directed evolution, making millions of mutatad proteins, throwing them at a wall of human cell receptors and collecting what stuck), it makes it far easier. Thus Google doesn't want to be seen as enabling, though given their prediliction for classified military-industrial contracting to a variety of nation-states, particularly with AI, with revenue now far more important than silly "don't be evil" statements, they might bear watching.

On the positive side, AlphaFold3 will be great for fields like small molecular biocatalysis, i.e. industrial applications in which protein enzymes (or more robust heterogenous catalysts designed based on protein structures) convert N2 to ammonia, methane to methanol, or selectively bind CO2 for carbon capture, modification of simple sugars and amino acids, etc.

niemandhier

0 replies

2h9m

2024-05-08 16:19:06 UTC

The logical consequence is to put all scientific publications under a license that restricts the right to train commercial ai models on them.

Science advances because of an open exchange of ideas, the original idea of patents was to grant the inventor exclusive use in exchange for disclosure of knowledge.

Those who did not patent, had to accept that their inventions would be studied and reverse engineered.

The „as a service“ model, breaks that approach.

dwroberts

0 replies

1h30m

2024-05-08 16:57:55 UTC

This turns it into a tool that deserves to be dethroned by another group, frankly. What a strange choice.

tea-coffee

11 replies

2h47m

2024-05-08 15:40:38 UTC

This is a basic question, but how is the accuracy of the predicted biomolecular interactions measured? Are the predicted interactions compared to known interactions? How would the accuracy of predicting unknown interactions be assessed?

joshuamcginnis

10 replies

2h43m

2024-05-08 15:44:38 UTC

Accuracy can be assessed two main ways: computationally and experimentally. Computationally, they would compare the predicted structures and interactions with known data from databases like PDB (Protein Database). Experimentally, they can use tools like x-ray crystallography and NMR (nuclear magnetic resonance) to obtain the actual molecule structure and compare it to the predicted result. The outcomes of each approach would be fed back into the model for refining future predictions.

https://www.rcsb.org/

dekhn

9 replies

2h38m

2024-05-08 15:49:40 UTC

AlphaFold very explicitly (unless something has changed) removes NMR structures as references because they are not accurate enough. I have a PhD in NMR biomolecular structure and I wouldn't trust. the structures for anything.

fabian2k

4 replies

2h26m

2024-05-08 16:02:20 UTC

Looking at the supplementary material (section 2.5.4) for the AlphaFold 3 paper it reads to me like they still use NMR structures for training, but not for evaluating performance of the model.

dekhn

3 replies

2h14m

2024-05-08 16:14:09 UTC

I think it's implicit in their description of filtering the training set, where they say they only include structures with resolution of 9A or less. NMR structures don't really have a resolution, that's more specific to crystallography. However, I can't actually verify that no NMR structures were included without directly inspecting their list of selected structures.

fabian2k

2 replies

2h12m

2024-05-08 16:15:59 UTC

I think it is very plausible that they don't use NMR structures here, but I was looking for a specific statement on it in the paper. I think your guess is plausible, but I don't think the paper is clear enough here to be sure about this interpretation.

dekhn

1 replies

2h9m

2024-05-08 16:18:54 UTC

Yes, thanks for calling that out. In verifying my statement I actually was confused because you can see they filter NMR out of the eval set (saying so explicitly) but don't say that in the test set section (IMHO they should be required to publish the actual selection script so we can inspect the results).

fabian2k

0 replies

2h2m

2024-05-08 16:25:24 UTC

Hmm, in the earlier AlphaFold 2 paper they state:

Input mmCIFs are restricted to have resolution less than 9 Å. This is not a very restrictive filter and only removes around 0.2% of structures

NMR structures are more than 0.2% so that doesn't fit to the assumption that they implicitly remove NMR structures here. But if I filter by resolution on the PDB homepage it does remove essentially all NMR structures. I'm really not sure what to think here, the description seems too soft to know what they did exactly.

JackFr

3 replies

2h28m

2024-05-08 15:59:51 UTC

Sorry, I don’t mean to be dense - do you mean you don’t trust AlphaFolds structures or NMRs?

dekhn

2 replies

2h6m

2024-05-08 16:21:58 UTC

I don't trust NMR structures in nearly all cases. The reasons are complex enough that I don't think it's worthwhile to discuss on Hacker News.

fikama

1 replies

1h37m

2024-05-08 16:51:08 UTC

Hmm, I would say its always worth to share knowledge. Could you paste some links or maybe type a few key-words for anyone willing to reasearch the topic further on his own.

dekhn

0 replies

1h2m

2024-05-08 17:26:20 UTC

Read this, and recursively (breadth-first) read all its transitive references: https://www.sciencedirect.com/science/article/pii/S096921262...

dopylitty

11 replies

2h46m

2024-05-08 15:41:49 UTC

This reminds me of Google’s claim that another “AI” discovered millions of new materials. The results turned out to be a lot of useless noise but that was only apparent after actual expert spent hundreds of hours reviewed the results[0]

0: https://www.404media.co/google-says-it-discovered-millions-o...

dekhn

9 replies

2h41m

2024-05-08 15:46:34 UTC

The alphafold work has been used across the industry (successfully, in the sense of blind prediction), and has been replicated independently. The work on alphafold will likely net Demis and John a Nobel prize in the next few years.

(that said, one should always inspect Google publications with a fine-toothed comb and lots of skepticism, as they have a tendency to juice the results)

nybsjytm

6 replies

1h58m

2024-05-08 16:29:26 UTC

The alphafold work has been used across the industry (successfully, in the sense of blind prediction), and has been replicated independently.

This is clearly an overstatement, or at least very incomplete. See for instance https://www.nature.com/articles/s41592-023-02087-4:

"In many cases, AlphaFold predictions matched experimental maps remarkably closely. In other cases, even very high-confidence predictions differed from experimental maps on a global scale through distortion and domain orientation, and on a local scale in backbone and side-chain conformation. We suggest considering AlphaFold predictions as exceptionally useful hypotheses."

dekhn

5 replies

1h52m

2024-05-08 16:36:05 UTC

Yep, I know Paul Adams (used to work with him at Berkeley Lab) and that's exactly the paper he'd publish. If you read that paper carefully (as we all have, since it's the strongest we've seen from the crystallography community so far) they're basically saying the results from AF are absolutely excellent, and fit for purpose.

(put another way: if Paul publishes a paper saying your structure predictions have issues, and mostly finds tiny local issues and some distortion and domain orientation,r ather than absolutely incorrect fold prediction, it means your technique works really well, and people are just quibbling about details.)

nybsjytm

3 replies

1h21m

2024-05-08 17:07:04 UTC

I don't know Paul Adams, so it's hard for me to know how to interpret your post. Is there anything else I can read that discusses the accuracy of AlphaFold?

dekhn

2 replies

1h3m

2024-05-08 17:24:27 UTC

Yes, https://predictioncenter.org/casp15/ https://www.sciencedirect.com/science/article/pii/S0959440X2... https://dasher.wustl.edu/bio5357/readings/oxford-alphafold2....

I can't find the link at the moment but from the perspective of the CASP leaders, AF2 was accurate enough that it's hard to even compare to the best structures determined experimentally, due to noise in the data/inadequacy of the metric.

A number of crystallographers have also reported that the predictions helped them find errors in their own crystal-determined structures.

If you're not really familiar enough with the field to understand the papers above, I recommend spending more time learning about the protein structure prediction problem, and how it relates to the epxerimental determination of structure using crystallography.

nybsjytm

1 replies

42m

2024-05-08 17:45:38 UTC

Thanks, those look helpful. Whenever I meet someone with relevant PhDs I ask their thoughts on AlphaFold, and I've gotten a wide variety of responses, from responses like yours to people who acknowledge its usefulness but are rather dismissive about its ultimate contribution.

dekhn

0 replies

28m

2024-05-08 18:00:07 UTC

The people who are most likely to deprecate AlphaFold are the ones whose job viability is directly affected by its existence.

Let me be clear: DM only "solved" (and really didn't "solve") a subset of a much larger problem: creating a highly accurate model of the process by which real proteins adopt their folded conformations, or how some proteins don't adopt folded conformations without assistance, or how some proteins don't adopt a fully rigid conformation, or how some proteins can adopt different shapes in different conditions, or how enzymes achieve their catalyst abilities, or how structural proteins produce such rigid structures, or how to predict whether a specific drug is going to get FDA approval and then make billions of dollars.

In a sense we got really lucky because CASP has been running so long and with some many contributors that it became recognized that winning at CASP meant "solving protein structure prediction to the limits of our ability to evaluate predictions", and that Demis and his associates had such a huge drive to win competitions that they invested tremendous resources and state of the art technology, while sharing enough information that the community could reproduce the results in their own hands. Any problem we want solved, we should gamify, so that DeepMind is motivated to win the game.

natechols

0 replies

1h11m

2024-05-08 17:16:40 UTC

I also worked with the same people (and share most of the same biases) and that paper is about as close to a ringing endorsement of AlphaFold as you'll get.

11101010001100

1 replies

2h6m

2024-05-08 16:22:11 UTC

Depending on your expected value of quantum computing, the Nobel committee shouldn't wait too long.

dekhn

0 replies

1h18m

2024-05-08 17:09:37 UTC

Personally I don't expect QC to be a competitor to ML in protein structure prediction for the foreseeable future. After spending more money on molecular dynamics than probably any other human being, I'm really skeptical that physical models of protein structures will compete with ML-based approaches (that exploit homology and other protein sequence similarities).

Laaas

0 replies

1h13m

2024-05-08 17:14:43 UTC

We have yet to find any strikingly novel compounds in the GNoME and Stable Structure listings, although we anticipate that there must be some among the 384,870 compositions. We also note that, while many of the new compositions are trivial adaptations of known materials, the computational approach delivers credible overall compositions, which gives us confidence that the underlying approach is sound.

Doesn't seem outright useless.

j7ake

7 replies

1h56m

2024-05-08 16:31:44 UTC

So it’s okay now to publish a computational paper with no code? I guess Nature’s reporting standards don’t apply to everyone.

A condition of publication in a Nature Portfolio journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications.

Authors must make available upon request, to editors and reviewers, any previously unreported custom computer code or algorithm used to generate results that are reported in the paper and central to its main claims.

https://www.nature.com/nature-portfolio/editorial-policies/r...

boxed

5 replies

1h43m

2024-05-08 16:45:00 UTC

Are you an editor or reviewer?

HanClinto

3 replies

1h36m

2024-05-08 16:52:01 UTC

Good question.

Also makes me wonder -- where's the line? Is it reasonable to have "layperson" reviewers? Is it reasonable to think that regular citizens could review such content?

_just7_

1 replies

59m

2024-05-08 17:28:44 UTC

No, infact most journals have peer reviews cordoned off, not viewable to the general public.

lupire

0 replies

53m

2024-05-08 17:34:48 UTC

That's pre-publication review, not scientific peer review. Special interests try to conflate the two, to bypass peer review and transform science into a religion.

Peer review properly refers to the general process of science advancing by scientists reviewing each other's published work.

Publishing a work is the middle, not the end of the research.

Kalium

0 replies

1h6m

2024-05-08 17:21:42 UTC

I think you will find that for the vast, vast majority of scientific papers there is significant negative expected value to even attempting to have layperson reviewers. Bear in mind that we're talking about papers written by experts in a specific field aimed at highly technical communication with other people who are experts in the same field. As a result, the only people who can usefully review the materials are drawn from those who are also experts in the same field.

For an instructive example, look up the seminal paper on the structure of DNA: https://www.mskcc.org/teaser/1953-nature-papers-watson-crick... Ask yourself how useful comments from someone who did not know what an X-ray is, never mind anything about organic chemistry, would be in improving the quality of research or quality of communication between experts in both fields.

j7ake

0 replies

1h8m

2024-05-08 17:20:00 UTC

If you read the standards it applies broadly beyond reviewers or editors.

A condition of publication in a Nature Portfolio journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications.

dekhn

0 replies

55m

2024-05-08 17:32:52 UTC

Nature has long been willing to break its own rules to be at the forefront of publishing new science.

weregiraffe

6 replies

2h55m

2024-05-08 15:33:21 UTC

s/predicts/attempts to predict

matt-attack

1 replies

2h49m

2024-05-08 15:38:58 UTC

Syntax error

adrianmonk

0 replies

1h39m

2024-05-08 16:49:10 UTC

Legal without the trailing slash in vi!

jasonjmcghee

1 replies

2h49m

2024-05-08 15:38:50 UTC

The title OP gave accurately reflects the title of Google's blog post. Title should not be editorialized.

jtbayly

0 replies

2h46m

2024-05-08 15:41:56 UTC

Unless the title is clickbait, which it appears this is…

pbw

0 replies

2h41m

2024-05-08 15:47:02 UTC

A prediction is a prediction; it's not necessarily a correct prediction.

The weatherman predicts the weather, even if he's sometimes wrong, we don't say "he attempts to predict" the weather.

dekhn

0 replies

2h37m

2024-05-08 15:50:50 UTC

AlphaFold has been widely validated- it's now appreciated that its predictions are pretty damn good, with a few important exceptions, instances of which are addressed with the newer implementation.

renonce

4 replies

2h53m

2024-05-08 15:35:11 UTC

What is different about the new AlphaFold3 model compared to AlphaFold2?

AlphaFold3 can predict many biomolecules in addition to proteins. AlphaFold2 predicts structures of proteins and protein-protein complexes. AlphaFold3 can generate predictions containing proteins, DNA, RNA, ions,ligands, and chemical modifications. The new model also improves the protein complex modelling accuracy. Please refer to our paper for more information on performance improvements.

AlphaFold 2 generally produces looping “ribbon-like” predictions for disordered regions. AlphaFold3 also does this, but will occasionally output segments with secondary structure within disordered regions instead, mostly spurious alpha helices with very low confidence (pLDDT) and inconsistent position across predictions.

So the criticism towards AlphaFold 2 will likely still apply? For example, it’s more accurate for predicting structures similar to existing ones, and fails at novel patterns?

COGlory

1 replies

2h31m

2024-05-08 15:56:38 UTC

So the criticism towards AlphaFold 2 will likely still apply? For example, it’s more accurate for predicting structures similar to existing ones, and fails at novel patterns?

Yes, and there is simply no way to bridge that gap with this technique. We can make it better and better at pattern matching, but it is not going to predict novel folds.

dekhn

0 replies

2h23m

2024-05-08 16:04:49 UTC

alphafold has been shown to accurately predict some novel folds. The technique doesn't entirely depend on whole-domain homology.

rolph

0 replies

2h22m

2024-05-08 16:05:27 UTC

problem is biomolecules, are "chaperoned" to fold properly, only specific regions such as, alpha helix, or beta pleatedsheet will fold de novo.

Chaperone (protein)

https://en.wikipedia.org/wiki/Chaperone_(protein)

dekhn

0 replies

2h40m

2024-05-08 15:48:19 UTC

I am not aware of anybody currently criticiszing AF2's abilities outside of its training set. In fact the most recent papers (written by crystallographers) they are mostly arguing about atomic-level details of side chains at this point.

ein0p

4 replies

1h54m

2024-05-08 16:34:22 UTC

I’m inclined to ignore such pr fluff until they actually demonstrate a _practical_ result. Eg. cure some form of cancer or some autoimmune disease. All this “prediction of structure” has been in the news for years, and it seems to have resulted in nothing practically usable IRL as far as I can tell. I could be wrong of course, I do not work in this field

dekhn

2 replies

1h26m

2024-05-08 17:02:03 UTC

the R&D of all major pharma is currently using AlphaFold predictions when they don't have experimentally determined structures. I cannot share further details but the results suggest that we will see future pharmaceuticals based on AF predictions.

The important thing to recognize is that protein structures are primarily hypothesis-generation machines and tools to stimulate ideas, rather that direct targets of computational docking. Currently structures rarely capture the salient details required to identify a molecule that has precisely the biological outcome desired, because the biological outcome is an extremely complex function that incorporates a wide array of other details, such as other proteins, metabolism, and more.

ein0p

1 replies

42m

2024-05-08 17:45:52 UTC

Sure. If/when we see anything practical, that’ll be the right moment to pay attention. This is much like “quantum computing” where everyone who doesn’t know what it is is excited for some reason, and those that do know can’t even articulate any practical applications

dekhn

0 replies

2024-05-08 18:27:29 UTC

Feynman already articulated the one practical application for quantum computing: using it to simulate complex systems (https://www.optica-opn.org/home/articles/on/volume_11/issue_... and https://calteches.library.caltech.edu/1976/ and https://s2.smu.edu/~mitch/class/5395/papers/feynman-quantum-...

These approaches are now being explored but I haven't seen any smoking guns showing a QC-based simulation exceeding the accuracy of a classical computer for a reasonable investment.

Folks have suggested other areas, such as logistics, where finding small improvements to the best approximations might give a company a small edge, and crypto-breaking, but there has been not that much progress in this area, and the approximate methods have been improving rapidly.

arolihas

0 replies

57m

2024-05-08 17:30:27 UTC

There are a few AI-designed drugs in various phases of clinical trials, these things take time.

s1artibartfast

3 replies

2h58m

2024-05-08 15:29:43 UTC

The article was heavy on the free research aspect, but light on the commercial application.

I'm curious about the business strategy. Does Google intend to license out tools, partner, or consult for commercial partners?

ilrwbwrkhv

1 replies

2h52m

2024-05-08 15:35:51 UTC

as soon as google tries to think commercially this will shut down so the longer it stays pure research the better. google is bad with productization.

s1artibartfast

0 replies

2h21m

2024-05-08 16:06:57 UTC

I don't think it was ever pure research. The article talks about infinity labs, which is the co. Mercial branch for drug discovery.

I do agree that Google seems bad at commercialization, which is why I'm curious on what the strategy is.

It is hard to see them being paid consultants or effective partners for pharma companies, let alone developing drugs themselves.

candiodari

0 replies

2h27m

2024-05-08 16:01:17 UTC

I wonder what the license for RoseTTAFold is. On github you have:

https://github.com/RosettaCommons/RoseTTAFold/blob/main/LICE...

But there's also:

https://files.ipd.uw.edu/pub/RoseTTAFold/Rosetta-DL_LICENSE....

Which is it?

nybsjytm

3 replies

2h8m

2024-05-08 16:19:58 UTC

Important caveat: it's only about 70% accurate. Why doesn't the press release say this explicitly? It seems intentionally misleading to only report accuracy relative to existing methods, which apparently are just not so good (30%, 50% in various settings). https://www.fastcompany.com/91120456/deepmind-alphafold-3-dn...

bluerooibos

2 replies

35m

2024-05-08 17:53:13 UTC

That's pretty good. Based on the previous performance improvements of Alpha-- models, it'll be nearing 100% in the next couple of years.

nybsjytm

1 replies

20m

2024-05-08 18:07:50 UTC

Just "Alpha-- models" in general?? That's not a remotely reasonable way to reason about it. Even if it were, why should it stop DeepMind from clearly communicating accuracy?

dekhn

0 replies

2024-05-08 18:22:07 UTC

The way I think about this (specifically, deepmind not publishing their code or sharing their exact experimental results): advanced science is a game played by the most sophisticated actors in the world. Demis is one of those actors, and he plays the games those actors play better than anybody else I've ever seen. Those actors don't care much about the details of any specific system's accuracy: they care to know that it's possible to do this, and some general numbers about how well it works, and some hints what approaches they should take. And Nature, like other top journals, is more than willing to publish articles like this because they know it stimulates the most competitive players to bring their best games.

(I'm not defending this approach, just making an observation)

qwertox

1 replies

2h20m

2024-05-08 16:08:09 UTC

Thrilled to announce AlphaFold 3 which can predict the structures and interactions of nearly all of life’s molecules with state-of-the-art accuracy including proteins, DNA and RNA. [1]

There's a slight mismatch between the blog's title and Demis Hassabis' tweet, where he uses "nearly all".

The blog's title suggests that it's a 100% solved problem.

[1] https://twitter.com/demishassabis/status/1788229162563420560

bmau5

0 replies

2h3m

2024-05-08 16:24:29 UTC

Marketing vs. Reality :)

nojvek

1 replies

25m

2024-05-08 18:02:29 UTC

So much hyperbole from recent Google releases.

I wish they didn't hype AI so much, but I guess that's what people want to hear, so they say that.

sangnoir

0 replies

16m

2024-05-08 18:11:54 UTC

I don't blame them for hyping their products - if only to fight the sentiment that Google is far behind OpenAI because they were not first to release a LLM.

_xerces_

1 replies

2h31m

2024-05-08 15:56:25 UTC

A video summary of why this research is important: https://youtu.be/Mz7Qp73lj9o?si=29vjdQtTtIOk_0CV

ProllyInfamous

0 replies

1h43m

2024-05-08 16:45:07 UTC

Thanks for this informative video summary. As a layperson, with a BS in Chemistry, it was quite helpful in understanding main bulletpoints of this accomplishment.

LarsDu88

1 replies

24m

2024-05-08 18:03:38 UTC

As a software engineer, I kind of feel uncomfortable about this new model. It outperforms Alphafold 2 at ligand binding, but Alphafold 2 also had some more hardcoded and interpretable structural reasoning baked into the model architecture.

There's so many things you can incorporate into a protein folding model such as structural constraints, rotational equivariance, etc, etc

This new model simple does away with some of that, achieving greater results. And the authors simply use distillation from data outputted from Alphafold2 and Alphafold2-multimer to get those better results for those cases where you wind up with implausible results.

You have to run all those previous models, and output their predictions to do the distillation to achieve a real end-to-end training from scratch for this new model! Makes me feel a bit uncomfortable.

amitport

0 replies

2024-05-08 18:27:24 UTC

Consider that humans also learn from other humans, and sometimes surpass their teachers.

A bit more comfortable?

xnx

0 replies

1h20m

2024-05-08 17:07:28 UTC

Very cool that anyone can login to https://golgi.sandbox.google.com/ and check it out

uptownfunk

0 replies

37m

2024-05-08 17:50:35 UTC

Very sad to see they did not make it open source. When you have a technology that has the potential to be a gateway for drug development, to the cures of new diseases, and instead you choose to make it closed, it is a very huge disservice to the community at large. Sure, release your own product alongside it, but making it closed source does not help the scientific community upon which all these innovations were built. Especially if you have lost a loved one to a disease which this technology will one day be able to create cures for, it is very disappointing.

mchinen

0 replies

1h51m

2024-05-08 16:36:31 UTC

I am trying to understand how accurate the docking predictions are.

Looking at the PoseBusters paper [1] they mention, they say they are 50% more accurate than traditional methods.

DiffDock, which is the best DL based systems gets 30-70% depending on the dataset, and traditional gets 50-70%. The paper highlighted some issues with the DL-based methods and given that DeepMind would have had time to incorporate this into their work and develop with the PoseBusters paper in mind, I'd hope it's significantly better than 50-70%. They say 50% better than traditional so I expected something like 70-85% across all datasets.

I hope a paper will appear soon to illuminate these and other details.

[1] https://pubs.rsc.org/en/content/articlehtml/2024/sc/d3sc0418...

dsign

0 replies

1h30m

2024-05-08 16:57:53 UTC

For a couple of years I've been expecting that ML models would be able to 'accelerate' bio-molecular simulations, using physics-based simulations as ground truth. But this seems to be a step beyond that.

bschmidt1

0 replies

53m

2024-05-08 17:34:50 UTC

Google's Game of Life 3D: Spiral edition