HN comments for: The Illustrated AlphaFold

tomohelix

3 replies

23h10m

2024-07-13 19:29:22 UTC

I consider this a glimpse into how neural networks and "AI"-like techs would be implemented in the future. Lots of engineering, lots of clever manipulations of known techniques woven together with a powerful, well trained, model, at the center.

Right now I think stuff like chatgpt is only at the first step of making that foundational model that can generalize and process data. There isn't a lot of work going into processing the inputs into something the model can best understand (not at the tokenizer level, even before that). We have a basic field about this i.e. prompt engineers but nothing as sophisticated as Alphafold exists for natural language or images yet.

People are stacking LLMs together and putting system prompts in to assist this input processing. Maybe when we have some more complex systems in place, we can see something resembling a real AGI.

astroalex

2 replies

19h1m

2024-07-13 23:38:06 UTC

Some[1] think that things are trending in the opposite direction: away from clever manipulations and hard coded domain knowledge, and towards large scale general models.

[1]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

sangnoir

0 replies

1h35m

2024-07-14 17:04:04 UTC

This made me think of thr differences FPGAs and microprocessors - with "more laters" being equivalent to "more gates"

PoignardAzur

0 replies

10h17m

2024-07-14 08:22:07 UTC

Yeah, I was surprised to see the architecture diagram is so complex. It's been a while since I saw a design that wasn't just "stack more transformer layers".

inciampati

2 replies

2024-07-13 17:41:26 UTC

It's so, so complex! I confess I had a sense of this but had no idea. We don't even hear which MSA algorithm is used to align the protein sequences.

flobosg

0 replies

23h15m

2024-07-13 19:24:09 UTC

Input MSAs are generated with jackhmmer and HHblits and further processed, if I recall Alphafold’s paper correctly.

elanapearl

0 replies

20h48m

2024-07-13 21:51:16 UTC

Hi, I was one of the authors of this! I think we briefly mentioned this in a footnote somewhere (a lot of things got cut or moved to footnotes since it is already so long & wanted to focus on the ML parts that aren't described elsewhere).

But yes as @Flobosg mentioned, for protein chains they use jackhmmer to search 4 of the databases (except when searching Uniclust30 + BFD when HHBlits is used instead) and for RNA chains they used nhmmer to search then hmmalign to re-align these to the query chain.

Hope that helps!

mk_stjames

0 replies

5h13m

2024-07-14 13:26:12 UTC

I have no prior knowledge on protein folding but nevertheless I enjoyed (attempting) to read through this. It's interesting to see the complexity in techniques used in comparison to a lot of other ML projects today.

joelS

0 replies

2024-07-13 17:54:57 UTC

This is an amazing writeup, thank you. looking forward to going through it in more detail.

great_tankard

0 replies

22h0m

2024-07-13 20:39:46 UTC

This is an awesome writeup that really helped me understand what's going on under the hood. I didn't know, for example, that for the limited number of PTMs AF3 can handle it has to treat every single atom, including those of the main and side chain, as an individual token (presumably because PTMs are very underrepresented in the PDB?)

Thank you for translating the paper into something this structural biologist can grasp.