I consider this a glimpse into how neural networks and "AI"-like techs would be implemented in the future. Lots of engineering, lots of clever manipulations of known techniques woven together with a powerful, well trained, model, at the center.
Right now I think stuff like chatgpt is only at the first step of making that foundational model that can generalize and process data. There isn't a lot of work going into processing the inputs into something the model can best understand (not at the tokenizer level, even before that). We have a basic field about this i.e. prompt engineers but nothing as sophisticated as Alphafold exists for natural language or images yet.
People are stacking LLMs together and putting system prompts in to assist this input processing. Maybe when we have some more complex systems in place, we can see something resembling a real AGI.
Some[1] think that things are trending in the opposite direction: away from clever manipulations and hard coded domain knowledge, and towards large scale general models.
[1]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
This made me think of thr differences FPGAs and microprocessors - with "more laters" being equivalent to "more gates"
Yeah, I was surprised to see the architecture diagram is so complex. It's been a while since I saw a design that wasn't just "stack more transformer layers".