So many cool ideas are being developed by the community. Here are some of my favorites.
Batch inference optimization: FlexGen, llama.cpp
Faster decoder with techniques such as Medusa, LookaheadDecoding
Model merging: mergekit
Constrained sampling: outlines, guidance, SGLang
So essentially a handful of people are doing God's work. They have deep knowledge on modeling and optimization, and they build amazing libraries for millions of mortals. On the other hand, it'll be hard for an engineer to work on training frameworks or building models with new knowledge or new capabilities (except some small-scale finetuning) or optimization in general -- the hardware cost for doing such work is prohibitive to such engineers.
Isn’t that the same with most engineering disciplines though? A nuclear engineer cannot build a nuclear power station at home, a chemical engineer or process chemist doesn’t have access to the industrial grade infrastructure outside of their job, a computer hardware architect cannot hope to design hardware at home and fabricate it at 3nm or better at TSMC. I guess that software engineering was more of an exception to this rule for a while because home computers were amazing enough to help build useful software. Even throughout all these years many people worked on parallel code that run on large clusters or infrastructure that was not appropriate for operating at home and now with deep learning a subset of that skill set is very desirable. I agree that additional public contributions to the training process on large clusters would be fantastic; eventually these people will be trained in all the right systems courses and will figure out their way to the jobs where they can apply their skills and grow.
But not necessarily true for software engineers. The so-called three romances of CS can all be attempted by anyone: compiler, operating systems, and computer graphics. And in the recent years, databases, distributed systems, and machine learning algorithms can also be attempted by anyone at home. Only the large models and the associated optimizations are really beyond most people's reach.
I'm a CS grad and software engineer and have never in my time heard anyone describe the "three romances". Where did you hear that term and can you give me a TL;DR on why you actually think those fields are the 3?
More like the holy trinity 99.998% of CS majors couldn't code if their career depended on completing.
https://youtu.be/TRZAJY23xio?feature=shared&t=2346
I don't agree with a lot of what Steve stood for, but he was right about this dynamic range observation.
Cheers =)
I don't think that's true at all. I think most "CS majors" would need to study how to do those things, but none of them are conceptually difficult and build on the same algorithmic and data structures knowledge you learn for other tasks.
I never built a compiler until I did. I never built an operating system until I did. (I don't remember when I started doing computer graphics, though, because that was a long time ago.)
Sure, one could spend a semester learning to build a rudimentary compiler, or watch your prof build a better solution in under 37 lines of Prolog.
Wish I was joking here... =)
Do you have a link to their project? Rich Hickey also always reiterates that Simple isn't Easy. Knowing exactly how something works means you can express it in as concise a way as possible.
I don't think they are denying being able to do any of those things, it just often won't be in the same league. You aren't going to be producing linux, llvm, or unreal, but instead very specific projects which may be remarkable or just a toy, templeos, tinyc, 64k demos. Not ChatGPT, but maybe autocomplete or an inferior llm using the same methods. For the hardware example they gave, while 3nm might be inaccessible, you can fabricate at 180nm, 130nm and 90nm process nodes [0] [1]. Even chemistry and nuclear science isn't beyond the home tinkerers grasp, but as complexity rises, the ability to acquire, control and synthesize diminishes rapidly.
[0] https://developers.google.com/silicon
[1] https://en.wikipedia.org/wiki/Google_Silicon_Initiative
I didn't now about "three romances of CS" terminology. #TIL
I don’t think anyone who would/could get into low level computer graphics wouldn’t be able to do so in LLM optimization land.
The only barrier to both areas is just the amount of math and gpu knowledge
I'm easily able to contribute to a lot of open source projects either out of the box or with little onboarding time.
When I read about those optimization blog articles it feels to me that I need to take at least half a year or a year as a sebatical to understand all of it