slightly OT:
I really struggle with dozens and dozens of vocabulary that is being used in the field of machine learning and especially AI. I'm not a beginner at all, but I wonder if there is a comprehensive guide for all those terms that not necessarily explains the technology behind them in detail, but shows their position and relation to each other. like some kind of landscape.
"everyone" seems to know Mamba. I never heard of Mamba. There are constantly new kind of llm popping up, talking about stuff that seems to be obvious.
So, is there some kind of resource like that, not aiming at beginners, but experienced users, coming from other fields of IT?
In fast evolving fields it’s always all about sociology, not canon or pedagogy. Meaning in new fields is created in community (constructionism).
You need to plug into the community and overhear what people are talking about (HN is such a community). You’ll also get a sense of the linguistic subculture (acronyms, lingo etc) much like you learn to talk hip hop if you’re into the hip hop subculture. Much of it will be noise but overall you’ll get a sense of what the community cares about, which helps you narrow what you need to focus on. The subreddit r/localllama is the watering hole for hobbyists right now.
If you need a primer, this is a good guide.
https://flyte.org/blog/getting-started-with-large-language-m...
In this particular case, I find it helpful to do syntopical reading (per Mortimer Adler) around LLMs not AI in general. Mamba is interesting to me because I have a background in optimal control and state space models are my bread an butter and it’s fascinating to see them applied in this way.
Side: I’m in my 40s and this isn’t my first rodeo. There will always be new fields and trends emerging — I’ve been through several waves of this (cloud, big data, ML, data science etc) where posts like yours are commonplace. But there is no need to be frustrated. Overhearing conversations is one way to make sense of them instead of feeling lost and waiting for someone to summarize and explain everything to you.
The same applies to academic fields.
Ps also consider you might not need to be on the cutting edge. If you’re not trying to build leading edge stuff, it’s good to wait for the dust to settle — you’ll waste less time following dead ends while the community is figuring out what’s good.
Perhaps the community at r/localllama could train an LLM that knows about the latest developments and explains jargon and papers, updated weekly. Free idea for karma.
Not a bad idea.
I actually read papers with the help of ChatGPT-4 and Claude. It helps me quickly understand papers that I don’t have a background in.
For instance when I see something I don’t understand I ask it “can you break that down for me?” Or “is this similar to (concept I know)?”
It’s the new way of doing syntopical reading — but faster and more efficient.
(For the uninitiated, it’s a technique from Mortimer Adler’s How to read a book)
How do you feed a recent arxiv paper directly to ChatGPT?
You can visit the page and use edge browser copilot feature. It uses gpt4 and doesn’t cost anything ;)
A few options are:
1. Select abstract or select all text then copy/paste.
2. Save the PDF and upload with ChatGPT’s document feature.
3. Ask for it, “what’s that well known LLM paper about context and getting lost in the middle?”. It will web search as needed.
You can also do more than summarize. Ask about equations, ask it to make analogies, challenge the key findings as devil’s advocate to learn from different angles. Propose your own ideas.
Use voice to digest topics during your commute and ask tons of questions until you understand.
If you have the + subscription you can upload pdfs directly/ask it to ingest.
This is a great way to consume papers. If there’s one thing LLMs know, it’s machine learning literature!
Good point, thanks for the link! (one of the links there leads to this wonderful post: Highly recommended: http://jalammar.github.io/illustrated-transformer/)
Only the "everybody who knows what mamba is" are the ones upvoting and commenting. Think of all the people who ignore it. For me, Mamba is the faster version of Conda [1], and that's why I clicked on the article.
https://github.com/mamba-org/mamba
Ah yes, Conda, definitely something else I've heard of.
Conda is the latest LLM cli frontend that's a MOE of Mistral 7B, LLama 17B, Falcon 32C, and the Yamaha YZ50 quad bike.
Well played.
Mamba is a PoC of the latest SSM architecture for LLMs named S6 and is a dense counterpart to Transformers trained for 300B tokens of the Pile in sizes up to 2.7B. Mamba proves that S6 LLMs train faster, run faster, use less VRAM, result in lower perplexity and better benchmark scores with the same exact training data.
That is actually accurate but probably sounds just as outlandish.
The approachable version is: Mamba is a proof of concept language model which showcases a new LLM architecture called S6 which is a competitor to the Transformer architecture (the 'T' in ChatGPT) and it is better in every measurable way.
Its extremely common to manage python environments with conda (although it can do much more). If you are unaware of conda, it is unlikely you work with python, and therefore unlikely to be doing much with ML (and LLMs) anyway - its even part of the "getting started" documentation for pytorch.
Conda has been around for a decade and it used to be the primary package manager for everything related to numpy/scipy. Most ML and data science people have heard of it even if they haven't used it.
That is not "a new LLVM architecture"... It's talking about a different Mamba.
Just came out a few days ago. It's new for everyone.
Mamba is also the name of a package management system, similar to Conda.
Just to make it a little extra confusing :)
https://github.com/mamba-org/mamba
Should have picked a different snake, like... I dunno, Asp? Wait, no, not that one...
Python!
It is a very fad driven field. Everyone brands everything. It isn't enough to give things boring titles like, stacked open linear dynamical system with selective observations and learned timestep.
that's half of it, the other half is pure social linguistics.
try talking about stacked open linear dynamical system for more than three times and you're bound to figure out a token that conveys the same but is quicker to produce
it's turtles all the way down with LLM And your comment. people are just trying to maximize their token conversations
I mean, Mamba is much easier to remember than what you said. It’s good to have short names for techniques.
the field just moves fast. I have curated a list of non-hypey writers and youtubers who explain these things for a typical SWE audience if you are interested. https://github.com/swyxio/ai-notes/blob/main/Resources/Good%...
Will check it, thank you!
Sorry for getting semantical here, but isn't ML a subfield of AI? In other words, I would have expected "... in the field of machine learning and AI in general"
AI is often being used recently for specifically generative AI, which is a subfield of machine learning, which is a subfield of AI in the broader sense.
Its a new LLM type: instead of transformers it use state-space machines, which are orders of magnitude faster. Its currently very new and less coherent than GPT-2.
? its better than GPT 2 for sure...
The people that are constantly up to date on this stuff tend to be AI/ML researchers and engineers. In academia, industry research groups, or startups.
They literally get paid to read papers, and implement models on a day-to-day basis.
I wouldn't worry too much not being up to date or things sounding a bit foreign. The names themselves are just that, names, the models themselves tend to be incremental versions of some previous model.
Most of the startups I've chatted with seem to prioritize finding people who build products. The complaint/regret I've heard from 3-5 organizations was hiring researchers.
Researcher is more for highly funded organizations. Starrups can get by with off the shelf models.
Heavily agree. Ive been following this space quite closely, like most people, only for the past year. But it seems to be still in its experimental phase which in turn brings academics and researchers who tend toward this type of language.
I didn't know Mamba but the bottom of the page lists comprehensive references.
If you mean the "branding" that is common in ML, which is often criticized, I much prefer it over the jargon used in other fields, e.g. Mathematics. It is nice to have distinguished words to talk about different concepts.
You are now in the loop! Your colleagues will think the same “this person how does he/she keep up with all the LLM stuff?”.
Don’t feel bad, Mamba is very new technology. I only just heard about it for the first time last week!
Everybody doesn't know Mamba. You can't stay on top of everything in ML so stop trying. Since you asked, Mamba is a neural architecture based on structured state space models (SSMs) that aims to replace Transformers. For me right now just know that counts as staying on top of things. If I need to know more than that I can have the computer summarize it for me.
I knew about Mamba from r/singularity and following AI researchers on Twitter.
I don't work in AI at all (and don't plan to), but it's fun to know about stuff a little before they become mainstream.
I'm not aware of such a glossary.
But I did notice the "References" section in the bottom of the README, which does explain what Mamba is by linking to the original paper: "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" https://arxiv.org/abs/2312.00752