I've been gleefully exploring the intersection of LLMs and CLI utilities for a few months now - they are such a great fit for each other! The unix philosophy of piping things together is a perfect fit for how LLMs work.
I've mostly been exploring this with my https://llm.datasette.io/ CLI tool, but I have a few other one-off tools as well: https://github.com/simonw/blip-caption and https://github.com/simonw/ospeak
I'm puzzled that more people aren't loudly exploring this space (LLM+CLI) - it's really fun.
I've been seeing less and less enthusiasm for CLI driven workflows. I think VS Code is the main driver for this and anecdotally the developers I serve want point & click over terminal & cli
I think it's due to a lack of familiarity, as the CLI should be more efficient
Any CLI is 1 dimensional.
Point and click is 2 dimensional.
The CLI should be more efficient, as you can reduce the complexity: you may need extra flags to achieve the behavior you want, but you can then serialize that into a file (shell script) to guarantee the reproduction of the outcome you want.
GUIs are harder, even without adding more dimensions like time (double click, scripts like AHK or AutoIT...)
If you don't have comparative exposure (automatizing workflows in Windows vs doing the same in Linux), or if you don't have enough experience to achieve what you want, you might jump to the wrong conclusions - but this is a case of being limited by knowledge, not by the tools
yup, we do this where we can, but let's consider a recent example...
They are standardizing around k8slens instead of kubectl. Why, because there are things you can do in k8s lens (like metrics) that you'll never get a good experience around in a terminal. Another big problem with terminals is you have to deal with all sorts of divergences between OSes & shells. A web based interface is consistent. In the end, they decided as a team their preference and that's what gets supported. They also standardized around VS Code, so that's what the docs refer to. I'm pretty much the only one still in vim, I'm not giving up my efficiencies in text manipulation.
I don't disagree with you, but I do see a trend in preferences away based on my experience, our justifications be damned
It looks like a limitation of the tool, not of the method, because metrics could come as CSVs, JSON or any other format in a terminal
I love vim too :)
Trends in preferences are called fashions: they can change the familiarity and the level of experience through exposure, but they are cyclic and without objective.
The core problem is the combinatorial complexity in the problem space, and 1d with ascii will beat 2d with bitmaps.
I'm all for adding graphics to outputs (ex: sixels) but I think depending on graphics as inputs (whether scriping a GUI or processing it with our eyeballs) is riskier and more complex, so I believe our common preferences for CLIs will prevail in the long run.
You're missing the point, it's about graphs humans can look at and gain understanding. A bunch of floating point numbers in a table are never going to give that capability.
This is just one example where a UI outshines a CLI, it's not the only one. There are limitations to what you can do in a terminal, especially if you consider ease of development
If humans are processing the output, I agree with you.
But for an AI (or a script running commands), a bunch of floating point numbers in a table will get you more reliability and better results.
This thread had dropped the AI context up to this point and instead focussed on why CLIs have lost popularity and preference with humans.
I think it has more to do with how close to the brink you are. It takes at least a decade for a technology to mature to the point where there's a polished point and click gui for doing it. It sounds like Borg just hit that inflection point thanks to k8slens which I'm sure is very popular with developers working at enterprises.
That makes a lot of sense, and it would generalize: things that have existed for longer have received more attention and more polish than fresh new things
I'd expect running a binary to be more mature than running a script, and the script to be more mature than a GUI, and complex assemblies with many moving parts (ex: a web browser in a GUI) to be the most fragile
That's another way to see there's an extremely good case for using cosmopolitan: have fewer requirements, and concentrate on the core layers of the OS, the ones that've been improved and refined through the years
100%
I also think people are on tooling burnout, there have been soooo many new tools (and SaaS for that matter), I personally and anecdotally want fewer apps and tools to get my job done. Having to wire them all together creates a lot of complexity, headaches, and time sinks
Same, because if learn the CLI and scripting once, then in most cases you don't have to worry about other workflows: all you need is the ability to project your problem into a 1d serialization (ex: w3m or lynx html dump, curl, wget...) where you can use fewer tools
I would have said point and click is 3-dimensional.
Otherwise, how can you read the text through the edges of buttons before clicking?
70% of the front page of Hackernews and Twitter for the past 9 months is about everybody and their mother's new LLM CLI. It's the loudest exploration I've ever witnessed in my tech life so far. We need to be hearing far less about LLM CLIs, not more.
I've been reading Hacker News pretty closely and I haven't seen that.
Plenty of posts about LLM tools - Ollama, llama.cpp etc - but very few that were specifically about using LLMs with Unix-style CLI piping etc.
What did I miss?
Has anyone written a shell script before that uses a local llm as a composable tool? I know there's plenty of stuff like https://github.com/ggerganov/llama.cpp/blob/master/examples/... where the shell script is being used to supply all the llama.cpp arguments you need to get a chatbot ui. But I haven't seen anything yet that treats the LLM as though it were a traditional UNIX utility like sed, awk, cat, etc. I wouldn't be surprised if no one's done it, because I had to invent the --silent-prompt flag that let me do it. I also had to remove all the code from llava-cli that logged stuff to stdout. Anyway, here's the script I wrote: https://gist.github.com/jart/bd2f603aefe6ac8004e6b709223881c...
It's really hard to put an LLM in the middle of unix pipes because the output is unreliable
I wrote one a while back, mainly so I could use it with vim (without a plugin) and pipe my content in (also at the CLI), but haven't maintained it
https://github.com/verdverm/chatgpt
Justine may have addressed unreliable output by using `--temp 0` [0]. I'd agree that while it may be deterministic, there are other definitions or axes of reliability that may still make it poorly suited for pipes.
[0] > Notice how I'm using the --temp 0 flag again? That's so output is deterministic and reproducible. If you don't use that flag, then llamafile will use a randomness level of 0.8 so you're certain to receive unique answers each time. I personally don't like that, since I'd rather have clean reproducible insights into training knowledge.
`--temp 0` makes it deterministic. What can make output reliable is `--grammar` which the blog post discusses in detail. It's really cool. For example, the BNF expression `root ::= "yes" | "no"` forces the LLM to only give you a yes/no answer.
that only works up to a point. If you are trying to transform a text based cli output into a JSON object, even with a grammar, you can get variation in the output. A simple example is field or list ordering. Omission is the real problematic one
That only works if you have the same input. It also nerfs the model considerably
https://ai.stackexchange.com/questions/32477/what-is-the-tem...
I was including these since they're LLM-related cli tools.
Err... LLM in general? Sure. Specifically CLI LLM stuff? Certainly not 70%...
Granted half of those are submissions about Simon's new projects.
Those are great links. I’ve been using:
https://github.com/npiv/chatblade https://github.com/tbckr/sgpt
I totally agree with LLM+CLI are perfect fit.
One pattern I used recently was httrack + w3m dump + sgpt images with gpt vision to generate a 278K token specific knowledge base with a custom perl hack for a RAG that preserved the outline of the knowledge.
Which brings me to my question for you - have you seen anything unix philosophy aligned for processing inputs and doing RAG locally?
EDIT. Turns out OP has done quite a bit toward what I’m asking. Written up here:
https://simonwillison.net/2023/Oct/23/embeddings/
Something I’m currently a bit hung up on is finding a toolchain for chunking content on which to create embeddings. Ideally it would detect location context, like section “2.1 Failover” or “Chapter 8: The dream” or whatever from the original text, also handle 80 character wide source unwrapping, smart splitting so paragraphs are kept together, etc etc.
That's the same problem I haven't figured out yet: the best strategies for chunking. I'm hoping good, well proven patterns emerge soon so I can integrate them into my various tools.
I’ve been looking at this
https://freeling-user-manual.readthedocs.io/en/v4.2/modules/...
at the freeling library in general, also spaCy and NLTK. The chunking algorithms being used in the likes of LangChain are remarkably bad surprisingly.
There is also
https://github.com/Unstructured-IO/unstructured
But I don’t like it, can’t explain why yet.
My intuition is that 1st step is clean sentences and paragraphs and titles/labels/headers. Then probably an LLM can handle outlining and table of contents generation using a stripped down list of objects in the text.
BRIO/BERT summarization could also have a role of some type.
Those are my ideas so far.
A quick glance at "embedding" reminds me a lot of some work I was doing on quantum computing.
I wonder if there is some crossover potential there, in terms of calculations across vector arrays
I'm heavily using https://github.com/go-go-golems/geppetto for my work, which has a CLI mode and TUI chat mode. It exposes prompt templates as command line verbs, which it can load from multiple "repositories".
I maintain a set of prompts for each repository I am working in (alongside custom "prompto" https://github.com/go-go-golems/prompto scripts that generate dynamic prompting context, i made quite a few for thirdparty libraries for example: https://github.com/go-go-golems/promptos ).
Here's some of the public prompts I use: https://github.com/go-go-golems/geppetto/tree/main/cmd/pinoc...
I am currently working on a declarative agent framework.