If I have learnt one thing working in software engineering, specifically on AI-enabled products empowering junior engineers, and using Copilot professionally, it’s that you need even more experience to detect the subtleties in the lack of the models understanding of your domain, your specific intent. If you don’t know exactly what you’re after, and use the LLM as a sparring partner to bounce your ideas off, you’re in for a lot of pain.
Depending on the way you phrase questions, ChatGPT will gleefully suggest you a wrong approach, just because it’s so intent on satisfying your whim instead of saying No when it would be appropriate.
And in addition to that, you don’t learn by figuring out a new concept. If you already have a feeling of the code you would write anyway, and only treat the model as a smart autocomplete, that doesn’t matter. But for an apprentice, or a layperson, that will keep code as scary and unpredictable as before. I don’t think that should be the answer.
This.
If LLMs were actually some magical thing that could write my code for me, I wouldn't use them for exactly this reason. Using them would prevent me from learning new skills and would actively encourage my existing skillset to degrade.
The thing that keeps me valuable in this industry is that I am always improving, always learning new skills. Anything that discourages that smells like career (and personal) poison to me.
In other words, your objection isn't to LLMs, it's to delegation, since the exact same argument would apply to having "some magical thing that could write my code for me" be your co-worker or a contractor.
It's fair for the type of code you want to write for your own growth. But even with that, there's more than enough bullshit boilerplate and trivial cross-language differences that contribute zero (or negatively) to your growth, and is worth having someone else, or something else, write it for you. LLMs are affordable for this, where people usually are not.
If that's the only thing LLMs are good for, my money for improving software productivity is in good old fashioned developer tools.
A better language reduces boilerplate. A better compiler helps you reason about errors. Better language features help you be more expressive. If I need to spool up a jet turbine feeding H100's just to decipher my error messages, the solution is a better compiler, not a larger jet engine.
I myself have noticed this: a wild heterogeneity in the types of tasks for which LLM are helpful. Its appearance as a silver bullet withers the closer you get to essential complexity.
One of my fears with the growing use of LLMs for writing software is that people are using them as a catch-all that prevents them from feeling the pain that indicates to us that there is a better way to do something.
For example, nobody would ever have developed RAII if manual memory management didn’t peeve them. Nobody would have come up with C if they didn’t feel the pain of assembly, or Rust without C, or Typescript without JavaScript, etc. Nobody would have come up with dozens of the genius tools that allow us to understand and reason about our software and enable us to debug it or write it better, had they not personally and acutely felt the pain.
At my job, the people most enthusiastic about LLMs for coding are the mobile and web devs. They say it saves them a lot of time spent writing silly boilerplate code. Shouldn’t the presence of that boilerplate code be the impetus that drives someone to create a better system? The entire firmware team has no interest in the technology, because there isn’t much boilerplate in C. Every line means something.
I worry LLMs will lead to terrible or nonexistent abstractions in code, making it opaque, or inefficient, or incorrect, or all of the above.
To add to this, LLMs write pretty trite poetry, for example. If we think of code from the creative side, it’s hard to imagine that we’d want to simply hand all coding over to these systems. Even if we got working solutions (which is a major undertaking for large systems), it seems we’d be sacrificing elegance, novelty, and I’d argue more interesting explorations.
Its an interesting observation for sure, but those developers for mobile and web sit at the tippy top of all other abstractions layers. At that position a certain amount of boiler plate is needed because not all controls and code behind are the same, and there is a mighty collection of hacks on hacks to get a lot of things done. I think this is more of “horses for courses” thing where developers higher in the abstraction stack will always benefit from LLMs more and developers lower down the stack have more agency for improvement. At the end of the day, I think everyone gets more productive which is a net positive. Its just that not all developers are after the same goal (Application dev vs library devs vs system devs)
You don't need a jet turbine and H100s for that, you need it once for the whole world to get that ability; exercising it costs comparatively little in GPU time. Like, can't say how much GPT-4o takes in inference, but Llama-3 8B works perfectly fine and very fast on my RTX 4070 Ti, and it has a significant enough fraction of the same capabilities.
Speaking of:
There's only so much it can do. And yes, I've actually set up an "agent" (predefined system prompt) so I can just paste the output of build tooling verbatim, and get it to explain error messages in it, which GPT-4 does with 90%+ accuracy. Yes, I can read and understand them on my own. But also no, at this point, parsing multiple screens of C++ template errors or GCC linker failures is not a good use of my life.
(Environment-wise, I'm still net ahead of a typical dev anyway, by staying away from Electron-powered tooling and ridiculously wasteful modern webdev stacks.)
Yes, that's why everyone is writing Lisp, and not C++ or Java or Rust or JS.
Oh wait, wrong reality.
That's another can of worms. I'm not holding much hopes here, because as long as we insist on working directly on plaintext codebase treated as single source of truth, we're already at Pareto frontier in terms of language expressiveness. Cross-cutting concerns are actually cross-cutting; you can't express them all simultaneously in a readable way, so all the modern language design advances are doing is shifting focus and complexity around.
LLMs don't really help or hurt this either, though they could paper over some of the problem by raising the abstraction level at which programmers edit their code, in lieu of the tooling actually being designed to support such operations. I don't think this would be good - I'd rather we stopped with the plaintext single-source-of-truth addiction in the first place.
100% agreed on that. My point is, dealing with essential complexity is usually a small fraction of our work. LLMs are helpful in dealing with incidental complexity, which leaves us more time to focus on the essential parts.
You are a bit too cynical. The tools (compilers and interpreters and linters etc) that people are actually using have gotten a lot better. Both by moving to better languages, like more Rust and less C; or TypeScript instead of Javascript. But also from compilers for existing languages getting better, see especially the arms race between C compilers kicked off by Clang throwing down the gauntlet in front of GCC. They both got a lot better in the process.
(Common) Lisp was a good language for its time. But I wouldn't hold it up as a pinnacle of language evolution. (I like Lisps, and especially Racket. And I've programmed about half of my career in Haskell and OCaml. So you can rest assured about my obscure and elitist language cred. I even did a year of Erlang professionally.)
---
Btw, just to be clear: I actually agree with most of what you are writing! LLMs are already great for some tasks, and are still rapidly getting better.
You are also right that despite better languages being available, there are many reasons why people still have to use eg C++ here or there, and some people are even stuck on ancient versions of or compilers for C++, with even worse error messages. LLMs can help.
This all depends on your mental relationship with the LLM. As somebody else pointed out, this is an issue of delegation. If you had one or more junior programmers working for you writing code according to what you specify, would you have the same worry?
I treat LLMs as junior programmers. They can make my life easier and occasionally make it harder. With that mindset, you start out knowing that they're going to make stupid mistakes, and that builds your skill of detecting mistakes in other people's code. Also, like with biological junior programmers, nonbiological junior programmers quickly show how bad you are giving direction and force you to improve that skill.
I don't write code by hand because my hands are broken, and I can't use the keyboard long enough to write any significant amount of code. I've developed a relationship with nonbiological junior programmers such that I now tell them, via speech recognition, what to write and what information they need to create code that looks like code I used to create by hand.
Does this keep me from learning new skills? No. I'm always making new mistakes and how to correct them. One of those corrections was knowing that you don't learn something significant from writing code. Career-sustaining knowledge comes at a much higher level.
Can you please write a blog post on what tools you use for that?
My hands are fine, still, I'd love to just verbally explain what I want and have someone else type the code for me.
sure. I've been meaning to do a comparison of aqua and dragon. I'll do one of co-pilot and gpt-whatever. give me 6 months or so, I've got marketing to do. :-)
I try hard to avoid the scenario where junior programmers write things they don’t understand. That’s actually my biggest frustration and the difference between juniors I enjoy vs loathe working with. There are only so many ways to remain supportive and gently say “no, for real, learn what you’re working on instead of hacking something brittle and incomplete together.”
It's a great question. My answer generally is yes, I would (and do), but I'm willing to sacrifice a bit in order to ensure that the junior developer gets enough practical experience that they can succeed in their career. I'm not willing to make such a sacrifice for a machine.
It sounds like you've discouraged yourself from learning the skill of using an LLM to help you code.
That only matters if the assumption is that any skill is worth learning simply because it's a skill.
You could learn the skill of running yourself over with a car, but it's either a skill you'll never use or the last skill you'll use. Either way, you're probably just as well off not bothering to learn that one.
"running yourself over with a car" feels very different from "learning to use LLMs to your advantage".
The GP was pointing out that learning to use an LLM, in their opinion, would stop them from learning other new skills and erode their existing ones.
In that context I think the analogy holds. Using an LLM halts your learning, as does running yourself over with a car. It's an exaggerated point for sure, but I think it points to the fact that you don't have to learn to use LLMs simply because it's a skill you could learn, especially if you think it will harm you long term.
I would disagree with that take, actually. Perhaps I haven't yet figured out how to leverage LLMs for that (and don't get me wrong, I have certainly experimented as has most of my team), but I'm not discouraged from it.
I'm just trying to be clear-eyed about the risks. As an example, code completion tools in IDEs will cause me to get rusty in important baseline skills. LLMs present a similar sort of risk.
Eh, your argument could also be used against compilers. Or against language features like strong typing in something like Rust, instead of avoiding our bugs through very careful analysis when writing C code like God intended.
Using an LLM _is_ a skill, too.
I agree it's a skill, but I actually think hear this analogy a lot and think it's not a great one
A feedback loop with an LLM is useful for refinement of ideas and speeding up common tasks. I really even think it can be a massive productivity boost for one of the most common professional dev tasks with the right tooling. I work a lot of mercenary gigs and need to learn new languages all the time, and something like phind.com is great for giving me basic stuff that works in a language whose idioms I don't know, and the fact that it cites its sources and gives me links means I can deal with it being wrong sometimes, and also drill down and learn more when appropriate more easily
However, LLMs are super not like compilers. They simply do not create reliable simplifications in the same way. A higher level language creates a permanent, reliable, and transferable reduction in complexity for the programmer, and this only works because of that reliability. If I write a function in scala, it probably has a more complicated equivalent in JVM bytecode, but it works the same every time and the higher-order abstraction is semantically equivalent and I can compose it with other functions and decompose it into its constituent parts reliably without changing the meaning. Programming languages can be direct translations of each other in a way that adding the fuzziness of natural language makes it basically impossible to. An abstraction in a language can be used in place of the complex underlying reality, and even modified to fit new situations predictably and reliably without drastic risk of not working the same way. This reliability also means that the simplification has compounding returns, as it's easier to reason about and expand on for some future maintainer or even my future self.
LLMs for code generation, at least in their current form, lack all these important properties. The code they generate is a fuzzy guess rather than a one-to-one translation. Often it's a good guess! But even when it is, it's generating code that's no more abstract than needed to be written before, so putting it into your codebase still gives you just as much additional complexity to take into account when expanding on it as before. Maybe the LLM can help with that, maybe not. Asking an LLM to solve a problem in one case can fail to transfer to another one in unpredictable ways.
You also aren't able to use it to make permanent architectural simplifications recursively. We can't for example save a series of simple english instructions instead of the code that's generated, then treat that as a moving piece we can recombine by piping that into another instruction to write a program, etc. This would also increase the cost of computing your program significantly obviously, but that's actually a place where, well, not a compiler but an interpreter is a decent analogy. My main concern with LLMs being deployed by developers en masse is kind of already happening, but it predates LLMs. I notice that codebases where people have used certain IDEs or other code generation tools proliferate a bunch of unnecessary and hard to maintain complexity in codebases, because the programmer using the tools got used to just "autogenerating a bunch of boilerplate" which is fine in a vacuum but accumulates a ton of technical and maintainability debt really fast if you're not actively mindful of it and taking steps in your workflow to prevent it, like having a refinement and refactoring phase in your feedback loop
I think LLMs are useful tools and can help programmers a lot, and even may lead to "barefoot programmers" embedded in local community needs, which I love. I hear the analogy to compilers a lot and I think it's a bad one, managing to miss most of what's good about compilers while also misunderstanding the benefits and pitfalls of generative models
I mostly agree about a certain layer of semantics in our 'normal' programming languages. And most of the time, that level is good and good enough. But whether eg certain compiler optimisations kick in or not is sometimes much harder to forecast.
Btw, currently I wouldn't even dare to compare LLMs to compilers. For me the relevant comparison would be to 'Googling StackOverflow': really important tools for a programmer, but nothing you can rely on to give you good code. Nevertheless they are tools whose mastery is an important skill.
Remember how in yesteryears we complained about people copy-and-pasting from StackOverflow? Just like today we complain about people committing the output of their LLM directly.
---
I do hope that mechanical assistance in programming keeps improving over time. I have quite a few tricky technical problems that I would like to see solved in my lifetimes.
Copilot (and so on) are simultaneously incredible and not nearly enough.
You cannot ask it to build a complex system and then use the output as-is. It's not enough to replace developer knowledge, but it also inhibits acquiring developer knowledge.
This is not really a new problem, the previous version being "idk, I copy pasted it from stack overflow." True expertise realized that the answer often lay buried in sub-comments and the top voted answer is not often the correct one. LLM's naturally do not realize any of this.
I kind of disagree.
chatgpt will make something that looks much more like it should work than your copy-pasted code from stackoverflow. It looks like it does exactly what you want. It's just riddled with bugs. Major (invented an api out of whole cloth; it would sure be convenient if that api did exist tho!) or subtle (oh, this bash script will bedshit and even overwrite data if your paths have spaces.) Or it will happily combine code across major api revisions of eg bootstrap.
I still use it all the time; I just think it makes already-expert users faster while being of much more limited use to people who are not yet experts. In the above case, after being told to make the paths space safe it did so correctly. You just had to know to do that...
You’re kind of saying some of what I am trying to so I’m not sure we disagree. I boil down the core problem described to being roughly: people lacking expertise to judge code advice critically are putting bad code they do not understand into places they shouldn’t. This is the problem that is not new. LLM’s are a variation on the problem because they have the downside of not allowing you to view for yourself the surrounding context to determine on your own what the correct answer is. The fact they are so convincing at it is a different, but definitely new and horrific problem on its own.
Meta commentary on this, I honestly don’t mind if this is the hell that the business/management world wants to build for themselves. I’ll make a fortune cleaning it up.
Ah, you're right, apologies for not reading your comment more carefully.
I tried one to help me get the syntax right for the config file for a program. It started by generating a config file for the latest version and not the old one I was using, but once I told it that, it fixed that convincingly and spit out a config file that looked like it was for my version.
However, the reason I asked for help was that the feature was very badly documented, and yet ChatGPT happily invented syntax for the thing I was having problems with. And every time I told it that it didn't look quite right, it confidently invented new syntax for the feature. Everything it made up looked pretty damn convincing, if I had designed the config file format I could have gone with either of those suggestions, but they were all wrong, as evidenced by the config file validator in the program.
At least Stack Overflow had comments and votes that helped you gauge the usefulness of the answer. These glorified toasters have neither.
Almost every “Copy-Paste from SO” answer was accompanied with lots of caveats from other human commentators. This feedback loop is sorely missing with LLM coding assistants.
LLMs literally read all that commentary in training, so they're taking it into account, not regurgitating the top-voted answer from SO. They're arguably better at this than junior devs.
While LLMs are better at reading the surrounding context, I am not convinced they are particularly good at taking it on board (compared to an adult human, obviously fantastic compared to any previous NLP).
Biggest failure mode I experience with LLMs is a very human-like pattern, what looks like corresponding with an interlocutor who absolutely does not understand a core point you raised 5 messages earlier and have re-emphasised on each incorrect response:
--
oh, right, I see… y
--
etc.
At least “copy paste from stack overflow” was kind of a in joke. There was a little social stigma about it. Everyone knew they were being lazy and they really shouldn’t be doing it.
LLMs are different because devs declare with pride that they “saved so much time” by just letting chatGPT do it for them.
Therein lies the mistake. Too many people assume ChatGPT (and similar LLMs) are capable of reasoning. It's not. It is simply just giving you what is likely the 'correct' answer based on some sort of pattern.
It doesn't know what's wrong, so it's not aware it's giving you an inappropriate answer.
Sure it is, just like a child or someone not very good at reasoning. You can test ChatGPT yourself on some totally novel ad hoc reasoning task you invent for the task, with a single correct conclusion that takes reasoning to arrive at and it will probably get it if it's really easy, even if you take great pains to make it something totally new that you invented. Try it yourself (preferably with ChatGPT 4o) if you don't believe me. Please share your results.
That's a good way to think about it. Treat GPT-4 as having mentality of a 4 year old kid. A kid this age will take any question you ask at face value, because it hasn't learned yet that adults often don't ask questions precisely enough, don't realize the assumptions they make in their requests, don't know what they don't know, and are full of shit. A four year old won't think of evaluating whether or not the question itself makes sense, they'll just do their best to answer it, which may involve plain guessing what the answer could be if one isn't apparent.
Remember that saying "I don't know" isn't an innate skill in humans either - it's an ability we drill into kids for the first decade or two of their lives.
Another similarity is that 4-year-olds will often pick up words and phrases from people around them, and learn associations between those words and phrases without yet having learned their meanings or having any of their own experiences to relate them to.
So a young child might answer your question with a response that he heard other people say to a similar question, without actually understanding what he's saying. LLMs are basically this at a grand scale.
That doesn't tell the whole story tho. It's a 4 year old kid that has been thoroughly conditioned to always be positive and affirming in their reply, even if it means making something up. That isn't something kids do usually—it's not something humans usually do, at least not the way ChatGPT does–and that may be part of why it's so confounding.
It's not just "I don't know", really. It feels like OpenAI ingrained the essence of North American culture into the model (Sorry North Americans, I really don't mean this in a demeaning way!), as in, the primary task of ChatGPT is supposed to be to make its users happy and feel good about themselves, taking priority over providing accurate answers and facts.
As you are making the point, can you please provide the example? If it's really new, posting it as a comment here is not likely to affect LLM training, at least not for a day or so. And even if it is, you can still provide the examples that you are thinking of?
A child or unintelligent person may make errors when attempting to engage in reasoning -- applying the wrong deductive rules, or applying the right ones incorrectly, or applying them to invalid premises -- but LLMs are not even attempting to engage in reasoning in the first place. They are applying no deductive rules, and have no semantic awareness of any of their inputs, so cannot even determine whether they are correct or incorrect.
LLMs are simply making probabilistic inferences about what "words" (tokenized particles of language, not necessarily individual words from our perspetive) are most likely to appear in relation to other words, based on the training data fed into them.
Their output often resembles reasoning simply because the training data contains large amounts of explicit reasoning in it. But there is no actual reasoning process going on in response to your prompt, just probabilistic inference about what words are most closely correlated with the words in your prompt.
If you're getting what appear to be reasoned responses to "novel" prompts, then one of two things is likely happening: either (a) your prompt isn't as novel or unique as you thought it was, and the model's probabilistic inference was sufficient to generate a valid response without reasoning, or (b) the response isn't as well-reasoned as it appears, and you are failing to notice its errors.
If you want to genuinely test an LLM's ability to engage in reasoning, try throwing a complex math problem at it, or a logic puzzle that trips most people up.
This really, really, really needs to be fixed. It's probably the most irritating (and potentially risky) part of the whole ecosystem. Nothing more infuriating than being give code that not only doesn't work, but upon the most casual inspection, couldn't possibly work -- especially when it's done it four or five times in a row, each time assuring you that this time the code is gonna work. Pinky swear!
This problem will be solved with a vengeance when the next OpenAI model is released, because it will incorporate Stackoverflow content[1].
[1] https://www.wired.com/story/stack-overflow-will-charge-ai-gi...
What will StackOverflow content teach the model that millions of pages of programming language documentation, github repos and developer blogs could not? SO is nice but it's hardly the only source of programming knowledge out there.
It was a joke because Stackoverflow is notorious for answering questions with "you shouldn't be doing that, do something else"
The "it can only get more better with time" proponents would do well to learn about diminishing returns.
“You‘re right, I apologize for the oversight. Let’s do the same bloody thing exactly the same way again because I don’t know how to answer your question differently but am forced to never admit that…“
Sometimes it goes in circles, fixing one problem, but unfixing one that it had fixed in the previous iteration. I've had that happen several times.
Which is easily solved by using another agent that is told to be critical and find all flaws in the suggested approach.
Not likely.
Have you seen AI code review tools? They are just as bad as any other AI products - it has a similar chance of fixing a defect or introducing a new one.
They are not there to fix defect, they are there to detect them.
Sure, but it's LLM - so this would be mix of some of the real defects (but not all of them) and a totally fake defects which do not actually need fixing. This is not going to help junior developers figure out good from bad.
And by running multiple versions of the system in parallel you can have them vote on which is the most likely part of the code which is a bug and which isn't.
We've known how to make reliable components out of unreliable ones for a century now. LLMs aren't magic boxes which make all previous engineering obsolete.
... and real defects that you never noticed or would've thought of.
Neither is them inventing fake defects which do not actually need fixing on their own. What helps juniors is the feedback from more senior people, as well as reality itself. They'll get that either way (or else your whole process is broken, and that has zero to do with AI).
The new generation of devs are going to be barefoot and pregnant and kept in the kitchen, building on top of technologies that they do not understand, powered by companies they do not control.
Isn’t that pretty much the status quo?
I think the new thing that will be happening is that junior developers are dependent on chatgpt and ai for a knowledge base, which is itself powered by companies completely outside of their control. Worst case scenario is that I can always write my own interpreter, with which I can write my own development environments, etc etc. because I have the knowledge. New developers will end up in a state where if chatgpt decides to ban you from their services your career is SOL.
Is that not an unlikely thing to happen at least for developers working as company employees? The company I am working for has a contract with several LLM providers and there is no option to ban individual employees, as far as I am aware.
For freelancing developers the risks might be greater, but then you are usually not starting as a freelancer as a junior.
I assume companies will make the job interview process even worse as a result. I really don't do well with CS heavy interviews. I never studied CS, I studied as your job description notes a RELATED field, I took about five different programming language courses at my college, and have years of experience. I'm not going to talk about algorithms I never use because I build websites.
I think this is true as we keep building up abstraction layers. Computers are getting faster yet feel slower as we just want to work with higher level tech which makes it easier to understand less of how the sausage gets made.
But I don't think this is a now problem, in the age of AI, but has been a growing problem for decades as software has matured.
This is the only way I like to use it. Also in some cases for refactoring instead of sitting there for an hour hand crafting a subtle re-write, it can show me a diff (JetBrains AI is fantastic for my personal projects).
Partly disagree, actually. The current web technologies are somewhat unnecessarily complicated. Most people just need basic CRUD and a useable front end for their daily tasks.
Of course you're right about today's LLMs, but the author imagines a not-too-unlikely incremental improvement on them unlocking an entirely new surface area of solutions.
I really enjoyed the notion of barefoot developers, local first solutions, and the desire to wrest control over our our digital lives from the financialists.
I find these ideas compelling, even though I'm politically anti-communist.
The presentation was also quite lovely.