This feels like an excellent demonstration of the limitation of zero-shot LLMs. It feels like the wrong way to approach this.
I'm no expert in the matter, but for "holistic" things (where there are a lot of cross-connections and inter-dependencies) it feels like a diffusion-based generative structure would be better-suited than next-token-prediction. I've felt this way about poetry-generation, and I feel like it might apply in these sorts of cases as well.
Additionally, this is a highly-specialized field. From the conclusion of the article:
Overall we have some promising directions. Using LLMs for circuit board design looks a lot like using them for other complex tasks. They work well for pulling concrete data out of human-shaped data sources, they can do slightly more difficult tasks if they can solve that task by writing code, but eventually their capabilities break down in domains too far out of the training distribution.
We only tested the frontier models in this work, but I predict similar results from the open-source Llama or Mistral models. Some fine tuning on netlist creation would likely make the generation capabilities more useful.
I agree with the authors here.
While it's nice to imagine that AGI would be able to generalize skills to work competently in domain-specific tasks, I think this shows very clearly that we're not there yet, and if one wants to use LLMs in such an area, one would need to fine-tune for it. Would like to see round 2 of this made using a fine-tuning approach.
There is one posted on HN every week. How many more do we need to accept the fact this tech is not what it is sold at and we are bored waiting for it get good? I am not say "get better", because it keeps getting better, but somehow doesn't get good.
That’s a perception and the problem isn’t the AI it’s human nature: 1. every time AI is able to do a thing we move the goalposts and say, yeah, but it can’t do that other thing over there; 2. We are impatient, so our ability to get bored tends to outpace the rate of change.
What goals were achieved that I missed? Even for creative writing and image creation it still requires significant human guidance and correction.
This is a great example of goalposts shifting. Even having a model that can engage in coherent conversation and synthesize new information on the fly is revolutionary compared to just a few years ago. Now the bar has moved up to creativity without human intervention.
But isn't this goalpost shifting actually reasonable?
We discovered this nearly-magical technology. But now the novelty is wearing off, and the question is no longer "how awesome is this?". It's "what can I do with it for today?".
And frustratingly, the apparent list of uses is shrinking, mostly because many serious applications come with a footnote of "yeah, it can do that, but unreliably and with failure modes that are hard for most users to spot and correct".
So yes, adding "...but without making up dangerous nonsense" is moving the goalposts, but is it wrong?
IMO it’s not wrong to want the next improvement (“…but without making up dangerous nonsense”), but it is disingenuous to pretend as if there hasn’t already been a huge leap in capabilities. It’s like being unimpressed with the Wright brothers’ flight because nobody has figured out commercial air travel yet.
The leap has indeed been huge, but it's still not useful for any anything. The Wright brothers did not start a passenger airline after the first try.
There are a lot of things where being reliable isn’t as important (or it’s easier to be reliable).
For example, we are using it to do meeting summaries and it is remarkably good at it. In fact, in comparison to humans we did A/B testing with - usually better.
Another thing is new employee ramp. It is able to answer questions and guide new employees much faster than we’ve ever seen before.
Another thing I’ve started toying with it with, but have gotten incredible results so far is email prioritization. Basically letting me know which emails I should read most urgently.
Again, these were all things where the state of the art was basically useless 3 years ago.
No it's not. You can not shift goalposts that do not exist in the first place.
The other side of this coin is everyone overhyping what AI can do, and when the inevitable criticism comes, they respond by claiming the goal posts are being moved. Perhaps, but you also told me it could do XYZ, when it can only do X and some Y, but not much Z, and it’s still not general intelligence in the he broad sense.
I appreciate this comment because I think it really demonstrates the core problem with what I'll call the "get off my lawn >:|" argument, because it's avowedly about personal emotions.
It's not "general intelligence", so it's over hyped, and They get so whiny about the inevitable criticism, and They are ignoring that it's so mindnumbingly boring to have people making the excuse that "designed a circuit board from scratch" wasn't something anyone thinks or claims an LLM should do.
Who told you LLMs can design circuit boards?
Who told you LLMs are [artificial] general intelligence?
I get sick of it constantly being everywhere, but I don't feel the need to intellectualize it in a way that blames the nefarious ???
*waves*
Everyone means a different thing by each letter of AGI, and sometimes also by the combination.
I know my opinion is an unpopular one, but given how much more general-purpose they are than most other AI, I count LLMs as "general" AI; and I'm old enough to remember when AI didn't automatically mean "expert level or better", when it was a surprise that Kasparov was beaten (let alone Lee Sedol).
LLMs are (currently) the ultimate form of "Jack of all trades, master of none".
I'm not surprised that it failed with these tests, even though it clearly knows more about electronics than me. (I once tried to buy a 220 kΩ resistor, didn't have the skill to notice the shop had given me a 220 Ω resistor, the resistor caught fire).
I'd still like to call these things "AGI"… except for the fact that people don't agree on what the word means and keep objecting to my usage of the initials as is, so it would't really communicate anything for me to do so.
ML scientists will tell you it can do X and some Y but not much Z. But the public doesn’t listen to ML scientists. Most of what the public hears about AI comes from businessmen trying to market a vision to investors — a vision, specifically, of what their business will be capable of five years from bow given predicted advancements in AI capabilities in the mean time; which has roughly nothing to do with what current models can do.
I don’t think the problem is moving the goalposts, but rather there are no actual goalposts. Advocates for this technology imply it can do anything either because they believe it will be true in the near future or they just want others to believe it for a wide range of reasons including to get rich of it. Therefore the general public has no real idea what the ideal use cases are for this technology in its current state so they keep asking it to do stuff it can’t do well. It is really no different than the blockchain in that regard.
One of the main issues I see amongst advocates of AI is that they cannot quantify the benefits and ignore provable failings of AI.
So are you happy that a 1940s tic-tac-toe computer "is AI"? And that's going to be your bar for AI forever?
"Moving the goalposts is a metaphor, derived from goal-based sports such as football and hockey, that means to change the rule or criterion of a process or competition while it is still in progress, in such a way that the new goal offers one side an advantage or disadvantage." - and the important part about AI is that it be easy for developers to claim they have created AI, and if we move the goalposts then that's bad because ... it puts them at an unfair disadvantage? What is even wrong with "moving the goalposts" in this situation, claiming something is/isn't AI is not a goal-based sport. The metaphor is nonsensical whining.
No I'd say it's that people are very bad at knowing what they want, and worse at knowing how to get it.
While it might be "moving the goal posts" the issue is that the goal posts were arbitrary to start with. In the context of the metaphor we put them on the field so there could be a game, despite the outcome literally not mattering anywhere else.
This isn't limited to AI: anyone dealing with customers knows that the worst thing you can do is take what the customer says their problem is at face value, replete with the proposed solution. What the customer knows is they have a problem, but it's very unlikely they want the solution they think they do.
I'm in awe of the progress in AI images, music, and video. This is probably where AI shines the most.
Soon everything you see and hear will be built up through a myriad of AI models and pipelines.
It is so bizarre that some people view this as a positive outcome.
These are tools. Humans driving the tools have heart and soul and create things of value through their lens.
Your argument rhymes with:
- "Let's keep using horses. They're good enough."
- "Photography lacks the artistic merit of portrait art."
- "Electronic music isn't music."
- "Vinyl is the only way to listen to music."
- "Digital photography ruins photography."
- "Digital illustration isn't real illustration and tablets are cheating."
- "Video games aren't art."
- "Javascript developers aren't real programmers."
Though I'm paraphrasing, these are all things that have been said.
I bet you my right kidney that people will use AI to produce incredible art that will one day (soon) garner widespread praise and accolade.
It's just a tool.
The specific phrase used was "everything you see and hear" (emphasis mine). You weren't arguing this would be an optional tool that could be used in the creation of art. You were arguing that this will replace all other art. That isn't an argument that photography is an art equal to painting, it is an argument for it to replace painting.
The population of people who want to create art is higher than the people who have the classical skills. By sheer volume, the former will dominate the latter. And eventually most artists will begin to use AI tools when they realize that's what they are -- tools.
Now combine that with the photography and painting analogy that you made in the previous post. Photography was invented some 2 centuries ago. Do you think the world would be better if every painter of that era, including the likes of van Gogh and Picasso, picked up a camera instead of a paintbrush?
Surely there's some point where it ceases being a tool though. We can't both be making AIs out to be comparable to humans while simultaneously calling them tools. Otherwise people who commission art would be considered artists using a tool.
Many many successful artists from the Renaissance until today are not actually artists but just rich people with a workshop full of actual artist they commission works from. The rich person curates.
Many times this also happens with artists themselves. After a point, you are getting way more commissions than you can produce yourself, so you employ a small army of understudies that learn your techniques and make your pieces for you. So what you describe has existed for hundreds of years.
A short list could include old ones like Rembrandt or Rubens and a new ones like Jeff Koons or Damien Hirst.
Just to play devil's advocate - I'm surprised you (and many other people apparently) are unable to tell the operative difference between something like:
1. (real illustration vs digital illustration)
2. (composing on sheet music vs composing in a DAW)
and
3. illustration vs Stable Diffusion
4. composing vs generative music models such as Suno
What's different is the wide disparity between input and output. Generally, art has traditionally had a closer connection between the "creator" and the "creation". Generative models have married two conventionally highly disparate mediums together, e.g. text to image / text to audio.
If you have zero artistic ability, you'd have about as much success using Photoshop as you would with traditional pencil and paper.
Whereas any doofus can type in the description of something along with words like "3D", "trending on artstation", "hyper-realistic,", and "4K" and then proceed to generate thousands of images in automatic1111 which they can flood DeviantArt with in a single day.
The same applies to music composition whether you are laboriously notating with sheet music or dropping notes using a horizontal tracker in a DAW like Logic. If you're not a musician, the fanciest DAW in the world won't make you one.
I don't think you realize the sheer scale of people that are working their asses off to leverage AI in their work in creative ways, often times bending over backwards to get it to work.
I spent 48 hours two weeks back (with only a few hours of sleep) making an AI film. I used motion capture, rotoscoping, and a whole host of other tools to accomplish this.
I know people who have spent months making AI music videos. People who painstakingly mask and pose skeletons. People who design and comp shots between multiple workflows.
These are tools.
What I find bizarre is people gatekeeping the process that helps get things from imagination onto canvas.
Artists and "creative" people have long held a monopoly on this ability and are now finally paying the price now that we've automated them away and made their "valuable" skill a commodity.
I've seen a lot of schadenfreude towards artists recently, as if they're somehow gatekeeping art and stopping the rest of us from practicing it.
I really struggle to understand it; the barrier of entry to art is basically just buying a paper and pencil and making time to practice. For most people the practice time could be spent on many things which would have better economic outcomes.
Doesn't this term imply an absence of competition? There seems to be a lot of competition. Anyone can be an artist, and anyone can attempt to make a living doing art. There is no certification, no educational requirements. I'm sure proximity to wealth is helpful but this is true of approximately every career or hobby.
Tangentially, there seem to be positive social benefits to everyone having different skills and depending on other people to get things done. It makes me feel good when people call me up asking for help with something I'm good at. I'm sure it feels the same for the neighborhood handyman when they fix someone's sink, the artist when they make profile pics for their friends, etc. I could be wrong but I don't think it'll be entirely good for people when they can just have an AI or a robot do everything for them.
I sincerely hope not. Talk about a dystopian future. That’s even worse than what social media has become.
Why would that be describing a dystopian future? A more generous framing might be to say that incredibly creative feats will be available to more people, and those who are particularly talented will create things that are now beyond our imagination using these tools. Who knows if that is how it will actually play out, but it also does not seem unreasonable to think that it might.
They already are, when using the meaning of "AI" that I grew up with.
The Facebook feed is AI; Google PageRank is AI; anti-spam filters are AI; A/B testing is AI; recommendation systems are AI.
It's been a long time since computers took over from humans with designing transistor layouts in CPUs — I was hearing about the software needing to account for quantum mechanics nearly a decade ago already.
There's this odd strain of thought that there's some general thing that will pop for hucksters and the unwashed masses, who are sheep led along by huckster wolves who won't admit LLMs aint ???, because they're profiting off it
It's frustrating because it's infantalizing, it derails the potential of an interesting technical discussion (ex. Here, diffusion), and it misses the mark substantially.
At the end of the day, it's useful in a thousand ways day to day, and the vast majority of people feel this way. The only people I see vehemently arguing the opposite seem to assume only things with 0 error rate are useful or are upset about money in some form.
But is that really it? I'm all ears. I'm on a 5 hour flight. I'm genuinely unclear on whats going on that leads people to take this absolutist position that they're waiting for ??? to admit ??? about LLMs.
Yes, the prose machine didnt nail circuit design, that doesn't mean whatever They you're imagining needs to give up and accept ???
So what should we make of the presence of actual hucksters and actual senior execs who are acting like credulous sheep? I see this every day in my world.
At the same time I do appreciate the actual performance and potential future promise of this tech. I have to remind myself that the wolf and sheep show is a side attraction, but for some people it’s clearly the main attraction.
The wolves/sheep thing was to indicate how moralizing and infantalizing serves as a substitute for actually explaining what the problem is, because surely, it's not that the prose machine isn't doing circuit design.
I'm sure you see it, I'd just love for someone to pause their internal passion play long enough to explain what they're seeing. Because I refuse to infantalize, I refuse to believe it's just grumbling because its not 100% accurate 100% of the time, and doesn't do 100% of everything.
I am literally right now explaining to a senior exec why some PR hype numbers about developer productivity from genAI are not comparable to internal numbers, because he is hoping to say to his bosses that we’re doing better than others. This is a smart, accomplished person, but he can read the tea leaves.
The problem with hype is that it can become a pathological form of social proof.
I see, I'm sorry that's happening :/ I was lucky enough to transition from college dropout waiter to tech startup on the back of the iPad, 6 years in, sold it and ended up at still-good 2016 Google. Left in 2023 because of some absolutely mindnumbingly banal-ly evil middle management. I'm honestly worried about myself because I cannot. stand. that. crap., Google was relatively okay, and doubt I could ever work for someone else again. it was s t u n n i n g to see how easily people slip into confirmation bias when it involves pay / looking good.
fwiw if someone's really into Google minutae: I'm not so sure it is relatively okay anymore, it's kinda freaky how many posts there are on Blind along the lines of "wow I left X for here, assumed i'd at least be okay, but I am deeply unhappy. its much worse than average-white-collar job I left"Are there any write ups of the newly evil Google experience I can read about? When did things shift for you in the 2016 - 2023 timeframe?
No, my way of dealing with it is to whine on HN/twitter occasionally and otherwise don't say anything publicly. Feel free to reach out at jpohhhh@gmail, excuse the overly familiar invitation, paying it forward because I would have found talking about that sort of thing f a s c i n a t i n g.
in general id recommend Ian Hickson's blog post on leaving. I can't remember the exact quote that hit hard, something like decisions moved from being X to Y to Z to being for peoples own benefit.
I'd also add there was some odd corrupting effects from CS turning into something an aimless Ivy Leaguer would do if they didn't feel like finance.
I’ll play along. The thing that’s annoying me lately is that session details leaking between chats has been enabled as a “feature”, which is quickly making ChatGPT more like the search engine and social media echo chambers that I think lots of us want to escape. It’s also harmful for the already slim chances of having reproducible / deterministic results, which is bad since we’re using these things for code generation as well as rewriting emails and essays or whatever.
Why? Is this naive engineering refusing to acknowledge the same old design flaws? Nefarious management fast tracking enshittification? Or do users actually want their write-a-naughty-limerick goofs to get mixed up with their serious effort to fast track circuit design? I wouldn’t want to appear cynical but one of these explanations just makes more sense than the others!
The core tech such as it is is fine, great even. But it’s not hard to see many different ways that it’s already spiraling out of control.
(thank you!) 100% cosign. It breaks my. goddamn. heart. that [REDACTED], the consummate boring boneheaded SV lackey is [REDACTED] of [REDACTED], and can't think outside 6 week sprints and never finishes launching. This is technology that should be freeing us from random opaque algorithmic oppression and enabling us to take charge if we want. I left Google to do the opposite, and I'm honestly stunned that it's a year later and there's nothing on the market that challenges that. Buncha me-too nonsense doing all the shit I hate from the 2010s: bulk up on cash, buy users, do the recurring revenue thing and hope x > y, which inevitably, it won't be.
Why should we even?
The problem with everything today is not only that it’s hype-centric, but that that carries away those who were otherwise reasonable. AI isn’t any special in this regard, it’s just “crypto” of this decade.
I see this trend everywhere, in tech, socio, markets. Everything is way too fake, screamy and blown out of proportion.
Irony: humans think in very black-and-white terms, one could even say boolean; conversely LLMs display subtly and nuance.
When I was a kid, repeats of Trek had Spock and Kirk defeating robots with the liar's paradox, yet today it seems like humans are the ones who are broken by it while the machines are just going "I understood that reference!"
And yet we still don’t have Data or the Holographic Doctor.
You're demonstrating my point :)
When we get to that level, we're all out of work.
In the meantime, LLMs are already basically as good as the scriptwriters made the TNG-VOY era starship computers act.
Excellent point, it really is what it comes down to. There's people getting hoodwinked and people hoodwinking and then me, the one who sees them for what they are.
how long does it take for a child to start doing surgery? publishing novel theorems? how long has the humble transformer been around?
Nobody is telling an experienced heart surgeon to step aside and let a child plan an open heart surgery. And yet, AI and LLMs in particular are being sold as the tools that can do complex tasks like that. But let's leave complex tasks and have a look at marketing behind one of the tools that's aimed at business. The messaging of one of the ads I'm seeing promises that the tools in question can summarise a 150-page long document into a 5-slide presentation. Now, that sounds amazing, if we ignore the fact that a person who wrote a 150-page document has already prepared an outline and is perfectly capable of summarising each section of the document. Writing a 150-page document without a plan and not being able to organise would mean that people have evolved into content generators that need machines to help them write tables of contents and reformat them into a presentation. Coming back to your child analogy, why would a child be better at summarising content it knows nothing about that the person who wrote it?
we do get consultants coming into companies and telling the experienced professionals how to screw up stuff all the time though. i think there are laws with teeth and of course the immediate body to get rid of that helps surgeons maintain the integrity of their profession. when the outcome is far removed from the decision, you do get people like ministers meddling in things they don't understand and leave the consequences for the next administration.
Wall-clock or subjective time?
I think it would take a human about 2.6 million (waking) years to actually read Common Crawl[0]; though obviously faster if they simply absorb token streams as direct sensory input.
The strength of computers is that transistors are (literally) faster than synapses to the degree to which marathon runners are faster than continental drift; the weakness is they need to, too — current generation AI is only able to be this good due to this advantage allowing it to read far more than any human.
How much this difference matters depends on the use-case: if AI were as good at learning as we are, Tesla's FSD would be level 5 autonomy years ago already, even with just optical input.
[0] April 2024: 386 TiB; assuming 9.83 bits per word and 250 w.p.m: https://www.wolframalpha.com/input?i=386+TiB+%2F+9.83+bits+p...
Subjective time doesn't really matter unless something is experiencing it. It could be 2.6 million years, but if the wall-clock time is half a year, then great - we've managed to brute-force some degree of intelligence in half a year! And we're at the beginning of this journey; there surely are many things to optimize that will decrease both wall-clock and subjective training time.
As the saying goes - "make it work, make it right, make it fast".
This post supports your case way less than you think. I've sent it to several EE friends and none have expressed your discontent. The general consensus has been "amazing what AI can do nowadays", and I agree. This would have been complete science-fiction just a couple of years ago.
My gut agrees with you that LLMs shouldn't do this well on a specialty domain.
But I think there's also the bitter lesson to be learned here: many times people say LLMs won't do well on a task, they are often surprised either immediately or a few months later.
Overall not sure what to expect, but fine tuning experiments would be interesting regardless.
I doubt it'd work any better. Most of EE time I have spent is swearing at stuff that looked like it'd work on paper but didn't due to various nuances.
I have my own library of nuances but how would you even fine tune anything to understand the black box abstraction of an IC to work out if a nuance applies or not between it and a load or what a transmission line or edge would look like between the IC and the load?
This is where understanding trumps generative AI instantly.
I doubt it too, but I notice that I keep underestimating the models.
Do you have a challenge task I can try? What's the easiest thing I could get an LLM to do for circuit board design that would surprise you?
Make two separate signals arrive at exactly the same time on two 50 ohm transmission lines that start and end next to each other and go around a right hand bend. At 3.8GHz.
Edit: no VSWR constraint. Can add that later :)
Edit 2: oh or design a board for a simple 100Mohm input instrumentation amplifier which knows what a guard ring is and how badly the solder mask will screw it up :)
It would seem to me that the majority of boards would be a lot more forgiving. Are you saying you wouldn't be impressed if it could do only say 70% of board designs completely?
No because it’s hard enough picking up an experienced human’s designs and work with them. A 70% done board is a headache to unwrap. I’d start again.
This is how I am with software. There's usually a reason I'm arriving at 70% done, and it's not often because it's well designed and documented...
Not the GP, but as an EE I can tell you that the majority of boards are not forgiving. One bad connection or one wrong component often means the circuit just doesn't work. One bad footprint often means the board is worthless.
On top of that, making an AI that can regurgitate simple textbook circuits and connect them together in reasonable ways is only the first step towards a much more difficult goal. More subtle problems in electronics design are all about context-dependent interactions between systems.
I hate that this is true. I think ML itself could be applied to the problem to help you catch mistakes in realtime, like language servers in software eng.
I have experience building boards in Altium and found it rather enjoyable; my own knowledge was often a constraint as I started out, but once I got proficient it just seemed to flow out onto the canvas.
There are some design considerations that would be awesome to farm out to genai, but I think we are far from that. Like stable-diffusion is to images, the source data for text-to-PCB would need to be well-labeled in addition to being correllated with the physical PCB features themselves.
The part where I think we lose a lot of data in pursuit of something like this, is all of the research and integration work that went on behind everything that eventually got put into the schematic and then laid out on a board. I think it would be really difficult to "diffuse" a finished PCB from an RFQ-level description.
Right - LLMs would be a bit silly for these cases. Both overkill and underkill. Current approach for length matching is throw it off to a domain specific solver. Example test-circuit: https://x.com/DuncanHaldane/status/1803210498009342191
How exact is exactly the same time? Current solver matches to under 10fs, and I think at that level you'd have to fab it to see how close you get with fiber weave skew and all that.
Do you have a test case for a schematic design task?
Yeah. But you need $200k worth of Keysight kit to test it.
The point is there’s a methodology to solve these problems already. Is this better? And can it use and apply it?
Really? Most of the time?
I find I spend an enormous amount of time on boring stuff like connecting VCC and ground with appropriate decoupling caps, tying output pins from one IC to the input pins on the other, creating library parts from data sheets, etc.
There's a handful of interesting problems in any good project where the abstraction breaks down and you have to prove your worth. But a ton of time gets spent on the equivalent of boilerplate code.
If I could tell an AI to generate a 100x100 prototype with such-and-such a microcontroller, this sensor and that sensor with those off-board connectors, with USB power, a regulator, a tag-connect header, a couple debug LEDs, and break out unused IO to a header...that would have huge value to my workflow, even if it gave up on anything analog or high-speed. Presumably you'd just take the first pass schematic/board file from the AI and begin work on anything with nuance.
If generative AI can do equivalent work for PCBs as it can do for text programming languages, people wouldn't use it for transmission line design. They'd use it for the equivalent of parsing some JSON or making a new class with some imports, fields, and method templates.
"Looks like you forgot pullups on your i2c lines" would be worth a big monthly subscription hahaha.
There are schematic analysis tools which do that now just based on the netlist
This totally didnt happen to me again recently. But next time I surely won't forget those. (Cue to a few months from now...)
I've found that for speeding up design generation like that, most of the utility comes from the coding approach.
AI can't do it itself (yet), and having it call the higher level functions doesn't save that much time...
Heh. This is very true. I think perhaps the thing I'm most amazed by is that simple next-token prediction seems to work unreasonably well for a great many tasks.
I just don't know how well that will scale into more complex tasks. With simple next-token prediction there is little mechanism for the model to iterate or to revise or refine as it goes.
There have been some experiments with things like speculative generation (where multiple branches are evaluated in parallel) to give a bit of a lookahead effect and help avoid the LLM locking itself into dead-ends, but they don't seem super popular overall -- people just prefer to increase the power and accuracy of the base model and keep chugging forward.
I can't help feeling like a fundamental shift something more akin to a diffusion-based approach would be helpful for such things. I just want some sort of mechanism where the model can "think" longer about harder problems. If you present a simple chess board to an LLM or a complex board to an LLM and ask it to generate the next move, it always responds in the same amount of time. That alone should tell us that LLMs are not intelligent, and they are not "thinking", and they will be insufficient for this going forward.
I believe Yann LeCun is right -- simply scaling LLMs is not going to get us to AGI. We need a fundamental structural shift to something new, but until we stop seeing such insane advancements in the quality of generation with LLMs (looking at you, Claude!!), I don't think we will move beyond. We have to get bored with LLMs first.
Is that true, especially if you ask it to think step-by-step?
I would think the model has certain associations for simple/common board states and different ones for complex/uncommon states, and when you ask it to think step-by-step it will explain the associations with a particular state. That "chattiness" may lead it to using more computation for complex boards.
That's fair -- there's a lot of room to grow in this area.
If the LLM has been trained to operate with running internal-monologue, then I believe they will operate better. I think this definitely needs to be explored more -- from what little I understand of this research, the results are sporadically promising, but getting something like ReAct (or other, similar structures) to work consistently is something I don't think I've seen yet.
There is such a mechanism - multiple rounds of prompting. You can implement diverse patterns (chains, networks) of prompts.
We have 0 y/o/y progress on Advent of Code, for example. Maybe we'll have some progress 6 months from now :) https://www.themotte.org/post/797/chatgpt-vs-advent-of-code
Have you tried using more 4000x more samples?
https://redwoodresearch.substack.com/p/getting-50-sota-on-ar...
Some research to the contrary [1] - tldr is that they didn't find evidence that generative models really do zero shot well at all yet, if you show it something it literally hasn't seen before, it isn't "generally intelligent" enough to do it well. This isn't an issue for a lot of use-cases, but does seem to add some weight to the "giga-scale memorization" hypothesis.
[1] https://arxiv.org/html/2404.04125v2
One downside for diffusion based systems (and I'm very noob in this) is that the model won't be able to see it's input and output in the same space, therefore wouldn't be able to do follow-up instructions to fix things or improve on it. Where as an LLM generating html could follow instructions to modify it as well. It's input and output are the same format.
Oh? I would think that the input prompt to drive generation is not lost during generation iterations -- but I also don't know much about the architectural details.
I asked this question of Duncan Dec 22!
If you are interested I highly recommend this + your favorite llm. It does not do everything but is far superior to some highly expensive tools, in flexibility and repeatability. https://github.com/devbisme/skidl
This tool looks really powerful, thanks for the link!
One thing I've been personally really intrigued by is the possibility of using self-play and adversarial learning as a way to advance beyond our current stage of imitation-only LLMs.
Having a strong rules-based framework to be able to be able to measure quality and correctness of solutions is necessary for any RL training setup to proceed. I think that skidl could be a really nice framework to be part of an RL-trained LLM's curriculum!
I've written down a bunch of thoughts [1] on using games or code-generation in an adversarial training setup, but I could see circuit design being a good training ground as well!
* [1] https://github.com/HanClinto/MENTAT
I agree diffusion makes more sense for optimizing code-like things. The tricky part is coming up with a reasonable set of "add noise" transformations.
Yes, as well as dealing with a variable-length window.
When generating images with diffusion, one specifies the image ahead-of-time. When generating text with diffusion, it's a bit more open-ended. How long do we want this paragraph to go? Well, that depends on what goes into it -- so how do we adjust for that? Do we use a hierarchical tree-structure approach? Chunk it and do a chain of overlapping segments that are all of fixed-length (could possibly be combined with a transformer model)?
Hard to say what would finally work in the end, but I think this is the sort of thing that YLC is talking about when he encourages students to look beyond LLMs. [1]
* [1] https://x.com/ylecun/status/1793326904692428907
I like how you called it holistic, it is maybe the first time I see this word not in a "bad" context.
What about the topic, it is impossible to synthesize STEM things not in the manner an engineer does this. I mean thou shalt to know some typical solutions and have all the calculations for all what's happening in the schematic being developed.
Textbooks are not a joke and no matter who are you - a human or a device.