This is super cool. I share Francois' intuition that the presently data-hungry learning paradigm is not only not generalizable but unsustainable: humans do not need 10,000 examples to tell the difference between cats and dogs, and the main reason computers can today is because we have millions of examples. As a result, it may be hard to transfer knowledge to more esoteric domains where data is expensive, rare, and hard to synthesize.
If I can make one criticism/observation of the tests, it seems that most of them reason about perfect information in a game-theoretic sense. However, many if not most of the more challenging problems we encounter involve hidden information. Poker and negotiations are examples of problem solving in imperfect information scenarios. Smoothly navigating social situations also requires a related problem of working with hidden information.
One of the really interesting things we humans are able to do is to take the rules of a game and generate strategies. While we do have some algorithms which can "teach themselves" e.g. to play go or chess, those same self-play algorithms don't work on hidden information games. One of the really interesting capabilities of any generally-intelligent system would be synthesizing a general problem solver for those kinds of situations as well.
I swear, not enough people have kids.
Now, is it 10k examples? No, but I think it was on the order of hundreds, if not thousands.
One thing kids do is they'll ask for confirmation of their guess. You'll be reading a book you've read 50 times before and the kid will stop you, point at a dog in the book, and ask "dog?"
And there is a development phase where this happens a lot.
Also kids can get mad if they are told an object doesn't match up to the expected label, e.g. my son gets really mad if someone calls something by the wrong color.
Another thing toddlers like to do is play silly labeling games, which is different than calling something the wrong name on accident, instead this is done on purpose for fun. e.g. you point to a fish and say "isn't that a lovely llama!" at which point the kid will fall down giggling at how silly you are being.
The human brain develops really slowly[1], and a sense of linear time encoding doesn't really exist for quite awhile. (Even at 3, everything is either yesterday, today, or tomorrow) so who the hell knows how things are being processed, but what we do know is that kids gather information through a bunch of senses, that are operating at an absurd data collection rate 12-14 hours a day, with another 10-12 hours of downtime to process the information.
[1] Watch a baby discover they have a right foot. Then a few days later figure out they also have a left foot. Watch kids who are learning to stand develop a sense of "up above me" after they bonk their heads a few time on a table bottom. Kids only learn "fast" in the sense that they have nothing else to do for years on end.
I have kids so I'm presuming I'm allowed to have an opinion here.
This is ignoring the fact that babies are not just learning labels, they're learning the whole of language, motion planning, sensory processing, etc.
Once they have the basics down concept acquisition time shrinks rapidly and kids can easily learn their new favorite animal in as little as a single example.
Compare this to LLMs which can one-shot certain tasks, but only if they have essentially already memorized enough information to know about that task. It gives the illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts.
Beyond just learning a new animal, humans are able to learn entirely new systems of reasoning in surprisingly few examples (though it does take quite a bit of time to process them). How many homework questions did your entire calc 1 class have? I'm guessing less than 100 and (hopefully) you successfully learned differential calculus.
Not just that: people learn mathematics mainly by _thinking over and solving problems_, not by memorising solutions to problems. During my mathematics education I had to practice solving a lot of problems dissimilar what I had seen before. Even in the theory part, a lot of it was actually about filling in details in proofs and arguments, and reformulating challenging steps (by words or drawings). My notes on top of a mathematical textbook are much more than the text itself.
People think that knowledge lies in the texts themselves; it does not, it lies in what these texts relate to and the processes that they are part of, a lot of which are out in the real world and in our interactions. The original article is spot on that there is no AGI pathway in the current research direction. But there are huge incentives for ignoring this.
I think it's more accurate to say that they learn math by memorizing a sequence of steps that result in a correct solution, typically by following along with some examples. Hopefully they also remember why each step contributes to the answer as this aids recall and generalization.
The practice of solving problems that you describe is to ingrain/memorize those steps so you don't forget how to apply the procedure correctly. This is just standard training. Understanding the motivation of each step helps with that memorization, and also allows you to apply that step in novel problems.
I think you're wrong. The research on grokking shows that LLMs transition from memorization to generalized circuits for problem solving if trained enough, and parametric memory generalizes their operation to many more tasks.
They have now been able to achieve near perfect accuracy on comparison tasks, where GPT-4 is barely in the double digit success rate.
Composition tasks are still challenging, but parametric memory is a big step in the right direction for that too. Accurate comparitive and compositional reasoning sound tantalizingly close to AGI.
Simply memorizing sequences of steps is not how mathematics learning works, otherwise we would not see so much variation in outcomes. Me and Terence Tao on the same exact math training data would not yield two mathematicians of similar skill.
While it's true that memorization of properties, structure, operations and what should be applied when and where is involved, there is a much deeper component of knowing how these all relate to each other. Grasping their fundamental meaning and structure, and some people seem to be wired to be better at thinking about and picking out these subtle mathematical relations using just the description or based off of only a few examples (or be able to at all, where everyone else struggles).
It's worth noting that for composition, key to abstract reasoning, LLMs failed to generalize to out of domain examples on simple synthetic data.
From: https://arxiv.org/abs/2405.15071
Everyone starts by memorizing how to do basic arithmetic on numbers, their multiplication tables and fractions. Only some then advance to understanding why those operations must work as they do.
Yes, I acknowledged that when I said "Composition tasks are still challenging". Comparisons and composition are both key to abstract reasoning. Clearly parametric memory and grokking have shown a fairly dramatic improvement in comparative reasoning with only a small tweak.
There is no evidence to suggest that compositional reasoning would not also fall to yet another small tweak. Maybe it will require something more dramatic, but I wouldn't bet on it. This pattern of thinking humans are special does not have a good track record. Therefore, I find the original claim that I was responding to("there is no AGI pathway in the current research direction") completely unpersuasive.
I started by understanding. I could multiply by repeat addition (each addition counted one at a time with the aid of fingers) before I had the 10x10 addition table memorized. I learned university level calculus before I had more than half of the 10x10 multiplication table memorized, and even that was from daily use, not from deliberate memorization. There wasn't a day in my life where I could recite the full table.
Maybe schools teach by memorization, but my mom taught me by explaining what it means, and I highly recommend this approach (and am a proof by example that humans can learn this way).
How did you learn what the symbols for numbers mean and how addition works? Did you literally just see "1 + 3 = 4" one day and intuit the meaning of all of those symbols? Was it entirely obvious to you from the get-go that "addition" was the same as counting using your fingers which was also the same as counting apples which was also the same as these little squiggles on paper?
There's no escaping the fact that there's memorization happening at some level because that's the only way to establish a common language.
The point is the memorization exercise requires orders of magnitude fewer examples for bootstrapping.
Does it though? It's a common claim but I don't think that's been rigourously established.
Perhaps that is how you learned math, but it is nothing like how I learned math. Memorizing steps does not help, I sucked at it. What works for me us understanding the steps and why we used them. Once I understood the process and why it worked, I was able to reason my way through it.
Did you look at the types of problems presented by the ARC-AGO test? I don't see how memorization plays any role.
Then lets see how they do on the ARC test? While it is possible that generalized circuits can develop in Ls with enough training but I am pretty skeptical till we see results.
Memorization is literally how you learned arithmetic, multiplication tables and fractions. Everyone starts learning math by memorization, and only later start understanding why certain steps work. Some people don't advance to that point, and those that do become more adept at math.
I understood how to do arithmetic for numbers with multiple digits before I was taught a "procedure". Also, I am not even sure what you mean by "memorization is how you learned fractions". What is there to memorize?
What did you understand, exactly? You understood how to "count" using "numbers" that you also memorized? You intuitively understood that addition was counting up and subtraction was counting down, or did you memorize those words and what they meant in reference to counting?
The procedure to add or subtract fractions by establishing a common denominator, for instance. The procedure for how numerators and denominators are multiplied or divided. I could go on.
I think there is a component of memorizing solutions. For example, for mathematical proofs there is a set of standard "tricks" that you should have memorized.
Sure memory helps a lot, it allows you to concentrate your mental effort on the novel ot unique parts of the problem.
And almost all of it is just more text, or described in more text.
You're very much right about this. And that's exactly why LLMs work as well as they do - they're trained on enough text of all kinds and topics, that they get to pick up on all kinds of patterns and relationships, big and small. The meaning of any word isn't embedded in the letters that make it, but in what other words and experiences are associated with it - and it so happens that it's exactly what language models are mapping.
It is not "just more text". That is an extremely reductive approach on human cognition and experience that does favour to nothing. Describing things in text collapses too many dimensions. Human cognition is multimodal. Humans are not computational machines, we are attuned and in constant allostatic relationship with the changing world around us.
Every time I see people online reduce the human thinking process to just production of a perceptible output, I start questioning myself, whether somehow I am the only human on this planet capable of thinking and everyone else is just pretending. That can't be right. It doesn't add up.
The answer is that both humans and the model are capable of reasoning, but the model is more restricted in the reasoning that it can perform since it must conform to the dataset. This means the model is not allowed to invest tokens that do not immediately represent an answer but have to be derived on the way to the answer. Since these thinking tokens are not part of the dataset, the reasoning that the LLM can perform is constrained to the parts of the model that are not subject to the straight jacket of training loss. Therefore most of the reasoning occurs in-between the first and last layers and ends with the last layer, at which point the produced token must cross the training loss barrier. Tokens that invest into the future but are not in the dataset get rejected and thereby limit the ability of the LLM to reason.
illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts
Now imagine how much would your kid learn if the only input he ever received was a sequence of words?
Are you saying it's not fair for LLMs, because of the way they are taught is different?
The difference is that we don't know better methods for them, but we do know of better methods for people.
I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day. It takes at least 4 years for a human children to be in any way comparable in performance to GPT-4 on any task both of them could be tested on; do people really believe GPT-4 was trained with more data than a 4 year old?
Yeah, but they're seeing mostly the same thing day after day!
They aren't seeing 10k stills of 10k different dogs, then 10k stills of 10k different cats. They're seeing $FOO thousand images of the family dog and the family cat.
My (now 4.5yo) toddler did reliably tell the difference between cats and dogs the first time he went with us to the local SPCA and saw cats and dogs that were not our cats and dogs.
In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.
I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled. In our case, this was a significant source of animal recognition skills of my daughters.
No images or photos (no books).
TV, certainly, but I consider it unlikely that animals in the animation style of pepper pig helps the classifier.
Besides which, we're still talking under a dozen cats/dogs seen till that point.
Forget about cats/dogs. Here's another example: he only had to see a burger patty once to determine that it was an altogether new type of food, different from (for example) a sausage.
Anyone who has kids will have dozens of examples where the classifier worked without a false positive off a single novel item.
I think it was; the guesstimate I've seen is GPT-4 was trained on 13e12 tokens, that over 4 years is 8.9e9/day or about 1e5/s.
Then it's a question of how many bits per token — my expectation is 100k/s is more than the number of token-equivalents we experience, even though it's much less than the bitrate even of just our ears let alone our eyes.
I think that's all fair that both LMMs and and people get a certain (even unbounded) amount of "pretraining" before actual tasks.
But after the training people are much more equipped to do single-shot recognition and cognitive tasks of imagery and situations they have not encountered before, e.g. identifying (from pictures) which animals is being shown, even if it is the second time of seeing that animal (the first being shown that this animal is a zebra).
So, basically, after initial training, I believe people are superior in single-shot tasks—and things are going to get much more interesting once LMMs (or something after that?) are able to do that well.
It might be that GPT-4o can actually do that task well! Someone should demo it, I don't have access. Except, of course, GPT-4o already knows what zebras look like, so something else than exactly that..
So a billion years of evolutionary search plus 20 years of finetuning is a better method?
Until they encounter a similar animal and get confused, at which point you understand the implicit heuristic they were relying on. (Eg. They confused a dairy cow as a zebra, which means their heuristic was a black-and-white quadrupedal)
Doesn't this seem remarkably close to how LLMs behave with one-shot or few-shot learning? I think there are a lot more similarities here than you give it credit for.
Also, I grew up in South Korea where early math education is highly prioritized (for better or for worse). I remember having to solve 2 dozen arithmetic problems every week after school with a private tutor. Yes, it was torture and I was miserable, but it did expose me to thousands more arithmetic questions than my American peers. All that misery paid off when I moved to the U.S. at the age of 12 and realized that my math level was 3-4 years above my peers. So yes, I think human intelligence accuracy also does improve with more training data.
Not many zebras where I live but lots of little dogs. Small dogs were clearly cats for a long time no matter what I said. The training can take a while.
This. My 2.5 y.o. still argues with me that a small dog she just saw in the park is a "cat". That's in contrast to her older sister, who at 5 is... begrudgingly accepting that I might be right about it after the third time I correct her.
And once they learn sarcasm, small dogs are cats again :-)
The thing is that the labels "cat" and "dog" reflect a choice in most languages to name animals based on species, which manifests in certain physical/behavioral attributes. Children need to learn by observation/teaching and generalization that these are the characteristics they need to use to conform to our chosen labelling/distinction, and that other things such as size/color/speed are irrelevant.
Of course it didn't have to be this way - in a different language animals might be named based on size or abilities/behavior, etc.
So, your daughter wanting to label a cat-sized dog as a cat is just a reflection of her not having aligned her generalization of what you are talking about when you say "cat" vs "dog" with her own.
My favourite part of this is when they apply their new words to things that technically make sense, but don't. My daughter proudly pointed at a king wearing a crown as "sharp king" after learning about knives, saws, etc.
I’m quite surprised at this guess and intrigued by your school’s methodology. I would have estimated >30 problems average across 20 weeks for myself.
My kids are still in pre-algebra, but they get way more drilling still, well over 1000 problems per semester once Zern, IReady, etc. are factored in. I believe it’s too much, but it does seem like the typical approach here in California.
I preferred doing large problem sets in math class because that is the only way I felt like I could gain an innate understanding of the math.
For example after doing several hundred logarithms, I was eventually able to do logs to 2 decimal places in my head. (Sadly I cannot do that anymore!) I imagine if I had just done a dozen or so problems I would not have gained that ability.
Yes. All that learning is feeding off one another. They're learning how reality works. Every bit of new information informs everything else. It's something that LLMs demonstrate too, so it shouldn't be a surprising observation.
Sort of, kind of.
Under 5 they don't. Can't speak what happens later, as my oldest kid just had their 5th birthday. But below 5, all I've seen is kids being quick to remember a name, but taking quite a bit longer to actually distinguish between a new animal and similarly looking ones they already know. It takes a while to update the classifier :).
(And no, they aren't going to one-shot recognize an animal in a zoo that they saw first time on a picture hours earlier; it's a case I've seen brought up, and I maintain that even most adults will fail spectacularly at this test.)
Correct, in the sense that the models don't update their weights while you use them. But that just means you have to compare them with ability of humans to one-shot tasks on the spot, "thinking on their feet", which for most tasks makes even adults look bad compared to GPT-4.
I don't believe someone could learn calc in 100 exercises or less. Per concept like "addition of small numbers", or "long division", or "basic derivatives", or "trivial integrals", yes. Note that in-class exercises count too; learning doesn't happen primarily by homework (mostly because few have enough time in a day to do it).
This simply is not true as stated in the article. ARC-AGI is a one-shot task test that humans reliably do much, much better on than any AI model.
I learned the basics of integration in a foreign language I barely understood by watching a couple of diagrams get drawn out and seeing far less than 100 examples or exercises.
Sure, but they learn a lot of labels.
At least 20 to 30 a week, for about 10 weeks of class. Some weeks were more, and I remember plenty of days where we had 20 problems assigned a day.
Indeed, I am a huge fan of "the best way to learn math is to do hundreds upon hundreds of problems", because IMHO some concepts just require massive amounts of repetition.
Two other points - I've also forgotten a bunch, but also know I could "relearn" it faster than the first time around.
To continue your example, I know I've learned calculus and was lauded at the time. Now I could only give you the vagaries, nothing practical. However I know if I was pressed, I could learn it again in short order.
My kid is about 3 and has been slow on language development. He can barely speak a few short sentences now. Learning names of things and concepts made a big difference for him and that's a fascinating watch and realization.
This reminds of the story of Adam learning names, or how some languages can express a lot more in fewer words. And it makes sense that LLMs look intelligent to us.
My kid loves repeating the names of things he learned recently. For past few weeks, after learning 'spider' and 'snake' and 'dangerous' he keeps finding spiders around, no snakes so makes up snakes from curly drawn lines and tells us they are dangerous.
I think we learn fast because of stereo (3d) vision. I have no idea how these models learn and don't know if 3d vision will make multi model LLMs better and require exponentially less examples.
I think stereo vision is not that important if you can move around and get spatial clues that way also.
Every animal/insect I can think of has more than 1 eye. Some has lot more than 2 eyes. It has to be that important.
I haven't seen 1000 cats in my entire life. I'm sure I learned how to tell a dog from a cat after being exposed to just a single instance of each.
I'm sure you saw over 1B images of cats though, assuming 24 images per second from vision.
The AI models aren't seeing the same image 1B times.
Babies, unlike machine learning models, aren't placed in limbo when they aren't running back propagation.
Babies need few examples for complex tasks because they get constant infinitely complex examples on tasks which are used for transfer learning.
Current models take a nuclear reactors worth of power to run back prop on top of a small countries GDP worth of hardware.
They are _not_ going to generalize to AGI because we can't afford to run them.
Nice one. Perhaps we are to conclude the whole transformer architecture is amazingly overblown in storage/computation costs.
AGI or not, we need better approach to what transformers are doing.
Not to mention that babies receive petabytes of visual input to go with other stimuli. It’s up for debate how sample efficient humans actually are in the first few years of their lives.
Hardly. Visual acuity is quite low (limited to a tiny area of the FoV), your brain is filling in all the blanks for you.
My friends toddler, who grew up with a cat in the house, would initially call all dogs "cat". :-D
My niece, 3yo, at the zoo, spent about 30 seconds trying to figure out whether a pig was a cat or a car.
I have a small kid. When they first saw some jackdaws, the first bird they noticed could fly, they thought it was terribly exciting and immediately learned the word for them, and generalised it to geese, crows, gulls and magpies (plus some less common species I don't know what they're called in english), pointing at them and screaming the equivalent of 'jackda! jackda!'.
I think your comment over intellectualises the way children experience the world.
My child experiences the world in a really pure way. They don’t care much about labels or colours or any other human inventions like that. He picks up his carrot, he doesn’t care about the name or the color . He just enjoys it through purely experiencing eating it. He can also find incredible flow state like joy from playing with river stones or looking at the moon.
I personally feel bad I have to each them to label things and but things in boxes. I think your child is frustrated at times because it’s a punish of a game. The departure from “the oceanic feeling.
Your comment would make sense to me if the end game of our brains and human experience is labelling things. It’s not. It’s useful but it’s not what living is about.
Of course for a human this can either mean "I have an idea about what a dog is, but I'm not sure whether this is one" or it can mean "Hey this is a... one of those, what's the word for it again?"
That’s all true, yet my 2.5 year old sometimes one-shots specific information. I told my daughter that woodpeckers eat bugs out of trees after doing what you said and asking “what’s that noise?” for the fifth time in a few minutes when we heard some this spring. She brought it up again at least a week later, randomly. Developing brains are amazing.
She also saw an eagle this spring out the car window and said “an eagle! …no, it’s a bird,” so I guess she’s still working on those image classifications ;)
Second that. I think I've learned as much as my children have.
Watching a baby's awareness grow from pretty much nothing to a fully developed ability to understand the world around is one of the most fascinating parts of being a parent.
well, maybe. We view things in three dimensions at high fidelity: viewing a single dog or cat actually ends up being thousands of training samples, no?
Yes, but we do not call a couch in a leopard print a leopard. Because we understand that the print is secondary to the function.
I'm not sure it's as simple as you say. The first time my very young son saw a horse, he made the ASL sign for 'dog'.
He had only ever seen cats and dogs in his life previous to that.
Did he require 9,999 more examples of horses before learning the difference?
In another comment I replied that 3D high fidelity images do end up being thousands of training samples, so the answer is yes.
I'm deeply skeptical that training AI on (effectively) thousands of images of one horse will perform very well at training to recognize horses in general.
I'll double down with you on this.
Then train the AI using a binaural video of a thoroughbred and see if it can distinguish a draft horse and a quarter horse as horse...
Are you suggesting that if a group of kids were given a book of zoo animals before going to the zoo, they would have difficulties identifing any new animals, because they only have seen one picture of each?
I think that's an interesting question, and a possible counter to my argument.
Certainly kids learn and become better at extrapolation and need fewer and fewer samples in general as they get more life experience.
Hah. My toddler gladly calls her former walking aid toy a "lawn mower". Random toys become pie and cakes she brings to us to eat.
But we have a lot more sensory input and context to verify all of that.
If you kept training LLMs with all that data, it would be interesting to see what the results would be.
Eh, still doesn’t hold up. I really don’t think there’s many psychologists working on the posited mechanism of simple NN-like backprop learning. Aka conditioning, I guess. As Chomsky reminds us every time we let him: human children learn to understand and use language — an incredibly complex and nuanced domain, to say the least — with shockingly little data and often zero-to-none intentional instruction. We definitely employ principles and patterns that are far more complex (more “emergent”?) than linear regression.
Tho I only ever did undergrad stats, maybe ML isn’t even technically a linear regression at this point. Still, hopefully my gist is clear
Chomsky's arguments about "poverty of the stimulus" rely on using non-probabistic grammars. Norvig discusses this here: https://norvig.com/chomsky.html
If I recall correctly, human toddlers hear about 3-13 million spoken words per year, and the higher ranges are correlated with better performance in school. Which:
- Is a lot, in an absolute sense.
- But is still much less training data than LLMs require.
Adult learners moving between English and romance languages can get a pretty decent grasp of the language (C1 or C2 reading ability) with about 3 million words of reading. Which is obviously exploiting transfer learning and prior knowledge, because it's harder in a less related language.
So yeah, humans are impressive. But Chomsky doesn't really seem to have the theoretical toolkit to deal with probabilistic or statistical learning. And LLMs are closer to statistical learning than to Chomsky's formal models.
This isn't accurate comparison imo, because we're mapping language to a world model which was built through a ton of trial and error.
Children aren't understanding language at six months old, there seems to be a minimum amount of experience with physics and the world before language can click for them.
The optimization process that trained the human brain is called evolution, and it took a lot more than 10,000 examples to produce a system that can differentiate cats vs dogs.
Put differently, an LLM is pre-trained with very light priors, starting almost from scratch, whereas a human brain is pre-loaded with extremely strong priors.
Asserted without evidence. We have essentially no idea at what point living systems were capable of differentiating cats from dogs (we don't even know for sure which living systems can do this).
We know for a fact that cats, dogs, and humans do.
As adults, not (as per this thread) genetically.
Sure, but can earthworms? Butterflies? Oak trees? Slime mould? At what point in the history of life did sufficient discrimination to differentiate e.g. a cat and a dog actually arise? Are the mechanisms used for this universal? Are some better than others? etc.
A human brain that doesn't get visual stimulus at the critical age between 0 and 3 years old will never be able to tell the difference between a cat and a dog because it will be forevermore blind.
Commonly believed, but not so: https://www.sciencedaily.com/releases/2007/02/070220021337.h...
I heard a similar case before I did my A-levels, so at least 22 years ago, where the person had cateracts removed and it took a while to learn to see, something about having to touch a statue (of a monkey?) before being able to recognise monkeys?
Humans, I would bet, could distinguish between two animals they've never seen based only on a loose or tangential description. I.e. "A dog hunts animals by tracking and chasing them long enough to exhaust their energy, but a cat is opportunistic and strikes using stealth and agility."
A human that has never seen a dog or a cat could probably determine which is which based on looking at the two animals and their adaptations. This would be an interesting test for AIs, but I'm not quite sure how one would formulate a eval for this.
Seems analogous to bouba/kiki effect:
https://en.m.wikipedia.org/wiki/Bouba/kiki_effect
Only after being exposed to (at least pictures and descriptions of) dozens if not hundreds of different types of animal and their different attributes. Literal decades of training time and carefully curated curriculum learning are required for a human to perform at what we consider ‘human level’.
A possible way to this idea would be to draw two aliens with different hunting strategies and do a poll of which is which. I'd try it but my drawing skills are terrible and I'm averse to using generated images.
If a human eye works at say 10 fps, then 8 minutes with a cat is about 10k images :-D
I'd say that was more like a single instance, one interaction with a thing.
But in that single interaction, you might have seen the cat from all kinds of different angles, in various poses, doing various things, some of which are particularly not-dog-like.
I vaguely remember hearing that there's even ways to expand training data like that for neural networks, i.e. by presenting the same source image slightly rotated, partially obscured etc.
Do computers need 10,000 examples to distinguish dogs from cats when pretrained on other tasks?
No.
There’s a great episode from Darkwish Patels podcast discussing this today
https://youtu.be/UakqL6Pj9xo?si=iDH6iSNyz1Net8j7
Dwarkesh*
Neither do machines. Lookup few-shot learning with things like CLIP.
Humans learn through a lifetime.
Or are we talking about newborn infants?
I don’t know enough of biology or genetics or evolution, but surely the millions of years of training that is hardcoded into our genes and expressed in our biology had much larger “training” runs.
Humans don't need those examples because our brains are very pretrained. Natural fear of snakes and snakelike things, etc etc.
ML models are starting from absolute zero, single celled organism level.