Security/privacy concerns aside, Recall doesn't particularly feel like an "AI" feature to me. It's on-device OCR plus a SQLite database that you can then search, right?
Even by today's loose definition of "AI" I'm having trouble making that connection. I guess the OCR is based on machine learning?
Is there an LLM component to Recall that I've missed? If so, can I build websites with prompt injection attacks on them that will target Recall users directly, by getting my rogue instructions indexed into their SQLite database and later fed into the LLM?
It's long been noticed that every time AI researchers figure out how to do a thing, it goes from "this is impossible SciFi nonsense" to "that's not real AI".
It's weird for me to actually encounter people doing this.
I remember when OCR was impossible, at any decent quality level, for AI. We used this to stop bots logging into forums, with a thing we called a "Completely Automated Public Turing test to tell Computers and Humans Apart" or CAPTCHA for short. It started off as text, then the text got increasingly distorted until most humans had trouble reading it, then it became house numbers, then it became "click all pictures of XYZ", before mostly disappearing into analytics seeing where you hovered your mouse cursor and which other websites you visited.
If we are going to define AI as whatever AI researchers are working on (a definition having something of a bootstrap problem), then the only way the goalposts will not move is when they are not making progress.
I wonder which other professions exhibit the same effect?
Artists certainly don't, as art remains art even when the artist themselves becomes lost to time.
I suspect politicians may be, for every problem they do solve becomes the status quo… though also every problem they don't solve becomes their fault, so perhaps not.
Civil engineers may be on the boundary, people forgetting that cities are built rather than natural when complaining about immigrants with phrases such as "we are full", yet remember well enough for politicians to score points with what they promise to get built.
I struggle to see how anything we have today is “AI”.
So you think we’ve done it?
We’ve solved the “AI” problem.
We can just stop working on it now?
Rather than posturing, perhaps you could provide us with the definition of “AI” so we can all agree it’s here.
And if statements remain if statements. What’s your point?
I disagree that there is any “effect” worth pondering, but here’s a biting quote that if written by a tech bro with unsubstantiated zeal for wasting planetary resources to engorge the wealth of unethical sociopaths it would have the word “goalposts” in it, and be the worse for it.
“Fashion is a form of ugliness so intolerable that we have to alter it every six months.” -Oscar Wilde
Your struggle is inherent in the "AI effect".
Calling it "the" is as wrong as calling all medical science "the" problem of medicine.
Replace "AI" with "medicine" and see how ridiculous your words look.
We have plenty of medicine without anyone saying "aspirin isn't medicine" or "heart transplants aren't medicine" or similar, and because nobody is saying that, nobody is saying "oh, so you think we've solved medicine, we can all just stop researching it now?"
So yeah, we've repeatedly solved problems that are AI problems and which people were arguing that no computer could possibly do even as the computers were in fact doing them.
"""It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and discover which actions maximize their chances of achieving defined goals"""
Which is pretty close to the opening paragraph on Wikipedia, sans the recursion of the latter using the word "intelligence" to define "intelligence".
This is exactly why I suggested you define the mental model you’re working with, because now I agree with mannykannot’s first gp addressing your original lament over those you interpret as “moving the goalposts”:
Your “definition” of “AI” is full of wishy washy anthropomorphisms like “perceive” and “discover”, and elsewhere is so broad it could apply to just about anything.
An astable multivibrator with two LEDs at each output and a switch fits your definition.
The circuit is a “machine” that “perceives” the switch being pressed then “discovers” which side of the circuit will “maximize its chances of achieving the goal” of illuminating the environment.
People do in fact say blood letting the humors “isn’t medicine”.
So you’re interpretation of “medicine” is yet another common example that also exhibits your apparently super rare “AI effect”?
But this attempt at analogy is just a distracting digression.
You appear to be changing the rigidity of your definition of “AI” ad hoc to satisfy whatever argument you’re trying to make in that moment.
In this thread alone you refer to products, the “aspirins”, as “AI”, but then claim your definition is that “AI” is a “field of research”, your “medicine”.
Take the press release from the product being discussed here and replace “AI” with “field of research”.
“Corpo just released a ‘field of research’ for your PC.”
Starting to “see how ridiculous your words look” yet?
Neither "perceive" nor "discover" is an anthropomorphism. Not only on a technicality such as "anthropomorphism excludes animals, so you would be saying you think animals can't do those things" (I wouldn't normally bring it up, but you want live by the technical precision sword, you get this) — but also for the much more important point that examples such as a chess engine doesn't need to have eyes, neither does a chatbot, nor even a robot: they only need an input and an output.
Definitionally all functions have an input and an output, and if you insist on mathematically precise formulation of "discover" I can rephrase that statement without loss as:
"AI is the field of research for how to automatically create a best-fit function f(x) to maximise the expected value of some other reward function r(f(y)), given only examples x_0 … x_n".
Any specific AI model is thereby some f() produced by this research.
No, it doesn't, there's no "discover" in that.
And at this point, I could replace you with an early version of ChatGPT and the prompt "give a deliberately obtuse response to the following comment: ${comment goes here}"
Those things, which are your examples not mine, were actively shown to not work, and this was shown by the field of research called medicine, so no.
(Also, "apparently super rare" is putting words into my mouth and wildly misrepresents "I wonder which other professions exhibit the same effect?").
Again, what you write here is so wildly wrong that I have to assume that your brand new account is either a deliberate trolling attempt, or ChatGPT session with a prompt of "miss the point entirely" — but of course, I have met humans who are equally capable of non-comprehension before such tools were available. (I think those humans were doing arguments as soldiers, but it's hard to be sure).
You asked for a definition, you got one, you complained about the definition. That's you being loose.
I have a fixed definition, and am noting how other people change theirs to always exclude anything that actually exists. Which you are doing, which is you being loose.
What would be shifting track, would be to use the observation that you are hard to distinguish from an LLM to introduce a new-to-this-thread definition of AI — but I'm going to say that Turing can keep the imitation game, because although his anthropomorphic model of intelligence has its uses, I view it as narrow and parochial compared to the field as a whole.
No.
Aspirin is a medicine, an example of a product of the field of research which is medicine.
The equivalence is "[[aspirin] is to [medical research]] as [[route finding] is to [AI research]]". One can shorten "I perform medical research" into "I work in medicine" and not be misunderstood, you are misunderstanding the contraction of "this is an AI algorithm" to "this is AI".
You are the one dismissing the existing solutions in the field of AI and sarcastically suggesting that anyone who says otherwise thinks we've solved all AI problems and can stop researching it now — which is as wrong as dismissing aspirin as "not a medicine" and sarcastically suggesting that anyone who says otherwise thinks "we've solved all medical problems and can stop researching it now".
I see you're unfamiliar with the entire history of scientific research software, too.
Wait, I’m trying to help someone who thinks that giving an idiosyncratic definition of a broadly used term when someone asks for one requires that the provided definition be applied universally and accepted as correct without scrutiny?
Cool, cool another ad hoc change. At least this one is more precise.
Project much?
https://ai100.stanford.edu/2016-report/section-i-what-artifi...
Read up on: Heuristic Search, Computer Vision, Natural Language Processing (NLP), Mobile Robotics, Artificial Neural Networks, and Expert Systems.
These fields of research either lack reward functions all together, or did so in earlier iterations.
Oh, I see why you are so passionate about these products now.
You can replace or dismiss anyone who points out your shortcomings with them.
That's not even a coherent English sentence.
You asked for it. It's identical in meaning. There's nothing "ad hoc" about this in either the original or the precise form.
Deeply and fundamentally wrong.
But worse than that, the thing you linked to actively denies your own prior claim, which was (and I'm copy-pasting) "I struggle to see how anything we have today is “AI”", even though that quotation is a statement about your own beliefs.
Even by trying to use that source you are engaging as arguments-as-soldiers, using something that contradicts your own other points.
I've heard much the same projection about my inner state from an equally wrong Young Earth Baptist. Like him, you have yet to demonstrate any understanding.
I've been interested in this since back when NLP couldn't understand the word "not", and back when "I write AI" implicitly meant "for a computer game".
This is the insidiousness of your ad hoc definition flip flopping. You can claim you meant one and I’m addressing the other, and vice versa, whenever it suits you.
The linked article is about your “field of research” definition of the term “AI” while the quote from my original reply is addressing your “product goalposts” definition.
Well I’ve tried but am unable to find a dictionary that defines “discover” as relying on reward functions. Can you link me to one?
Here’s a popular dictionary’s definition: https://www.merriam-webster.com/dictionary/discover
Nice, you removed the pertinent context of “or did so in earlier iterations” with that ellipsis to make your point. Nice.
Read up on Minskys SNARCs which used trial and error learning without the need for a predefined reward function.
That's a good question, though I think art is one of the few things that is in some sense a parallel. While masterpieces of the past remain masterpieces today, the scope of what is considered art has been expanding, and while perfectly good work is being done today in styles that would have been recognized as art in the past, it probably will not attract the attention it would have gained if it had been produced in the past.
Perhaps the thing that makes AI different from other aspects of computing, in terms of how its progress is regarded, is that the term invites lofty expectations.
Yeah, take this XKCD for instance: https://xkcd.com/1425/ That was published 10 years ago but now the second item is (nearly) as trivial to accomplish as the first.
To be fair, we've had multiple research teams and ten years :)
But that's not the point I think. The point to me is: 10 years ago, this seemed nearly impossible to solve. "Weird, impossible sci-fi stuff". Now I have this exact feature in the Photos app of my iPhone. It not only finds my dog in the photos, but correctly determines its breed. It's amazing! And it works locally on my device, no servers involved. But I'm sure many people would argue that it is not "real AI", because it is "just XYZ". So as soon as we figure out how to do it, it's declared banal. Like LLMs. I'm baffled by the people who think it's "just" this or "just" that. That little word "just" buries years of research and several breakthroughs. Somehow, AI is always what AI can't do yet. It's always the next thing. The noise of goal posts being moved is sometimes deafening. And people don't even realize they're doing it.
Mm.
I don't normally like reaching for SciFi in these conversations, as there's always at least one of someone who mistakes a story for reality, and someone who misses the point and accuses the person giving the SciFi as an example of the same…
…but there are two episodes of TNG Trek which come to mind:
1. Elementary, Dear Data: Dr. Pulaski asserts that Data, being a robot, is incapable of solving a mystery to which he does not already know the outcome. Data accepts Dr. Pulaski's challenge and invites her to join them on the holodeck. There, Geordi instructs the computer to create a unique Sherlock Holmes mystery, but accidentally specifies an adversary "who is capable of defeating Data" rather than "Sherlock" and thus shenanigans happen.
2. The Measure of a Man: Lawsuit over "is Data (a) a person with the right to refuse consent; or (b) a thing to be disassembled, studied, replicated, and used as a mechanical servant?"
We've had ongoing arguments about #1 with AI for a long time, even though Chess and Go playing models beating the best human players should've ended this one.
I think we're going to get a lot of real life cases like #2 even when they involve high-fidelity brain uploads.
Mobile version
https://m.xkcd.com/1425/
https://en.wikipedia.org/wiki/AI_effect
The most extreme example I can find of this, is Google Translate; I don't know of anybody who thinks of it as AI.
It is only because it already existed it isn't branded as AI.
It's also how Google generally brands consumer-facing products, which is just Google + Noun. Most of their non-enterprise products tend to have unambiguous names (Google Search, Google Maps, Google Calendar, Google Translate).
Huh.
I'm (to my own surprise, given how old the feature is) still astounding people by demonstrating that it has augmented reality mode.
If someone created the first Bayesian filter for classifying spam today, they would call it 'AI'.
It's funny when you look at this list: https://en.wikipedia.org/wiki/Timeline_of_artificial_intelli...
The original bump-and-turn Roomba is listed for 2002.
P(what you did there | I see it) = P(I see it | what you did there) * P(what you did there) / P(I see it)
Do bot detectors check browsing history of a prospective bot? How?
All those tracking cookies, or so I'm told.
To paraphrase the Incredibles, "if everything is AI, then nothing is".
Well that's really just a side effect of capitalism and marketing. Some AI researchers are working through incremental discoveries needed to potentially have artificial intelligence. Companies come along and roll up each innovation into products and pitch them as some massive AI revolution.
LLMs, at least as they are publicly understood, aren't AI any more than OCR is.
Demos show that you can search for "Blue bag" even if the words "blue bag" don't appear on the screen. If there was a blue bag in a photo, say in a PowerPoint slide, it will find it.
To be fair Apple Photos does this.
But it does it to your photos not your entire universe.
Same for google photos.
I searched for "smiling in beach" and it works flawlessly.
I recon Microsoft will want to offload this kind of search processing to user machines with the advent in "AI processor computers" we saw recently: https://blogs.microsoft.com/blog/2024/05/20/introducing-copi...
I believe Apple does it completely locally on the device, whereas Google searches in the cloud.
spotlight is also indexing pretty much every file you create on your mac.
Dropbox has also been doing this for years. It's helpful sometimes and other times completely misses the mark. Personally, I do find it to be creepy and it has me edging closer to using a more privacy-focused cloud solution to backup my important documents, photos, etc.
There’s a really interesting angle about how the model of the service fits with the user’s concept and consent here. Apple Photos only indexes things you choose to save there, and you have to grant access to other things like Siri using it, and most people seem to be comfortable with that because it’s easy to understand the scope and opt-out.
Recall is the opposite: on by default and it records everything, which leaves people uncertain about whether they even can reliably opt-out or what that will mean if they’re interacting with someone who hasn’t, and combines with other areas of Windows 11 where Microsoft has been doing things like putting ads in the start menu which really cut into users’ trust. The blowback has been pretty impressive for what might otherwise have been a relatively ignored feature.
I'd love to know how they are doing this! Are they running an on-device CLIP embedding model of some sort? Could they even be shipping a multi-modal LLM like Phi-3 Vision?
OneDrive and the local Microsoft photos app have been doing it for a decade already.
My favourite feature in Apple and Google Photos. I use this all the time and have gotten so used to it I get annoyed when it doesn't find something.
In some ways, I find this even creepier than when I thought it was more than an SQLite database.
Answering my own question (with the help of replies on Twitter: https://twitter.com/simonw/status/1798368111038779610 )
Best guess is that there's some level of semantic analysis going on beyond just OCR, which tips the balance (at least for me) to being an "AI" feature based on loose current usage of that term.
The clue is that in this demo https://www.youtube.com/watch?v=aZbHd4suAnQ&t=1062s they show searching for "blue pantsuit with sequin lace" finds something that was described as "peacock" in the text. It looks like this is based on embedding search against image embeddings.
And... one of the 3 SQLite databases created by Recall is called "SemanticImageStore" - which suggests to me there may be a CLIP-style image embeddings model being run on-device here.
Those databases contain "diskann" columns which are likely a reference to Microsoft's vector indexing library: https://github.com/microsoft/DiskANN
My team worked on this feature and I can confirm that yes, semantic search is being done for both text and images. This allows you to do fuzzy searches like you mentioned in your comment and use words to match images. Everything runs on your device (all the models, the vector database, etc.) in order to preserve privacy.
That's really impressive. Can you share which embedding models you are using for this? Also, is Phi-3 or Phi-3 Vision involved?
I don't think they've announced publicly what models we're using. I don't think there's any particular reason for this, but just in case, I can't name them here. I'll see if this can be addressed in a blog post of something.
I can tell you that Recall isn't using Phi. Rather, it's using a collection of models that are much more tuned (and therefore much more efficient) for the feature.
It probably has RAG.
Is that confirmed?
Microsoft have been boasting that it all runs on-device, so are they running RAG using an on-device model (Phi-3 or similar) as part of the Recall feature?
I had this confirmed during the Q&A portion of a demo presented by Microsoft. All the processing is supposed to happen locally using the NPU capacity of the hardware.
I really can't fathom what it takes to casually be like "security/privacy concerns aside" here.
I decided not to fully type out "obviously the security/privacy concerns are the most important thing here and should not be ignored... but aside from that, does anyone know if..."
I think you are misreading the comment. Saying aside from those concerns does not dismiss the concerns themselves. It’s just expressing that this comment will speak about some other things, and not those things.
The word "AI" is becoming a pure marketing checkbox. You can't release any new product in 2024 without somehow attaching the word "AI" to it. It doesn't matter whether it actually uses AI or an LLM.
Well, this one does, so what’s the point of this observation in this context?
I assume you are working with the AI definition that is along the lines of: "Problems a computer may not be able to solve"?
OCR was absolutely considered AI before we knew how to do it. Now that we understand it, it's just computing, of course. But it still is reasonably considered AI by any other definition of AI.
Isn't SotA OCR done using CNNs? So it's AI even in the modern sense.
I have no actual info on this, but I always assumed they'd compute some mutlimodal embeddings of the screenshots to then retrieve semantically-relevant ones by text? And yeah, they'd have to do it using on-device models, which doesn't seem out of reach?
I assume the AI part is using some model to interpret the stuff on the screen (Maybe even non-text) and used in the search results. That's just a guess. I'm not really that interested in the Recall product, so I have dug into it.
But it doesn't just recognize characters though? It recognizes things happening on video and scenes in pictures without text. for example, you could prompt "find my photos and videos with dogs in them"