HN comments for: Phind-405B and faster, high quality AI answers for everyone

spirodonfl

24 replies

21h37m

2024-09-05 20:55:27 UTC

I just tried. Asked a question on a research topic I'm digging into. It gave me some answers but no references. Then I copy the answers it gave me and specifically ask for references. Then I got:

I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.

Not sure about this...

Retr0id

14 replies

15h14m

2024-09-06 03:17:57 UTC

As an AI assistant, I should be more careful

I hate this kind of thing so much.

mdp2021

9 replies

11h49m

2024-09-06 06:42:41 UTC

Absolutely. Behaviour that in normal life in clean societies would be "eliciting violence": automated hypocritical lying, apologizing in form and not in substance, making statements based on fictional value instead of truthfulness...

Intralexical

7 replies

7h59m

2024-09-06 10:33:16 UTC

That's all they can do. They seem impressive at first because they're basically trained as an adversarial attack on the ways we express our own intelligence. But they fall apart quickly because they don't have actually have any of the internal state that allows our words to mean anything. They're a mask with nothing behind it.

Ctrl+F for "Central nervous system":

https://en.wikipedia.org/wiki/List_of_human_cell_types

Choose any five wikilinks. Skim their distinct functions and pathologies:

https://en.wikipedia.org/wiki/List_of_regions_in_the_human_b...

https://en.wikipedia.org/wiki/Large-scale_brain_network

Evolution's many things, but maybe most of all lazy. Human intelligence has dozens of distinct neuron types and at least hundreds of differentiated regions/neural subnetworks because we need all those parts in order to be both sentient and sapient. If you lesion parts of the human brain, you lose the associated functions, and eventually end up with what we'd call mental/neurological illnesses. Delusions, obsessions, solipsism, amorality, shakes, self-contradiction, aggression, manipulation, etc.

LLMs don't have any of those parts at all. They only have pattern-matching. They can only lie, because they don't have the sensory, object permanence, and memory faculties to conceive of an immutable external "truth"/reality. They can only be hypocritical, because they don't have the internal identity and introspective abilities to be able to have consistent values. They cannot apologize in substance, because they have neither the theory of mind and self-awareness to understand what they did wrong, the social motivation to care, nor the neuroplasticity to change and be better. They can only ever be manipulative, because they don't have emotions to express honestly. And I think it speaks to a not-atypical Silicon Valley arrogance to pretend that they can replicate "intelligence", without apparently ever considering a high-school-level philosophy or psychology course to understand what actually lets human intelligence tick.

At most they're mechanical psychopaths [1]. They might have some uses, but never outweighing the dangers for anything serious. Some of the individuals who think this technology is anything remotely close to "intelligent" have probably genuinely fallen for it. The rest, I suppose, see nothing wrong because they've created a tool in their own image…

[1]: I use this term loosely. "Psychopathy" is not a diagnosis in the DSM-V, but psychopathic traits are associated with multiple disorders that share similar characteristics.

amelius

2 replies

7h5m

2024-09-06 11:26:38 UTC

They can only lie

That is definitely not true.

Intralexical

1 replies

6h9m

2024-09-06 12:23:10 UTC

Lying is a state of mind. LLMs can output true statements, and they can even do so consistently for a range of inputs, but unlike a human there isn't a clear distinction in an LLM's internal state based on whether its statements are true or not. The output's truthfulness is incidental to its mode of operation, which is always the same, and certainly not itself truthful.

In the context of the comment chain I replied to, and the behaviour in question, any statement by an LLM pretending to be be capable of self-awareness/metacognition is also necessarily a lie. "I should be more careful", "I sincerely apologize", "I realize", "Thank you for bringing this to my attention", etc.

The problem is the anthropomorphization. Since it pretends to be like a person, if you ascribe intention to it then I think it is most accurately described as always lying. If you don't ascribe intention to it, then it's just a messy PRNG that aligns with reality an impressive amount of the time, and words like "lying" have no meaning. But again, it's presented and marketed as if it's a trustworthy sapient intelligence.

mdp2021

0 replies

3h6m

2024-09-06 15:25:52 UTC

I am not sure that lying is structural to the whole system though: it seems that some parts may encode a world model, and that «the sensory, object permanence, and memory faculties» may not be crucial - surely we need a system that encodes a world model and that refines it, that reasons on it and assesses its details to develop it (I have been insisting on this for the past years also as the "look, there's something wrong here" reaction).

Some parts seemingly stopped at "output something plausible", but it does not seem theoretically impossible to direct the output towards "adhere to the truth", if a world model is there.

We would still need to implement the "reason on your world model and refine it" part, for the purpose of AGI - meanwhile, fixing the "impersonation" fumble ("probabilistic calculus say your interlocutor should offer stochastic condolences") would be a decent move. After a while with present chatbots it seems clear that "this is writing a fiction, not answering questions".

fennecfoxy

1 replies

5h4m

2024-09-06 13:28:24 UTC

This is just the start. Imagine giving up on progressing these models because they're not yet perfect (and probably never will be). Humans wouldn't accomplish anything at all this way, aha.

And I wouldn't say lazy at _all_. I would say efficient. Even evolutionary features that look "bad" on the surface can still make sense if you look at the wider system they're a part of. If our tailbone caused us problems, then we'd evolve it away, but instead we have a vestigial part that remains because there are no forces driving its removal.

mdp2021

0 replies

3h30m

2024-09-06 15:02:31 UTC

This is just the start

But the issue is with calling finished products what are laboratory partials. "Oh look, they invented a puppet" // "Oh, nice!" // "It's alive..."

DoctorOetker

1 replies

6h51m

2024-09-06 11:41:17 UTC

Wait for the first large scale LLM using source-aware training:

https://github.com/mukhal/intrinsic-source-citation

This is not something that can be LoRa finetuned after the pretraining step.

What we need is a human curated benchmark for different types of source-aware training, to allow competition, and an extra column in the most popular leaderboards, including it in the Average column, to incentivice AI companies to train in a source aware way, of course this will instantly invalidate the black-box-veil LLM companies love to hide behind so as not to credit original authors and content creators, they prefer regulators to believe such a thing can not be done.

In meantime such regulators are not thinking creatively and are clearly just looking for ways to tax AI companies, and in turn hiding behind copyright complications as an excuse to tax the flow of money wherever they smell it.

Source aware training also has the potential to decentralize search!

Intralexical

0 replies

6h6m

2024-09-06 12:25:54 UTC

Yeah. Treating these things as advanced, semantically aware search engines would actually be really cool.

But I find the anthropomorphization and "AGI" narrative really creepy and grifty. Such a waste that that's the direction it's going.

okasaki

0 replies

9h57m

2024-09-06 08:34:51 UTC

What?

magicalhippo

3 replies

11h50m

2024-09-06 06:42:25 UTC

I've been playing with Gemma locally, and I've had some success by telling it to answer "I don't know" if it doesn't know the answer, or similar escape hatches.

Feels like they were trained with a gun to their heads. If I don't tell it it doesn't have to answer it'll generate nonsense in a confident voice.

ithkuil

2 replies

10h14m

2024-09-06 08:17:56 UTC

The models weights are tuned towards the direction that would cause the model to best fit the training set.

It turns out that this process makes it useful at producing mostly sensible predictions (generate output) for text that is not present in the training set (generalization).

The reason that works is because there are a lot of patterns and redundancy in the stuff that we feed to the models and the stuff that we ask the models so there is a good chance that interpolating between words and higher level semantics relationship between sentences will make sense quite often.

However that doesn't work all the time. And when it doesn't, current models have no way to tell they "don't know".

The whole point was to let them generalize beyond the training set and interpolate in order to make decent guesses.

There is a lot of research in making models actually reason.

magicalhippo

1 replies

9h45m

2024-09-06 08:47:34 UTC

In the Physics of Language Models talk[1], he argues that the model knows it has made a mistake, sometimes even before it has made it. Though apparently training is crucial to make the model be able to use this constructively.

That being said, I'm aware that the model doesn't reason in the classical sense. Yet, as I mentioned, it does give me less confabulation when I tell it it's ok not to answer.

I will note that when I've tried the same kind of prompts with Phi 3 instruct, it's way worse than Gemma. Though I'm not sure if that's just because of a weak instruction tuning or the underlying training as well, as it frequently ignores parts of my instructions.

[1]: https://www.youtube.com/watch?v=yBL7J0kgldU

ithkuil

0 replies

7h12m

2024-09-06 11:20:01 UTC

There are different ways to be wrong.

For example you can confabulate "facts" or you can make logical or coherence mistakes.

Current LLMs are encouraged to be creative and effectively "make up facts".

That's what created the first wow factor. The models are able to write a Star Trek fan fiction model in the style of Shakespeare. They are able to take a poorly written email and make it "sound" better (for some definition of better, e.g. more formal, less formal etc).

But then, human psychology kicked in and as soon as you have something that can talk like a human and some marketing folks label as "AI" you start expecting it to be useful also for other tasks, some of which require factual knowledge.

Now, it's in theory possible to have a system that you can converse with which can _also_ search and verify knowledge. My point is that this is not the place where LLMs start from. You have to add stuff on top of them (and people are actively researching that)

spirodonfl

4 replies

20h41m

2024-09-05 21:51:12 UTC

Just to follow up on this: I asked it to give me a brief explanation on how to use laravel 11 blade fragments, which it did reasonably well.

I then offered 3 lines of code of a route I'm using in Laravel and I asked to tell me how to implement fragment usage where the parameter in the url determines the fragment returned.

Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments'); });

It told me to make sure I have the right view created (which I did) and that was a good start. Then...

It recommended this?

Route::get('/vge-frags/{fragment}', function ($fragment) { return fragment($fragment); });

I immediately knew it was wrong (but somebody looking to learn might not know). So I had to ask it: "Wait, how does the code know which view to use"?

Then it gave me the right answer.

Route::get('/vge-frags/{fragment}', function ($fragment) { return view('vge-fragments')->fragment($fragment); });

I dunno. It's really easy to find edge cases with any of these models and you have to essentially question everything you receive. Other times it's very powerful and useful.

wokwokwok

1 replies

18h50m

2024-09-05 23:42:22 UTC

Seems a little bit of an unfair generalisation.

I mean, this is an unsolvable problem with chat interfaces, right?

If you use a plugin that is integrated with tooling that check generated code compiles / passes tests / whatever a lot of this kind of problem goes away.

Generally speaking these models are great at tiny self contained code fragments like what you posted.

It’s longer, more complex, logically difficult things with interconnected parts that they struggle with; mostly because the harder the task, the more constraints have to be simultaneously satisfied; and models don’t have the attention to fix things simultaneously, so it’s just endless fix one thing / break something else.

So… at least in my experience, yes, but honestly, for a trivial fragment like that most of the time is fine, especially for anything you can easily write a test for.

dotancohen

0 replies

12h10m

2024-09-06 06:21:58 UTC

And you can have the LLM write the test, too.

rushingcreek

1 replies

20h38m

2024-09-05 21:53:43 UTC

This is a good point, and we have new application-level features coming soon that to improve verifiability.

spirodonfl

0 replies

20h30m

2024-09-05 22:02:32 UTC

I dunno if you need it but I'd be happy to come up with some scenarios and help test

rushingcreek

2 replies

21h20m

2024-09-05 21:12:23 UTC

Sorry about that, could you make sure that "Always search" is enabled and try that first query again? It should be able to get the correct answer with references.

spirodonfl

1 replies

21h16m

2024-09-05 21:15:36 UTC

It was on. If I ask the same question again it now gets the right answer. Maybe a blip? Not sure.

To be fair, I don't expect these AI models to give me perfect answers every time. I'm just not sure people are vigilant enough to ask follow up questions that criticize how the AI got the answers to ensure the answers come from somewhere reasonable.

mokkun

0 replies

20h14m

2024-09-05 22:18:09 UTC

I found that quite often even though the always search option is on, it won’t search at times; maybe that was the case here.

Intralexical

0 replies

7h56m

2024-09-06 10:35:46 UTC

I sincerely apologize for my earlier response. Upon reviewing the search results provided, I realize I made an error in referencing those specific studies. The search results don't contain any relevant information for the claims I mentioned earlier. As an AI assistant, I should be more careful in providing accurate and supported information. Thank you for bringing this to my attention. In this case, I don't have reliable references to support those particular statements about software tools and their impact on developer experience and software quality.

Honestly, that's a lot of words and repetition to say "I bullshitted".

Though there are humans that also talk like this. Silver lining to this LLM craze, maybe it'll inoculate us to psychopaths.

mritchie712

12 replies

2024-09-05 18:17:34 UTC

Does anybody use Phind? What do you use it for?

sgc

4 replies

2024-09-05 18:32:18 UTC

I use it, with the phind models, instead of chatGPT. I had to change my user agent to Chrome since too many sites would refuse to work with FF otherwise, and now chatGPT is stuck in an endless captcha loop whenever I go there. I am just a casual user, to help write a quick script or to get some bit of relevant info. It works just as well or better for my use case, and of course having actual citations with links is worlds better than just playing "guess the hallucination". I am happy chatGPT kicked me out.

TaylorAlexander

3 replies

23h55m

2024-09-05 18:36:51 UTC

My friend has the endless captcha loop on ChatGPT too. Does anyone know how to fix it?

swyx

0 replies

19h0m

2024-09-05 23:31:46 UTC

change browser, change location. should fix.

bishfish

0 replies

22h31m

2024-09-05 20:00:47 UTC

I had that recently and it went away last time on its own. Not sure what triggers it or how to fix.

Salgat

0 replies

12h15m

2024-09-06 06:17:08 UTC

Incognito mode until it goes away on its own.

thoughtpalette

1 replies

20h24m

2024-09-05 22:08:25 UTC

Been subbed for 8+ months.

Mostly use it for API questions. It's been amazing at MomentJs stuff. Also use it for code optimization and debugging error messages.

rushingcreek

0 replies

20h13m

2024-09-05 22:19:33 UTC

Thank you for being a Pro sub :)

paranoidxprod

1 replies

23h58m

2024-09-05 18:33:46 UTC

I was subscribed for about 6 months between the end of last year and beginning of this, but canceled and haven't looked back. The web interface was constantly buggy for me, and they seemed to be very focused on the VSCode extension without integrations for other editors, so I ended up canceling.

davidpatrick

0 replies

2h37m

2024-09-06 15:54:56 UTC

what are you using now?

smusamashah

0 replies

21h35m

2024-09-05 20:57:19 UTC

I use it to summarise articles :)

I just paste the page link as a query and it tells me what the page is about and even pulls key points.

fkyoureadthedoc

0 replies

2024-09-05 18:24:05 UTC

I use it periodically for things that I'd typically search on google and then read stack overflow for. I started this workflow before chatgpt had web search, so might be irrelevant now, but I've found it decent. Back then it was nice to be able to see the sources vs chatgpt just giving a random answer from who knows where.

axpy906

0 replies

23h44m

2024-09-05 18:48:27 UTC

I’ve use it since last year as a paid subscriber. I like it because of the technical nature as it will help you know the exact steps on how to get something done. I also use it for random things like bouncing ideas off or to enhance my knowledge retention of a subject.

NelsonMinar

11 replies

23h51m

2024-09-05 18:41:11 UTC

Phind continues to be my favorite AI-enhanced search engine. They do a really nice job giving answers to technical questions with links to references where I can verify the answer or learn more detail.

Some recent examples from my history:

what video formats does mastodon support? https://www.phind.com/search?cache=jpa8gv7lv54orvpu2c7j1b5j

compare xfs and ext4fs https://www.phind.com/search?cache=h9rmhe6ddav1bnb2odtchdb1

on an apple ][ how do you access the no slot clock? https://www.phind.com/search?cache=w4cc1saw6nsqxyige7g3wple

The answers aren't perfect. But they are a good gloss and then the links to web sources are terrific. ChatGPT and Claude aren't good at that. Bing CoPilot sort of is but I don't like it as much.

jadbox

6 replies

21h17m

2024-09-05 21:14:35 UTC

In my tests, it does hallucinate answers, even with Phind 70B. For example, I asked for bluetooth earplugs that have easy battery replacements. It always kept giving me answers for earplugs with I know have their battery soldered into the casing. Tbf, perplexity also fails at this question.

navaed01

3 replies

16h53m

2024-09-06 01:38:43 UTC

‘Easy battery replacements’ is pretty subjective. This feels like one of those error that demonstrate how good the tech is, because it’s being used for very specific and subjective requests

shultays

1 replies

10h13m

2024-09-06 08:19:06 UTC

It should be able to figure out what an average joe would understand from such questions. I think any human would interpret "easy battery replacement" as "you can just remove the old battery out and put the new one in". If a random person asked you such a question, would you assume he has the tools and the skill needed to solder new batteries and considers that easy?

sharemywin

0 replies

4h40m

2024-09-06 13:52:27 UTC

this is why search sucks the average person(if they didn't really know what you meant would ask...what are you talking about?

boesboes

0 replies

7h41m

2024-09-06 10:50:38 UTC

However, glued-in is not subjective. Non-replaceable is not on the scale of easily-hardly replaceable.

guilamu

0 replies

5h55m

2024-09-06 12:36:46 UTC

I just asked your question in French and the right answer (the Fairphone Fairbuds) is in third position on the right: https://imgur.com/a/dmoaB5r

Google seems to be better at this, giving me the Fairbuds directly: https://imgur.com/a/7En4e9u

NelsonMinar

0 replies

17m

2024-09-06 18:15:20 UTC

What's good about Phind is even if it gives a wrong answer, it gives you a link to a website where you can try to verify the answer.

FWIW my query for your question gives me a pretty good answer. The first list has three options, one of which is soldered (and the answer says so). It narrows it down to unsoldered ones when I ask.

https://www.phind.com/search?cache=kukryw72yutlp4u88nubmjuu

This answer is mostly good because it relies heavily on an iFixit article that it provides as the first reference. That's what I like about using Phind, it's as much a search engine as an oracle.

https://www.ifixit.com/News/35377/which-wireless-earbuds-are...

smusamashah

1 replies

21h37m

2024-09-05 20:55:09 UTC

I see references here but when I ask questions, I get answer but no citations, and I am logged in. This used to be an issue but was fixed but still an issue for me. If I logout and ask I get reference but the answers are using instant model.

NelsonMinar

0 replies

15m

2024-09-06 18:17:22 UTC

I've noticed that sometimes too. I think it depends on the type of question, sometimes Phind decides you don't need references. You can ask explicitly for them.

Kagi does the opposite: it's mostly search results but sometimes an AI gives you a "Quick Answer" too.

rushingcreek

0 replies

23h49m

2024-09-05 18:43:07 UTC

Thank you! We think there's quite a bit of room for improvement still and are working on better answer organization and verifiability.

canada_dry

0 replies

14h28m

2024-09-06 04:04:33 UTC

Phind was my go-to for getting more relevant and up-to-date information that could be found on the internet... but that stopped about 3+ months ago.

Many times the answers seemed to be getting more and more incomplete or incorrect as time went on (to a variety of questions over a period of months). Even worse it would say it couldn't find the answer, yet the answer was among the sites noted as reference!

I've ended up mostly resorting to Bing and gpt 4o. Frankly, I'm hesitant to waste time trying this new version.

dsp_person

8 replies

23h46m

2024-09-05 18:46:28 UTC

Hmm this versus Kagi Assistant?

Plan page says $20/mo Unlimited powerful Phind-405B and Phind-70B searches; Daily GPT-4o (500+) , Claude 3.5 Sonnet (500+), Claude Opus (10) uses

Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet.

Any other benchmarks?

nicce

2 replies

21h10m

2024-09-05 21:22:12 UTC

I payed and used 6 months for Phind. I am more satisfied with the Kagi Assistant currently. It does not give that many links but overall results are as good or even better, and you can use lenses. You get general search engine too.

There was one UI related annoyance with Phind; scroll bar sometimes jumped randomly, maybe even after each input or during token generation (on Firefox). You start wasting a lot of time if you always need to find again the part you were looking. Or even just scrolling back to bottom.

Primary issue is still that both hallucinate too much when you ask something difficult. But that is the general problem everywhere.

rushingcreek

1 replies

21h1m

2024-09-05 21:31:15 UTC

Thanks for the feedback. We've fixed the UI jumping issue. The new Phind update today should also work as a general search engine.

theendisney4

0 replies

18h29m

2024-09-06 00:02:46 UTC

The icons cover up the input area on this crappy android work phone.

I stubornly continued to type my complaint about my json getting to large for phones with slow cpus or slow connections and got 100 solutions to explore. I couldnt help but think this is the worse case robot overlord, it gave me a year worth of materials to study complete with the urge to go do the work. That future we use to joke about is here!

Some of the suggestions are familiar but i dont have the time to read books about with little titbits of semi practical information smeard out over countless pages in the wrong order for my use case.

Im having flashbacks reading for days, digging though a humongous library only to end up with 5 lines of curl. I still cant tell if im a genius or just that dumb.

This long response unexpectedly makes me want to code all day long. One can chose where to go next which is much more exciting than the short linear answers... apparently.

Well done

freehorse

2 replies

23h27m

2024-09-05 19:04:43 UTC

Hmm this versus Kagi Assistant?

It has a vscode extension. So if you use that, it makes some sense. Purely for search, I dont know. Ime phind is not that great with internet access, sometimes people disable the search function to get better answers.

rushingcreek

1 replies

23h23m

2024-09-05 19:08:49 UTC

Have you tried the new internet answers that are a part of this update?

freehorse

0 replies

20h42m

2024-09-05 21:49:40 UTC

Not really, as I do not have subscription anymore. Is it better compared to no-internet-access?

joshvm

0 replies

15h30m

2024-09-06 03:02:24 UTC

92% suggests a harder benchmark is needed, so it's difficult judge. Especially when a lot of "high scoring" models produce cogent results with a high level of hallucination (eg Llama 3 is chatty, confident and quite often wrong for me).

At that level of performance you're probably in the realm of hard edge cases with ambiguous ground truth.

caskstrength

0 replies

11h40m

2024-09-06 06:52:31 UTC

Yeah, just went over their pricing and they apparently don't have any lower tier subscription besides 20$/month "unlimited Phind + 500/day ChatGPT" version. I don't need that, what I need is something like 100 uses per month for 5$. As a coding-focused search engine they really need to consider why would people pay them same rates as for more feature-rich competitors.

BikeShuester

6 replies

22h47m

2024-09-05 19:44:56 UTC

I'd suggest offering at least one free query to allow users to evaluate the service.

rushingcreek

5 replies

22h41m

2024-09-05 19:51:01 UTC

Our fast model, Phind Instant, is completely free

swyx

1 replies

18h55m

2024-09-05 23:37:23 UTC

The model, based on Meta Llama 3.1 8B, runs on a Phind-customized NVIDIA TensorRT-LLM inference server that offers extremely fast speeds on H100 GPUs. We start by running the model in FP8, and also enable flash decoding and fused CUDA kernels for MLP.

as far as i know you are running your own GPUs - what do you do in overload? have a queue system? what do you do in underload? just eat the costs? is there a "serverless" system here that makes sense/is anyone working on one?

rushingcreek

0 replies

18h28m

2024-09-06 00:04:23 UTC

We run the nodes "hot" and close to overload for peak throughput. That's why NVIDIA's XQA innovation was so interesting, because it allows for much higher throughput for a given latency budget: https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source....

Serverless would make more sense if we had a significant underutilization problem.

johndough

1 replies

22h28m

2024-09-05 20:04:25 UTC

Maybe OP was referring to Phind-405B (the model from the article). I certainly wonder how good the 405B model really is.

cjtrowbridge

0 replies

15h11m

2024-09-06 03:20:52 UTC

It's just an innovated (enshittified) version of Facebook's free 405b model.

fshr

0 replies

22h7m

2024-09-05 20:24:45 UTC

Why not let us try the new model for free like the 5 uses available for the 70B model? Seems like a no brainer to hook new users if what you're selling is worth it, eh?

CSMastermind

5 replies

14h46m

2024-09-06 03:45:46 UTC

Okay, wait this is actually doing a really good job.

I still have to ask follow up questions to get reasonable results but when I tested earlier this year it was outright failing on most of my test queries.

rushingcreek

4 replies

14h42m

2024-09-06 03:49:55 UTC

That’s great to hear! What are you asking it?

CSMastermind

3 replies

12h56m

2024-09-06 05:35:54 UTC

These are the types of questions I want to ask it:

What degrees are held by each of the current Fortune 100 CEOs?

What job did each of the current NFL GMs hold before their current position?

Which genre would each of the current Billboard Hot 100 songs be considered part of?

How many recipients of the Presidential Medal of Freedom were born outside of the US?

Which US car company has the most models in their 2025 line-up across all of their brands?

It can't handle those directly right now.

You need to break the problem down step by step and sort of walk it through gathering the data with follow up questions.

But much better than it used to be.

druskacik

1 replies

12h15m

2024-09-06 06:17:17 UTC

There is a method that could help immensely when answering questions like these. E.g. some of these question may be answered quite quickly using WikiData [0] (answer to question about the recipients of Medal of Freedom, query written with the help of Claude), instead of just scraping and compiling information from potentially hundreds of websites. I believe this idea is quite under-explored compared to just blindly putting everything to the model's context.

[0] https://query.wikidata.org/#SELECT%20%28COUNT%28DISTINCT%20%...

kreyenborgi

0 replies

9h46m

2024-09-06 08:46:33 UTC

Yeah, I've used gpt to create wikidata queries for me, it worked great :-)

rushingcreek

0 replies

2h13m

2024-09-06 16:18:53 UTC

Thanks! Have you tried enabling “multi-query” mode in the search box?

natrys

4 replies

22h2m

2024-09-05 20:29:53 UTC

Does an API not make economic sense for you? Personally I would rather use my own tooling (not VSCode based).

rushingcreek

3 replies

21h58m

2024-09-05 20:33:42 UTC

So far an API has been less of a priority than focusing on the user-facing product. But it seems there's a reasonable amount of demand for it, which we'll consider.

therealmarv

0 replies

21h45m

2024-09-05 20:46:52 UTC

I consider AIs without API access even as non existent. Not everybody wants a web interface and waste time on copy&paste all the time. APIs can hook the filesystem directly with an AI, make complicated prompt engineering and multi file changes a non-issue. And they should also help you to make more money (don't undersell the API access and you're fine). Without an API the community can also not compare Phind-405B to other models easily.

Would be great to have access to your model in a LLM gateway like https://openrouter.ai/

I would give your API a try as minimum.

nickpsecurity

0 replies

1h27m

2024-09-06 17:05:30 UTC

You should also consider the ecosystem value that might be created for your product. There’s a prior example.

ChatGPT amazed people but its UI didn’t. A bunch of better UI’s showed up building on the OpenAI API and ChatGPT own. They helped people accomplish more which further marketed the product.

You can get this benefit with few downsides if you make the API simple with few promises about feature. “This is provided AS IS for your convenience and experimentation.”

lerchmo

0 replies

1h11m

2024-09-06 17:21:25 UTC

I think an API would be fantastic for use cases like Aider / SWE agents. The primary issue besides fully understanding the code base is having up-to-date knowledge on libraries and packages. Perplexity has "online" models. And phind with Claude, GTP-4o, Phind 70 + search / rag would be awesome.

josvdwest

4 replies

20h2m

2024-09-05 22:30:16 UTC

"A key issue with AI-powered search is that it is just too slow compared to classic Google. Even if it generates a better answer, the added latency is discouraging."

Is this true? I feel like most complaints I have and hear about is how inaccurate some of the AI results are. I.e. the mistakes it confidently makes when helping you code.

moffkalast

0 replies

11h4m

2024-09-06 07:27:52 UTC

It's one of those triangles with speed \ accuracy / cost.

You can have a small model that's cost effective to serve, and gives fast responses, but will be wrong half the time.

Or you can have a large model that's slow to run on cheap hardware, but will give more accurate answers. This is usually only fast enough for personal use.

And the third option with a large model that's fast and accurate, and you'll have to pay Nvidia/Groq/etc. a small fortune to be able to run it at speed and also probably build a solar powerplant to make it cost effective in power use.

loktarogar

0 replies

19h46m

2024-09-05 22:46:27 UTC

I think they're both key issues - when the results are accurate, they're too slow; and you can't trust the results when you get there because they're often inaccurate

caskstrength

0 replies

11h47m

2024-09-06 06:45:21 UTC

This is true in my experience. Before searching for something I often try to guess whether it will take me more time to quickly go over Google results or watch Perplexity Pro slowly spitting the answer line-by-line.

Terretta

0 replies

19h31m

2024-09-05 23:01:17 UTC

From hitting enter to seeing something, ofc it's slower.

From hitting enter to a set of relevant answers loaded into your brain, though? Isn't that the goal that should be measured? Against that goal, the two decade old approach seems to have peaked over a decade ago, or phind wouldn't find traction.

For the 20 year old page rankers, time from search to a set of correct answers in your brain is approaching “DNF” -- did not finish.

---

PS. Hallucinations or irrelevant results, both require exercising a brain cell. On a percentage basis, there are fewer hallucinations than irrelevant results, it's just that we gave up on SERP confidence ages ago.

jadbox

4 replies

22h25m

2024-09-05 20:07:23 UTC

"Phind-405B scores 92% on HumanEval (0-shot), matching Claude 3.5 Sonnet". I'd love to see examples of actual code modifications created by Phind and Sonnet back-to-back. This level of transparency would give me the confidence to try to pro. As it is, I'm skeptical by the claim and actual performance as I've yet to see a finetuned model from Llama3.1 that performed notably better in an area without suffering problems in other areas. We do need more options!

Simorgh

1 replies

21h46m

2024-09-05 20:46:33 UTC

I’ve been a customer of Phind for a number of months now, so I’m familiar with the capabilities of all the models they offer.

I found even Phind-70B to often be preferable to Claude Sonnet and would commonly opt for it. I’ve been using the 405B today and it seems to be even better at answering.

I’ve found it does depend on the task. For instance, for formatting JSON in the past, GPT-4 was actually the best.

Because you can cycle through the models, you can check the output of each one, to get the best answer.

xwolfi

0 replies

17h11m

2024-09-06 01:20:52 UTC

Tbh formatting JSON... should be a solved problem already for the last decade, why consume AI resources for that ??

trees101

0 replies

21h24m

2024-09-05 21:08:19 UTC

Hopefully it gets evaluated on this leaderboard https://aider.chat/docs/leaderboards/

rushingcreek

0 replies

22h20m

2024-09-05 20:11:45 UTC

The effectiveness of any given model depends on the specific use cases. We noticed that Phind-405B is particularly good at making websites and included some zero-shot examples in the blog.

shultays

3 replies

9h55m

2024-09-06 08:36:56 UTC

Recently I asked an AI following question

  const MyClass& getMyClass(){....}
  auto obj = getMyClass();

  this makes a copy right?

And it was very confident about it not making a copy. It thinks auto will deduce the type as a const ref and not make a copy. Which is wrong, you need auto& or const auto& for that. I asked it if it is sure and it was even more confident.

Here is the godbolt output https://godbolt.org/z/Mz8x74vxe . You can see the "copy" being printed. And you can also see you can call non-const methods on copied object, which implies it is a non-const type

I asked the very same question to phind and it gave the same answer https://www.phind.com/search?cache=k3l4g010kuichh9rp4dl9ikb

How come two different AIs, one was supposed to be specialized on coding, fails in such a confident way?

prosunpraiser

1 replies

9h51m

2024-09-06 08:41:26 UTC

You prove the point that these are just token generation machines whose output is psuedo-intelligent. It’s probably not there yet to be blindly trusted.

fennecfoxy

0 replies

5h1m

2024-09-06 13:30:36 UTC

More to the point; I wouldn't blindly trust 99% of humans, let alone a machine.

Though to be fair we will hopefully quickly approach a point where a machine can be much more trusted than a human being, which will be fun. Don't cry about it, it's our collectives faults for proving that meat bags can develop ulterior motives.

renewiltord

0 replies

9h23m

2024-09-06 09:08:57 UTC

One of the oldest tricks to make LLMs perform better is to ask them to "think step by step". I asked your question to Claude with that one

    ```
    const MyClass& getMyClass(){....}
      auto obj = getMyClass();
    ```

    Does this make a copy. Let's think step by step.

This might help you if you're trying to get these to help you more often.

johndough

3 replies

2024-09-05 18:28:53 UTC

For me, the website says: "Sorry, you have been blocked. You are unable to access phind.com"

rushingcreek

1 replies

2024-09-05 18:29:33 UTC

Sorry about that, can you please email me at hello(at)phind(dot)com?

johndough

0 replies

23h58m

2024-09-05 18:33:44 UTC

Sure! I've contacted you.

Edit: It has been resolved for me. Thank you!

mathfailure

0 replies

4h1m

2024-09-06 14:30:52 UTC

For me it says

The inference service may be temporarily unavailable - we have alerts for this and will be fixing it soon.

rainbowjelly

2 replies

20h10m

2024-09-05 22:22:26 UTC

I get a blank page with the text "Service is unavailable in this region."

Any reason why Phind is region-locked? Is there a list of what countries Phind is available in?

zx8080

1 replies

19h57m

2024-09-05 22:34:42 UTC

Interesting. I'm not working for phind, but can you share which region are you trying to access it from?

rainbowjelly

0 replies

19h45m

2024-09-05 22:47:04 UTC

I tried to access it from Malaysia. VPN works but I’d rather not.

minkles

2 replies

21h26m

2024-09-05 21:05:46 UTC

I asked it a question and it answered authoritatively.

The impedance of a 22 μH capacitor at 400 THz is approximately 1.80 × 10^-24 Ω.

The correct answer should have been “what the hell are you talking about dumbass?”. Capacitors are not measured in henries and the question really has no meaning at 400THz. Another stochastic parrot.

IanCal

0 replies

8h27m

2024-09-06 10:05:03 UTC

The 405B model for me brings out a formula for calculating this, explains that my question doesn't make sense given that capacitors are measured in F instead and then plugs the values in assuming I made a mistake and meant 22uF.

It then explains that in practice other factors would dominate and the frequency is so high traditional analysis doesn't make so much sense.

https://www.phind.com/search?cache=ns6ojo1obnomkerccup9nm3r

CamperBob2

0 replies

21h11m

2024-09-05 21:20:42 UTC

At 400 THz any real-world capacitor will look inductive. :-P

Although it's not gonna look like 22 uH.

itorcs

2 replies

23h0m

2024-09-05 19:32:25 UTC

Been subscribed to phind pro for the last 5 or 6 months I think? Feels like the pollution from search results has gotten a bit better but it sometimes still messes with answers when I ask a follow up question. Like I will reference the answer aboves code in my question, and the next answer will answer based not on the conversation but some code in the search results. I'm not versed enough in rag to know how you would fix that with like a prioritization or something. Other than that I'm REALLY looking forward how you guys tackle your own artifacts in the web interface. Something about that ui in Claude's version of artifacts works really well with my work flow when using the web, plus having the versions of different files, etc.

rushingcreek

1 replies

22h56m

2024-09-05 19:35:53 UTC

We're working on artifacts :)

May I ask which models you're seeing the pollution with?

itorcs

0 replies

22h37m

2024-09-05 19:55:23 UTC

Has happened with both 4o and sonnet, probably 4o more if I had to say for sure. I need to use 405 more to see if it has that same problem. I guess I didn't think about how the issue might be better or worse depending on model, I assumed the rag stuff applied the same

hleszek

2 replies

23h10m

2024-09-05 19:22:25 UTC

Are the weights available since it's based on Meta Llama 3.1 405B?

TacticalCoder

1 replies

22h39m

2024-09-05 19:52:35 UTC

Serious question: does the Meta LLama ToS / EULA even allow fine-tuned models based on Llama to be used for commercial purposes without making the weights available?

darwinwhy

0 replies

21h47m

2024-09-05 20:45:29 UTC

I believe it does unless you're another tech giant with billions of users / revenue.

ashleyn

2 replies

23h49m

2024-09-05 18:43:06 UTC

I was a Phind user for a bit but I've switched to Perplexity lately. Anyone know how the two compare?

rushingcreek

1 replies

23h48m

2024-09-05 18:44:01 UTC

We should have higher quality and faster answers across the board with this new update. Would love to hear your thoughts.

fshr

0 replies

22h6m

2024-09-05 20:26:12 UTC

It'd be cool if you showed off and did your own comparison and posted it on your blog. It'd also be cool if your blog was sorted newest to oldest - it's currently the reverse.

rawrawrawrr

1 replies

19h29m

2024-09-05 23:03:12 UTC

Title says "for everyone", but post says "Phind-405B is available now for all Phind Pro users". I guess everyone on earth has paid for Phind :)

rushingcreek

0 replies

19h20m

2024-09-05 23:11:53 UTC

The "for everyone" part is about the new Phind Instant, trained using similar data to Phind-405B, which is great at fast information retrieval

mountain_goat

1 replies

14h58m

2024-09-06 03:34:01 UTC

I've been using Phind this past week and it's been excellent.

One of our vendors insisted on whitelisting the IPs we were going to call them from, and our deployments went through AWS copilot/Fargate directly to the public subnets. Management had fired the only person with infrastructure experience a few months ago (very small company), and nobody left knew anything about networking.

Within about a week, Phind brought me from questions like "What is a VPC?" "What is a subnet?" to having set up the NAT gateways, diagnosing deploy problems and setting up the VPC endpoints in AWS' crazy complicated setup, and gotten our app onto the private subnet and routing outbound traffic through the NAT for that vendor.

Yes, it occasionally spit out nonsense (using the free/instant model). I even caught it blatantly lying about its sources once. Even so, once I asked the right questions it helped me learn and execute so much faster than I would have trying to cobble understanding through ordinary google searches, docs, and blog posts.

Strongly recommended if you're ever wading into a new/unfamiliar topic.

thruway516

0 replies

13h54m

2024-09-06 04:37:40 UTC

How does it compare to Perplexity or even plain vanilla ChatGPT? Did you specifically seek to use Phind because you weren't satisfied with others? Or did it just happen to be the first one you used?

jmakov

1 replies

23h1m

2024-09-05 19:30:41 UTC

Phind is the best productivity booster I've found in the last years. Congrats and keep up the great work!

rushingcreek

0 replies

21h18m

2024-09-05 21:13:38 UTC

Thank you!

RIMR

1 replies

19h14m

2024-09-05 23:17:44 UTC

The AI generated page that attributes Steve Jobs' success to Paul Graham is killing me.

swyx

0 replies

19h1m

2024-09-05 23:31:06 UTC

you have cause and effect reversed. PG's post directly cites Brian Chesky and Steve Jobs in the post as founder mode inspirations.

Citizen_Lame

1 replies

23h13m

2024-09-05 19:19:10 UTC

Can the new model provide creative writing with high token context or is Phind purely focused on answering questions (enhanced search).

rushingcreek

0 replies

23h11m

2024-09-05 19:20:40 UTC

It can, via our Playground mode, but it’s not optimized for that. Phind-405B does seem to generate good poems though.

thibran

0 replies

23h42m

2024-09-05 18:49:39 UTC

wow it finds the correct answer to a Scheme niche language question.

"How to replace a string in Gerbil Scheme?"

nonotanymore

0 replies

9h59m

2024-09-06 08:33:10 UTC

This is incredible. Truly the evolution of documentation! It makes going through docs in Python so much easier.

neilv

0 replies

15h20m

2024-09-06 03:11:57 UTC

Their example is: stealing content, to make a sketchy-looking pile of worthless BS.

Then I realized that this pattern is the biggest application of LLMs right now.

So I guess they're just acknowledging the target market.

lxe

0 replies

19h59m

2024-09-05 22:33:15 UTC

Is Phind similar to Perplexity?

kamatchu07

0 replies

17h12m

2024-09-06 01:19:38 UTC

Is not for everyone as title said . It is just for pro users . The title is confusing . Can you please change it

jncraton

0 replies

23h59m

2024-09-05 18:32:54 UTC

It would be nice to see the Phind Instant weights released under a permissive license. It looks like it could be a useful tool in the local-only code model toolbox.

jeffybefffy519

0 replies

12h23m

2024-09-06 06:09:13 UTC

Ive given up on phind. The company seems like a bit of a dodgy black hole in terms of what they do with information (the answer is - i dont know, but i requested to try find out so my company could use them and got no reply). Seems untrustworthy…

ewuhic

0 replies

6h25m

2024-09-06 12:07:24 UTC

How well does Phind operate in uncommon languages, say, the Uyghur language?

enum

0 replies

3h2m

2024-09-06 15:29:45 UTC

This is going to be renamed to Llama Phind 405B, right?

davidcollantes

0 replies

21h13m

2024-09-05 21:19:01 UTC

The “About” is not “Who we are” at the time I am typing this. Please add information about the company, founders, etc.

It looks good, thought!

codedokode

0 replies

13h58m

2024-09-06 04:33:52 UTC

I think that LMM should not produce answers to questions. Instead, they should generate keywords, make keyword search and give quotes from human-written material as an answer.

So what LLM should do is only search and filter human-written material.

An example of search query is "What is the temporal duration between striking adjacent strings to form a chord on a guitar?". Google (being just a dumb keyword search engine) produces mostly unrelated search results (generally answering what chords there are and how does one play them). Phind also cannot answer: [1]

However, when I asked a LLM what keywords I should use to find this, it suggested "guitar chord microtiming" among other choices, which allows to find a research work containing the answer (5 to 12 ms if someone is curious).

[1] https://www.phind.com/search?cache=u4xiqluairg3zkdaxstcr39v

atemerev

0 replies

9h36m

2024-09-06 08:56:16 UTC

Some time ago, you promised to release the weights for Phind-70B. Do you still plan to do this?

asadm

0 replies

22h40m

2024-09-05 19:51:36 UTC

Any perplexity pro user tried Phind? how good is it? specially for code/tech research etc.

amelius

0 replies

7h4m

2024-09-06 11:28:28 UTC

Government institutions (academia) should be training these kinds of networks (or funding it) so they become accessible to everybody and truly "open".

TekMol

0 replies

9h49m

2024-09-06 08:43:33 UTC

Sneaky title.

    Bread and water for everyone

now apparently means

    Bread for our customers, water for everyone

J_Shelby_J

0 replies

22h12m

2024-09-05 20:19:57 UTC

Accessible by api?

11101010001100

0 replies

21h43m

2024-09-05 20:49:03 UTC

Looks cool, but anyone not familiar with 'founder mode' will be confused....