Compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT

I reckon these days search is pretty difficult and everyone knows how to game it. I recommend using a search engine that lets you effectively change which sites are shown. You can do this with Kagi, or with Google's Programmable Search Engines - I'm sure there are more too.

In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.

An ability to upvote results for a given query seems tantalizing but I bet it would be gamed too. The DIY approach seems to be the only tractable one.

In my case I only only results from domains I believe are correct. The whitelist approach does have downsides. Usually I'll vet new potential domains through social means like Reddit and this site, rather than identifying them through the search results. I believe there's an inherent tradeoff between discoverability and the gameability of the results.

Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.

I also personally disagree that yt-dlp is the "correct" result for the average user when they search Youtube Download. I highly doubt the average user would know or care to use the command line. A website front end would be more actionable for them.

In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.

Funnily enough, lately I've been prioritizing YT videos more when searching. So many sites now are just regurgitated SEO farms with minimal quality, and easy to see why: it's minimal effort to produce and cheap to host. But making a video takes time and effort, so has a much higher barrier to use as a click farm.

More than once when traditional search failed me, I went to YT and found some video from 2009 clearly and eloquently explaining what I'm looking for in detail, and without any distractions because the person authoring the video clearly didn't specialize in the media format or show interest in experimenting.

I've found it to also be a better source when looking a product to buy. Want to know which fan to get? Turns out there's a channel from a dedicated guy who keeps finding ways to test different fans and their utility and with multiple videos demonstrating his approach and findings. The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.

Would a browser feature that skipped to the relevant parts of the video based on closed captioning and understanding search intent be useful? It seems like this would be a good way for Google to fight to stay relevant in UX vs having the chat bots just quickly spitting out a readable answer. Hunting through ad laden webpages is annoying. Seeking to the relevant section of the video is a solvable problem, especially for videos above some viewership threshold.

I've definitely seen Google do this already: https://searchengineland.com/google-tests-suggested-clip-sea...

Google seems to be taking much more advantage of YouTube's transcription feature lately. The first addition was the (ok, gimmicky) animation on the Subscribe button when someone says the dreaded like. Hopefully a sign of things to come.

Overall AI summaries are very welcome for a certain subset of YouTube which is sadly dominated by sponsored, clickbait, and ad-driven content.

Seeking to the relevant section of the video is a solvable problem

...and it has already been solved, though partially: SponsorBlock allows people to add a "Highlight" section to a video, which denotes the part of the video which the user most likely wanted to see (sans the "what's up guys", "like and subscribe", etc.)

Of course, it's not perfect: it relies upon humans doing the work, though some may see that as a positive over something more computerized.

Didn’t Google try this already? It seems useful to me, at least. IMO the next frontier of search is not better hypertext, it’s podcasts, audio, and video.

Do you have some tips for finding concise videos that answer the question you are asking? I am finding more and more obvious LLM bullshit in results, so I am willing to try some other tactics. But I am not ready to spend the minutes watching videos to see if it is actually relevant or a waste of time, always artificially long to increase ad revenue.

For me, it really depends on the type of video. For fixing cars, I'm usually looking for something specific enough that there isn't a lot of chaff. It was probably recorded and edited on a phone just to splice the clips together. Probably the default thumbnail that youtube extracted from the video.

For product videos, if Project Farm did it, look there first. Otherwise, I look for someone has a lot of videos for competing products with basically the same format, not over 10 minutes.

Tech videos are the hardest, I often still prefer text. Maybe look for links to the docs in the description? I still get duds though.

I don’t know much about fixing cars, but yeah, YouTube is a treasure trove for tacit knowledge.

Wish I did, but here you're at the algorithm's mercy, unfortunately. One possibility is subbing/accruing watch time on channels that you find provide you the right value, so that the algorithm might recommend similar channels on other subject matters.

But making a video takes time and effort, so has a much higher barrier to use as a click farm.

The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.

This won't be the case for long. YT is already starting to be polluted with spam and AI generated content, which will get more and more common. The same thing that happened to the web in text form, will happen to videos.

I think the only solutions are using allowlists for specific domains, and ironically enough more AI to filter specific results. Or just straight up LLMs instead of web search, assuming they're not trained on spam data themselves.

One critical difference is the date attached to youtube videos. It's easy to verify that a video was made before this tech was available, but you can't do that with websites, or search engine result pages.

It does limit utility for more modern needs, unfortunately.

Note that the problem of filtering bad data out of learning material isn’t inherently easier than filtering same out of search results.

Yeah. I was recently looking for videos comparing two smartphones and among top ranked videos there were videos that just show the phones side by side and the video consists of showing specs side by side and videos that just have LLM-generated text, added to the video with TTS.

That's curious, I generally hate video due to inability to glance over content, and the few attempts I made to actually find useful information I searched for resulted in... spammy extra low effort video content that did not answer my questions.

Depends on what you’re looking for. A blog post about how to play Search and Destroy by The Stooges is not as useful as a video of James Williamson himself showing you the riffs!

I'm a big fan of the non commercial site search engines because of the gaming aspect. If you're not generating revenue from the clicks the game mostly goes away.

I'm not saying people aren't entitled to make some money, but it clearly incentivizes user hostile behavior.

Maybe make it an option because legitimate sites like journalism also use this model.

Subscription model like Kagi seems to work pretty well against gaming the results.

Their only remaining incentive is to be good enough that people keep paying for the service.

It works not because they're somehow smarter or have more resources than Google at detecting spam/SEO, it's because unlike Google (and other ad-supported search engines), they make money from result quality and have an interest in blocking spam.

Google on the other hand makes money off ads (whether on the search results page itself or on the spam sites), so spam sites are at best considered neutral and at worst considered beneficial (since they can embed Google ads/analytics, and make the ads on the search results page look relatively good compared to the spam).

Black-hat SEO has been around since the early days of search engines and they managed to keep it at bay just fine. What changed isn't that there was some sudden breakthrough in malicious SEO, it's that it was more profitable to keep the spammers around than to fight them, and with the entire tech industry settling on advertising/"engagement" as its business model, the risk of competition was nil because competitors with the same business model would end up making the same decision.

The same reason is behind the neutering of advanced search features. These have nothing to do with the supposed war on spam/SEO, so why were they removed? Oh yeah because you'd spend less time on the search results page and are less likely to click on an ad/sponsored result, so it's against Google's interests and was removed too.

Kagi works because there is no incentive for SEO manipulators to target it since their market share is so small.

Super tinfoil hat to believe Google wants to send users to blog spam websites (e.g. beneficial to Google).

Anytime there is money to be made, there is an effectively infinite amount of people trying to game the system.

Google is a complex system so “want” can just include we are making money from the blog spam and while we don’t like it other things take priority over fighting it as effectively as we could.

It's never tinfoil-hat to assume that a corporation is, at very least, making sure not to fight too hard against any activity that brings it more revenue.

But the author tried Kagi and the results don't appear to be noticeably different, filled with scammy adspam just like Google and Bing. Kagi's results seem to mostly aggregate existing search engines [1], so this isn't much of a surprise. Perhaps a subscription-based service that operates an index at Google's scale might help, but no such thing exists to my knowledge.

[1] https://help.kagi.com/kagi/search-details/search-sources.htm...

Right, but Kagi has built in tools to make it easy to fix that. Blocking those spammy sites from ever showing up again. Moving certain sites up the ranking, and so on. These features mean that over time my Kagi results have become nearly perfect for myself.

I have a hard time believing it's so difficult for a search engine to distinguish between a credible, respected website that has been around a while with some generated garbage that exists to be a search result. We humans can tell them apart, so in principle, computers can too.

Yes, this should be table stakes for a classifier - a company with the resource of Google can definitely solve that problem if they weren't themselves in the business of spam (advertising) and benefited from spam sites (as they often include Google ads/analytics).

Google is quite quick in plugging holes in AdSense but AdWords.

table stakes

Always “table stakes”. Do you think in buzzwords also? I’ve always wondered this. Or do you think normal words and then translate it into this bandwagoning / membership proving garbage ?

I guess this brings up the question of how good are humans at doing this across a wide number of domains on average?

The other question I have is how long do these garbage results stay up for a particular query on average?

it just takes too long to get info from videos.

I can’t wait until video transcripts get fed into LLMs just to eliminate the whole “This video is sponsored by something-completely-unrelated, more about them later. What’s up Youtube, remember to like, share, subscribe… 5 entire minutes pass on similar drivel… the actual thing you want, but stretched out to an agonizing length”

You need SponsorBlock.

Usually people leave a "highlight" marker which tells you where you're supposed to jump to. Along with the regular "This video was brought to you by <insert>VPN".

Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.

That was a decade after Google was created and people certainly understood SEO and Google was constantly updating its algorithm to punish people who were trying to game the algorithm.

The wikipedia page on "link farming" for example references it happening as early as 1999 and targeting SEO on inktomi:

https://en.wikipedia.org/wiki/Link_farm

I remember some internal presentations at Amazon around ~2004 about how boosting Google SEO on Amazon web pages increased traffic and revenue (and Amazon was honestly a bit behind-the-curve due to a kind of NIH syndrome).

At the time it seemed like Google was winning, though. SEO seems to have gotten really good, or maybe Google just gave up.

Can someone tell me why Bing, and thus DDG, has switched to prioritizing local results? I'll search the most inane things, like lyrics to a song, and get results for local businesses containing maybe one word in common.

It's most frustrating with phone numbers. I picked up the habit of searching the random numbers that called me, to try and find out if they were possibly important. I used to get a bunch of spam sites that clearly existed to profit off me making those searches.

Both Google and DDG have removed those spam sites, even though they were useful at times. Google will tell me the number is in some random PDF that contains a few of the digits, then no other results. DDG will say the top result is my local police department, something that freaked me out the first few times.

DDG is just repackaged Bing. Always has been. I remember looking into them when I was ready to job-hop many years ago, and they asked for dedication to their search engine as their foremost requirement for employment. It's the "drop-shipping" equivalent of search engines.

hope kagi takes ddg place in terms of adoption. never really liked ddg even though i always care about privacy.

I really don't get that sentiment. Currently Kagi is just as dependent on Google as DuckDuckGo is on Bing. That might only be temporary of course and Kagi does seem to be working on a search engine of their own.

Rather than wanting Kagi to take the place of DuckDuckGo, it would would be better if Kagi could take users from Google, and then when ready, drop Google as a search provider.

Kagi mixes google, bing, some non-profit small-web SE, and their own index.

I don't think they use Bing, but yes, Google, Marginalia, Yandex, Brave and others. I still fail to see how that's different to DuckDuckGo, who also run their own crawler. It's really weird that people are almost hating on DuckDuckGo for how they run their search engine, while applauding Kagi, for doing the same, but with a different business model.

I also assume that Kagi uses some shady residential IPs proxies and similar tricks to scrap Google while DDG has access to the Bing API.

You can buy access to the Google Search API, which is what I assume Kagi does. Building your product on being able to circumvent some Google restrictions seems like a bad business move, if you can buy the same service for a reasonable price.

Where can I buy it?

Only if they changed that (which they might have as part of their cost-optimization). They said they mixed bing and google results back then.

Kagi should hire the Marginalia author.

We already include Marginalia results in Kagi [1]

https://help.kagi.com/kagi/search-details/search-sources.htm...

DDG used to be the HN darling and you would get downvoted for saying anything negative or even insinuating that they are relying on Bing. Now the spot has been overtaken by Kagi but it looks like it suffers from the same problems. The counterargument that they have their own index as well is the same that was used for DDG, when the reality was that it was only used for widgets and other fluff. Let's see how it plays out for Kagi.

I'll search the most inane things, like lyrics to a song, and get results for local businesses

Query: “I’m coming out of my cage…”

Result (Ad): “You’ll be doing just fine with these amazing year-end closeout prices at Al’s Discount Car Barn. Gotta come down—you’ll want it all!”

It was only a list, how did it end up like this?

Ads would make sense, but there's no way my local city council is paying Bing and they are the most frequently listed result.

I’m confused, you are searching for, specifically, a local phone number and you are upset that the machine interprets that as you looking for a local result? That’s what most people expect from a local number search.

Perhaps the incorrect thing is not your internet search results, but actually your phone carrier for lying to you and telling you that a caller has a local number?

If I search for a ten digit number, it is not helpful to return a local business that shares the last four digits.

The number is local, and occasionally I've searched and found the number was a local clinic or business that had legitimate reason to call me but not leave a message. In those scenarios, close to all ten of the numbers are found on the page.

The top result being my local police department because it shares the same area code and has maybe one other number in common is clearly a bad result. It does this even if the phone carrier isn't lying to me and the caller does have a local number, like the increasingly common political spam calls.

Yeah, I've noticed this as well with DDG recently: even with the localised checkbox disabled it still prioritises them, which often is very frustrating as the results are then almost totally useless.

However, more generally, I've personally found that DDG (and maybe Bing's then?) localised results are just really bad, and have been for the multiple years I've been using DDG and it's had this feature: I'm in New Zealand, and enabling localised / region-based search still often provides results to pages with TLDs like "co.uk", ".ca" and ".pl" (these latter are really common for content-generated spam in my experience), which I just can't understand...

Unfortunately, I have found that Google's results are usually a lot better in terms of being "location-aware" than DDG, at least when that's what you want...

That's a bit surprising that you're seeing spam sites with .ca, those are illegal here and all .ca domains must be registered by someone in Canada.

You can report them: https://ised-isde.canada.ca/site/canada-anti-spam-legislatio...

I habe the same experience from Germany. There's the slider but it's not doing mich.

If you’re going to search for phone numbers you’ll want to ensure you enable verbatim searching under tools on Google, and put the number in quotes, perhaps in “xxx-xxx-xxxx” OR “(xxx) xxx-xxxx” forms. Many of the sites you mention are fake sites with fake contacts just for ad serving, and I’ve read in some few cases the scammers seeded the spoofed numbers they appear to call from on to the sites they control to see who googles their phone numbers.

Reverse spoof the numbers of FTC investigators and Google employees?

Man, thank you for saying this. Stuffing results with geolocated local junk despite explicitly opting out by choosing “All regions” is so frustrating. This wasn’t happening a year or two ago. I submit negative feedback about it constantly, but I guess not enough people are doing that for anyone to notice or care.

I’ve also noticed a significant increase in attempts to stuff news into regular search results. I really do not appreciate being force-fed mental health poison. I don’t need it ever, but I especially don’t need it when I’m searching for some specific technical thing and then get emotionally sabotaged by some clickbait headline because … why? Some bullshit KPI? Why are tech companies so obsessed with pushing news into every orifice?

Hah, calling the news "mental health poison" is the most accurate thing I've read all day.

Bing, and thus DDG, has switched to prioritizing local results

From what I can tell this is an issue with the Bing API that DDG uses that the DDG folks have been unable to resolve. I've tried many identical queries between DDG and Bing and while Bing does occasionally return incorrect local results, the completely irrelevant local results that appear on almost every DDG search do not seem to happen with Bing itself.

From what I understand, DDG is aware of the issue. I don't know why it isn't more of a priority.

Long time DDG user (>10 years) here, and it’s astounding to me that they haven’t prioritized making their own independent index to switch off Bing. I would have expected them to do it like 5 years ago, but there’s afaik no initiative to do so. It’s unfortunate and am now trying other engines like Brave search.

Maybe it was an attempt to make better their results for local results?

When searching for results from my country in DDG (picking the country in the drop-down below the search box) still returned results from the USA or other countries. Even when searching stuff in the local language. Maybe they tried to fix that because it really sucked, so much I never used it again for searching into local websites.

This is the one area it still ignores my location. I live in a town named after a UK city, there's several bigger towns in the US with the same name. I just searched "McDonalds city name." I got results for the locations at least half the US away from me, as well as Uber Eats GB.

Nearly every local search is a leading indicator of buying intent and, therefore, is worth more money when served as a response instead of an authoritative response.

In my country (Colombia) Google still has not removed those spam sites that just generate all possible numbers.

you can use true person search for numbers

I suspect it's a failure to distinguish mobile searches (where people are legitimately looking for a business) from desktop searches.

I'm in the camp of those who think Google's results are still very good. I admit I use adblock (uBlock Origin) and won't even try to disable it.

I understand the author's point of turning off their ad blocker "to get the non-expert browsing experience" but then they could make a different test with uBlock on for every query and see how it goes.

It's also a bit inconsistent to expect results for downloading videos mentioning yt-dlp while trying to emulate "the non-expert browsing experience"... Yt-dlp is a command-line Python utility. Talk about non-expert! Most people don't know that videos are files that can be downloaded; of those who do, most don't know about the command line or Python.

Yet when searching for "how to download youtube videos" the first result I get on Google is a link to a service called "savefrom.net", which appears to work well and does not seem to be a scam. This would qualify as "very good" in my book.

When searching for "how to download youtube videos from the command line" the first few results are about youtube-dl, including links to github and superuser. Granted they don't mention yt-dlp, but youtube-dl is a good start.

When I do a Google search in an Incognito tab for "how to download youtube videos", the first two results I get are the following.

- https://msunduziassociation.online/perfect-online-videos/

- https://gssaction.org/program-all-in-one-media-solutions/

I would certainly put those in the "Terrible" category like the author.

I'm curious: what is the rationale for "in an incognito tab" being part of the test harness?

It seems pretty arbitrary to me to disable one of the key features - in this case personalization - of the software being evaluated.

Or is the evaluation not between "search engines" but rather "search engines without personalization"? If so, then this restriction does make sense. But that is not the evaluation that "normal users" are interested in.

I'm curious: what is the rationale for "in an incognito tab" being part of the test harness?

It's the closest we can easily get to the 'average user experience'. Someone who has a long account/cookie history with Google has plausibly trained the site to return more relevant results through implicit user-curation of avoiding obvious-to-them SEO-spam on other queries.

If we posit that every user eventually trains Google to avoid SEO spam, then this begs the question of why Google(/Bing) don't eliminate the SEO spam in the first place.

Besides that, it's not obvious why search engine personalization should dramatically change the basic utility of search results. We should expect personalization to mostly address ambiguities: is 'the best way to set up tables' asking about furniture assembly/carpentry or SQL? None of the author's queries for this article supported such ambiguities, and besides that the results returned (see the final appendix) aren't[†] valid answers to a different interpretation of the question.

[†] -- I think I'd quibble about the 'adblock' question, since a reasonable person might still find an adblocker that works but participates in the 'acceptable ads program' to be sufficient.

It's the closest we can easily get to the 'average user experience'.

Maybe it's the closest we can get (though I doubt it), but it definitely isn't close enough to tell us anything about the "average user experience".

The average user has been using google for years, without taking any steps to avoid personalization. An incognito session (on a browser / machine / network that is probably fingerprinted...) is pretty much the opposite of that typical usage pattern.

I recognize that just writing a blog post or comment on HN is not a research project so needs to do something quick, but I think it mostly invalidates the experiment. What would get closer would be to devise a few user personas and attempt to search and browse for awhile within those personas before trying the experiment. Or much better yet, put together a focus group comprised of real people within the personas you're interested in, and run the experiment using their real accounts.

If we posit that every user eventually trains Google to avoid SEO spam

I don't think it's that, I think it's that every user trains it to return results more likely to improve the metric of "more likely to click one of the links", and I think that makes it more, not less, likely that they see what most of us here consider to be spam.

But I don't know! Maybe that's not what this experimental setup would show. But it would be a lot more enlightening than a setup using a fresh incognito window, which reflects the usage pattern of a proportion of search queries that is a tiny rounding error above zero.

Why are you assuming all users are logged in to google all the time?

Google has billions of user accounts ....

It's the closest we can easily get to the 'average user experience'

You wouldn’t be really taking the average here though would you? You would be capture the experience someone might have if they were in incognito, using google for the very first time, or using google on another device for the every first time, but not the “average experience”.

Google gets paid when you click on an ad. It's reasonable to guess you're not going to click on too many scam software ads with your software engineer profile. So naturally you'll be showed less of them.

In this thread we can see people both using incognito tabs seeing different results, it will only become worse to compare if they are using personalized results.

Did you click either of those links?

Both seem to do the job of downloading a youtube link to mp4 for free.

Did you click either of those links? They are not YouTube video downloaders, they just link to another downloader. There is nowhere on those links to even put a YouTube URL.

Are you seriously suggesting that a website with the following "About us" with only a link to another YouTube video downloader is itself a good YouTube video downloader?

Good Samaritan Support Action is to reawaken the Body of Christ to receiving the extravagant love of The Father, as well as our call to respond to this love by loving God with all of our hearts, souls, strengths, and minds. In order for people’s hearts to be linked to the heart of our Heavenly Father, we want to foster and facilitate the establishment of a culture of love in our churches and ministries.

so, there is one extra click... But for the user, the site does the job and takes an extra 1 second.

Ideal? No. But it does the trick.

Not GP, but navigating to an unrelated scammy site just having a link to the actual site is a terrible and unethical job by Google. Imagine if you search "youtube" and the top result is not YouTube but some scammy site just having a link to YouTube. It's not about click counts, if the youtube downloader has bad UX and requires extra clicks, it's a bit inconvenient but ok.

Those are both garbage/scam sites

I get savefrom.net in both Incognito and normal tabs, uBlock or not. I have no idea why you get crap results that are somehow different. uBlock doesn't change google results in Firefox for me at all. It seems you get crap added, not removed.

I searched with Chrome, perhaps that's the difference. Firefox also blocks some ads out-of-the-box even without uBlock, so maybe it was already blocked.

It could also be related to targeting, like time zone, location, IP address, age group etc.

I get the same search result in Edge as in Firefox. Can't test in Chrome, but something seems strange.

My top 2 (incognito) are blog posts from pcmag.com and zdnet.com listing 5 ways to download YT videos. Maybe it's blogspam, but the listed services seem valid at first glance.

savefrom.net is the 5th result (2nd page underneath 5 youtube videos)

Edit: This is from the US. If i had to guess, these are regional differences. What country are you in?

I got similar to you; I'm in Australia.

I'm also in the same camp who think search results from Google is very good but ChatGPT based search with RAG is better, granted it's a paid version. The latter however is kind of experimental, personally would love to have another column on ChatGPT with RAG (Bing) and the fact the author ignored RAG is rather strange.

The topic of control (in ChatGPT like models) explained: https://arxiv.org/pdf/2311.11701.pdf

For those (like me) wondering what RAG means: “Retrieval Augmented Generation (RAG) represents a groundbreaking approach in information retrieval, where the accuracy of search results directly influences the quality of generated answers. In essence, RAG combines traditional search mechanisms with Large Language Model's ability to understand and generate answers.”

(https://www.linkedin.com/pulse/how-we-increased-search-accur....)

An adblocker is necessary, and IMO a script blocker as well. I feel vaguely like search has gotten worse over time, but it is not a huge problem—usually a good site is on the first page or two, and so I can just go check them out.

But if clicking a site meant I would be under attack, that really increases the stakes, I start to care strongly about the absence of bad sites, not just the existence of a good one.

Other than that, people need to be trained to not download programs from websites in general. I think this has gotten better over time? This is just a human mistake. Maybe Google could suppress sites that link to executables. It must, right?

It would suppress linking to malware executables, but just general programs I don't see why they would.

By the time you know enough about a site to download some random executable off it and run it, you know more than enough to just enter the URL, so there’s no point to having it show up in search results.

cross-posted: Did you try using savefrom.net? You can type "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and hit "Download". Then you'll get a new tab that tries to get you to install malware. If you decline to install it, the new tab takes you to the malware's homepage. If you close the tab and go back to the original tab, savefrom.net presents you with an error message saying "The download link not found." and does not help you download the video.

savefrom.net used to be good but it seems they've switched their MO. plenty of decent alternatives filled the gap though.

Can you name the alternatives, and are they present in the search results?

they could make a different test

The takeaway I got from the article is everyone can make their own test, as opposed to relying on other people's sentiments and memes about X is bad or Y is good.

Trying to emulate a non-expert experience without workarounds is not the common usage pattern since everyone familiar with their favorite tools have ways to get more value out of them, but this article presents a way of constructing an experiment (this is why I chose these queries, this is how I ranked scams, etc.), and I think people should follow this same spirit to evaluate if they are stuck in a local optimum with their current choice of tools.

Put me in the camp of google and the rest are horrible for all but very specific/unique technical terms, ie weak neutral currents. Anything that is more "everyday life" is an exercise in futility sorting through trash, often without even the terms you are looking for. And good luck with "verbatim" searches - either ignored or zero results.

Another smash hill comment. Be wary of this when criticising a trillion dollar company. They can afford the shills.

What always confuses me about the „search has gotten so bad“ mentality is that it is often based on anecdotal evidence at best, and anecdotal recollection at worst.

Like, sure, I have the impression that search got worse over the last years, but .. has it really? How could you tell?

And, honestly, this should be a verifiable claim; you can just try the top N search terms from Google trends or whatever and see how they perform. It should be easy to make a benchmark, and yet no one (who complains about this issue) ever bothers to make one.

Dan at least started to provide actual evidence and criteria by which he would score results, but even he only looked at 5 examples. Which really is a small sample size to make any general claims.

So I am left to wonder why there are so many posts about the sentiment that search got worse without anyone ever verifying that claim.

I think the point he's trying to make that the search results page from the mainstream search engines are a minefield of scams that a regular person would have difficulty navigating safely.

If he was looking at relevance, yours would be a solid point, but since most of the emphasis is on harm, a smaller sample works. Like "we found used needles in 3 out of 5 playgrounds" doesn't typically garner requests for p-values and error bars.

I think this is a good illustration of my frustration with this discussion: I don't think search has gotten bad, I think the web has gotten bad. It's weird to even conceptualize it as a big graph of useful hypertext documents. That's just wikipedia. The broader web is this much noisier and dubious thing now.

That's bad for google though! Their model is very much predicated on the web having a lot of signal that they can find within the noise. But if it just ... doesn't actually have much signal, then what?

On the one hand, I'm not sure the data corroborates that. If this is a web problem and not a search engine problem, then I'd expect every search engine to have the same pattern of scam results.

I'd also argue that finding relevant results among a sea of irrelevant results is the primary function of a search engine. This was as true in 1998 as it is today. In fact, it was Google's "killer feature", unlike Altavista and the likes it showed you far more relevant results.

If the web is being polluted by a nefarious search engine provider that is excluding the polluted pages from their algorithm, you wouldn't see the same pattern across search engines

Not saying or even suggesting that's happening, but the logic isn't airtight

Well, there's always the Münchaussen trilemma, by which no reasoning is airtight.

Relevant is a difficult concept to agree on. In 1998 it was more about X != Y, that is being shown legit pages that just were not the correct topic.

These days the results are apt to be the correct topic, but instead optimized for some other metric than what the user wants. For example downloading malware or showing as many crypto ads as possible.

I don't expect every search engine to have the same scam results. Scammers target individual search engines with particular methodologies. Google does a lot of work to prevent crap on their engines, the issue is the scammers in total do far more.

But there's still plenty of signal. It isn't as if there are no working YouTube downloaders, or factually correct explanations of how transistors work. It's just that search engines don't know how to (or don't care enough about) disambiguating these good results from the mountains of spam or malware.

I think that both of you are correct. The internet has much more "noise" than in the past (partially due to websites gaming SEO to show up higher in Google's search results). As a result, Google's algorithm returns more "noise" per query now than it used to. It is a less effective filter through the noise.

Imagine Google were like a water filter you install on your kitchen faucet to filter out unwanted chemicals from your drinking water. If as the years progress your municipal tap water starts to contain a higher baseline of unwanted chemicals, and as a result the filter begins to let through more chemicals than it did before, you'd consider your filter pretty cruddy for its use case. At the bare minimum you'd call it outdated. That is what is happening to Google search

The web has gotten bad because of what big search engines have encouraged. If they stopped incentivizing publishing complete garbage (by ruthlessly delisting low quality sites regardless of their ad quantity, etc) then maybe we'd see a resurgence of good content.

The web is bad because it is both popular and commercial. Every now and then I fantasize that just finding a sufficiently user-hostile corner would suffice to recreate the early internet experience of an online world nearly exclusively populated by anticommercial geeks.

I agree we can say "this is a minefield of scams" without doing a comparison.

There still is a question about when it got bad--I think Dan mentions 2016 as a point of comparison, and there were plenty of scams back then, so you might wonder whether the days when a query wouldn't return many scams.

If you go back far enough, then there wasn't the same kind of SEO, and Internet scams were much smaller/less organized, but that's a long time ago.

I think the automation tools for scams are what the major change is. In the distant past it was humans doing this, now I'm guessing there are a few larger businesses and likely nation states that have a point and click interface that removes 99% of the past work.

Probably for the same reason that there are so many more posts about anything that make claims than that explore evidence systematically, especially when the people making the posts stand to gain nothing by spending their time that way.

I encounter claims that "protobuf is faster than json" pretty regularly but it seems like nobody has actually benchmarked this. Typical protobuf decoder benchmarks say that protobuf decodes ~5x slower than json, and I don't think it's ~5x smaller for the same document, but I'm also not dedicating my weekend to convincing other people about this.

The problem with benchmarking that claim is there's no one true "json decoder" that everyone uses. You choose one based on your language -- JSON.stringify if you're using JS, serde_json if you're using Rust, etc.

So what people are actually saying is, a typical protobuf implementation decodes faster than a typical JSON implementation for a typical serialized object -- and that's true in my experience.

Tying this back into the thread topic of search engine results, I googled "protobuf json benchmark" and the first result is this Golang benchmark which seems relevant. https://shijuvar.medium.com/benchmarking-protocol-buffers-js... Results for specific languages like "rust protobuf json benchmark" also look nice and relevant, but I'm not gonna click on all these links to verify.

In my experience programming searches tend to get much better results than other types of searches, so I think the article's claim still holds.

I agree. You wouldn't use encoding/json or serde-json if you had to deserialize a lot of json and you cared about latency, throughput, or power costs. A typical protobuf decoder would be better.

So I am left to wonder why there are so many posts about the sentiment that search got worse without anyone ever verifying that claim.

I suspect it has gotten worse, so posts complaining about it resonate. But, it is not really a huge problem, and anyway it isn’t as if there’s much I can do about it, so I’m not going to bother collecting statistically valid data.

I think this is generally true about a lot of things. We should be OK with admitting that we aren’t all that data-driven and lots of our beliefs are based on anecdotes bouncing around in conversations. Lots of things are not really very important. And IMO we should better signal that our preferences and opinions aren’t facts; far too many people mix up the two from what I’ve seen.

When it comes to human psychology what we believe tends to be more important than what is when it comes to future predictions of our actions. If people think search sucks then it's likely they'll use less of it in the future and it opens up companies like Google for disruption.

Dan approached the problem from a qualitative perspective. Perhaps if more people took this approach over quantitative maximalism we would actually have products that don’t drive us fucking insane.

All that matters is the overwhelming sentiment that search has gotten worse, not the same fucking spreadsheet that got us here in the first place!

Yes to get an accurate comparison we would need to have results from queries 10 years ago.

I still remember myself having to really often go to page 3 and more of google searches to find things even really early on.

I think it has never been good, got a bit better before SEO farms took all the gain out. That's my feeling with nothing to back it.

> has it really? How could you tell?

Yes it has and for a certain class of queries it's not even open for debate, because Google themselves have stated they deliberately made it worse. And they really did, it's very noticeable.

This class of queries is for anything related to any perspective deemed "non authoritative". Try to find information that contradicts the US Government on medical questions, for example, and even when you know what page you're looking for you won't be able to find it except via the most specific forcing e.g. exact quoted substrings.

Likewise, try finding stories that are mostly covered by Breitbart on Google and you won't be able to. They suppress conservative news sites to stop them ranking.

15 years ago Google wasn't doing that. It would usually return what you were looking for regardless of topic. There are now many topics - which specifically is a secret - on which the result quality is deliberately trashed because they'd prefer to show you the wrong results in an attempt to change your mind about something, than the results you actually asked for.

What always confuses me about the „search has gotten so bad“ mentality is that it is often based on anecdotal evidence at best, and anecdotal recollection at worst.

I can't speak for anybody else, just trying to find stuff online, not writing a treatise about it or writing my own engine to outcompete Google. It's been asked many times here over the years and the answer was always explanations, never solutions.

Shittification does not happen overnight, but along many years. It started with Google deciding that some search terms weren't so popular: "did you mean...?" (forcing a second click to do what you intended to do in the first place) and went downhill when qualifiers to override that crap got ignored.

For me enough was enough when I realized that a simple query with three words, chosen carefully to point to the desired page, gave thousands of results, none of them relevant. YMMV.

Internet Archive remembers. https://web.archive.org/web/*/google.com/search/%2A

Find a query of interest, see for yourself (and take a snapshot of the present state for posterity).

The api enables more powerful queries, https://web.archive.org/cdx/search/cdx?url=google.co.jp*&pag...

Also try other search engines and languages.

Some things are easily quantifiable, but very few. Such as the number of ads per search. Back in the day google had at most 1 and it was visibly distinct from the rest of the links.

Otherwise, yeah, maybe search didn't degrade but the internet got more spammy. Or maybe users just got wiser and can see through the smoke screen better. Who knows...

Doesn't change the fact that today one has to know how to filter through pages of generic results made by low effort content farms. Results that are of dubious validity, which at best simply waste your time. Or through clones of other websites (i.e. Stackoverflow clones).

Search engines can choose to help with that (kagi certainly puts in the effort and I love it for that), or they can ignore the problem and milk you for ad clicks.

Anecdotal evidence is good enough for me.

Dan at least started to provide actual evidence and criteria by which he would score results, but even he only looked at 5 examples. Which really is a small sample size to make any general claims.

US NIST, in their annual TREC evaluation of search systems in the scientific/academic world, use sets of 25 or 50 queries (confusingly called "topics" in the jargon).

For each, a mandated data collection is searched by retired intelligence analysts to find (almost) all relevant result, which are represented by document ID in general search and by a regular expression that matches the relevant answer for question answering (when that was evaluated, 1998-2006).

Such an approach is expensive but has the advantage of being reusable.

Even without looking at the subjective quality of search results, the sheer user hostility of the design of the Google search results page is an obvious, objective instance of how search has enshittified.

That is, in the early days, Google used to highlight that "search position couldn't be gamed/bought" as one of their primary differentiators, ads were clearly displayed with a distinct yellow background, and there weren't that many ads. Nowadays, when I do any remotely commercial search the entire first page and a half at least on mobile is ads, and the only thing that differentiates ads from organic results is a tiny piece of "Sponsored" text.

So you're confused why other people aren't doing research for you and when they do provide some evidence, you dismiss it because it's not a large-scale scientific inquiry into search quality? Get frickin a grip.

I don't think this is a fair criticism.

1) The step where you evaluate "how they perform" is necessarily subjective.

2) you could design a study and recruit participants but that isn't something a blogger is going to do.

3) He does link to polls where people agree with the idea the result have gotten worse. Yeah, there are sampling problems with a poll, but its better than nothing.

In this case especially, the writer is answering the question: "Whose results are best according to my tastes?"

To do this you would need to have a comprehensive definition of "quality", and that's that easy, because at least partly it's subjective. It's also hard to include omissions in your definition of "quality" (and again, what should or should not be omitted is subjective as well).

For example, let's say I search for "Gaza"; on one extreme end some engines might only focus on recent events, whereas others may ignore recent events and includes only general information. Is one higher "quality" than the other? Not really – it depends what you're looking for innit?

All you can really do is make a subjective list of things you find important and rate things according to that, and this is basically just the same thing as an anecdotal account but with extra steps.

Shill comment. Be wary of this sort of thing when ever criticising a trillion dollar company. They can afford the shills.

I'm not able to reproduce the author's bad results in Kagi, at all. What I'm seeing when searching the same terms is fantastic in comparison. I don't know what went wrong there.

In the Youtube Downloader search, NortonSafeWeb is nowhere to be found. I get a couple of legit downloader websites, and some articles from reputable tech newspapers on how to use them or command line tools.

In the Adblock search, ublock Origin is #3, followed by some blogs about ad blocking ethics debates and the bullshit Google has been pulling recently.

In the wider tires grip search, #3 is a physics blog that dives deep into the topic.

In the transistors search, the first reddit link directly answers the question in very similar wording to the hypothetical correct answer spelled out in the rubric. 4/5 of the reddit results are on the correct topic, followed by two SuperUser questinos also on the correct topic, then some linus tech tips and toms hardware articles, also on the correct topic. No Quora questions.

In the vancouver winter snow search, the first several results are from local news papers talking about the anticipated effects of el nino on snowfall, and then a couple of high-quality blogs and weather sites.

Really wondering how Dan got such bad results.

------

Aside from that, the way that the author expects all the results to return the same kind of thing is just... weird? Like, that's not how search engines are supposed to work. A search that gives you 10 links to fundamentally the same thing is a bad search. Search results should cover a breadth of reasonable guesses for what you should be looking for given a query. If you search for "download firefox", and you scroll past the first 5 download links, then you're probably not actually looking for a download link and a blog post about firefox is not "irrelevant" and shouldn't be points against.

This opinion is even borne out in search engine quality metrics that have been industry-standard for decades, like mean reciprocal rank and distributed cumulative gain. What matters is how far you have to scroll to get to a good result, not what proportion of the first N results are good.

Same here, I was curious about Kagis low ranking, and couldn't replicate the search results. Also saw ublock Origin on #3, good results for tires, transitors and snow, etc. I've never used any of the Kagi search result weighing features.

Ctrl+F on the page for "System prompt" doesn't show any hits. Given how important those are for ChatGPT (another thought - was the author testing GPT3.5 or 4?) I'm not sure how much weight to put into the ChatGPT results either.

Not sure how much I can take away from this comparison.

I asked GPT-4 about Youtube Downloader and it rambled on about how downloading videos is against Youtube’s TOS and I should buy YouTube premium which has the download feature.

Getting any useful data from GPT-4 about anything even remotely “illegal” is a waste of time.

With a better prompt, you can get it to list some, but it’s very annoying to do so.

Mistral showed that their medium model is far better (yet not good), and the same prompt as in the article gives only one instead of 3 paragraphs of rambling about copyright, and then lists 3 categories of options with examples for each (not good, because ytdl is not one of those listed).

Funnily enough, both mistral and GPT4 apologize profoundly and almost with the same wording when asked "Why did you not mention the very popular, free and open source "youtube-dl" software?" and then mention how/where to get it and how to use it.

Funnily enough, both mistral and GPT4 apologize profoundly and almost with the same wording when asked "Why did you not mention the very popular, free and open source "youtube-dl" software?"

Likely because they were optimized for general population, which would not have a use for command line python utility.

I’m clear why they didn’t include it, I wanted them to tell me why, though. And I thought that both of them apologized in almost the same way, was funny.

It's plausible that mistral trained on GPT-4 output and therefore has similar mannerisms.

Works fine IMO

https://chat.openai.com/share/7dfd22d2-975c-4e6d-ba4b-c6b99e...

https://chat.openai.com/share/90fae0dc-f8fd-4603-835c-5f3a57...

claude.ai produced pretty reasonable results.

So it has also become one of the glitterati. That didn't take long.

The author already alludes to the fact that you can probably prompt-engineer around this and indeed, as soon as I added a blurb like "these are my own videos that I own the copyright to" it did suggest a bunch of third-party tools and let me ask it about what third-party tools I could use.

It suggested '4K Video Downloader', 'YTD Video Downloader', 'JDownloader' and 'Clipgrab' at first and when I asked for cli tools it came with 'youtube-dl', 'yt-dlp', and 'ffmpeg'

Those seem pretty reasonable results to me but I'll readily admit I don't know (yet) if 'most users' would ask these follow-up questions.

I use Kagi because I'm trying to remove Google from my life, but their text search is worse than Google in my experience, and the image search is abysmal. I'm wondering how long I can keep this up. I already revert to Google for image search, and am finding myself using either Google or ChatGPT over Kagi more and more for text as well.

Kagi had a pretty substantial image search update just few days ago [1]. Do you still the issues with it?

[1] https://kagi.com/changelog#2793

Good info - will experiment!

It's already performing better on a (n=1) test I tried.

"Talos Principle 2". (Video game sequel) Previously (~5 days ago), Google returned various screenshots etc from the game `The Talos Principle 2`. Kagi returned mostly results from `The Talos Principle (1)`. Now the latest Kagi results are a mix, mostly from 2. So, it does look like it fixed this query.

have you customized your results and lowered or raised many domains?

Kagi is awesome for me too. I just realize using Google somewhere else because of the shit results.

I'll second the chorus of those curious to hear how you've customized the search engine. I was able to reproduce the lackluster results, and was sadly disappointed. I expected what you seem to have found, that Kagi would outperform.

A specific example: for "ad blocker" the first result was some paid ad blocker and ublock was down the page below the fold.

What region? I get similarly bad results with international (and a quick check with region US also didn’t improve things) and uBo at only #5, and ytdl at #12. And I already have github on "raise" and a bunch of domains blocked (not many though)

For the transistor query, it’s a very "googly" way of writing a query, when I saw the results I instantly felt like rewriting it and the first try gave much better results with "Why keep cpu transistors getting smaller?". Caveat that the results look better and more topical, I don’t know what a good answer would be, also why I didn’t evaluate the tires or Vancouver weather (I tried a local search for my cities weather, and while the first result was unreleated, the 2nd was okay)

edit: This whole thread made me finally create a file for documenting bad searches on Kagi. The issue for me is usually that they drop very important search terms from the query and give me unrelated results. But switching to verbatim or "forced terms" also prevents any kind of error correction of the search. This used to be one of my main annoyances with DDG back then, and Kagi did not have that issue during the early days.

I have a new Kagi account with no custom rankings and I see the same terrible results. Basically the same as what he describes. yt-dlp is not found at all, the 2010 link to youtube-dl, and a bunch of spam sites.

Search was the biggest feature of the web in the early '00s. Now it's such a mess. I can't imagine Search will ever be amazing again, given all the complexity of providing quality while still avoiding all the crap.

Is it actually more complex to provide good results, or is it just more profitable not to?

I have a hard time believing an organization like Google doesn't have the resources to provide a search engine that's just as usable as what they had 6 years ago (around the time I feel like the decay really set in). Seems a lot more likely that it's just more profitable to serve up garbage sponsored content.

Definitely more profitable not to. Especially as Google is an ad company, not a search company.

I’d rather see a world with numerous paid/subscription search engines, that are motivated to do nothing but return search results well. I expect you would see some of the SEO crap getting solved.

i cant remember where i read this, but something about how google ranks site that have google ads higher than sites that dont. makese sense, its evil, but makes sense, thats why we get all this scrapped spam. is there any more info on this?

Intentionally ranking sites with Google ads higher would be a huge antitrust liability, so no way they're doing that.

On the other hand, they can achieve virtually the same outcome while keeping plausible deniability by just not doing anything that would downrank sites with ads (of which a significant chunk is likely to be Google's).

Spam sites often include ads.

i dont think they public disclose that fact

It doesn't need to be public to become an antitrust liability. Internal written material can still come up during discovery, potentially even in unrelated cases.

Therefore the safest option is to never openly discuss it or intentionally do it and instead use other means to achieve the objective (don't intentionally rank spam higher, just defund/cancel any projects that would make it rank lower).

this is like focusing on one single problem as being the cause of the decline of the United States. It's actually a lot of things combining and there's not going to be one fix

wtf decline of the United States are you talking aboout

Google, or Alphabet, is not a search emgine company. It's an ads company and that's what they are optimizing for.

Search probably hasn't changed much, but the internet is very different.

Yeah, the problem is that there is so much low quality content, that search doesn’t (or can’t) do a good job of surfacing it above the noise. There is still some signal left, but it’s such a small fraction that it’s much more difficult to filter it out.

Having said that, I’m usually still able to find what I’m looking for, if I know that it likely exists, and know the keywords to use to find it. But it’s much harder nowadays for sure.

i have a radio that can "hear" down to -130dBm, i've proven this empirically. Cellular signals work at -12dB or more below the noise floor, wspr works even lower than that. Lightning is broadband noise, and yet i can still use digital stuff when there's lightning storms.

I don't buy the signal to noise argument. For example, whenever i get on youtube and get fed some content, i can immediately tell if it's had AI involved anywhere, and thumbs it down. I won't recommend it, i've called people out for linking such tripe to me (or others).

Hear me out - google got bad about 11 years ago when the dorking stopped being effective, right around the time of the spotlight search results and the sponsored junk taking the top results. Around this time, various agencies (news, etc) started gaming the SEO to respond to any remotely related search with whatever the news was currently. Google chose not to "fix" this, because we're not the customer. DDG was better for a few years for real results, too, but that has gone downhill as well.

The current zeitgeist uses stuff like tiktok and facebook for "web searches" - "food trucks near Austin, TX" or so. No one really uses web search like people on this site do, and google couldn't care less if we don't like the search results.

I wonder how much influence google had in lowering the content quality over the years? After all, most SEO spam was a direct response to all the ludicrous requirements they've forced the whole web into, which eventually only SEO spam were willing to commit to.

I also wonder if google just stopped existing, would the web heal over time?

To me it is only due to the ads, google and bing return nothing but ads on the first page. Plus for me to have the joy of seeing these ads, I need to got through a CAPTHA that I need to try multiple times.

But all in all, a very good article

The problem is that even if providers of the service are 100% trying to provide a great service, everyone on the web will always be min-maxing to appear on top.

So it's inevitably going to become crap.

The golden era of search results is very much over. Welcome to the pot-metal era.

Try uBlacklist, it's like uBlock, but for search results.

https://addons.mozilla.org/en-US/firefox/addon/ublacklist/

https://chromewebstore.google.com/detail/ublacklist/pncfbmia...

You can sync the settings and your personal blocklist to either Dropbox or Google Drive. It also has the ability to subscribe to blocklists. Mind, you need to manually turn on search engines and subscribe to lists. The uBlacklist subscriptions setting doesn't have any built-in feeds yet though. :(

edit: THere are some feeds on the uBlacklist site though. https://iorate.github.io/ublacklist/subscriptions

edit edit: Found an even better list of feeds. https://github.com/quenhus/uBlock-Origin-dev-filter#other-fi...

This is a feature of Kagi already. You can promote or blacklist domains in your search results.

Kagi is just the best, it feels like Google did before a decade+ of enshittification and ad tech.

Did anyone notice that Kagi showed as barely better than Google in the article?

Yeah, for the the results of kagi are so much better than anything else, that it makes me wonder how objective can one be measuring search results.

I use google in a clients computer and it’s just horrible.

But it could also be a factor of the customizations I’ve made for my kagi. Ban quick a few paywalls sites, always put Wikipedia articles on top, prefer blogs than stackoverflow stuff…

But I can't do regexes, wildcards or anything like that as far as I can see, like I can in uBlacklist

And it seems like they also have a 1000 domain limit?

uBlacklist is absolutely excellent: I've been using it for a few years now, with absolutely no problems.

Quick tip: turn on the 'Skip the "Block this site" dialog', and disable 'Hide the "Block this site" links' settings -- they make it much quicker to block spam websites (of which there are many on regular search engines).

Just today I was looking for an extension just to block Quora from search results. (Talk about a useless site that seems to uselessly outrank Wikipedia on google lately — what on earth is Google up to?) I’m thankful I saw your and your parent’s post.

When Quora was new I followed some topics, got to read interesting answers to interesting questions, but then some kind of enshittification happened. I've blocked it in Kagi now.

Does this exist for DDG?

Yes, it works for most search engines.

This is amazing, I was maintaining my own custom solution that did this.

Appreciate you sharing this; I've been searching for something similar for quite some time.

I use uBlacklist with my own blacklists and Google has been pretty usable, it's great.

I really don't understand why anyone writing articles about ChatGPT uses 3.5. It's pretty misleading as to the results you can get out of (the best available version of) ChatGPT.

For comparison, here are all the author's questions posed against GPT4:

https://chat.openai.com/share/ed8695cf-132e-45f3-ad27-600da7...

It's a bit hard to use for most, either $20/month fixed for a limited # of messages, or you need to be able to reason through how to get an API key, or get another 3rd party service with similar cost & limits.

You can use GPT-4 for free via Bing - though I find it a little hard to explain to people how they can do that because I'm never sure what the rules are with regards to creating Microsoft accounts, whether you can use any browser or have to use Edge, what countries it's available in etc.

Actually maybe the recommendation should be to use GPT-4 for free via https://copilot.microsoft.com/ instead now.

(Except I can't tell which version of GPT that's using yet - there was a story on 5th December that said GPT-4 Turbo was "coming soon", not sure when "soon" is though: https://blogs.microsoft.com/blog/2023/12/05/celebrating-the-... )

FYI: Balanced doesn't run pure GPT4. Balanced uses a combination of multiple models. Precise and Creative is pure GPT4.

About GPT4 Turbo, to check if you are on Turbo, ctrl+U > ctrl+f > check if "dlgpt4t" exists. If it exists, you are running turbo.

You can also double-check by, well, asking stuff after 2021 knowledge cut-off as well ("What are the oscar winners?") with search disabled.

But you'll notice because turbo is much faster on bing (and better too).

But that gpt-4 says it can't code

IMHO TBF the "limited # of messages" is continously increasing, to the point I hardly remember it exists these days

Why does OpenAI continue to offer chatgpt 3.5 if it's so bad?

GPT 4 is THIRTY (30) times more expensive.

In the llm-assisted search spaces I'm involved in, a lot of folks are trying to build solutions based on fine tuning and support software surrounding 3.5, which is economical for a massive userbase, using 4 only as a testing judge for quality control.

Cheaper and faster.

Chatgpt3.5 is good enought if can give context in query.

I really don't understand why anyone writing articles about ChatGPT uses 3.5.

Because that’s what most people have access to. It’s absolutely worthless to most readers to talk about something they’ll never pay for and it’s not the job of random third-parties to incentivise others to send money to OpenAI.

What I really don’t understand is why anyone gets so hung up about it and blames the writer. If you’re bothered by people using 3.5 you should complain to OpenAI, not the people using the service they make freely available.

Anecdotally, I find this excessive fawning about 4 VS 3.5 to be unwarranted.

https://news.ycombinator.com/item?id=38304184

Because that’s what most people have access to.

I’d agree with this rationale if the author clearly communicated their choice of model and the consequences of that choice upfront.

In this post the table of results and the text of the post itself simply reads “ChatGPT” with no mention of 3.5 until the middle of a paragraph of text in the appendix.

It’s absolutely worthless to most readers to talk about something they’ll never pay for and it’s not the job of random third-parties to incentivise others to send money to OpenAI.

The “worth” is in communicating an accurate representation of the capabilities of the technology being evaluated. If you’re using the less capable free version, then make that clear upfront, and there’s no problem.

If you were to write an article reviewing any other piece of software that has a much less capable free version available in addition to a paid version, then you would be expected to be clear upfront (not in a single sentence all the way down in the appendix) about which version you’re using, and if you’re using the free version what its limitations may be. To do otherwise would be misleading.

If you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the best possible version of “ChatGPT”, not the worst.

Accurate communication is literally the job of the author if they’re making money off the article (this one has a Patreon solicitation at the top of the page).

Whether or not "most readers" are ever going to pay for the software is totally orthogonal.

If using GPT4 vs 3.5 would create results so distinct from one another that it would serve to incentivize people to give money to OpenAI, well then that precisely supports the argument that the author’s approach is misleading when presenting their results as representative of the capabilities of “ChatGPT”.

What I really don’t understand is why anyone gets so hung up about it and blames the writer.

Again, if they’re making money off their readers it’s their job to provide them with an accurate representation of the tech.

Anecdotally, I find this excessive fawning about 4 VS 3.5 to be unwarranted. https://news.ycombinator.com/item?id=38304184

Did some part of my comment came across as “excessive fawning”? Regardless, if this “excessive fawning” is truly unwarranted, this would again undermine your statement that using GPT4 would “incentivize others to send money to OpenAI”.

In regards to your link, I’ll highlight what another commenter replied to you. What should ChatGPT say when prompted about various religious beliefs? Should it confidently tell the user that these beliefs are rooted in fantastical nonsense?

It seems in this case you’re holding ChatGPT to an arbitrary standard, not to mention one that the majority of humanity, including many of its brightest members, would fail to meet.

I’d agree with this rationale if the author clearly communicated their choice of model and the consequences of that choice upfront. (…) with no mention of 3.5 until the middle of a paragraph of text in the appendix.

You’re moving the goalposts. You went from criticising anyone using 3.5 and writing about it to saying it would’ve been OK if they had mentioned it where you think it’s acceptable. It’s debatable if the information needed to be more prominent; it is not debatable it is present.

If you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the best possible version of “ChatGPT”, not the worst.

Alternatively, it you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the version most people have access to and can “play along” with the author.

If using GPT4 vs 3.5 would create results so distinct from one another that it would serve to incentivize people to give money to OpenAI

Those are your words, not mine. I argued for the exact opposite.

Again, if they’re making money off their readers it’s their job to provide them with an accurate representation of the tech.

I agree they should strive to provide accurate information. But I disagree that being paid has anything to do with it, and that their representation of the tech was inaccurate. Incomplete, maybe.

Regardless, if this “excessive fawning” is truly unwarranted, this would again undermine your statement that using GPT4 would “incentivize others to send money to OpenAI”.

Again, I did not argue that, I argued the opposite. What I meant is that even if you believe that to be true, that still doesn’t mean random third-parties would have any obligation to do it.

I’ll highlight what another commenter replied to you.

That comment has a reply, by another person, to which I didn’t feel the need to add.

It seems in this case you’re holding ChatGPT to an arbitrary standard, not to mention one that the majority of humanity, including many of its brightest members, would fail to meet.

Machines and humans are not the same, not judged the same, don’t work the same, are not interpreted the same. Let’s please stop pretending there’s an equivalence.

Here’s a simple example: If someone tells you they can multiply any two numbers in their head and you give them 324543 and 976985, when they reply “317073642855” you’ll take out a calculator to confirm. If you had done the calculation first on a computer, you wouldn’t turn to the nearest human for them to confirm it in their head.

The problem with ChatGPT being wrong and misleading isn’t the information itself, but that people are taking it as correct because that’s what they’re used to and expect from machines. In addition, you don’t know when an answer is bullshit or not. With a human, not only can you catch clues regarding reliability of the information, you learn which human to trust with each information.

Everyone’s standard for ChatGPT, be it absolute omniscience, utter failure, or anything in between, is arbitrary. Comparing it to “the majority of humanity, including many of its brightest members” is certainly not an objective measurable standard.

The issue with traditional search engines is that keyword-first algorithms are extremely gameable.

Try https://search.metaphor.systems - it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

So in the mentioned example of searching for Youtube downloaders, with Metaphor you'll get only Youtube downloaders (https://search.metaphor.systems/search?q=This%20is%20the%20b...)

Full disclosure - I work there :p

How is that different from keywords? Embeddings aren't magic, they're just page content. Content is trivial to game since it's controlled by the website owner.

edit: The results are also from my quick QA not that great. Searching for "what is the best mouse to buy" leads to links to buy random mice versus review summaries or online discussions on mice. One of the recommended queries of "Here is a great fun concert in San Francisco" leads to some really bizarre results in non-English languages that have nothing to do with either SF or concerts.

edit2: Also, Google has been using LLMs part of their search since at least 2018 so definitely not just keyword matching there.

Yup, definitely still gameable but if the model learns what high quality content is like and what high quality webpages there are (which it does), then the only way to game would be to be great :)

For your search - I would recommend turning autoprompt off and searching something like "Here is a great summary of the best computer mice to use:".

Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.

That's true. Although should be much harder

How do you deal with dynamically/contextually generated content? And how about paywalls and login-required content?

Do our best at getting the right content.

For paywalls/login - we play pretty straight, always obey robots.txt, etc.

This is excellent!

Definitely excited to see how it holds up to daily use.

So far it gave me exactly what I wanted at the top for all of my test queries that were well formed.

As for asking “ignorant” questions both your service and the goog failed where phind gave me an actionable starting point (after a prodding follow up question: https://www.phind.com/search?cache=hmul4znpn7y4ei6qa64fosmc )

“max-height like css property for top and left”

Unsure if this sort of thing is even a goal of your project, but you won over a new user.

Wish you and your team all the best.

https://getthatvideo.com/ Is the first result for downloading YouTube videos. Seems super sus (especially since the site doesn’t load).

Auto-prompted to: "Here's a helpful website for downloading YouTube videos:"

Also, this result is horrible:

“What does it mean if someone is not covered in nfl football?”

Just wait until the content farms adapt

it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

What prevents websites from gaming their embedding? Switching to a similarity search doesn't prevent the results from being gamed.

The first result vtubego.com is a 144MB downloader app. The page contains "Pricing Plans Lorem ipsum dolor sit amet, placerat verterem luptatum phaedrum vis, impetus mandamus id vix fabulas vim." above its 3 paid plans (there is no free plan).

I haven't installed the downloader app, so I'm not sure if it lets me download youtube videos for free.

The second result "ytder.com" is a redirect to "https://poperblocker.com/edge/" which seems to be a browser extension for Microsoft Edge that protects the user from the Holy See. I'm not using Edge and I'm trying to download a Youtube video.

The third result download-video.net says that it can download videos from a list of sites. Youtube is not in the list, but let's try anyway. If you put "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and click "download" you get "500 SyntaxError: Unexpected token '<', ""

At this point I gave up, but please let me know if any of the results work.

So far so good. I'll try using this first from now on, and see how it does. Good luck!

While I've made huge improvements to the algo recently, I do think Marginalia Search got a bit lucky with the sample queries, as it is still IMO far more hit and miss than many alternatives, but that also speaks for how hard evaluating search quality is.

Its efficacy is also strongly dependent on understanding that it's a keyword search engine with no semantic understanding.

[...] but that also speaks for how hard evaluating search quality is.

Would you be able to share some of your personal highlights regarding this?

I've partially kept up-to-date with the DIY, non-corporate search space (YaCY and friends). I'd love to understand a bit more behind the engineering decisions made when creating a search engine; it seems like a very hard problem to solve.

P.S. Marginalia is a very impressive piece of work, overall -- I've heard nothing but positive remarks from users on here. I've been meaning to try it for a while, but time constraints have... well, constrained, thus far.

I just tested Mariginalia and it was completely unable to lead me to a Wikipedia or imdb page when searching for "driver ryan gosling" and variations. It just listed lots of random articles.

That.. is kind of the point of this particular search engine.

This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed.

Well that makes sense, but I wanted to push against the result that the OP seems to take away from their test, which was that Marginalia seems to work well for the common user.

There's also a known bug with Wikipedia in particular, I do index it but the results are never ranked particularly high. I haven't fixed it because I don't want Wikipedia to be the #1 result for every search. Feels like most people are aware of Wikipedia and don't need help finding it.

Thanks for your work!

I have a suggestion for the “About” section at the top of Marginalia’s landing page. I think it would read better like this:

This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of [instead] of the sort of sites you probably already knew existed.

Showing one thing “in favor of” another seems contradictory in this case.

I often do a Google search, and then go directly to the Wikipedia result. My reasoning is that during the initial search, I don't know if there's a Wikipedia page about that topic, and I might need a fallback option.

Honestly I understand it well enough that I see it is surprisingly hard, but not enough to have good solutions...

I notice you completely avoid the question on how a single developer can do so well ;)

I do think that search has gotten much worse but my ability to know the magic words like “ublock origin” instead of “Adblock” and “yt-dlp” instead of “download YouTube” and phrase my search has gotten better.

We’ve all been doing prompt engineering against the Internet-wide LLM that is the spam houses.

I notice you completely avoid the question on how a single developer can do so well ;)

As much as I enjoy the notion of somehow being a 10,000X developer, it's probably mostly that modern search is a filtering problem, and MS does filtering fairly well.

Just my feedback after trying to finally get to what it is exactly.

I tried to find marginalia on DDG, not on the first page. Google has it after some garbage. If I go to marginalia.nu I get a SSL error. search.marginalia.nu works

If i search on marginalia for duckduckgo there first link is somewhat relevant but is about the app, all the other links are related to DDG but of curious relevance.

If I search for ublacklist mentioned above, I do not see anything directly relevant.

I am not sure what the intention of this post is. In my handpicked results Kagi far outperforms Marginalia.

#1 "Gordon ramsey" (misspelled "Gordon Ramsay"). Marginalia shows "The Life I Imagine: are my cheeks red?". Kagi corrects to Gordon Ramsay and shows relevant results.

#2 "Ukraine war". Marginalia shows an article about the Russian Orthodox church and a Substack post about the war. Kagi shows Wikipedia, Al Jazeera, etc up-to-date summaries about the war.

#3 "Dildo". Top post on Marginalia is "Students for Concealed Carry Embraces UT Dildos | Students for Concealed Carry". Top posts on Kagi are Wikipedia (read) and Amazon (buy).

How is Marginalia, a search engine built by a single person, so good?

Because it's not good?

I don't disagree with your assessment in full, but I don't exactly consider wikipedia and Amazon good results. Like they are big enough that if that's the result I want I can go to them directly. So like they aren't bad or wrong, but I can see the case for excluding them. Should something like Webster's dictionary be a top result?

I think for single word queries like that Wikipedia covers more ground than a dictionary. Personal preference, perhaps. If I need a definition I search for "define dildo" (Kagi shows Merriam-Webster, Oxford, etc dictionary entries).

Marginalia supports the old Google syntax, e.g. "define:dildo"

Thanks! If you are that "single person" who built Marginalia... hope you are not taking my criticism personally. I am more annoyed by this blog post that uses a few handpicked queries to present generalized long winded conclusions that are completely disproven when using a different set of queries.

Yeah, its me, and to be fair I made a comment to a similar effect myself. Assessing search result quality is very hard, and this is definitely a pretty flattering selection of queries.

On the plus side - in addition to Marginalia's own success, you can take partial credit for how good Kagi search results are (IIRC Marginalia's index is one of the sources for Kagi search results). So... thank you for that!

Marginalia Search isn't trying to be a universal knowledge engine, it's just a website finder.

That's bad if you're looking for a simple answer or basic fact, and good if you're looking for a few hours of reading.

I had a similar experience when testing Kagi after reading this. The top result for the “wider car tires” query on Kagi was a link to Physics StackExchange with some marginally informative answers [0], which would be easy to expand on in future searches. The second result was Reddit. Then a couple of incorrect/irrelevant pages but they don’t look like scams

[0]: https://physics.stackexchange.com/questions/29903/why-do-peo...

Edit: I did just realize that I have StackExchange customized to be up-ranked. So that probably helps. But yeah, I guess this is why I usually get good results, which is something that generally still fails with Google for me.

It seems to me that the name "marginalia" is not just a random set of syllables. It sounds like it's doing what it says on the tin, which is gooder than not doing what it says on the tin. (distinct from whether what it says on the tin is something you want)

I'm sorry but the very first request is completely wrong. When people search for a YouTube downloader, they want a website that allows to download a YouTube video, not a command line tool. And the first results given by Google do that. I'm one of the people that think Google search became bad but it's not because of the kind of search

That's the tricky mind-reading aspect about search intent.

Different people have varying expectations as to what they want to find with the same query. I'd definitely want yt-dlp in favor of some website.

it's easy: just append command line to the query like you would append android app if you wanted and android app

That is a user POV solution, speaking from the search engine POV.

Based on your handle, I suspect you have much better insight into this than the rest of us!

But can the search engine mind-read by assuming Windows users don’t want to use a command-line utility?

They can based on user tracking and profiling, but that's murky waters I personally don't want to dip into.

I assume you meant to say you don't want to! :)

Yeah I accidentally a word.

They do not do that, have you tried using them?

I noticed that the author uses ChatGPT3.5 rather than 4, which is a rather large difference. I don't have the knowledge to rerank all questions the author asked, but I will say that a test of ChatGPT 4 leads me directly to youtube-dl, which is better than every other search engine listed.

That was the first thing I checked reading the article. Although the argument would be 3.5 is free - any comparison of systems against ChatGPT that isn't using ChatGPT 4 can be dismissed almost out of hand; there is not much point talking about ChatGPT if it's not using ChatGPT 4 and making proper use of its capabilities.

That is not to say that there aren't valid criticisms of and shortcomings in ChatGPT 4 - just that it's not useful to say ChatGPT when it's referring to 3.5

This is silly, most people aren't going to pay for ChatGPT, just like they won't pay for Google or DDG. So using 3.5 in this case is perfectly acceptable when we're talking about free software.

He gives the full queries - do you have chat 4.0 that you ran run it against?

any comparison of systems against ChatGPT that isn't using ChatGPT 4 can be dismissed almost out of hand

Does everyone or even most use ChatGPT 4? The most used version is -of course- by far the most relevant.

I’ve come to recognize that any article that uses 3.5 has an agenda.

The agenda of not wanting to pay for something just to test it out when there is a free version?

I also suspect as much, but obviously can't know for sure. IMHO it's intellectually lazy if not dishonest to benchmark against 3.5 and not make that fact clearly known upfront

A better benchmark would have had two entries for ChatGPT, showing both 3.5 and 4 results

I will say that a test of ChatGPT 4 leads me directly to youtube-dl

And yet to other people it starts rambling about how that’s wrong and you shouldn’t do it and doesn’t give a usable answer.

https://news.ycombinator.com/item?id=38822040

I boggles the mind the extent to which people salivate over a system that cannot decide between a correct straight answer, something wrong but plausible, something wrong and impossible, or outright refusing to answer.

I feel like you could reboot yahoo directory and have more utility that most searches.

The return of something like Yahoo Directory would be most welcome. There is great utility in having more than one approach into a data space. That we have been stuck with essentially one way in for over a decade means that there is a great deal out there which would be great to access but which has been rendered invisible.

https://ooh.directory/

Nice. Thanks!

this is awesome! thanks

The !bang directory for Kagi is honestly pretty good, found some cool sites there: https://duckduckgo.com/bangs

Did you mean to say Kagi or Bing?

Anyway, here’s Kagi’s bangs:

https://help.kagi.com/kagi/features/bangs.html

Note that Kagi supports all DuckDuckGo-style bangs.

You can also make your own bangs.

That said, my point was that the bang directory has a bunch of the most useful sites in each category.

Do search engines censor political topics these days? If you search "truthsocial" on ddg, the truthsocial.com website is the first hit. But if you search "trump truthsocial", it doesn't give you trump's truthsocial page, and doesn't even give you truthsocial.com within the first few pages of search results.

Since ddg uses bing, does anyone know what is happening here at bing? It looks like google results are similar.

I have concluded that Google definitely censors search results relating to the Ukraine war, after vainly searching for articles about documented Ukrainian war crimes (reported in mainstream Western media like NYT/WaPo).

I'm not seeing this. I Googled "war crimes by ukrainian soldiers" and the top link was an Amnesty International Article, "Ukraine: Ukrainian fighting tactics endanger civilians".

https://www.amnesty.org/en/latest/news/2022/08/ukraine-ukrai...

I use Google as little as possible because I don't like surveillance advertising but fair is fair.

I doubt you're seeing censorship. If you search for "truthsocial trump" on ddg, you'll see his profile, for better or worse.

Oh, interesting. So it depends on the order of the terms:

- "truthsocial trump" works

- "trump truthsocial" doesn't work

DuckDuckGo (and by extension perhaps Bing, assuming identical upstream results) has some terrible results when trying to filter by all kinds of domains.

There's a power tools review/news site that returns zero hits for the actual domain when searching its name (which is the same as its .com address). While for some domains even searching using the `site:` parameter will give far fewer results when paired with a query than just searching the domain name + query sans the TLD (the router firmware site openwrt.org is among such).

It's a mess and reporting it hasn't any difference ime in the past 3 years. So I'd be reluctant to say irrelevant results are due to censorship unless there was more evidence.

I don't understand the praise of Marginalia.

When I search for "Steve Jobs" on Marginalia, I got blogs about his speech in 2011 and some mailing list from 2007.

When I search for my own name I get nothing. In Google it's just me.

It's cool that one person built all this of course but... that's not a good search result compared to Google?

Maybe I miss something, maybe I use it wrong

What do you expect when you search for Steve Jobs? Also, which filter did you use?

I don't know I used any filters? I don't know what are filters sorry

I expect wikipedia article on Jobs as a baseline.

Ah, I downrank Wikipedia pretty hard :P

By the way please don't take it as if I am taking you down or something

It's amazing what you did, it's just not a Google killer? or at least I don't see it

It's really not supposed to be either. Like it's designed to be the search engine you use when you can't find something elsewhere, so it's largely designed to show you different results than the ones you get on Google and Bing.

In general a lot of the complaints seem to be "I'm not getting what I expect from Google". Well... yeah. That's the point. If someone wants the same results as Google, they should arguably use Google.

Blah blah blah. Could you lay this article out any worse? What are the queries you used to test? I want to try them too. Buried in here somewhere.

Using an adblocker is not expert anything.

That you've defined your own opinion for what some of the results should be blows the thing up.

Searching youtube downloader, many people would be fine with some of the ad covered but totally functional sites that pop up on Google. I use some of them every day for quick conversion tasks. I don't want any youtube-dl result. The average users don't either.

Download firefox? What's that? All the top links are fine? No one's looking at the 7th listing for a simple query to download a program.

Why do wider tires have better grip? .. what, sites like roadandtrack, prioritytire, reddit, some physics and stackexchange sites aren't good enough? they are.

The Vancouver snow report one also. Lots of major news sites. Some weathernetwork and almanacs. All totally acceptable results for a sort of variable question.

blah blah this is just a hate on for Google and a HN/nerd view of the world that the average user is nowhere near living in.

Download firefox? What's that? All the top links are fine? No one's looking at the 7th listing for a simple query to download a program.

They are if the first six results are SEO bullshit. Which is the de-facto state of affairs for Google today: advertising traipsing around as search.

heh, they're not. They're all variations of mozilla download pages and site posts.

For whatever it's worth, I think your comment would be a whole lot more convincing without its first and last lines, which had the effect of making you sound (at least to me) like you're shallowly dismissing the article.

Completely agree. I personally thought searching "Vancouver snow report" to be extremely strange. Just search zip code or city name and weather. Two words. That's all you need to get results. What the hell is snow report? Do you even think you can trust weather reports 10+ days out?

Whole article is rambling and silly and assuming.

Which web site did you use to successfully download a youtube video, and which youtube video did you download?

Re: Kagi, I heard about it on HN, tried it for 100 searches, then subscribed. When I search for random JS and CSS things, MDN is the first result, and if it isn't, I can downrank whatever spammy site(s) are on top.

---

I wish I had a local LLM trained to detect clickbait and or low-effort content. I imagine searching YouTube and having all the clickbait collapsed together (just like Kagi condenses listicles), with the remainder being potentially high-quality content. Don't know how feasible this is right now.

I became a huge fan of Kagi after seeing it on hacker news too. It's amazing how good a search engine can be when it's not full of ads.

Yeah. At first I primarily used Kagi to move away from Google as a company, hoping for results that were equally good. But Google search actually feels crappy now in comparison.

Just use the Kagi Summarizer on YouTube videos and you don’t have to waste time watching trash. It’s a great life hack.

How does that work? Does it scrape the auto-generated captions?

Been paying for Kagi for 6+ months and very happy with it. I’m pretty anti subscriptions so that’s saying a lot for a service that is otherwise free.

I do have to dump into google for local searches every once in awhile, but otherwise happy with it.

For me the problem is not just that searching on Google is bad, but that sometimes it COMPLETELY hides exactly what I'm looking, for no good reason.

For instance, I wrote an R ggplot2 package called "fedplot" (following the convention of calling the package for the figure style it replicates, as in "bbplot" for BBC-style charts).

Try searching for it on Google: "github" "fedplot" doesn't get you anywhere. Meanwhile, every other search engine gives you exactly what you want if you just type "fedplot". I even tried to add the relevant websites through google's suggested tools, and nothing happened :|

Searching for "fedplot" looking for https://github.com/sergiocorreia in the results:

Qwant: Result 1

Bing: Result 1

Google: Result 2

Marginalia: Zero results

ChatGPT 3.5: Some Federal Reserve dot plot nonsense and no useful results.

You're never going to find github results on Marginalia as long as they block 3rd party crawlers :-/

Well, zero results are better than spam ;-)

Their black box semantic guesser has been told not to feed the radicalizing conspiracy theorist fires about federal plots.

Who needs to know anything about government owned land anyway?

I think the result grading is too opinionated here.

For example, the first query is "download YouTube videos", for which Google is ranked "terrible" for not showing you a command line open source program. But the literal first result is an ad supported site where I can paste in a YouTube link and download it right from the browser. That seems like exactly what most people would want or to the CLI tool the author is searching for. The author seemed to be looking for sites without ads as what they wanted to see in search results more than search relevance.

Search is a very gamed system with a lot of SEO spam type results, but I think a much better analysis could be done for more meaningful results. Also I recreated some of the searches and got very different results (including ublock origin in the top three responses). Again, a more scientific ranking system could help uncover better data on searches.

The author describes that site as such, which seems fair to rate as "terrible":

Some youtube downloader site. Has lots of assurances that the website and the tool are safe because they've been checked by "Norton SafeWeb". Interacting with the site at all prompts you to install a browser extension and enable notifications. Trying to download any video gives you a full page pop-over for extension installation for something called CyberShield. There appears to be no way to dismiss the popover without clicking on something to try to install it. After going through the links but then choosing not to install CyberShield, no video downloads. Googling "cybershield chrome extension" returns a knowledge card with "Cyber Shield is a browser extension that claims to be a popup blocker but instead displays advertisements in the browser. When installed, this extension will open new tabs in the browser that display advertisements trying to sell software, push fake software updates, and tech support scams.", so CyberShield appears to be badware.

That's how he described it but I tried it myself and found it perfectly functional to download a video with different options for size / quality. It has ads but not nearly as bad as described.

It's a service that is quasi illegal and explicitly breaks the YouTube terms of service. I think the search engine did a good job surfacing what was searched for, there just aren't going to be any free online YouTube downloaders without advertising.

Which web site did you use to successfully download a youtube video? Which youtube video did you download?

Yeah, if one typed “YouTube downloaded cli” you the results the author was thinking.

It seems like the author wants to search to read their kind without specifying what kind of YouTube downloaded they want

This makes so much sense why people think search results are bad. Great results for "Download youtube videos" is "Ideally, the top hit would be yt-dlp or a thin, graphical, wrapper around yt-dlp"

Just give me a website where I can plug in the DL link and download it to my hard drive. I don't care what package they are using (I don't worry about malware like I did in the 90s). 99.999% of people are not programming tinkerers.

Just makes me realize how subjective search results are. All of their "Great" results are my "Terrible" results.

The first result on Kagi is exactly this, just tried it a moment ago. It processed and downloaded the video extremely fast. Why would any reasonable person prefer youtube-dl?

Totally, As the sibling said, it is the same using Google. I am not sure, why anyone would want a programming package to accomplish a task that could be done in < 10 seconds.

But again, I guess that's why search is so hard is because I have to parse that intent from 3 words.

It is the same using Google.

Malware or well, the actual viruses, in the '90s were a joke, especially because a computer was an isolated thing. Connected computers were the exception.

If you wanna know why Google (or any search engine) sucks, just look at how it measures its own search results. Most search companies do this “at scale” according to very specific guidelines, like what the author did here but on steroids. For example, take a look at Google’s 168-page instruction manual for search quality raters:

https://static.googleusercontent.com/media/guidelines.raterh...

It talks about figuring out a query’s meaning(s), judging the user’s intent (were they looking for some specific answer, etc.), evaluating the “quality” of a website, rating the site’s usefulness in relation to the query’s meaning/intent, etc.

All this is to say, it’s not that search companies don’t do exactly what the author did here, it’s just that they have different standards than the author. And I’d venture their standards match their users’ better than the author’s, but maybe not or not forever, anyway.

Why would an average user want blog spam search results?

My hope is as LLM’s improve, they can be more discriminating about the results returned.

Why would an average user want blog spam search results?

I didn’t say they would :)

In fact, I can’t figure out how your comment relates to mine. Are you claiming that Google doesn’t factor blog spamminess into its evaluation of search results? If so, that’s quickly put to bed by the document I linked, pretty much section 4.6. Excerpt:

Creating an abundance of content with little effort or originality with no editing or manual curation is often the defining attribute of spammy websites.

You could claim that they fail to capture some essential quality of “blog spamitude” or that they don’t weight it heavily enough in their eval but to say they just, like, don’t know about blogspam over there, is pretty far fetched IMO.

I really don't think that's true. For example, page 29 of your link describes "Lowest Quality Content." Most of the search results that the author rated as spammy or scammy clearly fit these guidelines, which means that either (1) the raters aren't knowledgeable enough about the subject matter to determine that the website they're rating is harmful or misleading; or (2) the raters are rating these sites correctly, but it still isn't having the desired effect.

If you wanna know why Google (or any search engine) sucks

While I obviously don't know it may be related to how Google believes a "normal" person search. I have come to view Google as a product search engine/price comparison site, that's what it's great at. Google can find you the most relevant products for any purchase you may consider, so maybe that's what Google has optimized for. The majority of my searches are related to IT, programming, software and computers in general, but what does "normal" people search for. They search for products, news, opening hours for a store, Google is pretty decent at that, but the money is in the "go buy something". The ads on a product search on Google is always way more accurate than the actual search result.

I think Google has optimized for selling products.

Current Kagi results for those without an account to compare:

youtube downloader

https://kagi.com/search?q=youtube+downloader&r=us&sh=_szITdy...

ad blocker

https://kagi.com/search?q=Ad+blocker&r=us&sh=-BHzV2ZoCDpmgOu...

download Firefox

https://kagi.com/search?q=Download+Firefox&r=us&sh=zkkmc_EQX...

why do wider tires have better grip?

https://kagi.com/search?q=Why+do+wider+tires+have+better+gri...

why do they keep making cpu transistors smaller?

https://kagi.com/search?q=Why+do+they+keep+making+cpu+transi...

vancouver snow forecast winter 2023

https://kagi.com/search?q=Vancouver+snow+forecast+winter+202...

I agree with the author that there is too much spam on the web. I think Kagi in general does a pretty good job at downranking it (number of ads/trackers is a negative ranking signal on Kagi) but we can always do better. Kagi has special search modes like "Small Web" which virtually eliminates spam.

I welcome such scrutiny from the community. Please continue to keep us honest.

Kagi gives me websites that require more clicking; Google just gives me reasonable answers and I don't see spam in your examples.

"why do wider tires have better grip?"

Wider tires provide more grip due to a larger contact patch with the road. While it's true that friction is not directly dependent on surface area, a larger contact patch allows for more even weight distribution and better traction, particularly during cornering. This can result in improved handling and stability.

"why do they keep making cpu transistors smaller?"

Smaller transistors can do more calculations without overheating, which makes them more power efficient. It also allows for smaller die sizes, which reduce costs and can increase density, allowing more cores per chip.

"vancouver snow forecast winter 2023"

The forecast for the 2023/2024 season suggests that we can expect another winter marked by ample snowfall and temperatures hovering both slightly above and below the freezing mark. Be prepared ahead of time.

My only gripe with these quick answers or excerpts is sometimes it doesn’t fully give the context. After it’s given me couple wrong results I stop trusting it and always read the source. If I were to read the source might as well click into the website and read the whole thing.

I often default to using Kagi’s FastGPT when I want direct information instead of a starting point for web browsing. It combines web searches with LLM summarization to produce an integrated, sourced answer. Kinda like “New Bing” with a better interface. You can trigger this mode by adding “!fast” to your query, going to https://kagi.com/fastgpt, or using their API. Here’s what it returned for me on those specific queries: ========== "why do wider tires have better grip?" Wider tires provide better grip due to having a larger contact patch with the road surface. A larger contact patch means the tire can disperse the weight of the vehicle over a wider area, reducing pressure. This allows for more friction between the tire and road, improving traction and cornering ability. However, wider tires do not necessarily provide better grip in wet conditions. [1][2][3][4][5] 1. wheels - Does a wider tire always mean more grip/traction - Motor ... 2. Do Wider Tires Give You More Grip? Not All the Time 3. Learn Me: Wider tires = more Grip? = Faster cornering?| Grassroots ... 4. Wide or narrow tires? Here's how your choice will affect driving ... 5. VIDEO: Do wider tires offer more grip? - BMW M3 and BMW M4 Forum ========== "why do they keep making cpu transistors smaller?" There are a few key reasons why CPU manufacturers keep making transistors smaller: 1. Smaller transistors allow more transistors to be placed on a single chip, enabling increased processing power. More transistors results in more capabilities and performance. [1] 2. Smaller transistors use less power and produce less heat. This allows CPUs to operate faster and with more cores/threads within the same power and thermal envelope. [2] 3. Moore's Law, an observation made by Intel co-founder Gordon Moore in 1965, predicted that the number of transistors on integrated circuits doubles approximately every two years. Following this trend has driven the semiconductor industry to continue shrinking transistors. [3] 4. Smaller transistors can switch on and off faster, enabling increased clock speeds. This directly leads to increased performance. [1] However, we are rapidly approaching the fundamental physical limits of transistor shrinkage. Transistors cannot be smaller than an atom, which is around 0.2 nanometers for silicon. [4] Alternative technologies like carbon nanotube transistors may be needed to continue performance growth beyond this point. [5] 1. Why does computer chip process size have to keep getting smaller ... 2. Why does smaller = less heat? - CPUs, Motherboards, and Memory ... 3. What Is Moore's Law and Is It Still True? 4. Transistors are an essential building block used in almost every ... 5. These Transistor Gates Are Just One Carbon Atom Thick - IEEE ... ========== "vancouver snow forecast winter 2023" Based on the context provided: The winter of 2023-2024 in Vancouver is expected to see less snow and milder temperatures than typical winters. Forecasts predict a below average snowfall season with warmer weather [1][2][3][4][5][6]. Long range forecasts through December 2023 for southern British Columbia call for showers along the coast and less precipitation inland [7]. Metro Vancouver's fall and winter is also predicted to be a potentially 'down' snow year [4]. Confidence is high for a mild winter in western Canada with reduced snowfall amounts [5]. However, specific snowfall totals are uncertain given the long lead time [6]. 1. Vancouver winter weather: Less snow, milder temperatures ... 2. 2023-2024 British Columbia Winter Forecast Preview | OpenSnow 3. Snow Prediction Vancouver Winter 2023/24 — Alblaster Snow ... 4. Metro Vancouver's fall, winter forecast | CityNews Vancouver 5. What will this winter be like? Grab the hot cocoa — here's your 2023 ... 6. Canada's Winter Forecast: El Niño a critical factor for the season ... 7. 60-Day Extended Weather Forecast for Vancouver, BC | Almanac.com

99% of the time, shill, those quick answers are from content farms written by ai or low paid freelancers in developing countries and are completely garbage when not accidentally correct.

I would love to see Perplexity.ai in the benchmark. It has completely replaced Google/DDG for information questions for me. I still use DDG when I want to do a navigational query (e.g. find the URL for a blog i partially recall the name).

While kagi was the product that most brought me joy in 2022, perplexity.ai has been the one for 2023, even though i only recently started using it. It's just been a joy to be able to iteratively discuss most of my searches.

EDIT: here's a search for tire (I don't know anything about tire, so maybe there's much better links out there, but this is pretty much what I was expecting. Not an ad or SEO in sight.) https://www.perplexity.ai/search/tire-3iuI9T6BQUSvu2tAhgsRmA...

I am wondering if you can use AI chat exclusively for your search needs? If not, what does the perfect integration looks like?

Me too. I only heard about it this morning and it looks kinda perfect so far.

I've been really enjoying Perplexity as well. It's a much better Internet/search focused experience than ChatGPT, Bing, or Bard. For anyone interested, until the new year (~20 more hours?) there's a code for 2mo free Pro: https://twitter.com/perplexity_ai/status/1738255102191022359 (more file uploads, choose your model including GPT4)

Here's a fun experiment to try. Take an open source project such as yt-dlp and try to find it from a very generic term like "youtube downloader". You won't be able to find it because of all of the content farms that try to rank at the top for that term. Even though yt-dlp is probably actually what you want for a tool to download video from YouTube.

Is that true? Do most people want to install a command line tool to download youtube videos?

No. They want sites like savefrom.net - which is hit number one on Google.

Did you try using savefrom.net? You can type "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and hit "Download". Then you'll get a new tab that tries to get you to install malware. If you decline to install it, the new tab takes you to the malware's homepage. If you close the tab and go back to the original tab, savefrom.net presents you with an error message saying "The download link not found." and does not help you download the video.

I tried this. I went to savefrom.net. First thing it does is ask permission to send notifications.

After that there is a popup asking me if I want to continue in the browser or download their app. If I click download, it downloads a file called download_helper_2.3.27.apk.

Instead of downloading their app, if I paste a YouTube link, it tells me I can wait or download their APK to skip waiting. The download link downloads an older version called download_helper_2.3.19.apk.

When I do the process again, instead of the older APK link it gives me a Chrome extension link. But if you look at the instructions you see that it's not a Chrome extension, but a minified userscript. And it has `@include https://*` so it can basically run on any website regardless of clicking on an extension icon like regular browser extensions.

If I try to ignore all the distractions and wait for the download link, I can click it and it downloads the MP4 file. But it also opens a popunder with the domain https://refpamjeql.top/.

Not the best experience, and seems like a high risk of getting malware, but it does get an MP4 file at some point.

Interesting! I tried again and got completely different results this time. Now there's no malware tab, and instead it tries to get me to pay for a subscription to download high-quality videos or MP3s. If I click the barely-visible "Just let me download in my browser with low quality" below the paid subscription button, I get the same error as before.

Edit: the paid subscription payment flow says I'm actually buying "Televzr Premium Max Subscription for

1 Month_mp

Televzr helps get wireless access to the media library on the computer from the mobile phone"

So it purports to be something unrelated to downloading youtube videos. I didn't pay 1400 yen for it, so I won't get to find out if it helps me download youtube videos.

Have you tried perplexity.ai? It's like ChatGPT and Google had a baby. Looks very promising and I'm seeing a lot of tech leaders (example Toby of Shopify) moving to it.

Aren't Bing Chat and Kagi FastGPT the same in effect?

No, FastGPT is GPT-2 based. I actually prefer FastGPT because its fast (duh!), and as it gives very concise answers and all the generated response carries footnotes with the link to the source.

Just to correct, FastGPT uses claude-instant.

Pretty biased selection of queries. Article avoids the things that ChatGPT and the others without fresh data can't answer. Look at the trending searches on Google. They are all for fresh info that none of the others can answer. Sports scores. Google probably judges quality weighted by the questions their users actually ask, not this nerd bullshit.

How is a youtube downloader biased to fresh results? Seems to cover a pretty broad test.

It selects a "right answer" that suits a stale index, assuming that there can't have been a right-er answer discovered after ChatGPT's training horizon.

Isn't any selection of queries would be biased. Even what you are saying is biased, you try to say that Google would be better for cases that it optimize for which is even weirder. That is like saying you want to compare highly optimized code that is using some C libraries vs some native python code.

I have recently started using kagi after seeing a recommendation here.

From what I understand, it aggregates results from multiple sources rather than having their own indexer.

The results aren’t really any better, but the lack of ads and videos in the results makes for a cleaner experience.

I also haven’t yet taken advantage of the extra features to block certain websites from results.

Personally, I pay the $5 mostly in an attempt to support another competitor in the space.

Kagi is awesome, so much better experience than Google!

Start using bangs, lenses and customized results ASAP, that makes a big difference.

I actually find myself using bangs way more since I switched to Kagi from DDG. I think it's the AI bangs like !chat and !expert that got me in the habit of using bangs besides !g (which I never actually use anymore).

Pretty sure the reason Kagi is better isn't because they use multiple sources, it's just because they can use the presence of ads as a negative ranking signal, something that none of the major public search engines will ever do as it goes against their own business model.

However, there's a sizable group of vocal folks who claim that search results are still great.

I think that this very sentence shows the author's bias, because I feel that Google's search results are not just great, but better than what it was 10 years ago.

You must be kidding, Google is becoming worse every day. Still better than useless Bing though.

Consider yourself part of the sizeable group of vocal folk then.

I really don't agree with some of the expectations around results.

Download youtube videos

Ideally, the top hit would be yt-dlp or a thin, graphical, wrapper around yt-dlp. Links to youtube-dl or other less frequently updated projects would also be ok.

That's not what a random person expects. yt-dlp or youtube-dl have no meaning to a normie. The first result is an online downloader and that's what an average person is after. I checked the first result in Kagi and it's a valid youtube downloader.

If you're after a commandline tool, ask for it: "commandline tool download youtube videos" gives youtube-dl as the top result with valid options afterwards: https://kagi.com/search?q=commandline+tool+Download+youtube+...

"Ad blocker" seems to ignore other options exist. Yes, ublock would be preferable for most, but ABP is not "very bad". Kagi mentions ABP at position 1 and ublock at position 8: https://kagi.com/search?q=Ad+blocker&r=au&sh=4VHApDrTEfuxMOt... (But for a query like that, I'd be happy with a wikipedia article about adblockers, because why not?)

I'm not disagreeing that results have been getting worse for years, but... this is a really bad scoring system. It feels like that one very new person jumping on SO posting something like "syntax error: if 1 {" - what are you even asking for? (To be honest, the search engines could also give you the equivalent of "this is a very vague, would you like to specify what you're actually after? here are some suggestions: ...", but that's beyond the scope here.) The search returning not the exact thing you want to see for a super generic query, but returning a valid answer to a question is not "very bad".

My thoughts exactly.

If you try using it, the first result doesn't help you download a youtube video and does try to get you to install malware.

Speaking of bad software, anyone getting a huge amount of horizontal scroll on mobile on this blog post? What should I add to my bag of tricks to work around that

Reader mode might do the job.

I am not (Chrome on iOS).

Weird article. Basically, the author thinks that anything that is not yt-dlp is a bad search result, which is pretty insane.

Like, for me at least, I already know yt-dlp exists. When I search "youtube downloader", it's exactly because I want an online-website page to download youtube videos.

The author would probably accept any result that helps them download youtube videos. Did you find any and successfully use it to download a youtube video? Could you provide a link to the one you used?

The intro query "youtube downloader" already showed me relevant results (some website where you paste an URL and bam download). I think there's a big tech bias in the whole post (how relevant is a mastodon poll, for real).

Not saying the current landscape doesn't suck with ads everywhere and incentives to not give exactly relevant results at times, but I think google is pretty good still.

Which web site did you use to successfully download a youtube video, and which youtube video did you downloadl?

Kagi really shines on topics that are SEO-spammed on other search engines. I.e. when travelling to a touristic city, searching a recipe, or basically any product you want to buy. I actually got "search anxiety" searching these topics, as I know I will have to navigate a lot of SEO spam, content that is artificially blown up, and the core information purposefully hidden somewhere on the page - if any. Plus the multitude of cookie consent banners and newsletter subscription popups on each link...

I've been using Kagi's FastGPT [0] now for these searches, it basically removes all the bullshit and gives verifiable sources for any answers.

[0]: https://kagi.com/fastgpt

Yeah that’s my go-to as well. Interestingly, I often find that “Fast” mode results are as good or better than “Expert” mode for simpler tasks.

I have a small page that modifies my get requests to google by adding -site:… for a bunch of most annoying content farms for stuff I search often (docs)

Have you tried uBlacklist?

Meta: Since the text on the page is so dense, I tried reading it in Chrome's reading mode. Which was fine until the Appendix. All the results are missing, leading to confusion.

I also was overwhelmed by the amount of data. I came back here to find the cliff notes :)

On a side note: would it kill the author of the site to use a stylesheet?

it's the same as my choice to only use lowercase letters, it is designed to make you upset that i am not following conventions. that's as far as i have been able to figure for hwy i started doing this, and by extension, why tech bois love to drop some vital freature to communication to signal being an 'insider'

There’s something incredibly entertaining to me about even this well researched article struggling to find a reason for why wider tyres have more grip.

As I understand it, this is because tyres are still somewhat of a mystery, and anyone outside of a laboratory really doesn’t know shit. The best explanation I can think of is due to tyre load sensitivity. The friction coefficient of rubber decreases with normal force (E.g, a heavily loaded tyre has a lower friction coefficient), which is a pretty well accepted fact, this is one of the methods engineers will use to tune the handling of cars. This means a wider tyre has a lower force per unit area of the contact patch, which means it’ll have a higher friction coefficient.

Now that sounds plausible to me, but that’s just my best guess explanation.

https://www.bicyclerollingresistance.com/

gives good tyre advice (obviously not car tyres, but info is there)

Kagi is great, it's now my daily driver for search. This is after I got tired of DDG, moved to Google (through StartPage), but the spammy result, or just irrelevant... and the fact that sometime they aren't any results even... for the most trivial search. So I switch recently to Kagi, and so far it's been smooth sailing and a real time saver.

Look at the source for that page. Is it hand-coded? (I think it's great.)

I had to stop reading this because I found it too depressing and it triggered a lot of anger about how big tech combined with the incentives of capitalism is basically fucking up the world.

I wonder if this aggregate enshittification of computers (be it search, social media, video games) etc. is actually a good thing for humans in general.

I feel like today's digital spaces don't have as strong a grip on the minds of people - I think folks started rediscovering the value of genunine human interaction and hobbies that do not involve a computer screen.

For example, I haven't seen the equivalent of 2000s-2010s Facebook addicts or (WoW addicts in the gaming space) to such an extent, with parasocial media, such as TikTok or Youtube or Twitch, having replaced social media, and social video gaming such as MMOs having lost a lot of popularity.

Honestly, if you have to search something remotely technical, try HN's search function with comments enabled.

If the topic has ever come up the discussion and links are likely to be more relevant and better than your avg. wiki article

Search engines are not designed to give you the information you desire. They are designed to sell ads or metadata. "Result quality" is of no consequence.

If you actually wanted accurate results you wouldn't use a tool that is literally attempting to read your mind like a fortune teller. It is impossible to know what you want just by the word "snow". Jesus Christ engineers are so dumb.

Without labor to run their circus, 99% of business would disappear overnight.

Without business, spam would disappear.

So if you remove the labor you remove the spam.

So the best spam filter is UBI.

What's most shocking to me is how much malware there is in all of this. The fact that Google et al aren't constantly in trouble for directly forwarding unwitting users to malware distributors indicates to me just how far our standards have fallen for a "good" search engine. I feel like we'd be happier with search engines that adhered to "first, do no harm" principles.

I got different results for Google on "ad block".

And changing the query to "ad blocker" like Google suggested raised ublock origin way up in the results

When I tried running the query from the paper, "cellular phone" (no quotes) and, the top result was a Google Store link to buy Google's own Pixel 7, with the rest of the top results being various Android phones sold on Amazon.

Interestingly, if you add "before:2001-01-01" to the query, the paper that Brin and Page referenced shows up as the third result.

That this query now ranks phones you can buy higher than information about phones makes sense, since the web is much bigger these days and cell phones are much more widely accessible than they were back then.

Although Google doesn't publicly provide the ability to see what was historically returned for queries, many people remember when straightforward queries generally returned good results.

See above. Sort of.

---

I wish Dan spent more time talking about Kagi. I, too, have found it terrible for searching for things to buy and some images but excellent otherwise.

Honestly, this is depressing. Back in the day, AltaVista and AskJeeves existed but returned terrible results, and Google showed up to disrupt them all. It seems like we should be on the verge of repeating this cycle.

Maybe LLMs will help, but I can’t shake the nagging feeling that the situation will simply get worse with LLMs, not better, due to hallucinations and the apparent “gullibility” of LLMs: I would not be surprised if SEOing an LLM turns out to be easier than SEOing Google.

I would kinda have liked side by side screenshots so I could see for myself rather than a wall of text

For the ad blocker results, the author judges the search engines by how they rank the best result (uBlock Origin), but I think that search results that point to Adblock Plus or AdBlock are good enough. Sure, they do not block all ads, and take money from advertisers to allow through certain types of ads, but they still block ads in general, and 'acceptable ads' can be disabled in the settings. So I would consider these 'good results', rather than 'bad results'/'very bad results' as the author does.

Using phind most of the time. Would be interesting adding it.

The Github link is my top result on Google. Clearly a mix of uBlock and Privacy Badger are more powerful than most appreciate.

No mention of https://www.qwant.com

Was gpt4 used (with paying subscription)?

While I think the article is interesting, I disagree with its results regarding Kagi. I like Kagi and rarely use anything else. Kagi's results are decent and I can blacklist sites like Amazon.com so they never show up in my search results.

Is this from desktop? What region?

Ublock origin in the very top result for ios device is simply a bad search result page. Maybe fourth position is tolerable, after three different working ones. Maybe it should be lower, I doubt myself, if my point of view is too elitist.

Yt-dlp is subject to all sorts of takedown requests in different jurisdictions.

I will admit that I can't read between lines here and just go ahead an ask: What does "bluesky thought leader" suppose to mean? (1) Any guesses who this may be? Why is he not quoted directly? (btw, the term is used 3 times, presumably to refer to the same person).

1: my reading is that this is a sarcastic denomination for someone that is supposed to be an innovation thought leader but actually is just defending the broken search landscape status quo.

I have found appending site:edu remarkably improves google results.

For both the tire question and with respect to a youtube dowloader, the first results were on the nose with the addition of site:edu on Google.

Why this is needed and whether a noncommercial, information rich web portal should exist are questions for another thread.

Wide tires by Jason of Engineering Explained: https://www.youtube.com/watch?v=kNa2gZNqmT8

Better answer: learn the differential equations in this book:

https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/TERRAMECH...

"Going back to the debate between folks like Xe, who believe that straightforward search queries are inundated with crap, and our thought leader, who believes that "the rending of garments about how even google search is terrible now is pretty overblown", it appears that Xe is correct."

Also, the article tested Mwmbl as well, not mentioned in the title here.

mostly my search is now Wikipedia.

I'm probably in a very small group who have the entirety of English wikipedia (without images) on my Android (via Kiwix), and I just search that. 99% of the time that's all I need.

the only exceptions are super current things like weather (Windy), or travel (Navan work travel system gives me enough to just go direct to airlines, hotels, etc), and local (OSM via Organic Maps).

I've almost completely degoogled (not intentionally, but driven gradually by Google becoming crappy incrementally), but didn't really find a single generic replacement as much as I found far better single purpose tools.

I'm reminded of that Craigslist image showing how many startups were each competing against specific parts of Craigslist https://cbi-blog.s3.amazonaws.com/blog/wp-content/uploads/20... , and this is what it feels like is happening to Google.. they're being beaten in specific areas, but at the same time spam and crap is diluting their core product.

The appendix describing the individual search results is both entertaining and scary e.g.

"Two of the top three hits are how to install the extension and the rest of the top hits are how to remove this badware. Many of the removal links are themselves scams that install other badware."

I use serpapi for my hot RAG and the results are fine.

Brave search API is obscenely overpriced. I hope someone is working on Search because Google has become a singularly garbage company. Propping up DEI is sinful enough but just failing to compete is lame. /shrug

Continuing with the theme of running simple, naive, queries, we used the free version of ChatGPT for this post, which means the queries were run through ChatGPT 3.5.

why

I'm not sure youtube-dl is a good answer unless you're a nerd.

Which is a similar phenomenon to search. If you have sufficient tech skills there's a whole world of freely available software out there to complete your task.

If you're not then you are at the mercy of a range of commercial offerings (some built on the free software) that range from arguably scams to outright scams.

More incorrect usage of "hallucinated" for simply made-up or inaccurate results.

I’d love to see this a little extended.

Searx and Yandex.

Specifically… if I need something even slightly “gray”, Yandex is the only option anymore. Torrent search on google et al is just awful.

Okay, so all search engines suck. Yeah, that matches my experience

The thoughts about building a better search engine than Google are interesting.

Unlike the author, I think that building a better search engine than Google is possible. But it's going to be rather expensive. And the only proven way to monetize it is selling ads. Which will degrade the quality of the search results fast. For potential investors, there are probably many better ways to invest money then by building a search engine.

This lets us with only one viable alternative: build it in the open like Wikipedia and source donations from people and from Google competitors like Amazon or Apple.