I reckon these days search is pretty difficult and everyone knows how to game it. I recommend using a search engine that lets you effectively change which sites are shown. You can do this with Kagi, or with Google's Programmable Search Engines - I'm sure there are more too.
In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.
An ability to upvote results for a given query seems tantalizing but I bet it would be gamed too. The DIY approach seems to be the only tractable one.
In my case I only only results from domains I believe are correct. The whitelist approach does have downsides. Usually I'll vet new potential domains through social means like Reddit and this site, rather than identifying them through the search results. I believe there's an inherent tradeoff between discoverability and the gameability of the results.
Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.
I also personally disagree that yt-dlp is the "correct" result for the average user when they search Youtube Download. I highly doubt the average user would know or care to use the command line. A website front end would be more actionable for them.
Funnily enough, lately I've been prioritizing YT videos more when searching. So many sites now are just regurgitated SEO farms with minimal quality, and easy to see why: it's minimal effort to produce and cheap to host. But making a video takes time and effort, so has a much higher barrier to use as a click farm.
More than once when traditional search failed me, I went to YT and found some video from 2009 clearly and eloquently explaining what I'm looking for in detail, and without any distractions because the person authoring the video clearly didn't specialize in the media format or show interest in experimenting.
I've found it to also be a better source when looking a product to buy. Want to know which fan to get? Turns out there's a channel from a dedicated guy who keeps finding ways to test different fans and their utility and with multiple videos demonstrating his approach and findings. The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.
Would a browser feature that skipped to the relevant parts of the video based on closed captioning and understanding search intent be useful? It seems like this would be a good way for Google to fight to stay relevant in UX vs having the chat bots just quickly spitting out a readable answer. Hunting through ad laden webpages is annoying. Seeking to the relevant section of the video is a solvable problem, especially for videos above some viewership threshold.
I've definitely seen Google do this already: https://searchengineland.com/google-tests-suggested-clip-sea...
Google seems to be taking much more advantage of YouTube's transcription feature lately. The first addition was the (ok, gimmicky) animation on the Subscribe button when someone says the dreaded like. Hopefully a sign of things to come.
Overall AI summaries are very welcome for a certain subset of YouTube which is sadly dominated by sponsored, clickbait, and ad-driven content.
...and it has already been solved, though partially: SponsorBlock allows people to add a "Highlight" section to a video, which denotes the part of the video which the user most likely wanted to see (sans the "what's up guys", "like and subscribe", etc.)
Of course, it's not perfect: it relies upon humans doing the work, though some may see that as a positive over something more computerized.
Didn’t Google try this already? It seems useful to me, at least. IMO the next frontier of search is not better hypertext, it’s podcasts, audio, and video.
Do you have some tips for finding concise videos that answer the question you are asking? I am finding more and more obvious LLM bullshit in results, so I am willing to try some other tactics. But I am not ready to spend the minutes watching videos to see if it is actually relevant or a waste of time, always artificially long to increase ad revenue.
For me, it really depends on the type of video. For fixing cars, I'm usually looking for something specific enough that there isn't a lot of chaff. It was probably recorded and edited on a phone just to splice the clips together. Probably the default thumbnail that youtube extracted from the video.
For product videos, if Project Farm did it, look there first. Otherwise, I look for someone has a lot of videos for competing products with basically the same format, not over 10 minutes.
Tech videos are the hardest, I often still prefer text. Maybe look for links to the docs in the description? I still get duds though.
I don’t know much about fixing cars, but yeah, YouTube is a treasure trove for tacit knowledge.
Wish I did, but here you're at the algorithm's mercy, unfortunately. One possibility is subbing/accruing watch time on channels that you find provide you the right value, so that the algorithm might recommend similar channels on other subject matters.
This won't be the case for long. YT is already starting to be polluted with spam and AI generated content, which will get more and more common. The same thing that happened to the web in text form, will happen to videos.
I think the only solutions are using allowlists for specific domains, and ironically enough more AI to filter specific results. Or just straight up LLMs instead of web search, assuming they're not trained on spam data themselves.
One critical difference is the date attached to youtube videos. It's easy to verify that a video was made before this tech was available, but you can't do that with websites, or search engine result pages.
It does limit utility for more modern needs, unfortunately.
Note that the problem of filtering bad data out of learning material isn’t inherently easier than filtering same out of search results.
Yeah. I was recently looking for videos comparing two smartphones and among top ranked videos there were videos that just show the phones side by side and the video consists of showing specs side by side and videos that just have LLM-generated text, added to the video with TTS.
That's curious, I generally hate video due to inability to glance over content, and the few attempts I made to actually find useful information I searched for resulted in... spammy extra low effort video content that did not answer my questions.
Depends on what you’re looking for. A blog post about how to play Search and Destroy by The Stooges is not as useful as a video of James Williamson himself showing you the riffs!
I'm a big fan of the non commercial site search engines because of the gaming aspect. If you're not generating revenue from the clicks the game mostly goes away.
I'm not saying people aren't entitled to make some money, but it clearly incentivizes user hostile behavior.
Maybe make it an option because legitimate sites like journalism also use this model.
Subscription model like Kagi seems to work pretty well against gaming the results.
Their only remaining incentive is to be good enough that people keep paying for the service.
It works not because they're somehow smarter or have more resources than Google at detecting spam/SEO, it's because unlike Google (and other ad-supported search engines), they make money from result quality and have an interest in blocking spam.
Google on the other hand makes money off ads (whether on the search results page itself or on the spam sites), so spam sites are at best considered neutral and at worst considered beneficial (since they can embed Google ads/analytics, and make the ads on the search results page look relatively good compared to the spam).
Black-hat SEO has been around since the early days of search engines and they managed to keep it at bay just fine. What changed isn't that there was some sudden breakthrough in malicious SEO, it's that it was more profitable to keep the spammers around than to fight them, and with the entire tech industry settling on advertising/"engagement" as its business model, the risk of competition was nil because competitors with the same business model would end up making the same decision.
The same reason is behind the neutering of advanced search features. These have nothing to do with the supposed war on spam/SEO, so why were they removed? Oh yeah because you'd spend less time on the search results page and are less likely to click on an ad/sponsored result, so it's against Google's interests and was removed too.
Kagi works because there is no incentive for SEO manipulators to target it since their market share is so small.
Super tinfoil hat to believe Google wants to send users to blog spam websites (e.g. beneficial to Google).
Anytime there is money to be made, there is an effectively infinite amount of people trying to game the system.
Google is a complex system so “want” can just include we are making money from the blog spam and while we don’t like it other things take priority over fighting it as effectively as we could.
It's never tinfoil-hat to assume that a corporation is, at very least, making sure not to fight too hard against any activity that brings it more revenue.
But the author tried Kagi and the results don't appear to be noticeably different, filled with scammy adspam just like Google and Bing. Kagi's results seem to mostly aggregate existing search engines [1], so this isn't much of a surprise. Perhaps a subscription-based service that operates an index at Google's scale might help, but no such thing exists to my knowledge.
[1] https://help.kagi.com/kagi/search-details/search-sources.htm...
Right, but Kagi has built in tools to make it easy to fix that. Blocking those spammy sites from ever showing up again. Moving certain sites up the ranking, and so on. These features mean that over time my Kagi results have become nearly perfect for myself.
I have a hard time believing it's so difficult for a search engine to distinguish between a credible, respected website that has been around a while with some generated garbage that exists to be a search result. We humans can tell them apart, so in principle, computers can too.
Yes, this should be table stakes for a classifier - a company with the resource of Google can definitely solve that problem if they weren't themselves in the business of spam (advertising) and benefited from spam sites (as they often include Google ads/analytics).
Google is quite quick in plugging holes in AdSense but AdWords.
Always “table stakes”. Do you think in buzzwords also? I’ve always wondered this. Or do you think normal words and then translate it into this bandwagoning / membership proving garbage ?
I guess this brings up the question of how good are humans at doing this across a wide number of domains on average?
The other question I have is how long do these garbage results stay up for a particular query on average?
I can’t wait until video transcripts get fed into LLMs just to eliminate the whole “This video is sponsored by something-completely-unrelated, more about them later. What’s up Youtube, remember to like, share, subscribe… 5 entire minutes pass on similar drivel… the actual thing you want, but stretched out to an agonizing length”
You need SponsorBlock.
Usually people leave a "highlight" marker which tells you where you're supposed to jump to. Along with the regular "This video was brought to you by <insert>VPN".
That was a decade after Google was created and people certainly understood SEO and Google was constantly updating its algorithm to punish people who were trying to game the algorithm.
The wikipedia page on "link farming" for example references it happening as early as 1999 and targeting SEO on inktomi:
https://en.wikipedia.org/wiki/Link_farm
I remember some internal presentations at Amazon around ~2004 about how boosting Google SEO on Amazon web pages increased traffic and revenue (and Amazon was honestly a bit behind-the-curve due to a kind of NIH syndrome).
At the time it seemed like Google was winning, though. SEO seems to have gotten really good, or maybe Google just gave up.