Stract: Open-souce, non-profit search engine

Everyone here is complaining about the search results - but instead I think we should all take some time to appreciate that someone worked hard to create a search engine (including the scraper / crawler part) and making it open source (AGPL).

The results will be improved over time I guess, and for the few search queries I've done - I'm fairly happy with the results.

Kudos to the authors!

type google > get anything but google

It's an interesting case. The kneejerk reaction is "it should return google.com". But really... why? If I wanted google.com, I'd add the .com. If I wanted to search for something I wouldn't search for google first.

I guess my top candidates would be: wikipedia page about google, google stock chart, recent news about google. google.com would never be a result I want to click. The current results are not amazing, but also do we care what the results for that one are? It's like someone telling you "cow" and expecting you'll know the context of what they're thinking of at the moment. Maybe a heading like "I have no idea what you're on about, here are some clickable ideas: google news, google stock price, ..." would be the best solution?

I was amazed looking at people entering the first part of the domain into the search bar, then searching and clicking the first result. That was before chrome had the integrated google search in the address bar.

I had multiple people (albeit small sample size) who went to google.com, entered the domain they wanted (e.g. news.ycomb) then clicked the first suggestion which ran the actual search and then clicked on the first hit.

While if you wanted to do so you would probably just add the .com and skip the middle man, I know of quite a few people who use search to get to sites even though they know their URLs.

That isn’t to say I hate what this site is doing, I think it is quite neat, but I do think we have to consider that there are more ways used to get to Google than just entering Google.com

It should absolutely return the actual thing you search for if it exists, and as the first result.

It would require an extraordinary circumstance to justify anything else, like if a thing exists, but overwhelmingly the entire net is full of some other reference that is probably what everyone is looking for.

The overwhelming majority of people typing "google" into a search are not looking for the Wikipedia page about google.

I'm presently on leave at a university in Croatia and my research group's name is Group for Applications and Services on Exascale Research Infrastructure, so the acronym is GASERI. [1]

The first result for gaseri on Stract is our presentation of group's research work, and the second result is our landing page.

Can't complain.

Viva open source, viva AGPL search engine!

[1] https://group.miletic.net/

I've tried with an exotic keyword of mine that I use as testing for search engines.

Worked well.

Everyone here is complaining about the search results

It worked to find my obscure website for my company that has only been around for 3 years. I'd say that's pretty damn good.

I've been doing some extensive searching for a particular topic for months using primarily Google, and I just found a bunch of sites that I had not previously found just running one query on this.

I think that as with ChatGPT vs Bard, the result space is so huge, there are going to be many strength/weakness tradeoffs for any given query.

I searched for a few things that were of the class "you should have this and match DDG/Kagi/G/Y/B" and the top 2/3 were matching.

That's pretty good for a new-ish player.

I think we should all take some time to appreciate...

There are lots of things in the settings that I really like too: The Manage Optics (Copycats removal, Hacker News, Fediverse, more...), Site Rankings and Explore (similar sites).

I like the overall thinking behind this search engine. It feels like they are creating the kind of search engine I would spec out myself. It starts off on a strong foundation.

I do hope they crawl many of the older pages. Google, Bing (DDG), etal... seem to only index pages for the last 10-15 years or so.

Quibble: "Allow usage statistics" is on by default which leads to this:

"We primarily store the text you used for the search, and which results (if any) you clicked on."

If you regularly clear cookies or have an extension that does this: it's something to keep in mind.

While I intend to keep it _on_ to help the author(s) to help me find stuff - I hope it isn't used to once again make the popular sites - popular. In the process, bury the unpopular older sites (that have nearly completely vanished).

It is super hard to match the 'big' players. But straight programming a proper index listing is hard.

clearly labelled, contextual ads based on your current search query and a subscription option without ads

Perfect! This is the way the god of the internet intended search engines to work.

But DuckDuckGo does the same and currently provides superior results based on a very brief test.

So good luck with that.

DDG rubbed me the wrong way when they decided to "filter misinformation" which is an arbitrary and biased thing. I don't side with Russia or anything, but I've seen this sort of thing get out of hand.

One of my favorite topics is treated as misinformation and completely scrubbed from google and Bing. The idea seems to be the assumption that one must believe all that one reads. Imagine a discussion where everyone agrees?

Stract produced only good results. Unusually good.

I must know

The idea seems to be the assumption that one must believe all that one reads. Imagine a discussion where everyone agrees?

So I don't know what topic you're on about but the one you replied to I do. And in that context, these two statements unfortunately do not add up as the second one is a huge stretch from the first.

The misinformation in context (Russian propaganda, probably on rt.com or something) is designed to mislead. Not inform; mislead. This really is two steps further than a discussion between two equal discussion partners who disagree. A healthy discussion like that is based on arguments which are supported by premises. Russian propaganda is based on bullshit. It tries to spread as much bullshit as possible, then see what sticks.

Defaults should be reasonable for the general public, the average user. They should be harmless. The term used for that nowadays is SFW.

You don't want NSFW content by default; a search engine should by default remove that, but leave the option open to the user to do sift through it. For example, you don't want naked ladies on your screen at work or at home (if your wife or kids are parents are watching it might turn awkward).

DuckDuckGo also allows you to switch off ads, for free, without any fuss or adblocker needed. Just go to the settings page.

Although if you aren't going to support DDG with ad revenue, I'd suggest supporting with a donation if you can afford it and value their service.

I really don't mind helping DDG take advertisers for all they are worth as long as it doesn't cost me my privacy or waste too much of my time.

And if they take something away from Google in the process --- that's just an extra bonus.

Turnabout is fair play don't you think? Google has worked very hard to take privacy away from users.

DDG is amazing. To good to be viable I’m usually thinking.

DDG sucked hard enough I'm now paying for kagi.

What would happen to DDG if Microsoft stopped letting them use Bing? What does MS get out of this relationship?

What does MS get out of this relationship?

My guess --- a portion of the ad revenue.

The option to return from only sites popular on HN, blogroll, and the other "manage optics" settings are incredibly cool and useful, I could see myself using this just for that feature alone.

Exciting stuff.

I'm more pessimistic about how would that drive bad actors to HN polluting the site.

I take it you aren't showing dead threads? If you look at newer submissions you'll see people voting bad stuff to death. HN is insanely decent at community driven self moderation. Not to knock the mods who put a lot of work into the site of course, but I assume the community's own self-moderation helps some.

"Polluting HN" is more than just the vitriol thats flagged - there are plenty of self-promoting, downright wrong, or comments clearly unrelated to the posted article but comment author gets to segue to their hobby-horse (off-topic discussions are annoyingly frequent, IMO). HN is better than most, but its not immune to being gamed. Once there is a financial incentive for it, it will become more common - see how Twitter turned out after offering monetary incentives for engagements.

I mean, is your post off-topic? It is a tangent. Tangents are cool. You may or may not like one, that is OK.

I mean, is your post off-topic?

My comment directly answered parent comment on an issue pertaining to search engines. How is that a tangent or remotely off-topic? This meta-discussion, on the other hand...

I think HN could cope by weighting upvotes/downvotes/flags more heavily based on the age and reputation of the user taking that action. It was distribute moderation responsibility to users a bit more.

Search still needs some improvement, I typed "gundam watch order reddit" and was expecting some reddit links, but none of the results are reddit links. Perhaps there's another way to limit search results to a particular site here?

Normal way is "site:reddit.com"

That is Google way, rather than "normal"

I dont use google, that's how it works on ddg and kagi

Stract's githib page says they support site: queries

The site: operator seems to work in most search engines these days.

If you're looking for the answer to this question the "relation graph" from AniDB is probably the best thing around: https://anidb.net/anime/715/relation/graph

Wonderful project, congratulations! I love the speed, clean design, many options, multilingual results, overall very impressive!!

Some quibbles/points to consider: * I can't find anything on the people/organisation behind, and can onl guess from the Terms that the team is based in DK. * Search results are broad and interesting, maybe a bit more weighting for the joint occurrence of terms would be great. * Developing a site weight over time might be interesting, maybe even with user votes. Currently minor and major sites appear all together and e.g. a search for "Donald" gives me an interesting ranking order that gives neither the most famous Donald's nor the most reliable sites firet (not problematic per se - my fault for entering an unclear search term) * There are some interesting result patterns, with often official sites quite low. For instance search for "EU" with some term like subsidy (in any of the languages I speak) gives me random project websites but nothing from any of the official EU websites, or "Microsoft 365" (sorry...) gives me no MS website. * Very minor but hopefully a very easy fix: at least on Firefox mobile there is no direct way to add the search to my search engines, I had to add it manually. For other engines I can long press.on the search field and then get the option.

Great work, keep it up! I will certainly start using this :-)

"maybe even with user votes. "

I'm so mind-blown that this does not exist yet. Free ranking feedback (live training of the algo!) + better search results for everyone. win-win

Free ranking feedback

Free spam SEO ranking in practice. A spam site has 1000x the incentive to upvote its result than you have to downvote it. YaCy did a distributed index with filtering lists and you effectively had to keep a list of who you trust / your own filter.

Kinda solved with accounts, and/or subs, and monitoring/fraud detection? Or just turn it off for product related searches but keep it on for information related?

Have you seen any large site with user reviews? Amazon? Yelp? Any feedback on a page that matters will be gamed, because anything over what you spent to game it is pure profit. This is not a solved issue in any meaningful way.

So is this truly its own search engine / crawler / etc... and not using anyone else's searchs? I know ddg / kagi often use results from bing and other places, so just want to make sure.

also, how can I add this to my firefox search inside the address bar / search field?

also, how can I add this to my firefox search inside the address bar / search field?

Navigate to https://stract.com/ then focus the url field: Firefox will display the new search engine at the bottom of the suggestions, on the "This time, search with:" line.

I didn't see that option on mobile, but I got it added. Click the search provider icon in the search bar, go to search settings, then manage search providers, then add new. Add it with this url: https://stract.com/search?q=%s

That was hard to discover, thanks for your explanation.

I was looking in "Firefox Settings -> Search -> Search Shortcuts" for a way to add it. I guess the functionality is not used very often, but it would be nice to have a hint on how to add new Search Engines there.

thank you, wow that's buried deep. Been using firefox for forever, and don't think I've ever noticed / seen that button. Thank you.

Wanted to say congrats on launching! I'm building a search engine myself, I can tell a lot of work went into this.

I think the biggest thing you overlooked are page titles. When you issue a query it's a bit hard to quickly scan and judge what a site is about because the page titles are missing.

How do you crawl the web? Do you follow links around? How do you reach a page that isn't linked from anywhere you've crawled?

I mean that's what web crawling is, right? By extension, you just can't reach a page unless you stumble upon a link to it _somewhere_. Google gives you an option to submit a link and schedule a crawl that way, so that's another option if it's not being linked to from anywhere.

I'm just using common crawl for now

Congrats!

I tried to search for a particular domain data but neither search nor the explore would have the domain listed. What's the process to get unlisted domains indexed?

awhh I can see DMOZ (https://en.wikipedia.org/wiki/DMOZ) is no longer! That used to be the seed for crawling the internet I believe, for search engines.

A static version (archive) of DMOZ is still available at http://www.odp.org/

how many bits are in a byte

I checked 11 pages and none of the results were relevant.

I searched instead for "size byte bits", third result has the answer. It seems like the engine gives equal weight to all words in the search, so "are", "in" and "a" throw it off.

excellent! I'm tired of search engines that optimize for natural language queries because the inevitable trade-off is that they become useless at keyword/exact queries.

I searched for "horror movies" and the first result was a lemmy community that has literally "616 subscribers" "30 Posts" and "76 Comments" which is about as dead as you would expect from a lemmy instance.

I also searched for "league of legends", and it couldn't find its homepage.

I think its ranking algorithm may need improvement.

Edit: also, I'd rather not say this, but do we really need another DuckDuckGo? I don't think Google fails at its job because of financial incentives. I think it might fail at this job simply put because the web of 2024 isn't the web of 1990. For example, the lemmy result, it's a link aggregation about horror movie articles. The search engine could literally do the job of the link aggregator, as it has a SERP that aggregates links, and yet it's aggregating links to link aggregators. Why are the search engines doing this? Because it's 2024. I wish someone tried a new approach at this problem rather than just copying Google's design and saying "it's Google but not yucky".

Its so fucking easy to come in here and shit on things. I actually love that is suggests a Lemmy community over some fucking Subreddit. And 616 subs does not sound totally dead either.

"I'd rather not say this" is bullshit saying. So why are you saying it then. What follows is also complete bullshit. Its something COMPLETELY different from DDG. DDG is just another commercial close source woke censored search engine that just does not track. Are the not actually PAYING M$ for Bing results? Or was the a another privacy search engine?

B4 I use DDG I might as well just use Startpage or something and get Google results without tracking.

There are plenty of "private" search engines now DDG, Brave Search, Quant, ... NONE of those are actually open source. There is SearX that is just a meta search engine proxy.

In fact I recently thought "why is there not actually a true open source search engine that can be self hosted that actually indexes things ... assuming this does this and not just proxies other search engines results.

I am exited about this and hope it gets better and there will be like communities of users who use it, improve it, build indexing clusters and whatever, I am not coming in here demanding perfect results and totally failing to understand what this actually is.

The reason I'd rather not say it is exactly because I don't want to sound like I'm just coming in here and shitting on things.

I think there are no open source search engines because the instant you realize you have to periodically scrape the trillions of web pages in your index for updates you just give up because there is no way you can afford that without a solid business plan, which is hard to have when you are a search engine with no users, because you have no results, because you can't scrape trillions of webpages to build an index. Hence, I don't think it makes sense to try to make a general-purpose search engine, specially as Google has mastered that art and Google results look like that.

Generates some pretty interesting results. No way to make it my default search engine?

On Firefox just add its search URL (https://stract.com/search?q=%s I think, it's somewhere above in this thread) to your list of search engines then set it as default. Idk about chrome

If you are interested in setting up your own non-profit org marketplace or know someone who does, I made an example one using free tools (https://donate.pcblues.com/) that costs me only about $10 USD per month to host the example because it is just a hosted linux VM and not Saas or software subscription based. I configured the VM myself and then "just" installed the software and configured it. It hasn't been down for ages. I only just remembered to check it. It does everything from merch and service websites to escrowed payment transactions, user reputation, etc.

It's just an offer to communicate, not a business.

It's failing (completely wrong results) my goto query for testing search engines: "best sub 10 usd Linux single board computer" Try it out

Damn Pine64 has some fun stuff happening.

Also I noticed DuckDuckGo performed much better than Google with this benchmark.

Thought of grabbing like a big chunk of the way back machine and having THAT in the index? There’s always so much good stuff that gets nuked, and being able to search across it properly would be potentially very interesting.

Holy shit, I never thought of this before but this is an excellent idea for a search engine feature. It would work really well as one of Stract's optics!

To make Stract usable for me (slightly reduced vision), I had to apply the following custom CSS:

``` html, body, div, td, th, p, h1, h2, h3, h4, h5, b, i, strong, li, button { font-family: ui-sans-serif, sans-serif !important; webkit-font-smoothing: antialiased; font-weight: 400; text-rendering: geometricPrecision; } ```

I would give stract.org a shot tho

So I have put two inquires of my local country but they didn't shown up

Very impressive, and kudos to the developers and originators.

I just hope Stract doesn't go 'corporate' the way DDG did. :(

this is a neat thing, i like it, i'll add it to my list of search engines i use

Really like the explore feature. It lets you put in a url and shows you similar sites. Very promising project. Love to see people actually thinking about what search would be rather than rehashing decades old ideas.

The search bar should really be full width.

It can be very annoying to have your query not fit it while the window has plenty of room left.

Sources: https://github.com/StractOrg/stract

Backend in Rust (axum web framework, rocksdb), frontend with Svelte.

In swagger/open API, why is everything a post?

I tried the first endpoint get suggestions and tried searching for Gemini or Gemin hoping it would at least auto complete a word but the result set is empty.

https://stract.com/beta/api/docs/#/autosuggest/route

Seems surprising ok for coding related queries ('celery rate limit'), I'm curious about their scraping setup, building that out must be quite a big task.

Can someone provide a bit of background how the crawling part works?

Optics are a great idea, something we don’t see on other engines.

Fully open source -`ღ´-

Haven’t dig in to see what’s powering the search, I think DDG uses Bing

VERY cool product. I have a quibble. I searched for "cool pokemon to use" and the top result was "How to use Paypal on Amazon" from "online-tech-tips.com". Understandable that the search results are not perfect - the second result was a perfect match for what I searched for - but anyway, clicking the "dials" icon gets me the following options:

"""Do you like results from online-tech-tips.com? (thumbs down, thumbs up, or banned emoji options) <a href='make-an-llm-do-something-stupid.com">Summarize result</a> """

IMO this feedback widget and (maybe) its backing API could use work. It's not that I like or don't like results from online-tech-tips.com; it's that they're a bad result for the specific context of this search.

The only current search engine that I can use in my native language, Catalan, is Google. I can't wait for a project like this one to get good at that.

Great stuff!

Just want to mention, when I search for “ExpressLRS use uart on older f4 fcs” it gives me about 15 results, but only the first two are unique. The other 13 are a literal copy of the first, both in content and in URL. Probably best to filter for uniqueness

I just set it as my default search engine for a day. It's not quite there for my use cases. Can we help improve the search results?

That looks quite promising! Thank you for crediting tantivy in the github README, that's well appreciated! Ping me if I can help with anything.

immensely customizable -- We aim to give you the ability to customize everything about the search. You can block sites, boost sites, prioritize links from specific sites and much, much more.

Great! Can I use more than one optic? The drop-down list seems to allow only 1.

Oh, and if we ever become evil (maybe by changing our motto) please take our code and start a competitor.

The most important part is the index data, what would be the deal with that?

Where does their crawl come from?

Fast, feels clean and uncluttered to use and the search results are fairly high quality. I like the “optic” idea.

After reading the about page, I’m not sure what the developers are trying to achieve? Perhaps a sort of alternative-Universe Google search funded by search-context AdWords?

Sad to say this for a promising idea, but the search results are objectively terrible. If it wants to succeed, it needs to nail the primary use case.

I think a lot of people will now go and benchmark queries only to report back disappointed with results.

Trying to build generalized search engine for the modern internet that will come close to Google/Bing would require a "tech megaproject" level of investment and commitment. Most likely only to end up with the same optimizations and architecture as existing big-search and the very similar level of experience.

I think it's a better direction to build a search based on more limited amount of topic-based data and focus on great match engine within, then - just aggregate the relevant ones together. Far more maintainable also on the crawling part. I can use google/bing to find the Honda dealership or read keyboard reviews, or get 50 most useful unix commands.

I also wonder if with the rise of LLMs, while it still may not be feasible in such large scale production environment, those can serve as guides/agents to also improve the query itself and not the results of the query, for example - a chat-like search where user answers shift the relevancy metrics for returned documents. This would fit perfectly for smaller but open source, customizable and thematic search.

That being said. I think it's great that project as such pop up more often. (Phind.com was also on my radar this year)

I searched for "calories in 450 gm of steak" and the top 3 results were:

1. Brexit as the start of the reversal of neoliberal globalization - softpanorama.org 2. Directory Search - Fulshear-Katy Area Chamber of Commerce - chamberorganizer.com 3. The 100 Best New Products of 2020 - gearpatrol.com

And none of the Page 1 results were related to my search query...

Searching for "adventure game studio", neither the website that has the forums or the GitHub repository is in the first page. Most results on the first page of search are really old things. Neither Wikipedia or repology that has the package infos are anywhere in the results.

Tried searching for Dota (the video game), and the game’s website is buried by a bunch of SEO spam. It might not even have been crawled because it doesn’t appear on the first or second page.

I just set up a YaCy jail on my truenas box at home. It's a distributed p2p system.

Haven't actually used it yet since I'm currently paying for kagi and it's good, and I only just set it up yesterday.

But this just struck me, I just said 2 things there and this post is yet another, between kagi, yacy, and now stract, not just 3 different names but 3 different types of solution to a problem, and all seemingly actually viable, that have popped up recently after decades of no one really feeling like they needed anything else.

I think something is changing.