Search could be better? Yes, yes it could.
I search for words, can even indicate I want search results with a keyword included and it will be ignored. And then I have to sift between what is the search result, and what is an ad.
And if I get another quora answer....
But, this post? it was a waste. We do some hand wavy stuff, come try us.
MatterRank uses LLMs to rank pages based on criteria you provide it with, not SEO tricks. It’s not meant to replace Google, but helps when you're looking for something specific and don't want to wade through tons of results that you don't care about. Still early, but useful for deeper searches.
I cant even imagine why they got rid of that, unless hundreds of thousands of people started pasting 1000-character search terms removing all the known ads currently flying around
That seems extremely unlikely as the reason.
It's far more likely that some executives looked at the numbers and decided that removing search operators would make people more likely to click on ads, while leaving them in would make people click on the actual results that they were searching for.
make search worse = more searches are performed = more ads shown
It's anti-consumer but every company is like this now.
I am unsure if it is possible to run a "free" web search without having a benevolent benefactor paying for the scraping and maintenance and staffing. Furthermore, someone has to play mouse and mousetrap with the "gaming" of whatever "algo" one chooses to use to rank results. Maybe a list is the wrong way to display search results. maybe a contemporary snapshot of the page with the search text highlighted might work better. It might even convince a lot of sites to clean up their landing pages and their blog/article formats.
I know how to stand up and start a web search engine, and probably could implement a decent chunk of functionality myself. it'd be slow and fall down if 100,000 people hit it at once, but nonetheless, the hard part isn't getting one running and starting the scraping. The hard part is results and funding.
I envisioned, last night, in a fever dream: maybe some metadata that the crawler and the sites share, to encourage Value for Value. If a site is willing to be scraped, but would like some nominal bandwidth costs recouped, or perhaps some sort of data agreement that is mutually - mutually - beneficial; or a site like NYT chipping in to the search hosting costs if the search company has really good results, like better than NYT could implement, then there could be some value for value there, too.
Search engines provide a valuable service for humanity, as a general concept. Search engines as they exist now provide a valuable service for their shareholders. remove the shareholders, make the service valuable for humans, and the human stakeholders in the search company (employees, vendors, etc) might not be so greedy or "legally obligated to make numbers go up".
Encarta and Britannica existed. Wikipedia exists - as well as the forks and archives.
Does it? I understand there are issues with spam in search, but assuming we don't know what we want is not at all the conclusion I draw from using search engines.
For the former, I'm intrigued but unconvinced that it's what I actually want in a search engine.
For the latter, I imagine that's something that this search engine will need to contend with, although it could "just" be an LLM compute trade-off, where if you give enough results to an LLM to analyse you'll eventually find the good stuff. That said, SEO is going to rapidly become LLMEO and ruin the day again.
https://github.com/rumca-js/Internet-Places-Database
I must admit, that this is a difficult task. There are many domains for "hotels", "casinos", so I have to protect myself, just as google agains spam.
Remove all of this, just let me directly use your app, I want to search and create engines on the fly.
I don't need to save them for future uses, if I am not going to use your app even once.
If you want this to take off, it needs to just work, no extra steps unless I want to.
The results for me were fairly high quality and moderately relevant but I think they could be improved as well.
You get pretty far by just blocking low quality blogspam and Medium, which would be a lot faster and could even be done on the frontend with a chrome plugin.
As for the results, it's tough because we've made the deliberate decision to have no control over the reranking. What that means is that if your criteria is "written by a woman", for instance, then any result that meets that will be ranked equally at the top. In all engines I've built for myself, I have a relevance criteria that's weighted relative to how much I care that the result is exactly what I'm looking for. It's probably important to make that clearer to the end user.
I'm hoping that as LLMs become more mainstream more functionality is built into tech that doesn't treat consumers as idiots. This is one stab at it, but there's so many other opportunities imo.
What I prefer is interfaces that are more systematic and based on comprehensible principles. Like, for search (as someone mentioned in another comment), I want to be able to search for pages (or records, or whatever) that contain the text I searched for. I don't want an interface that tries to understand what I mean, I just want it to use the data I give it in a way that's deterministic enough that I can figure out how to make it do what I want.
In my opinion, the best applications of LLM UX will have full clarity for the end user (something we're trying to do with MatterRank). The non-determinism should be something the user can control to get better results, not something the engineer has prompted that takes control away from the user.
Now, if the use case you're looking for is "give me results with x text", then yes I agree with you that LLMs are just getting in the way. But that's not always the case.
It doesn't even explain why it's better than Perplexity.
anyway, my test was to search for FOSS software, explicitly asking for "not big tech" and no ads. the contents of the results were fine, if repetitive - but i was a bit sad to see a lot of youtube and reddit in the results. does the " algorithm" not look at the actual domains?
Why is everyone so fixated on keywords for instance? They have their uses, but librarians and people who do research for a living also use subject headings. These are still human designated as far as I know.
People who are experts in an area often search directly by author. An actually useful tool would be something that cross-references advisor-advisee relationships, who was colleagues with who as a function of time, etc, and finds additional sources based on author networks. You maybe could do something like this for the web too, as I suspect a lot of high quality pages made by individuals are related by such interpersonal networks. A lot of spammy garbage sites probably have network relations to each other as well.
I think where MatterRank shines right now is for finding results where you wouldn't mind waiting an extra 20-30 seconds for an added layer of vetting, as opposed to just wanting a quick answer.
Having said that, we are definitely working on making it faster and more useful for everyday queries.
I've not used it, but anecdotally, I can refine my own search query to get what I want, or conclude it doesn't exist, within 20-30s. Assuming ~5s per search to write, search, read, decide, that's 4-6 searches.
Do you think you're getting more value than 4 iterations on the initial search term? Are you always getting it in one search, or do you end up still needing to refine the search term, extending it beyond that 20-30s?
Definitely not for all cases, but in some cases yes. Where it really makes a difference is when you're looking for qualitative attributes of the webpage, rather than what words show up in it (e.g. “written by a woman", "is likely to convince someone who supports Trump", "talks about X/Y/Z but not A/B.”) It reads the actual content, so you can get oddly niche in a way you just can’t with keywords alone.
Edit (hn doesn’t let me post this fast): is finding places to buy shit really an issue? How many times in your life have you thought “damn I know what I want to buy, I just don’t know from which site to buy it”? That’s hard to imagine of anyone. This user story just seems like a problem made up by search indexes to court capital.
Edit2: Kagi is great. I'm a full subscriber.
I find I do it quite a lot. When I was researching solar. When I needed some actuators recently. Now I'm looking for a trailer. And so on.
Obviously not groceries, but whenever um investigating something new I find commercial sites to be very helpful.
I've been Amazon-free for a while and generally I've had very good luck simply going directly to manufacturer's websites, but it seems like you might be searching for a class of products for which that strategy is ineffective?
That would definitely have enriched my comment, but, unfortunately, I couldn't think of anything in particular in the moment, and can't now. seb1204 (https://news.ycombinator.com/item?id=43564922) mentions one common kind of use case for me: I want to buy some small utility item that I'm used to finding in the hardware store, but it's sufficiently specialty that it's not worth it for the hardware store to carry it, and it's sufficiently small that its price would triple or quadruple if I paid the manufacturer's shipping costs.
> I've been Amazon-free for a while and generally I've had very good luck simply going directly to manufacturer's websites, but it seems like you might be searching for a class of products for which that strategy is ineffective?
As I say, I'm stuck in the situation of being vague because I can't think of the last specific time this affected me, but I have definitely dealt with relatively small sellers where the purchase option on their webaite is "here's a link to buy from our Amazon store."
If you mean pop-ups, MatterRank can't handle that at the moment because it evaluates markdown content, but it's something we're looking at adding. In the meantime, I'd recommend a good ad-blocker.
Use Brave Search with Goggles (https://search.brave.com/goggles/discover). It's great.