Unshittification, Exact matches, Booleans, Negation, Indexof etc

I like that there is a search engine such as this, but it is like every other modern search engine it has NOTHING on pre-2009 google, and still follows many of the enshittified anti-patterns of modern google.

I noticed that you are giving altered matches, a query for “engine” such as in “search engine” or “game engine” returns “engineer”, which is obviously bollocks/nonsense. When google started doing this is when it really first became useless in my opinion. This type of semantic redirection needs to go, I need matches for the terms I entered not for what the engine thinks it knows better than me that I thought I needed but didn’t.
Giving partial matches or similar matches is fine, but they need to appear after the exact matches and not override what was entered thinking it knows better.

At first you could negate the unwanted word with a minus, e.g. search engine -engineer, but support for this feature is sparse among search engines now, and mojeek does not have it. In my opinion this is critical for filtering out results that are popular, altered, or highly ranked or SEO honeypots, but are not what you are actually looking for. A search engine without negation is basically useless, which means all of them nowadays.
Negation is just one part of the set of boolean operators, another very useful feature that has been completely dropped by modern search engines for no apparent reason.

Another powerful tool was the ability to search site indexes or ftp using the intitle:indexof command, which may be able to locate files that are not directly linked on a page. Also removed from all modern search engines.

An operator reference card or page for all this would be nice to have too.

And finally the real reason google became dominant in the 2000’s is not just because of better results, but because of better localized results, all other search engines at the time were for america only, google was the only company at the time to realize there is a world outside the coasts, and gave the same good results for international queries.

3 Likes
2 Likes

Welcome @BmB23 :wave:

The process which matches engine to engineer here is stemming rather than semantic redirection, matching words based upon root parts; It’s not always going to get this right, and I’ve noted here the example you’ve given, but it does also do a lot of good for matching.

This being said, examples like this are always massively useful, and we are always looking at overstemming, with queries or words that fit in this category being very useful to have on hand. If you have any others then we’re always happy to receive them, here or via the contact page / aloe@mojeek.com.


A video explanation of how text enters the index exists here if useful.

2 Likes

How about giving users the power to control the stemming process? One would be to respect quotes, that for any word enclosed in quotes, stemming will not be applied. That could resolve @BmB23’s situation.

Another would be for search operators to override stemming, like the aforementioned example of “search engine -engineer”. That could help mitigate the problems of stemming through elimination.

Even more powerful if Mojeek could implement a checklist for words it guesses through stemming. Either a horizontal checklist above the search results, or a vertical checklist utilizing the free space on the left, like for example:

:ballot_box_with_check: engine
:ballot_box_with_check: engines
:ballot_box_with_check: engineer

Then the users can manually uncheck those that are wrong or are not applicable to their search. This would also be a huge boon to feedback, where users can easily report back all wrong predictions.

This is definitely something that we’re looking into, but it’s easier said/thought about than done, due to a quite early-on design decision.

That decision means it’s not possible right now, and would involve some thinking and some changes.

1 Like

It’s hard for me to picture how complicated the changes that are needed to implement them, but I would still like to express gratitude for Mojeek considering my idea, and I hope that it could eventually be implemented.

As Google slowly depreciates its search operators, and move towards unreliable predictions using AI, I can see search operators being a feature that would differentiate Mojeek from its competitors, giving back the control frustrated Google users sorely seek. And that can actually make Mojeek’s search results even better than Google, as history have already shown that giving users that power, to filter results using operators, is vastly superior to guessing them.

Btw if you do make the necessary changes to implement these additional search operators, I think it would be useful to make those changes flexible, where you could easily introduce or depreciate search operators depending on usage and user feedback in the future.

And here are some more stemming problems I’ve seen:

  • “lest” and “leste”
  • “sore” and “sorely”, and probably most words ending with -ly are not interchangeable to those without
2 Likes