I like that there is a search engine such as this, but it is like every other modern search engine it has NOTHING on pre-2009 google, and still follows many of the enshittified anti-patterns of modern google.
I noticed that you are giving altered matches, a query for “engine” such as in “search engine” or “game engine” returns “engineer”, which is obviously bollocks/nonsense. When google started doing this is when it really first became useless in my opinion. This type of semantic redirection needs to go, I need matches for the terms I entered not for what the engine thinks it knows better than me that I thought I needed but didn’t.
Giving partial matches or similar matches is fine, but they need to appear after the exact matches and not override what was entered thinking it knows better.
At first you could negate the unwanted word with a minus, e.g. search engine -engineer, but support for this feature is sparse among search engines now, and mojeek does not have it. In my opinion this is critical for filtering out results that are popular, altered, or highly ranked or SEO honeypots, but are not what you are actually looking for. A search engine without negation is basically useless, which means all of them nowadays.
Negation is just one part of the set of boolean operators, another very useful feature that has been completely dropped by modern search engines for no apparent reason.
Another powerful tool was the ability to search site indexes or ftp using the intitle:indexof command, which may be able to locate files that are not directly linked on a page. Also removed from all modern search engines.
An operator reference card or page for all this would be nice to have too.
And finally the real reason google became dominant in the 2000’s is not just because of better results, but because of better localized results, all other search engines at the time were for america only, google was the only company at the time to realize there is a world outside the coasts, and gave the same good results for international queries.
The process which matches engine to engineer here is stemming rather than semantic redirection, matching words based upon root parts; It’s not always going to get this right, and I’ve noted here the example you’ve given, but it does also do a lot of good for matching.
This being said, examples like this are always massively useful, and we are always looking at overstemming, with queries or words that fit in this category being very useful to have on hand. If you have any others then we’re always happy to receive them, here or via the contact page / aloe@mojeek.com.
A video explanation of how text enters the index exists here if useful.
How about giving users the power to control the stemming process? One would be to respect quotes, that for any word enclosed in quotes, stemming will not be applied. That could resolve @BmB23’s situation.
Another would be for search operators to override stemming, like the aforementioned example of “search engine -engineer”. That could help mitigate the problems of stemming through elimination.
Even more powerful if Mojeek could implement a checklist for words it guesses through stemming. Either a horizontal checklist above the search results, or a vertical checklist utilizing the free space on the left, like for example:
engine
engines
engineer
Then the users can manually uncheck those that are wrong or are not applicable to their search. This would also be a huge boon to feedback, where users can easily report back all wrong predictions.
It’s hard for me to picture how complicated the changes that are needed to implement them, but I would still like to express gratitude for Mojeek considering my idea, and I hope that it could eventually be implemented.
As Google slowly depreciates its search operators, and move towards unreliable predictions using AI, I can see search operators being a feature that would differentiate Mojeek from its competitors, giving back the control frustrated Google users sorely seek. And that can actually make Mojeek’s search results even better than Google, as history have already shown that giving users that power, to filter results using operators, is vastly superior to guessing them.
Btw if you do make the necessary changes to implement these additional search operators, I think it would be useful to make those changes flexible, where you could easily introduce or depreciate search operators depending on usage and user feedback in the future.
And here are some more stemming problems I’ve seen:
“lest” and “leste”
“sore” and “sorely”, and probably most words ending with -ly are not interchangeable to those without
I just ran into the same problem, trying to search for the odd-sounding phrase “shortened suite” while excluding “suit”, and finding that when I use -suit it has no effect; either way I get the same set of results, most of them containing the word I specifically excluded and not even containing the word I searched for.
One way to fix this without tearing down how Mojeek has always worked – that is, by adding a new feature rather than rewriting an original one – might be to run two searches on every query that uses a “not” operator: the first one searching as usual, and the second one searching the list of results, removing every hit for the unwanted words.
I can understand why it’s useful to (by default) automatically broaden people’s “positive” search terms in certain ways, to include things they might have missed. However, the only reason someone would ever go to the trouble of excluding a search term is that they want it to be absolutely eliminated from the results. I’m certain that I speak for every user of “not” when I say I’m happy to risk losing some good results in which my desired term is misspelled, in return for truly eliminating the mountains of properly-spelled garbage results I currently get.
(It would also be very helpful to just be able to override the normal functionality by asking for a verbatim search. I’d be thrilled to receive “no results” instead of “here, try these useless suggestions instead”.)
It appears to me as if someone, somewhere, feels threatened by the idea that I might be permitted to choose not to read material that doesn’t interest me, someone who believes they ought to be allowed to compel everyone to see what they have written.
I just ran into the same problem, trying to search for the odd-sounding phrase “shortened suite” while excluding “suit”, and finding that when I use -suit it has no effect; either way I get the same set of results, most of them containing the word I specifically excluded and not even containing the word I searched for.
The issue which is cropping up here is one of stemming; the minus operator itself does work, for example see:
But in this case suite is being stemmed down to suit, so the -word is having no effect on the search itself. We have a few stemming issues that we’ve taken down recently and they’ve been added into the queue to fix.
Thanks - I understood that. The general question of what happens with other languages was an afterthought, I guess. If I search for “Wart” in German (which is a stem for warten, Wartung, etc.), it would be good to bypass “warts” and “wart remover”. But it isn’t necessarily easy to tell.