What's the thing holding back Mojeek the most?

Kind of new user here. I’ve been using Mojeek for a little while, and first of all I’d like to say I love the AI summaries on desktop.
Second, the quality of the results has quite some unpredictable variance. Sometimes I search for something that I think is quite easy, and get really poor results, while sometimes I search for keywords wondering if I shouldn’t just Google instead, and get great results instead. I can’t quite find a pattern in the quality of the results, but whatever it is it is not what I expected.
My question is, what is holding you back the most in terms of improving the quality of the results? I ask out of curiosity and genuine interest. Is it that you don’t have enough users/searches? Is it that users aren’t giving enough feedback on search results? Is it a matter of revenue/costs/hiring more people? Is it that you don’t have a large enough index?

P.s.: searching for “brand-name trustpilot” in an attempt to bring up trustpilot reciews never works. Even when specifying site:uk.trustpilot.com, nothing comes up. Just a random thought.

3 Likes

I give so much feedback I wonder if it’s getting annoying…

This has also been my experience over the past few years. But some results have gotten way better.

In my opinion, the biggest problem is the size of the index. The most common reason for my results to be bad is that Mojeek just doesn’t have the page in its index. It’s less common for the result to exist in the index but not be pulled out when I search for it, but that’s possible too.

Your trustpilot example seems to be a case of Mojeek’s index not having the page you’re looking for.

Mojeek’s index is one of the biggest but I need to reach out to another search engine almost every day. Its focus on only a few languages is another thing that forces me to reach out for another search engine.

One of the problems I used to have was due to result clustering, which would only show 1 result per domain on the results page—unfortunately, Mojeek’s ranking usually meant that the 1 result it pulled was not useful to me. This has improved so much since that I hadn’t even noticed my result clustering was set to 1 instead of my usual 3.

(I’ll let Mojeek staffers discuss this in detail; just wanted to give my perspective. I’m also curious. I know one of the issues they’ve had is sites like Reddit/Linkedin blocking all search engine crawlers except for Google/Bing)

2 Likes

Mojeek is committed to neutrality and privacy.

In broad terms, this means that a controversial subject won’t be censored, users won’t be tracked, and clicks won’t influence search results.

two people searching with the same search settings (typically country and language) at the same time will get the same search results

Mojeek is primarily based on lexical search. There, the search algorithm looks for your keywords in the page title or body or in incoming links. Web pages that more closely match your keywords are ranked higher in the search results. And, the algorithm makes no attempt to interpret your intent or to respond to trending topics.

These commitments promote the health of the Internet. Mojeek users have a common view of the Internet. We don’t experience censorship. Mojeek organically links to other websites. And, Mojeek is incentivized to make a better search engine rather than better adtech.

That said, using Mojeek on a daily basis is not what I expect.

I am accustomed to typing keywords and either ending up where I intended or having a collection of high quality links: something I might collect if I spent hours researching a topic. This is often not what happens.

Often, lexical search places a result farther down than I expect.

Or, I have to pick through the results specifically because there is no click data or bias boosting the results that I might find useful.

So, there is a trade-off.

There are two things that help me.

First, I use a dedicated search box in Firefox and Vivaldi. And I have it set to preserve my keywords. This way I can start with Mojeek and then reuse the same keywords in other search engines without having to retype anything. You can do something similar within Mojeek by using their Search Choices feature. That turns on buttons which will resend your keywords to a search engine of your choice.

Second, the obvious alternative to lexical search is semantic search. That’s where the search algorithm has the ability to interpret your keywords. However, I don’t think that is necessary for Mojeek. For a while, I’ve been writing about Searching by Intent. And, in my view, today’s blog post is a version of that: giving Mojeek users the ability to search within a set of results. But, to simplify, I’d rather do something like a SQL INTERSECT between two sets of Mojeek search results. And, that is not directly possible for ordinary users today.

I use Mojeek every day. And, I’m willing to work around its limitations. I think Mojeek is better for the Internet and civil society. And, I’m trying to give it every chance for success.

As Josh wrote in your other thread, the best things you can do to help Mojeek are to use it, give feedback, and share it with like-minded people.

3 Likes

Nobody is doubting Mojeek’s commitment to neutrality and privacy – I think if we’re here it’s because we all want to see Mojeek succeed because we think it’s better for competition, civil liberties, etc.

My question was about “what can make results better?” because at the moment I have to go through the tedious process of proxying my searches through to Google about 70% of the time. Yes, I’ve turned on the Google button so that I don’t have to retype everything, but that’s still a lot of effort. This leads to a situation where if I’m looking up something quickly, I start to wonder whether I should even use Mojeek to begin with. It’s not good, and certainly not something I can recommend to people in honesty. “Use this search engine, you have to go to Google anyway for most of your searches, but it’s better for society! They aren’t profitable though and it might shut down anytime” aren’t strong selling points among my friends. I’m not here to hate on Mojeek, I desperately want to see them succeed.

To @gnome’s point, yes it might very well be the size of the index. But trustpilot? It’s an extremely popular website. Could it be that they have a deal with Google/whoever that doesn’t allow Mojeek to index them?

I’m still curious to learn if more explicit user feedback is required, or if Mojeek is learning from the websites we’re clicking and that’s enough implicit feedback. Maybe indexing is the biggest problem after all. Idk, someone from Mojeek can comment.

3 Likes

While Mojeek usually has the sites I’m looking for in its index, it may not have the exact page from that site I’m looking for. For example, I know Mojeek indexes myanimelist, but it’s 50-50 whether the page for the show/manga I’m looking for will be in the index.

I’ve long since adopted the habit of adding every site like myanimelist/anilist with its own search engine to my list of search engines, and will search it directly instead. It doesn’t work in every case, though.

I would also like to know if sending feedback as aggressively as I have been has been useful :slight_smile:

4 Likes

Thank you @sqrl, @gnome, @mike for your sharing your thoughts, suggestions and feedback.

The details of clustering, as raised by @gnome, are in this blog post.

User feedback is very important to us. We have been able to improve Mojeek in a lot of ways based on the feedback of many people here, and via email, and notably from that of @mike and @gnome. The problem you have highlighted @sqrl about TrustPilot is another piece of useful and actionable feedback. It seems that we are requested (by default) not to crawl the main domain of TrustPilot, even if we can crawl their subdomains. This appears to be an oversight, not deliberate, so we’ll get in touch with them and hopefully resolve this; as we have done so in the past with other sites.

It is true that we are unable to crawl some big sites (Reddit, LinkedIn, Facebook) but in most cases we can get unblocked (if we are) when we get in touch; and given that we operate a respectful crawler and have done so for 20 years.

There are many reasons @sqrl for the variable search quality; and you have mentioned a few. we are a small team and so improvements take time. We also need to find ways to do things efficiently; we are not going to throw 100’s of engineers or GPUs at challenges like Google and Microsoft can.

Explicit user feedback is really appreciated; that is details of specific search queries that evidence poor quality. It is hard to predict what types of queries are helpful to us, so the more the better. We do look at all feedback we get, even if we don’t reply to them all. Occasionally one specifc query can unlock a big improvement, and quite often they unlock smaller problems which all add up to improving Mojeek.

4 Likes

thanks @Colin , glad to hear the Trustpilot thing should be improved easily. This would go a long way in reducing some of the friction I’ve had using Mojeek.

Hopefully this will change if revenue streams improve. (may I once again say that I have not yet seen a single ad :sob: It makes me feel so bad knowing that I am costing you money with every search, I really wish you could make money out of my searches).

Lastly, do you learn from user’s clicks? E.g.: if the first result is irrelevant, but the second one is exactly what I need, will Mojeek learn from the fact that I clicked the first result, and then click the second? Similarly, will it learn from the fact that I’ve clicked the second result and skipped the first one? Just checking if I need to send feedback explicitly or if my clicks are already being used to train better results.

1 Like

@sqrl you started a great thread here that ended up answering questions I didn’t know I had.

Having spoken with my students many times over the years about how they choose what services to use online, I think I can partly answer the titular question.

Awareness

Some people are just not aware they have a choice. So many of my students use the default or whatever everyone uses.

I remember when Google search started, lots of people continued using Altavista. Then, poof, a certain percentage moved to Google and then everyone did. I’ve watched my students begin using Canva (one student in 25, now 24 in 25), ChatGPT, and other online services. Other things never take off: I have never seen anyone using DuckDuckGo in the wild, and I’ve only met 4 people that use OpenStreetMap. Mojeek is my homepage on my work computer, so maybe a student will try it someday.

Community

Onboarding people to this community, for example, would be beneficial and free. Feedback makes things better. Mojeek has gotten progressively and noticeably better since I discovered it. I wish I could make a Focus with hundreds of excluded sites, but most people wouldn’t ever want that.

Habits

Lots of people think the result should be right there at the top. Nobody wants to scroll anymore. Nobody learns to search anymore! In the 90s we had search engine lessons and contests at school. Now we let algorithms and SEO magic do its thing.

4 Likes

+1 from me :sweat_smile:

2 Likes

the index size is certainly a (big) issue, but that aside, i think it would benefit users greatly if they could switch between algorithms, one of them being where sites are ranked according to clicks/popularity

Are you saying that the current ranking doesn’t take popularity into account?

Why Mojeek?

We are not directing our crawler, and therefore populating our index, based on the clicks and navigation of users. Nor are we basing any part of our search indexing or ranking on results from other search engines like Google or Bing.

3 Likes

Hmm okay thanks for the info. That is a very interesting choice indeed. I don’t understand what drove this decision, but I can’t think of a “good enough” reason – like is it to protect privacy? Reduce bias? I don’t think it would make results more biased, if anything, it makes them more useful for the majority of people :thinking: I’m open to changing my mind though, I’m sure they’ve thought about this more than the 1 minute I’ve spent thinking about it just now.

reduce bias, and that can be a very important benefit when doing research, especially for controversial topics, but i see it as a hindrance when doing general searches where i want to see the most popular result at the top

my impression (i may be wrong) is that Moj seems to want to combine the two in a single algo which, IMO, i think results in an algo that performs poorly for general ‘i-don’t-care-about-neutrality’ searches and this may be where multiple, user-selectable algo’s could provide a significant benefit … but what the hell do i know … these guys have been at this for more than a decade, so it isn’t like they haven’t put a lot of thought into this :slight_smile:

The closest thing to automated feedback is /eval.

https://www.mojeek.com/eval

There, two different search algorithms present results side-by-side. And, you can use the ‘A is better’ or ‘B is better’ links at the top to vote (after searching). For example, that is how semantic matching was evaluated.

But, I have not heard new information about /eval in a while. And, I’m not sure if there is a useful algorithm test in progress. Typically, new algorithms are only evaluated for a few weeks. But, the page remains up afterward.

3 Likes

It is a great platform. I use your platform to do search. I think suggestions are important when I enter a keyword. The user engagement is important to improve your the search quality. My blogging website is not easily appear on Mojeek that is already ranked in google on many keywords.

1 Like

Welcome @Roheveski_ou

My blog also took some time to get indexed. Mojeekbot will find yours too.

1 Like

Noted @bbbhltz and @gnome

This is both privacy and bias (and other things); the Mojeek tagline is, after all, No Tracking. Just Search…

Currently /eval isn’t being used to test a specific algorithm, but it was a great help when it came to testing out the newly-deployed algorithm we wrote about in February last year: Major Algorithm Update; Adding A Semantic Element | Mojeek Blog

2 Likes

This is both privacy and bias; the Mojeek tagline is, after all, No Tracking. Just Search…

I’ll have to give this a think. I am not sure I would consider learning from anonymous users’ clicks ‘tracking’, but I see your point. Mostly I think that without this vital implicit feedback from the users, the results might never become competitive quality-wise.

Which reminds me I wanted to make a point about something @bbbhltz said:

I remember when Google search started, lots of people continued using Altavista. Then, poof, a certain percentage moved to Google and then everyone did. I’ve watched my students begin using Canva (one student in 25, now 24 in 25), ChatGPT, and other online services. Other things never take off

The reason why people start adopting a product is because it either responds to a need they have, makes their lives easier, or it does so to a greater degree than the product they’re already using. In this game there is, of course, a first-in-the-market competitive advantage. Once Google established itself as the default/market leader, it became hard for anyone else to gain market share. This isn’t just because Google spends billions in anti-competitive practices to maintain the de-facto monopoly over search, it’s also because none of the competitors have better results than Google. And if the results aren’t better, people simply won’t switch. Yeah, Duckduckgo and others might be better at protecting your privacy, but it is convenience, not philosophical beliefs about civil liberties and the ultimate good for society, that win people over. So if you want to build something that will win people over, you have to at least give people a high quality product. Some users you will gain because of philosophical/personal beliefs, but 99% of people will not switch until it is convenient to do so.

This is by the way why chatGPT is such a threat to Google. Because finally after years of having a monopoly over search, Google is threatened by something that is definitely more convenient than its search engine for many use cases. People are still going to google “how tall is the eiffel tower” but many other use cases are now better captured by having a conversation with chatGPT (planning a holiday, making a recipe, looking for advice on whatever, complex queries of any other kind, etc).

So basically my worry is that Mojeek is not doing all it can to maximise the “convenience factor”, which is what ultimately wins the majority of people over. Refusing to learn from users click may sound like a strong and unassailable moral ground, but in reality it just prevents Mojeek from becoming better – and thus being adopted by more users and ultimately protecting more of the world’s population.

I’m aware of the fact that on this forum my opinion might not be shared by most :slight_smile:

2 Likes

Sites with ranking can be manipulated. Rotten Tomatoes is a good example [1]. And Google just lets companies do whatever they pay for [2]. I also don’t see Mojeek as directly competing with Google (different segments, targets, positioning models—from my perspective, at least.).

ChatGPT isn’t much of a product (my opinion there), and neither is Google’s Gemini (dumb name, also my opinion). With DeepSeek showing it is possible to do something similar on low-end hardware, for cheaper, it really takes the sparkle out of these predictive text machines. But, I see what you mean. For me, it still comes back to what I said above: different tools for different jobs, i.e. different markets.


  1. Rotten Tomatoes Still Has Hollywood in Its Grip ↩︎

  2. How 16 Companies are Dominating the World’s Google Search Results (2024 Edition) - Detailed.com ↩︎

1 Like