Mojeek Updates, January 2023

Here’s a rundown of what’s changed or new on Mojeek since our November update.

Crawling/Servers

During this period we crossed the 6.5bn page mark - on 7th January, another big index milestone. Over the holidays we switched on 80 new servers, meaning both that MojeekBot has a considerable amount more infrastructure to crawl with, and the general Mojeek stack is beefier than before.

We have also recently updated our adult page detector and this has now found its way around the majority of our index. This piece of work is pivotal to the implementation of a safesearch feature, which is coming soon. On top of this we’ve amended crawl behaviour in a bunch of different ways in order to improve our coverage of larger sites.

Results

  • We made multiple tweaks to our algorithm, improving result relevancy even further.
  • Some over-stemming issues have been identified and fixed.
  • We’ve worked to improve snippet lengths when a longer snippet is requested.

Web Content

If you’ve ever had someone ask you Why Mojeek? and you’ve wanted a link to send them, we’ve recently added in a page which answers the question Why is Mojeek Different? This covers the index, speed, and a fair few other aspects of what we do and why. It might even be useful/interesting for people in this forum, who knows?

Mojeek Focus

  • We’ve removed the link from the homepage to Focus, so if you are enjoying using the tool, please spread the word!
  • We’ve rejigged how the Focus Dashboard looks, moving the Templates section to the bottom of the UI. This move is to allow for the adding of even more Templates; if you have a useful Focus that you’d like to tell us about, or you have a great idea you’ve not put into practice, please let us know.
  • The Focus Support Page now has more links through to useful resources, possibly saving you a few clicks.

News

  • We had a request to lengthen the titles in the Mojeek News Search so there is now more context given within a headline before it becomes an ellipsis.

Language and Location

  • As mentioned before, we added Portuguese and Polish into the languages that Mojeek is searchable in, but we hadn’t handled highlighting some of the accented characters used in these two languages in snippets. This has now been fixed :portugal: :poland:
  • The Precise Location process, which uses browser coordinates in order to improve search relevancy, has been updated. It goes without saying but this process is as privacy respecting as it always was, but now it should be more accurate.

Quickboxes

  • The calculator previously had a problem with parentheses, and so calculations such as 5*(2+5) wouldn’t register as such; this has now been fixed.
  • We’ve added Bunq bank into our quickboxes. We take care in ensuring that trusted sources are escalated above others when it comes to banking in order to give people more financial security when using Mojeek.
  • When using the conversions quickbox to change one unit of measurement to another, the values you input into a query are now pulled through. You can now search 9 feet to inches for example, and have the result presented to you.

Preferences

  • We reviewed Preferences and rearranged the order of things in order to make it more navigable.
5 Likes

wow, lots of significant updates in this update

80 new servers? holy cow!

may i ask what the crawl rate was prior, and what it is expected to be since?

1 Like

I was beginning to wonder why I was getting so many adult content websites for fairly innocuous search terms recently.

1 Like

Now I can search for donkey videos on the BBC.


You have a duplicate line in the new about page:

This, along with our own search technology, means Mojeek returns search results for the majority of queries in less than 300ms


It looks like the status page has the option for an RSS feed. Have you considered enabling the feed?

1 Like

But of course! Crawl rate varies depending upon what improvements we’re testing and the capacity of the index. It’s usually around the 25m a day. We could easily crawl much more, but it’s set up so that crawling never gets too far in front of indexing. Out of these 80 new servers, 61 are in as search nodes, which is a 24% increase and so will be roughly an extra 6m a day. More importantly, this increases the overall maximum index size, relieving the other servers and enabling them to prioritise updating already indexed pages.

These two things are actually unrelated, we’ve noticed it too and are working to try and reduce it. Safesearch will help, but if you’re able to use the feedback to report when you see adult sites for innocuous searches, it would be of great assistance!

1 Like

It’s fixed but not yet deployed, thanks for pointing it out though :pray:

This is actually already enabled :smiley:

1 Like