What's the thing holding back Mojeek the most?

yes, and bots have become more sophisticated, as you guys know

i occasionally talk to the eTools dev and he says he spends the majority of his time battling bots and i suppose the problem is only getting worse with “AI”

Q: curious, what is the ratio of bots that do and don’t process JS nowadays?

We don’t track this so it’s not going to be based on hard data, but it’s likely that the majority do not process JS. It’s also important to note that there are a wide-range of different types of bots such as fly-by crawlers and targeted scrapers, so difficult to give a straight response.

One thing I noticed is that the primary work product is new search algorithms. Though, as the search index becomes larger, it takes more time for new algorithms to take effect. It seems as though new algorithms change the way Mojeek Bot works. Different algorithms collect different information about each web page. And, each data set is apparently incompatible with the last. It can take weeks for a change to the algorithm to have a meaningful impact on the index and, consequently, the search results seen by users.

Perhaps this is an argument for page caching.

In the past, page caching has been discussed as a way for Mojeek to present historical versions of web pages for the purposes of archiving or research.

But a page cache might accelerate algorithm changes without requiring a change in crawl speed. And, therefore, a page cache would become an important prerequisite technology instead of just a user feature.

I’m not sure if this is an accurate description of how Mojeek works or whether a page cache would serve that function.

1 Like

The response to this is essentially that it is correct sometimes, and sometimes not. Algorithmic changes can range from changing things right then and there, and requiring a full refresh of the index in order to have an effect (and absolutely everything in between). It will depend a lot on the nature of the change.

2 Likes