Major Bug: Wrong Dates make Time Operators Useless

Here’s a search I did with a since operator, but the results broke common sense:

Here are some of the search results, where at first glance, they seem to follow the time operator…

However, when opened, the sites clearly indicate that the dates written were wrong. And this problem appears on almost every result. Looking closely, the dates are last crawled date instead (i.e., the date 24 Jan 2024 in the 1st result is indeed 2 months ago). According to the blog about search operators:

The date a page has will be either when Mojeek has noticed that it was last modified or, if the page has never been found as modified, when Mojeek first crawled the page. The dates do not reflect when a page was first published, although in some cases these won’t be far apart.

Do you mean that if a page is never modified, Mojeek would use the first crawled date, instead of when the page is actually published? I find that logic, to say it kindly, extremely bizarre. Who, except the SEO folks, would prefer the first crawled date over the actual date? So here are my questions for Mojeek:

  1. Is this a bug, or is this intentional?
  2. If intentional, for what purpose? For the benefit of the users, or limitations of the search engine?
  3. If it can’t be repaired immediately or at all, can you at least indicate it to avoid confusing users? Something like “We did not find any articles within the date you specified”, then a button if they want to use the first crawled date instead.

This is one of those things which is very easy for a human to pick out, but more difficult to automate and scale. It is the reason why this functionality since/before is on the operators page but not in a UI element.

This is correct, the date of a page is when it first entered into the index.

This should improve along with crawling more, and we’re aware of it, but it’s a case of where to allocate effort. We’re aware of it and returning every now and again to thinking about ways it could be improved.

So:

  1. intentional
  2. limitations
  3. the dates are all going to be last crawled/modified, so there would be no two steps to this process, there is only one date to offer, the text in Operators says:

to restrict results to those that were last modified since then

which I guess could be better as

to restrict results to those that were last modified since then (in cases where a page has not been modified this will be the date we found the page)

to cut out some confusion.

1 Like

How about even more honesty?

First, put “beta” on date operators. I don’t think operators is shorthand for incomplete features.

Second, instead of wrapping it on the description, make it more prominent. Something like:

On Mojeek there are two ways in which you can narrow down the pages returned based upon dates. (Note: Currently, we can only check if the page has been modified AFTER the first crawled date, but we’re working on fixing this.)

I believe the proper word is “fix”, not “improve”, as it can produce wrong results, breaking basic assumptions about such a feature. Anything less would sound deceptive.

That said, I appreciate your admission of this problem, and would be willing to help in testing out if you found possible solutions.