Why does Mojeek Give Me a HTTP 403?

Vivaldi has a feature request section where I’ve just posted. Interested parties can visit while logged into vivaldi.net and click on the thumbs up button to ‘vote’ for the request.

2 Likes

Many thanks @mike, very much appreciated. There are some replies to which I’ve responded and with more details of our perspective.

Mojeek will also take you to a 403 - Forbidden page if your search contains Hiragana or Katakana:

https://www.mojeek.com/search?q=%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AF%E3%81%A9%E3%81%93

1 Like

Thanks for the feedback @gnome. I’ll pass it on to the search team. We don’t index Japanese (yet) so this is the likely explanation here.

1 Like

This is no longer a 403, but no Japanese indexed so no results.

2 Likes

Searching Mojeek support for information about conversions gave a 403. I am using the default algorithm.

# Query
conversion site:mojeek.com inurl:support

# URL
https://www.mojeek.com/search?q=conversion+site%3Amojeek.com+inurl%3Asupport

# Decoded URL
https://www.mojeek.com/search?q=conversion+site:mojeek.com+inurl:support

# Result
403 - Forbidden

1 Like

Hey @mike, thanks for pointing this one out; the 403 itself is now fixed but it’s given us another issue to fix with inurl. Here the results that don’t have the word in the URL aren’t being gotten rid of but demoted, it’s something we’re going to get fixed by no timescales on it yet.

1 Like

I have another 403 while searching for the “.git” directory.

# Query
site:git-scm.com inurl:book/ ".git" directory

# URL
https://www.mojeek.com/search?q=site%3Agit-scm.com+inurl%3Abook%2F+%22.git%22+directory

# Decoded URL
https://www.mojeek.com/search?q=site:git-scm.com+inurl:book/+".git"+directory

# Result
403 - Forbidden

That might have given this page as a result:

I got a 403 when testing quotes in Vivaldi.

# Query
"mojeek search engine"

# URL
https://www.mojeek.com/search?q=%22mojeek%20search%20engine%22

# Decoded URL
https://www.mojeek.com/search?q="mojeek search engine"

# Result
403 - Forbidden
Sorry your network appears to be sending automated queries so we can't process your search at this time.


Also, searching from Vivaldi appears to use + by default now.

Above, I used an old link that used %20

Thanks for this @Mike, it looks like the combination of quotes and %20 encoding is tripping our automated search blocker. We’ve thought about ways this could be mitigated, but If you’re right on the Vivaldi encoding change then this will be a big help. It has been quite unique in this respect.

1 Like

same error for me - search query was inurl:"index of"

also, a query of inurl:index of without quotes leads to “Did you mean: inurl: index of” ← notice the space after ‘:’

I believe double quotes in an operator always gives a 403.

inurl:"index of" 

Here is an example of the same search which works when using an ‘all’ operator. Though, such a specific query might not return high quality results.

allinurl:index of

In case you are wondering, ‘all’ operators can be combined with site: which might give better results since site: disables clustering.

allinurl:index of site:stackoverflow.com

Also

inurl:index of

works because it does not use double quotes in the operator.

Here, the operator value is index. And I believe the algorithm considers of to be a search term.

In the past, I would have said that you needed at least one search term when searching with an operator. But, after testing today, it seems you can use inurl: by itself.

inurl:index

Currently, Mojeek doesn’t do a good job of correcting queries with operators. So, when dealing with operators, I tend to ignore the suggestions.

The suggestion is controlled by the Settings > Search Results > Spellings checkbox. You can disable the Spelling preference to remove the suggestion (and other spelling suggestions).

https://www.mojeek.com/search?q=inurl%3Aindex+of&spell=0

In my personal correspondence, I originally raised the issue on February 13. But I don’t see a clear reply. I don’t remember getting an official response about the added space.

At the time, I attributed the added space to ‘autocorrect’:

Also,

* autocorrect will ask you to revise your search to include a space
between allintext: and the terms;
* and it will incorrectly reverse the order: breaking the query.

```
allintext:incapable of site:merriam-webster.com
site:merriam-webster.com allintext: incapable of

```


Hopefully, you will get an official reply about the issues you raised.

Apologies that this one wasn’t picked up before; at the moment the combination of quotes and operators is not supported, and it’s a tough one for us as this sort of combination is normally a good sign of traffic being automated (this thread notwithstanding) so it’s quite a difficult thing to change.

On the spellcheck adding a space, I’m raising an issue for that and we’ll look into it - thanks for the submission :pray:

1 Like

might simply allowing such queries but with tighter rate-limiting/IP work?

Unfortunately this doesn’t fix the issue too well as they are using a massive pool of IP addresses when it comes to the kind of stuff this is built to prevent.

maybe a dumb idea, but what about using some JS to see how long it takes to key in the search terms for queries like this which are abused and if the time is < x ms then redirect to a verification page with a (self-hosted) captcha???

No such thing to us :smiley:

We normally like to work without JavaScript as much as we possibly can, but also a lot of the searches in question come from browser plugins/search bars or similar, and so they don’t type anything into the search box before arriving with a query.

yes, of course - i wasn’t thinking - plus some bots/scripts probably don’t process JS

I got a 403 while searching for Asian carp. I clicked on the “Next” button and received the splash page for automated queries.

https://www.mojeek.com/search?q=great+lakes+asian+carp&s=375

Today, I’ve been submitting other queries in a private window so that I could give more targeted feedback on specific issues.

@mike thanks for reporting this. Also acknowledging receipt of “submit feedback” reports. This partiular problem was a rare concidence of factors, and should now be fixed.

1 Like