Why does Mojeek Give Me a HTTP 403?

I’ve had this 403 thing happen too. It was from the address bar in Vivaldi: a three word search. I think my query string was “buy linux laptop” at the time.

The bulk of, if not all of, these will be automated query blocks i.e. stuff that’s been put into place in order to try and avoid letting through searches being undertaken by bots, where it is falling down is very useful though so I’m going to get all of these checked out.

EDIT: I also forgot to speak about that the thing which you pointed out @mike:

Vivaldi will encode spaces as %20 instead of +

is an issue that has been raised with them in the past which causes problems when it comes to Vivaldi+Mojeek and I believe they’re reasonably unique in that respect, this causes us an issue with the aforementioned blocks. It’s something that we’ve talked about revisiting a decent amount in order to see if we can improve here too.

1 Like

Regarding Vivalidi, please note @mike and @brad that we have raised the encoding issue with them more than once, and directly with their management team, as you can see here: Only browser which handles spaces in search queries differently | Vivaldi Forum
Feel free to add something to the Vivaldi forum!

Vivaldi has a feature request section where I’ve just posted. Interested parties can visit while logged into vivaldi.net and click on the thumbs up button to ‘vote’ for the request.

2 Likes

Many thanks @mike, very much appreciated. There are some replies to which I’ve responded and with more details of our perspective.

Mojeek will also take you to a 403 - Forbidden page if your search contains Hiragana or Katakana:

https://www.mojeek.com/search?q=%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AF%E3%81%A9%E3%81%93

1 Like

Thanks for the feedback @gnome. I’ll pass it on to the search team. We don’t index Japanese (yet) so this is the likely explanation here.

1 Like

This is no longer a 403, but no Japanese indexed so no results.

2 Likes

Searching Mojeek support for information about conversions gave a 403. I am using the default algorithm.

# Query
conversion site:mojeek.com inurl:support

# URL
https://www.mojeek.com/search?q=conversion+site%3Amojeek.com+inurl%3Asupport

# Decoded URL
https://www.mojeek.com/search?q=conversion+site:mojeek.com+inurl:support

# Result
403 - Forbidden

1 Like

Hey @mike, thanks for pointing this one out; the 403 itself is now fixed but it’s given us another issue to fix with inurl. Here the results that don’t have the word in the URL aren’t being gotten rid of but demoted, it’s something we’re going to get fixed by no timescales on it yet.

1 Like

I have another 403 while searching for the “.git” directory.

# Query
site:git-scm.com inurl:book/ ".git" directory

# URL
https://www.mojeek.com/search?q=site%3Agit-scm.com+inurl%3Abook%2F+%22.git%22+directory

# Decoded URL
https://www.mojeek.com/search?q=site:git-scm.com+inurl:book/+".git"+directory

# Result
403 - Forbidden

That might have given this page as a result:

I got a 403 when testing quotes in Vivaldi.

# Query
"mojeek search engine"

# URL
https://www.mojeek.com/search?q=%22mojeek%20search%20engine%22

# Decoded URL
https://www.mojeek.com/search?q="mojeek search engine"

# Result
403 - Forbidden
Sorry your network appears to be sending automated queries so we can't process your search at this time.


Also, searching from Vivaldi appears to use + by default now.

Above, I used an old link that used %20

Thanks for this @Mike, it looks like the combination of quotes and %20 encoding is tripping our automated search blocker. We’ve thought about ways this could be mitigated, but If you’re right on the Vivaldi encoding change then this will be a big help. It has been quite unique in this respect.

1 Like

same error for me - search query was inurl:"index of"

also, a query of inurl:index of without quotes leads to “Did you mean: inurl: index of” ← notice the space after ‘:’

I believe double quotes in an operator always gives a 403.

inurl:"index of" 

Here is an example of the same search which works when using an ‘all’ operator. Though, such a specific query might not return high quality results.

allinurl:index of

In case you are wondering, ‘all’ operators can be combined with site: which might give better results since site: disables clustering.

allinurl:index of site:stackoverflow.com

Also

inurl:index of

works because it does not use double quotes in the operator.

Here, the operator value is index. And I believe the algorithm considers of to be a search term.

In the past, I would have said that you needed at least one search term when searching with an operator. But, after testing today, it seems you can use inurl: by itself.

inurl:index

Currently, Mojeek doesn’t do a good job of correcting queries with operators. So, when dealing with operators, I tend to ignore the suggestions.

The suggestion is controlled by the Settings > Search Results > Spellings checkbox. You can disable the Spelling preference to remove the suggestion (and other spelling suggestions).

https://www.mojeek.com/search?q=inurl%3Aindex+of&spell=0

In my personal correspondence, I originally raised the issue on February 13. But I don’t see a clear reply. I don’t remember getting an official response about the added space.

At the time, I attributed the added space to ‘autocorrect’:

Also,

* autocorrect will ask you to revise your search to include a space
between allintext: and the terms;
* and it will incorrectly reverse the order: breaking the query.

```
allintext:incapable of site:merriam-webster.com
site:merriam-webster.com allintext: incapable of

```


Hopefully, you will get an official reply about the issues you raised.

Apologies that this one wasn’t picked up before; at the moment the combination of quotes and operators is not supported, and it’s a tough one for us as this sort of combination is normally a good sign of traffic being automated (this thread notwithstanding) so it’s quite a difficult thing to change.

On the spellcheck adding a space, I’m raising an issue for that and we’ll look into it - thanks for the submission :pray:

1 Like

might simply allowing such queries but with tighter rate-limiting/IP work?

Unfortunately this doesn’t fix the issue too well as they are using a massive pool of IP addresses when it comes to the kind of stuff this is built to prevent.

maybe a dumb idea, but what about using some JS to see how long it takes to key in the search terms for queries like this which are abused and if the time is < x ms then redirect to a verification page with a (self-hosted) captcha???

No such thing to us :smiley:

We normally like to work without JavaScript as much as we possibly can, but also a lot of the searches in question come from browser plugins/search bars or similar, and so they don’t type anything into the search box before arriving with a query.