Why does Mojeek Give Me a HTTP 403?

In certain cases, searching on Mojeek can result in a HTTP 403 “Forbidden” message.

Operator with Double Quotes

One way to generate a 403 is to use double quotes with a Mojeek search operator:

intitle:"javascript switch" site:w3schools.com

That will produce the URL:

https://www.mojeek.com/search?q=intitle%3A%22javascript+switch%22+site%3Aw3schools.com

If all you have is the URL, you can decode that with urldecoder.org

Spaces in the original search are replaced with + in the URL.

https://www.mojeek.com/search?q=intitle:"javascript+switch"+site:w3schools.com

One way to execute this search without error is to use an operator for each term:

# Query
intitle:javascript intitle:switch site:w3schools.com

# URL
https://www.mojeek.com/search?q=intitle%3Ajavascript+intitle%3Aswitch+site%3Aw3schools.com

# Decoded URL
https://www.mojeek.com/search?q=intitle:javascript+intitle:switch+site:w3schools.com

That gives W3Schools as its first result:

# Result
JavaScript Switch Statement
https://www.w3schools.com/js/js_switch.asp
... JS Arrays JS Array Methods JS Array Sort JS Array Iteration JS Array

Lone Operator

However, be careful not to use an operator by itself.

# Query
inanchor:mojeek

# URL
https://www.mojeek.com/search?q=inanchor%3Amojeek

# Decoded URL
https://www.mojeek.com/search?q=inanchor:mojeek

# Result
403 - Forbidden

Most operators will give a 403 when used alone.

site: will give no result.

Lone Operator Table

Operator    | Result
-------------------------
inanchor:     HTTP 403
intext:       HTTP 403
intitle:      HTTP 403
inurl:        HTTP 403

allinanchor:  HTTP 403
allintext:    HTTP 403
allintitle:   HTTP 403
allinurl:     HTTP 403

since:        HTTP 403
before:       HTTP 403

site:         No result.

Vivaldi Search

Currently, Vivaldi has some quirks with respect to Mojeek.

There are four ways to invoke a search in Vivaldi:

One way that Vivaldi differs from Mojeek.com is that Vivaldi will encode spaces as %20 instead of +

For simple searches, %20 works:

upside-down house near Johannesburg

https://www.mojeek.com/search?q=upside-down%20house%20near%20Johannesburg

https://www.mojeek.com/search?q=upside-down house near Johannesburg

South Africa's 'upside down' house attracts tourists
https://nypost.com/.../03/10/south-africas-upside-down-house-attrac...
HARTEBEESTPOORT, South Africa — With its roof on the ground and its floor 

Vivaldi with Double Quotes

However, including double quotes in the search will produce a 403:

"upside-down house" near Johannesburg

https://www.mojeek.com/search?q=%22upside-down%20house%22%20near%20Johannesburg

https://www.mojeek.com/search?q="upside-down house" near Johannesburg

403 - Forbidden

Vivaldi with Search Operators

And using a search operator will also give a 403.

double quote site:en.wikipedia.org

https://www.mojeek.com/search?q=double%20quote%20site%3Aen.wikipedia.org

https://www.mojeek.com/search?q=double quote site:en.wikipedia.org

403 - Forbidden


If you find any other examples which give a HTTP 403. Please reply here or start a new post.

References

HTTP 403 | Wikipedia

Search Operators | Mojeek

URL Decoder

Quick Commands Video | Vivaldi

Quick Commands Help | Vivaldi

Upside-down House | BBC News

2 Likes

I’ve had this 403 thing happen too. It was from the address bar in Vivaldi: a three word search. I think my query string was “buy linux laptop” at the time.

The bulk of, if not all of, these will be automated query blocks i.e. stuff that’s been put into place in order to try and avoid letting through searches being undertaken by bots, where it is falling down is very useful though so I’m going to get all of these checked out.

EDIT: I also forgot to speak about that the thing which you pointed out @mike:

Vivaldi will encode spaces as %20 instead of +

is an issue that has been raised with them in the past which causes problems when it comes to Vivaldi+Mojeek and I believe they’re reasonably unique in that respect, this causes us an issue with the aforementioned blocks. It’s something that we’ve talked about revisiting a decent amount in order to see if we can improve here too.

1 Like

Regarding Vivalidi, please note @mike and @brad that we have raised the encoding issue with them more than once, and directly with their management team, as you can see here: Only browser which handles spaces in search queries differently | Vivaldi Forum
Feel free to add something to the Vivaldi forum!

Vivaldi has a feature request section where I’ve just posted. Interested parties can visit while logged into vivaldi.net and click on the thumbs up button to ‘vote’ for the request.

2 Likes

Many thanks @mike, very much appreciated. There are some replies to which I’ve responded and with more details of our perspective.

Mojeek will also take you to a 403 - Forbidden page if your search contains Hiragana or Katakana:

https://www.mojeek.com/search?q=%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AF%E3%81%A9%E3%81%93

1 Like

Thanks for the feedback @gnome. I’ll pass it on to the search team. We don’t index Japanese (yet) so this is the likely explanation here.

1 Like

This is no longer a 403, but no Japanese indexed so no results.

2 Likes

Searching Mojeek support for information about conversions gave a 403. I am using the default algorithm.

# Query
conversion site:mojeek.com inurl:support

# URL
https://www.mojeek.com/search?q=conversion+site%3Amojeek.com+inurl%3Asupport

# Decoded URL
https://www.mojeek.com/search?q=conversion+site:mojeek.com+inurl:support

# Result
403 - Forbidden

1 Like

Hey @mike, thanks for pointing this one out; the 403 itself is now fixed but it’s given us another issue to fix with inurl. Here the results that don’t have the word in the URL aren’t being gotten rid of but demoted, it’s something we’re going to get fixed by no timescales on it yet.

1 Like

I have another 403 while searching for the “.git” directory.

# Query
site:git-scm.com inurl:book/ ".git" directory

# URL
https://www.mojeek.com/search?q=site%3Agit-scm.com+inurl%3Abook%2F+%22.git%22+directory

# Decoded URL
https://www.mojeek.com/search?q=site:git-scm.com+inurl:book/+".git"+directory

# Result
403 - Forbidden

That might have given this page as a result:

I got a 403 when testing quotes in Vivaldi.

# Query
"mojeek search engine"

# URL
https://www.mojeek.com/search?q=%22mojeek%20search%20engine%22

# Decoded URL
https://www.mojeek.com/search?q="mojeek search engine"

# Result
403 - Forbidden
Sorry your network appears to be sending automated queries so we can't process your search at this time.


Also, searching from Vivaldi appears to use + by default now.

Above, I used an old link that used %20

Thanks for this @Mike, it looks like the combination of quotes and %20 encoding is tripping our automated search blocker. We’ve thought about ways this could be mitigated, but If you’re right on the Vivaldi encoding change then this will be a big help. It has been quite unique in this respect.

1 Like

same error for me - search query was inurl:"index of"

also, a query of inurl:index of without quotes leads to “Did you mean: inurl: index of” ← notice the space after ‘:’

I believe double quotes in an operator always gives a 403.

inurl:"index of" 

Here is an example of the same search which works when using an ‘all’ operator. Though, such a specific query might not return high quality results.

allinurl:index of

In case you are wondering, ‘all’ operators can be combined with site: which might give better results since site: disables clustering.

allinurl:index of site:stackoverflow.com

Also

inurl:index of

works because it does not use double quotes in the operator.

Here, the operator value is index. And I believe the algorithm considers of to be a search term.

In the past, I would have said that you needed at least one search term when searching with an operator. But, after testing today, it seems you can use inurl: by itself.

inurl:index

Currently, Mojeek doesn’t do a good job of correcting queries with operators. So, when dealing with operators, I tend to ignore the suggestions.

The suggestion is controlled by the Settings > Search Results > Spellings checkbox. You can disable the Spelling preference to remove the suggestion (and other spelling suggestions).

https://www.mojeek.com/search?q=inurl%3Aindex+of&spell=0

In my personal correspondence, I originally raised the issue on February 13. But I don’t see a clear reply. I don’t remember getting an official response about the added space.

At the time, I attributed the added space to ‘autocorrect’:

Also,

* autocorrect will ask you to revise your search to include a space
between allintext: and the terms;
* and it will incorrectly reverse the order: breaking the query.

```
allintext:incapable of site:merriam-webster.com
site:merriam-webster.com allintext: incapable of

```


Hopefully, you will get an official reply about the issues you raised.

Apologies that this one wasn’t picked up before; at the moment the combination of quotes and operators is not supported, and it’s a tough one for us as this sort of combination is normally a good sign of traffic being automated (this thread notwithstanding) so it’s quite a difficult thing to change.

On the spellcheck adding a space, I’m raising an issue for that and we’ll look into it - thanks for the submission :pray:

1 Like

might simply allowing such queries but with tighter rate-limiting/IP work?

Unfortunately this doesn’t fix the issue too well as they are using a massive pool of IP addresses when it comes to the kind of stuff this is built to prevent.

maybe a dumb idea, but what about using some JS to see how long it takes to key in the search terms for queries like this which are abused and if the time is < x ms then redirect to a verification page with a (self-hosted) captcha???