It seems like I always get a “403 - Forbidden” error whenever my search query includes “powered by”. I did try a few combinations and get the error every time when the query includes those words in that sequence. I don’t get the error when the query includes “by powered” or whenever there’s another word between “powered” and “by”.
Did I bump into an error, or is this a ‘me’ issue?
seems a bit odd that a single query would result in " your network appears to be sending automated queries" - i think that determination should be made only after an ip sends rapid multiple queries, however i think Josh mentioned that the problem is battling bot-nets
i am curious though as to why ‘powered by’ would be problematic
The powered by is a part of bot blocking as mentioned, as it was historically a very common term for looking for vulnerabilities. That being said, it’s not necessarily going to still be useful, so we’ll have a look at that and see if there’s still a good reason for having it, or a way of changing this.
I was just wondering: Would there be perhaps another way to handle these kinds of dilemma’s?
So now you get blocked whenever you enter a search query that includes a term on the bot blocking list. You instantly get the 403. I’ve heard that captchas aren’t actually useful to actually block bots these days. I’m no expert on this myself, but isn’t there a way you could present a user some sort of “challenge”, something captcha-like (perhaps a honeypot mechanism) that actually works, so that human users can prove that they’re human and not a bot? This way, humans can still use queries that include keywords that are on the bot blocklist, but bots can’t.
I’d like to get some opinions on this and learn a bit more about bot blocking.
Cloudflare’s Turnstile is part of the discussion. And I’m guessing mainstream solutions like this will draw the most resources and developer attention.
I often get 403 pages when I include too much of an error message. It isn’t too much work to include less of it, though.
Almost every bot-blocking method requires Javascript. One method that doesn’t is IP Address blocking. You could go pretty extreme with this and block everyone with a VPN, or just build up a blacklist of known bots (or use someone else’s list). This method can be pretty effective, if blunt. Mojeek probably already uses a hybrid of this and search term blocking.
I use Mojeek with Javascript nowadays, so blocking methods using first-party scripts don’t bother me much. However, I don’t want to enable third-party scripts to use Mojeek.
blocking VPN IPs in an age where more and more people are becoming concerned with privacy and depending on VPNs would be a really poor approach IMO - blocking IPs, VPN or otherwise, is not a very effective way to deter bots, one reason being bot nets where traffic is coming from many thousands of IPs
rate limiting similar queries, regardless of IP, may provide some effictive relief, but i’m sure they’ve considered that long ago
i don’t recall what Mojeek’s stance is on JS, but i can’t imagine anyone would have a problem with it as long as it’s a) not provided by a 3rd party server or library, b) the source code is published and c) it’s kept to a bare minimum
bhphotovideo.com has a very simple method to combat bots that doesn’t rely on text, math or images - it’s a simple button that has to be clicked and held until a progress bar reaches 100% - it takes a few seconds to complete and is far less annoying than google’s stupid crap
in the age of so-called AI though, i’d wonder if there’s any effective deterrent to the problem
blocking VPN IPs in an age where more and more people are becoming concerned with privacy and depending on VPNs would be a really poor approach IMO
I agree with this. It doesn’t make sense given Mojeek’s values. I make use of VPNs myself (though more often TOR). I only meant to provide an example of extreme IP address blocking rather than recommending it; I think Startpage blocks VPNs?
in the age of so-called AI though, i’d wonder if there’s any effective deterrent to the problem
Making more effective spambots is certainly something LLMs seem to be good at.
not by default, but yeah, i suppose they block ‘bad’ IPs and, unfortunately, many of them are VPN IPs
i know eTools does the same but they provide a simple captcha to unblock (eTools (meta) is my go-to until Mojeek further expands their index, and it queries Mojeek, among others)
It’s something we’re consistently revisiting, I came back to this after spotting it on the weekend to find a fair few detailed explanations of options here, cheers.
One issue with the captcha stuff if we were going that route is that we’d need it to align with our approach to other things (privacy at the core etc.) and it a lot of ways of doing this would introduce a dependency, something we attempt to avoid wherever possible.
Basically just that our offering should work with it disabled, it should not be a thing which is required to use Mojeek. There have been a few people who have specifically come to us, or in one case added us as default in their browser because of this (scroll down to privacy browser for dodging the preamble).
Captcha: IMHO it’s a bad way to go for a search engine. Back when I was testing Qwant I started getting ReCaptcha’s because they thought I was a bot. When I’m really testing a search engine or looking for info on a topic I know little about, I always start broad and then start narrowing. I’m not a fast typist but Qwant didn’t like it.
After several times I terminated my long term test and have not used that search engine since.
I really like how Mojeek works perfectly fine without Javascript enabled. It’s a rare trait for a search engine.
I despise CAPTCHAs. Part of it is because it feels like I’m doing more to train some robot to identify which part of the image contains a motorbike (or a bus, or a crosswalk) than proving I’m a human. Not to mention my consistent 60% failure rate. I’m apparently very bad at proving my humanity. Microsoft’s Captchas are somehow worse than Google’s. They demand crazy stuff like getting the right answer 15 times in a row. I am thoroughly sick of Captchas.
As much as I dislike Cloudflare’s challenge walls, at least I don’t need to do anything.