Do CDNs like Cloudflare block Mojeek and other Search Engines?

I happened to be reading the Gigablast blog retrieved from the Internet Archive.

I found this article:

The Danger of Content Distribution Networks
Sep 21, 2019

A lot of power in the hands of a few. It’s a recurring theme on the Internet. Companies like Cloudflare and Akamai, both CDNs, have blocked or impeded smaller search engines from spidering the content on the millions of websites for which they control the security and network logistics. Under the guise of protection their policies default to prohibiting smaller search engines from legitimately downloading and indexing this content.

It’s also quite interesting to note that Cloudflare is heavily funded by both Google and Bing, and also Baidu. So once you get the dominant search engines actively interefering with smaller search engines’ spidering of the content from millions of websites, it doesn’t really get any more anti-competitive. This is something I would not expect anyone except a handful of people to realize, as there are not that many people spidering website content for their search engine as I am, but, nonetheless, it is a critical piece of the entire anti-competitive racket orchestrated by the data barons.

This could be fixed with a Bot Bill of Rights that required all bots in the same category be treated equally. No favoritism.

Is this still an issue? Fanfiction.net is particularly infamous for hiding behind “the great Cloudflare wall”. From a cursory search, Mojeek seems to have indexed a good deal of fanfiction.net stories: https://www.mojeek.com/search?q=site%3Afanfiction.net+my+immortal

I don’t care for the site because I need to enable a lot of Javascript to get beyond the Cloudflare challenge. I’m surprised Mojeek is able to index the site properly!

I can’t think of another site like this at the moment, but if it’s annoying to me, I have to believe it’s many times more annoying for search engines other than Google and Bing.

I’m not sure about outright access, but I believe Mojeek caps their crawling to a low amount to avoid triggering any blocks.

The only site I know doesn’t work is Home Depot which has a geo block against UK IP addresses.

it’s a really good thing that Matt’s posts are preserved, not least because of things like this:

For Cloudflare specifically, we are on their list of verified bots, so we don’t have this problem when it comes to them, but yes we have had issues. One which was pointed out when I asked other team members was Cloudfront.

Yeah, much like other blocks we’ve encountered, it does cause some extra work on our side or make things not operate how you’d like them to.

In all possible ways MojeekBot is built to behave well and be a good bot. Obeying robots and yes, not doing things which would cause people to think MojeekBot isn’t behaving that way, is of paramount importance.

3 Likes