I happened to be reading the Gigablast blog retrieved from the Internet Archive.
I found this article:
The Danger of Content Distribution Networks
Sep 21, 2019A lot of power in the hands of a few. It’s a recurring theme on the Internet. Companies like Cloudflare and Akamai, both CDNs, have blocked or impeded smaller search engines from spidering the content on the millions of websites for which they control the security and network logistics. Under the guise of protection their policies default to prohibiting smaller search engines from legitimately downloading and indexing this content.
It’s also quite interesting to note that Cloudflare is heavily funded by both Google and Bing, and also Baidu. So once you get the dominant search engines actively interefering with smaller search engines’ spidering of the content from millions of websites, it doesn’t really get any more anti-competitive. This is something I would not expect anyone except a handful of people to realize, as there are not that many people spidering website content for their search engine as I am, but, nonetheless, it is a critical piece of the entire anti-competitive racket orchestrated by the data barons.
This could be fixed with a Bot Bill of Rights that required all bots in the same category be treated equally. No favoritism.
Is this still an issue? Fanfiction.net is particularly infamous for hiding behind “the great Cloudflare wall”. From a cursory search, Mojeek seems to have indexed a good deal of fanfiction.net stories: https://www.mojeek.com/search?q=site%3Afanfiction.net+my+immortal
I don’t care for the site because I need to enable a lot of Javascript to get beyond the Cloudflare challenge. I’m surprised Mojeek is able to index the site properly!
I can’t think of another site like this at the moment, but if it’s annoying to me, I have to believe it’s many times more annoying for search engines other than Google and Bing.