Crawling Mastodon

According to what I’ve found on the internet, mastodon servers should be indexable unless otherwise set by the admin.

Besides that, every user also has the option of the opt-out of search engine indexing.

image

In my experience, Google and Bing have indexed Mastodon server info as well as toots, but these don’t show up high in search results if they even do (for non Mastodon related searches).

Mojeek doesn’t seem to have crawled any Mastodon server. I can’t find Mastodon servers or toots using Mojeek.

Is this done consciously/by design?

2 Likes

Mojeek seems to already be indexing another Fediverse participant, Lemmy: site:lemmy.ml linux - Mojeek Search

1 Like

Ok… so… after trying again, I now do find mastodon servers and toots.

Before, I was trying to find the Mojeek profile and @Josh with the ‘site’ operator. I also tried to find some specific toots I had seen by using keywords from those toots. Not sure what I did wrong to not find anything. Maybe I just used keywords that still aren’t indexed, or perhaps I’m just terrible at search :man_shrugging:

I just tried the “linux” search term in combination with site operators for mstdn.social and mastodon.social. Tilvids in combo with the mstdn.social site operator gave me 2 results. I guess I now have to find toots and tweets that are indexed and have the same subject/keywords to see how they rank in Mojeek results.

On the users/instances that are stating that there should be no indexing, providing they’re using standard/robots.txt methods then we will follow what they ask.

For not finding a large quantity of results through searches such as “site:mastodon.social mastodon” or the same for mstdn.social, we have methods in place to make sure we don’t use lots of resource just crawling a massive chunk of a user-generated content site in one go.

Looking at both those big general instances we are discovering links and crawling/recrawling, it just is being done at a measured pace. You’re for sure not “terrible at search” :smile:

2 Likes