According to what I’ve found on the internet, mastodon servers should be indexable unless otherwise set by the admin.
Besides that, every user also has the option of the opt-out of search engine indexing.
In my experience, Google and Bing have indexed Mastodon server info as well as toots, but these don’t show up high in search results if they even do (for non Mastodon related searches).
Mojeek doesn’t seem to have crawled any Mastodon server. I can’t find Mastodon servers or toots using Mojeek.
Ok… so… after trying again, I now do find mastodon servers and toots.
Before, I was trying to find the Mojeek profile and @Josh with the ‘site’ operator. I also tried to find some specific toots I had seen by using keywords from those toots. Not sure what I did wrong to not find anything. Maybe I just used keywords that still aren’t indexed, or perhaps I’m just terrible at search
I just tried the “linux” search term in combination with site operators for mstdn.social and mastodon.social. Tilvids in combo with the mstdn.social site operator gave me 2 results. I guess I now have to find toots and tweets that are indexed and have the same subject/keywords to see how they rank in Mojeek results.
On the users/instances that are stating that there should be no indexing, providing they’re using standard/robots.txt methods then we will follow what they ask.
For not finding a large quantity of results through searches such as “site:mastodon.social mastodon” or the same for mstdn.social, we have methods in place to make sure we don’t use lots of resource just crawling a massive chunk of a user-generated content site in one go.
Looking at both those big general instances we are discovering links and crawling/recrawling, it just is being done at a measured pace. You’re for sure not “terrible at search”
Coming back at this: Wouldn’t it be an idea to have a fediverse search feature? Something similar to substack, but for mastodon and maybe other fediverse software.
Sure thing! I try to use this sometimes. I even throw in some other servers and platforms. Unfortunately, my Focus… (not sure how to write or say that in plural) are getting deleted because cookies get removed from my browser from time to time.
Having this as Focus is also not super scaleable and with the size of the Fediverse, I thought that a substack like approach would be more convenient for users.
On the other hand, I can see that there might be other types of searches like these that someone would like to have a solution for (I even have some myself)… So perhaps it’s not feasible to create a different solution for every of these requests. And now I’m thinking about some sort of more advanced “advanced search” or more advanced search operators or focus settings
I’ll let the decision as to what to do with this up to others. I just wanted to pour my brains out (if that’s an expression in English)
That’s one of the great things when it comes to the open source repo; if a Focus goes in there and gets folded into Mojeek, it’ll stay as a an option even if you clear cookies.
I think this would depend on how much of it we have indexed at the moment, it’s going to be a case of which instances we’ve got coverage of. Of course a fair few will block bots as standard. Putting the top 25 (or more if we push the cap further) instances by user numbers into Focus could be an interesting starting point.
Obviously would had to be dealt with sensitively as the issue of fedi search and people’s feelings about that come up cyclically.