Major players (Google, Bing, Yandex) offer documentation to help webmasters and authors. I’d be interested in seeing some for Mojeek.
A lot of advice seems to revolve around optimizing rankings, but this has the toxic side-effect of polluting SERPs with SEO spam. I’m more interested in advice concerning compatibility. Some things I’d to know:
How does Mojeek handle JavaScript? Right now, it looks like sites that require JS only have <noscript> content show up in the snippets. I personally dislike JS so I don’t have a problem with this, but I don’t speak for other users who probably feel differently.
Are there any extensions to the robots.txt and the Sitemap protocol that Mojeek uses?
What markups/schemas does Mojeek support besides semantic HTML/XHTML and the Sitemap XML format? (e.g. RSS, Atom, HTML Sitemap, etc).
Part of my reason for asking: I’m writing a follow-up to my search-engines reference containing advice for authors to ensure compatibility with various agents, including but not limited to a diverse array of search engine crawlers/parsers. It’ll have information concerning the structured data formats with the widest support, the implications of using JS, etc.
I’ll dig into this for you @Seirdy, though it definitely is something which has resonance outside of this reference, so probably useful for us to either put out via a blog or publish somewhere. Will get back when I have something more fleshed out.
Hopefully these answers help, our intention is to put them into some kind of knowledge-base soon:
Q. How does Mojeek handle JavaScript? Right now, it looks like sites that require JS only have content show up in the snippets. I personally dislike JS so I don’t have a problem with this, but I don’t speak for other users who probably feel differently.
A. Mojeek currently indexes the contents of <noscript>. We do not index JS rendered content but have had initial discussions about the potential for doing it.
Q. How does Mojeek select favicons?
We do not fetch favicons or associate them with hosts/pages.
Q. Does Mojeek respect rel=“canonical” tags?
Mojeek respects canonical elements.
Q. Are there any extensions to the robots.txt and the Sitemap protocol that Mojeek uses?
Mojeekbot behaves identically to Googlebot, e.g. wildcards are supported and generally any allow or disallow rules. We do not look at or crawl sitemaps or observe crawl delay (mojeekbot has a grace period of 4 seconds between requests).
Q. What markups/schemas does Mojeek support besides semantic HTML/XHTML and the Sitemap XML format? (e.g. RSS, Atom, HTML Sitemap, etc).
Organic search only considers X/HTML documents for now.