Ways to receive updates from a site

Say Mojeek has discovered a website a while ago and ranks it favorably. How does Mojeek “notice” new content on that site, and does it plan on adding new options?

Some that come to mind:

  • No special logic. Notice new pages the same way as always, by following referring links
  • Polling the sitemap
  • Polling individual already-known pages
  • Polling feeds (RSS, Atom, JSON feeds)
  • IndexNow API (maybe it doesn’t have to participate in the initiative but can still use the same API)
  • WebSub (Open w3c protocol specifically for the purpose of pushing updates! Google uses this)
  • Manual submission (discussed already; source of spam)
  • Data sharing with a partner

I’m planning on running a personal WebSub publisher, so the thought just crossed my mind again.

1 Like

On your points: If we use it :+1:; If we don’t use it now :-1:; comments in italics

  • No special logic. Notice new pages the same way as always, by following referring links: :+1:
  • Polling the sitemap: :-1:
  • Polling individual already-known pages: :+1:, we refresh/re-crawl pages and so find new links
  • Polling feeds (RSS, Atom, JSON feeds): :-1:
  • IndexNow API (maybe it doesn’t have to participate in the initiative but can still use the same API): :-1: but we are well aware of and keeping an eye on :thinking:
  • WebSub (Open w3c protocol specifically for the purpose of pushing updates! Google uses this): :-1: personally I had to look this up. Only to find I’m not so out-of-touch, but out-of-date. Was looking/working with PubSubHubbub/RSS Cloud with a previous gig and when the battle wa raging between them. Didn’t know it got renamed in 2017. Thanks for the reminder :grinning:
  • Manual submission (discussed already; source of spam): :-1:
  • Data sharing with a partner: :-1:

We have no specific plans to change what we are doing for now. But that’s more because of other priorities. All have merits so more a which and when, than if. Thanks for sharing.