Search engines index web content and monetize that with subscriptions, ads and/or APIs. In doing so they also provide traffic to websites via hyperlinks, in line with the legal principle in copyright of fair usage. Now we have AI companies harvesting web content and compressing it into models with little traffic return or attribution.
Various proposals have been made to address this threat to the web but they do not unambiguously empower creators and publishers. A “robots.txt” style approach alone has limitations as most want content searchable but not necessarily used for AI.
We are thus proposing a “NoML” protocol to allow owners to clearly and explicitly express whether they want their content searchable or not, and if they want content used for AI or not.
We have put together an open letter with the full details of this proposal and have published it as below. It has been co-signed by two other organisations. We will announce this proposal on Thursday this week, so we would appreciate any public sharing of it then. In the meanwhile we invite you to become signatories to the open letter here: https://noml.info/