NoML Proposal for Fair Use of Content in AI and Search

Colin · 31 October 2023 10:54

Search engines index web content and monetize that with subscriptions, ads and/or APIs. In doing so they also provide traffic to websites via hyperlinks, in line with the legal principle in copyright of fair usage. Now we have AI companies harvesting web content and compressing it into models with little traffic return or attribution.

Various proposals have been made to address this threat to the web but they do not unambiguously empower creators and publishers. A “robots.txt” style approach alone has limitations as most want content searchable but not necessarily used for AI.

We are thus proposing a “NoML” protocol to allow owners to clearly and explicitly express whether they want their content searchable or not, and if they want content used for AI or not.

We have put together an open letter with the full details of this proposal and have published it as below. It has been co-signed by two other organisations. We will announce this proposal on Thursday this week, so we would appreciate any public sharing of it then. In the meanwhile we invite you to become signatories to the open letter here: https://noml.info/

mike · 1 November 2023 04:46

This seems like a simple and effective modification to the existing crawler infrastructure. And this, unfortunately, is necessary as the AI companies like to trample copyrights.

Josh · 2 November 2023 13:24

And now we’re at “then” - if you did want to share it at all then it’s much appreciated, or repost any of the Twitter, Facebook, LinkedIn, Mastodon if any of those are spaces you choose to spend time.

ricardo81 · 2 November 2023 13:38

Shared on lemmy https://lemmy.ml/post/7384600

itsMe · 2 November 2023 14:27

this is all a symptom of a much larger problem – call it capitalism if you like, where childish monetary systems such as we have breed greed and greed breeds corruption

while i’m certainly not against such a proposal, i’m not seeing where it would be very effective since the best one can hope for is voluntary compliance and even that can fail when, whoever, indexes a site, doesn’t include the noml request, and that site is then indexed by whoever else

in short, there is no way to stop indexing and learning by or for AI; if the site is reachable, it can and will be indexed

i would also posit that copyright is the least of the problems regarding what is in store via AI

that said, good luck with the proposal and thanks for your ethical intention

ricardo81 · 2 November 2023 15:01

This has definitely been a problem in Google since time immemorial for people who chose to not be indexed, wrt duplicate content at the very least.

I’d always wondered on the potential for I guess what’d have to be a centralised service that could discern unique content, since so much of the web nowadays is recycled. Ironically with AI they may be able to see past the basic mutations of text to make it apparently unique.

All the same, Google, Bing etc at least up until GPT4 have honoured robots.txt/elements in one way or another wrt web pages. I guess at least having a standard let’s us see whether the big players continue to honour it or not. Interesting times for where the line is.