User Privacy and Large Language Models

Colin · 1 March 2026 16:32

An interesting paper:

which I found with Mojeek: User Privacy and Large Language Models - Mojeek Search

from the abstract…
We find that all six developers appear to employ their users’ chat data to train and improve their models by default, and that some retain this data indefinitely. Developers may collect and train on personal information disclosed in chats, including sensitive information such as biometric and health data, as well as files uploaded by users. Four of the six companies we examined appear to include children’s chat data for model training, as well as customer data from other products.

No surprise to readers here, but pass it on