TESTING: Mojeek Summary

I was doing research with the experimental Mojeek Summary feature and noticed a few problems that I want to share.

The issues all have one thing in common: A human cannot verify the information that the Summary is providing.

Tag Feeds

One problem I noticed was that the Summary referenced tag feeds. The obvious problem is information in a feed cannot be verifiable on a permanent basis because the page constantly changes. And the feed will only ever present generalized summaries of the articles which is not as good as drawing from the articles directly.

example.com/tags/some-topic

Deleted Sources

Another problem was that the Mojeek Summary can reference deleted pages. I’m assuming this is an inherent latency either in the Mojeek index itself or in the relationship between the index and the RAG API. Again, if the page being summarized is deleted, I can never verify the information in the Summary.

Paywall Peeking

The Summary can refer to information behind a paywall. While there is nothing inherently wrong with a paywall, there is a perverse incentive here. The crawler can peek at paywalled content and summarize that. But, a person would have to pay to verify that information. Most people would just depend on the free summary in that case.

Requires JavaScript

I also noticed a relatively minor issue but one which is likely to affect Mojeek users disproportionately. One of the referenced URLs relied on JavaScript to properly redirect the web browser to the article. While most people could reach the article in that circumstance, some Mojeek users have JavaScript turned off. And they might not be able to easily access the underlying source because of their preference.


I’m sure I’m getting some of the technical details wrong here. But I think my conclusions are valid.

These circumstances create claims which will show in the Mojeek Summary but which can be difficult or impossible to verify.

While these large language model problems might not be specific to Mojeek, formally launching a retrieval-augmented generation feature would mean that Mojeek is endorsing and adopting these problems as their own. At that point, you would be reserving some fraction of your resources to dealing with RAG issues. And you might not be able to work on more innovative features.

3 Likes