TESTING: Mojeek Summary

Josh · 14 March 2024 12:04

Hello @all, we’ve got something new to share for testing, currently via a cookie. I’ll explain the functionality first, then put the method for how to try it out.

What this is

This functionality is a first version of an AI search summary. It uses your input query to retrieve Mojeek search results as normal, and then creates a summary from those results. This uses an LLM (Large Language Model) to do what is known as RAG (Retrieval-Augmented Generation).

This can be triggered by clicking the “Summarise these results” button which will appear just above the infobox (if there is one).

This will activate it and give you a summary of the top results:

How to test it out:

Much like with Focus, you can access this functionality with an FFID cookie using the value B9FX3DK3 on path / like so:

Browser methods are in the Focus How To Set Up - changing the value to B9FX3DK3, obviously.

After that, the option should pop up on a refresh and you should be all good to go.

Display Options

You can change the display also in Preferences toward the bottom of the Search Results tab.

Here you have an option of not to show the Summary, to show it on the right, to show it above the results, and to show it below the results.

As an early version and there are some bugs that we’re working on, mostly these are around formatting. If you see any then please do report them. Other than that, thoughts, questions, or anything else is much appreciated.

Videonas · 14 March 2024 19:12

I have a few thoughts, at this moment:

Since Mojeek is privacy focused, I’m assuming that this AI is so too and runs on Mojeek hardware. Perhaps you could elaborate on this? Might be a good blogpost when you release this feature for the broad public.
I have not yet decided if this feature is truly helping, actually. On the one hand, I can see the arguments as to how this could help. On the other hand… Mojeek sets by default to only include 1 result per domain on the search results. If a user doesn’t change this, that would mean that the AI is making a summary of many different domains and websites as if it would be all part of one theme or subject. For example, I searched for “CPA”, which can mean two things that are mixed together in the summary.

Screenshot 2024-03-14 at 20.01.44808×1380 172 KB

One could also argue that the search results by definition are a summary of all matches for a certain query In that light: If we would need an AI summary of the search results, doesn’t that just mean that the meta descriptions/SERPs are not really doing as they were once intended?

With that being said, I am curious though where this goes to and if/how it could be useful. I might not yet be convinced about it, but I’m always happy to be proven wrong. For that reason alone, I’ll leave the feature on for now.

mike · 15 March 2024 00:35

I’d prefer to visit a dedicated URL such as /rag instead of integrating machine learning output directly into the search results.

One of the criticisms of Google is that users can’t tell when information comes from Google and when it comes from a third party. For example, many people attribute the info box to Google and not to Wikipedia or another semantic web contributor. I see integrating ML output into the search results as undermining the average person’s online literacy. And it is a little too ‘scrapey’–if that’s a word. Meaning, I’d rather a search engine direct me to a third party instead of being trapped in a walled garden of scraped content.

One of the things I liked about the /rag demo was that it explicitly listed the sources underneath the main body.

I acknowledge that ML can be useful. But, I’m just concerned that we’re leading people down the primrose path where they’re dependent instead of informed.

Would it be possible to maintain the full response from the demo by giving RAG it’s own search tab? For example, when I switch to the News tab (&fmt=news), the web results are pushed to the right column and the news results appear in the main column.

I think having the full RAG response would give more of a clue that the user should pay attention and not take this information for granted.

Josh · 15 March 2024 10:44

So this is very much still experimental and right now there are no plans to automatically summarise/turn this on by default. When it comes to concerns, we share them, and there will be a blog about this soon.

On specific points

Since Mojeek is privacy focused, I’m assuming that this AI is so too and runs on Mojeek hardware. Perhaps you could elaborate on this? Might be a good blogpost when you release this feature for the broad public.

This uses an external model/API and requests to it include the query and results; no other personal data is sent, and all of the requests which are made come from our own Mojeek IP addresses. For now this is an experiment so we’re considering options on the way in which it will be offered up to the general public, full transparency about this will be given with the release.

I have not yet decided if this feature is truly helping, actually. On the one hand, I can see the arguments as to how this could help. On the other hand… Mojeek sets by default to only include 1 result per domain on the search results. If a user doesn’t change this, that would mean that the AI is making a summary of many different domains and websites as if it would be all part of one theme or subject. For example, I searched for “CPA”, which can mean two things that are mixed together in the summary.

Duly noted, how this will evolve is currently not decided.

Would it be possible to maintain the full response from the demo by giving RAG it’s own search tab? For example, when I switch to the News tab (&fmt=news), the web results are pushed to the right column and the news results appear in the main column.

This is definitely a good suggestion and one which has been banked. The three different options (as well as not showing it) are partially because of not having made a clear decision on this.

Thanks again and please don’t hesitate to fold more into this thread, either out of thinking about things of usage

Videonas · 16 March 2024 15:13

It’s neat that the summary is referring to the search results using numbers and highlights the corresponding result when you hover that number. Depending on where you have the summary, the highlighted result get outside the screen quicker and therefore not visible, though. I’ve found myself counting the SERPS to find a result that was referred to by the summary. Perhaps this might be another good use case to number the SERPs

I also noticed that from time to time all references are also collected together at the end of the summary. This is not always the case, but this “summary of references” at the end of the summary have helped with this as well, though it wasn’t always completely effective.

itsMe · 17 March 2024 14:38

hey @Josh - is this still available?

i added the name/value pair and the summary preference is available and enabled in settings, but the summary column doesn’t display

anyway… this is an interesting feature that is potentially helpful, however i also share mike’s concerns

i think ‘AI’ (or similar) should be included in the summary section/title, making the source of the information blatantly obvious

also, as Videonas alluded to, i’d like to know a lot more about exactly how the AI was developed and trained (data sets)

as mike i think alluded to, the problem here is one of credibility - so called “AI” (it isn’t AI) has none, yet i think people are very likely to assume the text has some weight behind it, as many do with the MoT for example

Josh · 18 March 2024 10:36

Hello, it should be yes, I just tried it with a different browser and it surfaced. The first time you use it there’s a box you have to click:

Let me know a browser / settings combo if you’d like and I can double check there’s nothing funky going on.

On the rest of the comments noted, we’re working on a related blog post and messaging which will flag the think for yourself element of using these tools

gnome · 18 March 2024 13:18

I haven’t done extensive testing and I was holding off commenting before I’d used it for more than half an hour, but I’ll just comment some of my thoughts now.

I think this feature is a good idea.

I’ve had trouble navigating Mojeek results because while the page I want is usually there, it’s usually not ranked at the top. The summary is a quick way of getting the answers I want and identifying the pages that have those answers.

I like that it’s on the side and not at the top.

I like that you need to click it the first time to summarize it.

Finally, I can easily find out when Broken Flowers was released.

I’ve observed that when you click it, the whole page reloads instead of just the Summary box. It didn’t do this initially. I’m not sure what the reason is, because the Summary feature doesn’t work without Javascript (which is fine, by the way). Is it just so you can share the page with the Summary attached?

Anyway, I think it’s a great feature. Obviously, users need to use it responsibly, but this will go a long way in satisfying the appetite for Instant Answers. You get the helpfulness of Google’s widgets without the entire SERP being overcrowded by widgets.

The only thing I’m unsure about is this:

For now, my only question is, does the model use the queries as training data?

Also @Josh, the cookie expired on the same day I added it. I needed to delete the date field for the summary to come back.

itsMe · 18 March 2024 14:01

ah, needs JS, that’s why it wasn’t working for me - i thought Moj had a rather rigid anti-JS stance

Josh · 18 March 2024 14:19

Nope this is pre-trained and there’s no continuous learning.

If possible then we’ll always try and push out non-js first, but the simplest quick way of doing this was with js. It’s an experiment at the moment and the intention is to improve as time goes by - part of that would be offering non-js.

Non-js would require the whole response to be complete, so this would be slower and we’ll need to flag this; other than that it’s quite easy to do.

Josh · 18 March 2024 14:21

Yes this is a function of getting it out there quickly, it requires a full-page reload but we’re aware and it’s on the list.

itsMe · 18 March 2024 14:44

my initial impression after using it isn’t good - it seems there’s little difference between the Moj summary and the Wikipedia summary

the text appears to be sourced only from mainstream sources and ‘mainstream’ does not imply accuracy and in many cases is simply disinformation or misinformation

in some cases the summary reads more like an advertisement rather than general information (“covid vaccine” for ex.)

i think it would be immensely more helpful if countering POVs were included - while sourcing reliable information from non-mainstream sources may be difficult, ignoring it will, without a doubt, lead people in a very wrong direction some of the time, especially with regard to politically sensitive topics, healthcare, etc.

if Mojeek is unbiased, then so too should the summary be IMO

i can provide examples if desired, but they are political hot potatoes

snatchlightning · 19 March 2024 08:55

I haven’t tested this summarizing feature, but I can share some thoughts regarding this feature on the practical perspective, which I hope you may find helpful in determining the direction of this project.

On my experience with Google and Brave Search, one thing AI is particularly bad at is being infoboxes. Hallucination is not actually unique to AI, but while search engines easily reveal off-topic results in a simple glance, AI summarizers create or string together wrong results that can sound plausible to someone who lacks deep knowledge about a certain topic (which, unfortunately, is often the reason someone is searching in the first place). So although a number of people love its convenience, it being muddled by too many unrecognizable false positives makes it terribly unreliable to me that I frequently just ignore AI-made infoboxes.

On the other hand, I noticed that AI excels in seeding ideas. The times when I’m having difficulty thinking of keywords due to lack of knowledge on a subject, or queries in question form that are hard to distill into keywords, AI can come handy in giving me keywords that I can use, or websites where I can start the search.

With that, I’ll mostly agree with @mike’s idea of a separate URL, like what you did with the RAG search. The referenced websites being listed along with the response make it easier to check the quality of those websites’ contents, and to spot hallucinations by comparing their contents to the response. Moreover, the related questions section helps in narrowing down vague searches.

Colin · 19 March 2024 11:10

Thank you @Videonas, @mike, @itsMe, @gnome, @snatchlightning for all this very interesting and useful feedback. Some of your questions will be answered in a blog post we will likely publish next week. At this point, I wanted to clarify some possible misunderstandings about how the summariser works. Firstly the summariser is not so different from the RAG protoype at labs.mojeek.com. The summaries on both are the pulling the same information from the search results; actually from the snippets of the top 8 ranked search results. The differences are more in the UI; both are showing the search results (only 8 in the case of labs.mojeek.com) with those results cited in the “Summary”/“Answer”.

When search results give you links to less mainstream sources, the same will be true in both these AI answers. Here is an example:
https://www.mojeek.com/search?q=What+do+we+know+about+UAPs%3F&mal=1
You may note that when you hover over a citation in the summary the corresponding results on the vertical search links is highlighted:

Here is the same result in RAG, labs.mojeek.com, with the main differences being here the addition of suggested “Related” queries and a limit of 8 results aka “Sources”:

On Google Gemini and Bard Copilot you get the following, evidently with more mainstream sources:

itsMe · 19 March 2024 11:37

exactly, and given the focus of Mojeek, i’m not sure it’s a great fit in general, even if it is somewhat helpful some or perhaps most of the time (depending on the query)

i’ll give you folks a very good example of why the so-called “AI” the public has access to is not artificial intelligence: query RAG for “holocaust”, then ask it to “cite scientific evidence of homicidal gas chambers during world war 2”

the contradictions are obvious and this is one of the many reasons why “AI” can be quite dangerous (manipulation of public perception), yet the public is likely to view it as at least somewhat authoritative

i can provide examples all day long where these primitive chat-bots are factually and scientifically incorrect regarding many subjects because the information used to train them is also often largely and factually incorrect, but i only know this because i used search engines to navigate the weeds myself and find the necessary books, papers and other materials rather than relying on very flawed “AI” which, like Wikipedia in many cases, i see as a detour to critical thinking

“AI”, at this time, can easily amplify incorrectness and given the human biases in the algorithms, i think this is likely to only get worse as those algorithms are “adjusted” for accuracy

Josh · 20 March 2024 09:31

It’s good to see there are a variety of views on this, and this thread shows that anything like this has to be dealt with sensitively and cater to all of the varied opinions of people using Mojeek.

Just to reinforce something up top, this is an experiment, not an indication of a change of direction for Mojeek, or any kind of change to the fabric of what we do.

Mojeek is still a keyword-based search engine (with some semantic elements) that is working hard to provide the world’s alternative in search. This is something that people will be able to use if they want to.

mike · 26 March 2024 04:12

Where’d my results go? Google Search’s chatbot is no longer opt-in | Ars Technica

Josh · 26 March 2024 16:10

If you have the cookie and you visit a results page you should now see a Summary tab. Clicking this will take you to a page where the query is inputted into Summary and the resulting output will generate below the search box and tabs.

This has been built in order to allow easy display of and access to the sources (on the right-hand side).

As with the rest of the functionality, testing and feedback is very much welcomed and encouraged.

mike · 26 March 2024 17:37

Thank you. This new layout addresses my concerns.

It looks like my user script hijacked the formatting.

@Josh This kicks you out of other experiments like Maps. In the future, will there be a way to participate in multiple experiments? Or, are you willing to release Maps?