Algorithm Evaluation Tool

In our latest newsletter we announced an easier and better way that you can help us test algorithm changes. Those who used it provided some really useful feedback :+1:

You can use the Mojeek Algorithm Evaluation Tool here

Often results will be very similar, but you can mark those as “Even”. If you thinks one is much better then please mark it as such. We’ll take a look at the details, of these, so don’t worry if you are not sure of your assessment.

Keep it coming and thank you from the interweb :pray:

3 Likes

this is great that you’ve done this(!), however a bit more information might be helpful

in my case every search i did produced the same results because i don’t know how to “test” whatever it is you’re testing - i tried keywords, phrases and programming code and they all produce the same results – not necessarily good, but the same – and that brings up my next thought…

i’d love to see another test much like this where people can rank the accuracy of the results, but there are potential problems with this…

  • accuracy can be subjective
  • biases, if not set aside, could taint the results

problems aside, one way to run such a test and minimize bias might be to run a hard-coded array of searches and ask for a vote after each - in other words, every participant does exactly the same searches and sees the same results

such data may be useful for tweaking algorithms or tailoring results to a specific type of person by way of a user option of course - in other words, differing between someone who is largely interested in mainstream results and those with a… a more open mind, shall we say

1 Like

Thanks @itsMe, you are right that more information would be helpful, so here goes.

When running the tool, you will get the same, or very similar results (say results 6 and 7 swapped but the rest the same) a lot and should score those as “Even”. It’s not too important that you spend much time figuring out how to score those that differ. We will likely take a look at all results that are not marked as “Even”. It takes quite a bit of time to decide how to score these and anyway scorings are subjective.

It’s the fact that so many results are the same which motivated this tool. We wanted to enable users to help us identify the rare queries that highlight the cases where a new algorithm update we are testing needs looking at.

So the main aim is to help us to decide whether the current algorithm update is ready for deployment. We appreciate the time put in in providing feedback, and that it tests ones patience; it’s testing mine too; I ran 9 queries in a row the other day which gave identical results. I lost count but on somthing like the 19th, I found something apparently very odd.

The tool has already thrown up at least one result from a user that I happen to know of, and which highlighted a very signifcant issue that has helped us improve.

Many thanks everybody all for using this tool. We have just gone live with the new algorithm after getting you feedback and internal testing.

Around 10% of queries were ‘Better’ or ‘Even Better’ on the new algorithm, with 3% ‘Better’ on the old.

The tool will be retained using the Old and New (now Live) algorithms and can still provide useful feedback. We will report back here when we have switched it up, for the next iteration.

3 Likes