In our latest newsletter we announced an easier and better way that you can help us test algorithm changes. Those who used it provided some really useful feedback
Often results will be very similar, but you can mark those as âEvenâ. If you thinks one is much better then please mark it as such. Weâll take a look at the details, of these, so donât worry if you are not sure of your assessment.
this is great that youâve done this(!), however a bit more information might be helpful
in my case every search i did produced the same results because i donât know how to âtestâ whatever it is youâre testing - i tried keywords, phrases and programming code and they all produce the same results â not necessarily good, but the same â and that brings up my next thoughtâŚ
iâd love to see another test much like this where people can rank the accuracy of the results, but there are potential problems with thisâŚ
accuracy can be subjective
biases, if not set aside, could taint the results
problems aside, one way to run such a test and minimize bias might be to run a hard-coded array of searches and ask for a vote after each - in other words, every participant does exactly the same searches and sees the same results
such data may be useful for tweaking algorithms or tailoring results to a specific type of person by way of a user option of course - in other words, differing between someone who is largely interested in mainstream results and those with a⌠a more open mind, shall we say
Thanks @itsMe, you are right that more information would be helpful, so here goes.
When running the tool, you will get the same, or very similar results (say results 6 and 7 swapped but the rest the same) a lot and should score those as âEvenâ. Itâs not too important that you spend much time figuring out how to score those that differ. We will likely take a look at all results that are not marked as âEvenâ. It takes quite a bit of time to decide how to score these and anyway scorings are subjective.
Itâs the fact that so many results are the same which motivated this tool. We wanted to enable users to help us identify the rare queries that highlight the cases where a new algorithm update we are testing needs looking at.
So the main aim is to help us to decide whether the current algorithm update is ready for deployment. We appreciate the time put in in providing feedback, and that it tests ones patience; itâs testing mine too; I ran 9 queries in a row the other day which gave identical results. I lost count but on somthing like the 19th, I found something apparently very odd.
The tool has already thrown up at least one result from a user that I happen to know of, and which highlighted a very signifcant issue that has helped us improve.
Many thanks everybody all for using this tool. We have just gone live with the new algorithm after getting you feedback and internal testing.
Around 10% of queries were âBetterâ or âEven Betterâ on the new algorithm, with 3% âBetterâ on the old.
The tool will be retained using the Old and New (now Live) algorithms and can still provide useful feedback. We will report back here when we have switched it up, for the next iteration.