[suggestion] avoid splitting domain names

searching for domain.com for example may return results containing domain, com, anotherdomain.com, etc.

i think a FQDN should be treated as 1 term instead of being split

given the plethora of new TLDs available however, determining what is a FQDN could be problematic/costly i suppose so maybe it might make more sense to simply treat \S\.\S as a single term???

Is this an example of the issue?

yes sir

This is an interesting one, as we actually don’t split domains, or any word, it looks like this is a separate issue; it might be useful if possible to see any examples if you have them (here or support is all good).

@Mike for "learn.microsoft.com" - Mojeek Search the expected result comes up top for me, so we will look into that search. If you’re able to let us know the parameters you’re using here then that would also be useful :pray:

Mike provided one example and i attached another - here the search was for bread.com

1 Like

I am not having a problem. And I was just trying to help with itsMe’s thread.

However, what I can add is that there seem to be two issues.

One, the search algorithm breaks up a string and searches for its parts. So, instead of looking for only bread.com, the algorithm appears to include results related to “bread” and “com”.

Two, visually, the snippets emphasize this error with highlights like “… to come play! …” from rudisbakery.com where .com has clearly been misinterpreted as “come”.

I mention the visual aspect because the individual parts of a query can be emphasized even when the entire query is present.

For example, thewindowsupdate.com shows this snippet:

Learn how to enable the Microsoft Loop ... Source:https://learn.microsoft.com/en-
us/microsoft-365/admin/email/create-edit-or-delete-a-security-group

There we can see learn.microsoft.com in the snippet.

But, we also see “Learn how to enable the Microsoft” which emphasizes the parts of my query.

Another example is loginvast.com.

Microsoft Learn Blog – Microsoft Tech Community ... Introducing our New Learning Portal for 
Partners – Microsoft …

Here, only the parts of my query are shown in the snippet.

But, if we navigate to the web page, we can see learn.microsoft.com appear in print. So, the snippet crowded out my search term with examples of individual words that matched earlier in the body.

Personally, I think there are two additional issues.

Three, I don’t have a way to group search terms as an atomic unit. So going from bread.com to "bread.com" makes no difference. And Mojeek still breaks up what I’ve typed and searches for those terms individually.

Four, because Mojeek searches lexically, the user might omit a mandatory search operator. Mojeek has several buckets like URL, title, anchor, and text. So, maybe I searched for bread.com and that appears in an anchor but the snippet can’t show anything visually. So, I’d want to do something like exactintext:bread.com to tell Mojeek that a) I want to see bread.com mentioned in writing, and b) I want only that full string.

I’m guessing that, today, Mojeek can’t recognize longer strings like "learn.microsoft.com" So, I’m not sure what the most efficient way would be to add that feature.


@Josh To answer your question, I was likely using:

https://www.mojeek.com/?arc=us&t=40&cdate=1&dlen=0&autocomp=0&qsba=1&qss=Bing,DuckDuckGo,Google&rp_i=0

But I’ve seen similar results in an incognito window with no preferences set.

3 Likes

The bread.com one is redirecting to bread breadfinancial.com and as such there’s not a search engine I can find which brings back bread.com - they all break up the string. The site which that is redirected to does appear on the first page for Mojeek.

Snippets are generated separately from ranking, the come highlight is a visual/frontend error that we’ll look into. Come here is not stemmed to com as above; it is only a part of the snippet generation, it’s not used in scoring.

Mojeek does search for exact matches, and there is a marker that comes up to separate the exact matches from the rest of the results when we run out of exact matches i.e.

This being said, looking at some of the results, they look a bit odd, so we’ll look into that. The bigger thing here is the snippet generation not catching the right phrase to display, it’s possible the page was too long, either way it’s something we’ve put on the list.

Mojeek can handle long strings like learn.microsoft.com just fine. A different issue is causing them not to be displayed as required here. This being said, it is coming top for us, so this is a strange result.

1 Like

Josh, i think there’s a misunderstanding here - using the example learn.microsoft.com - Mojeek Search, the search “term” is split into 3 terms; ‘learn’, ‘microsoft’, ‘com’

we’re not using “site:” here - we’re just looking for pages with “learn.microsoft.com” as a phrase and instead are getting a lot of results containing ‘learn’, ‘learning’ and ‘com’ and i suspect most people wouldn’t expect this

tl;dr - treat “(sub.)?domain.tld(:port)?” as a single search term instead of splitting it

1 Like

Howdy, apologies for missing this (was off on holiday) have it on my list to look into :pray:

Noted and understood, it’s on the pile to look into, thanks for sending it in both of you.

1 Like