Mojeek onion service for use in Tor Browser

I think it would be very nice to have a Mojeek onion service for use in Tor browser. This has several benefits, for example, onion services are self-authenticating and don’t need certificate authorities. They can also be faster than connecting to the surface web domain in Tor Browser.

By the way, even though onion service hostnames are randomly generated, you can brute force generate them with special tools until the start of it matches ‘mojeek’. DuckDuckGo has done that, as an example: http://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion

I think this step would be greatly appreciated in the privacy community.

What do you think?

6 Likes

Mojeek + Tor would be beastly. Looking forward

Cheers for the suggestion @Nova, I’ll raise it on our internal ideas board and get it looked at :pray:

3 Likes

This could also help mitigate impacts of spam and DDoS attacks: legitimate Tor users are more likely to use the dedicated hidden service. <unsubstantiated-claim> I believe that this was part of Facebook’s motivation for having a hidden service. </unsubstantiated-claim>.

Tor users constitute an admittedly small, but viable and sustainable audience: since there aren’t many engines with hidden services to go around, Mojeek could make an impression.

Other engines with hidden services include DDG (Bing/Yandex proxy with a focus on infoboxes/instant-answers) and Brave (own index, onion url). Various community proxies like Searx and Whoogle instances are also popular.

One thing worth considering: Mojeek’s current CA, DigiCert, offers (pricey) certificates for .onion domains. HTTPS on Tor has certain benefits unrelated to encryption (since Tor already takes care of that). It allows:

  • Users to verify that a hidden mirror is run by the same people who operate the clearnet version
  • Servers to use HTTP/2 (not HTTP/3, since Tor is only designed for TCP)
  • The Tor Browser to accept an Alt-Svc header instructing clients to connect to the hidden service instead.

The Alt-Svc support avoids the slow “load clearnet → redirect to hidden service → load hidden service” dance that the onion-location header entails: browsers just remember to connect directly to the hidden service when given the clearnet URL.

6 Likes

Nice, there was a thread about this already here. I just suggested same thing on Twitter.

Also using configuration like this will improve performance:

SocksPort 0
HiddenServiceNonAnonymousMode 1
HiddenServiceSingleHopMode 1

And as bonus, it’s possible to select EntryNodes, which are also Rendezvous points in single hop mode from same or nearby countries / popular ASes with a good or great connectivity.

Edit: Formatting fixes.

@samleh I originally made my own site a single-hop node too, but didn’t see much of a perf boost in my benchmarks. The performance boost from avoiding the exit node bottleneck is much higher than the one from choosing a single-hop config.

I wouldn’t worry about this unless network performance becomes a significant problem. Right now, network performance on Tor is much better than it was a few years ago: speeds in the megabit ranges aren’t uncommon.

Another advantage:

Tor Browser users on the “safest” setting have many features disabled: JS, all JIT compilation, SVG, MathML, lazy-loading, and Graphite font rendering. A dedicated hidden service can be optimized for such a browser. Normally, optimizing specially for a given user agent is a bad practice; however, the Tor Browser is a special case because it sometimes pretends to have Firefox’s capabilities. Progressive enhancement and graceful degradation won’t work.

For example, my website’s clearnet version uses some SVG images. Some browsers can’t handle a given image format. The typical solution is to use a <picture> element containing <source> children with varying formats and a fallback <img> element using a legacy image format.

The Tor browser will download whichever format Firefox would, rather than whichever formats it actually supports. A <picture> element containing an SVG and a raster fallback won’t help: the Tor browser will avoid fingerprinting by selecting the SVG format, not a fallback format. The image will not be rendered, so users will have downloaded the image only to see a white box.

I address the issue by not using any SVG images on my hidden service.

Setting up a .onion address for an already existing clearnet website is pretty easy with Enterprise Onion Toolkit (EOTK). This is how Twitter recently added .onion support to combat Russian censorship.

3 Likes

upvoting this one. this really needs to be done for many of us enthusiasts to take mojeek seriously in the privacy realm.

1 Like

I have no clue as to the technology and all the uses for the Onion browser. Nor do I know anything about the costs for Mojeek to set up such a service.

That said, I’m looking at the banning of the last remains of a free press and the state surveillance and repression in Russia and Russia’s crony allies not to mention in so many other authoritarian countries. If there is a way for Mojeek to provide the light of private unbiased search for those, who must search covertly out of fear for their own safety and who are living in the darkness then it would be good to provide that if possible.

2 Likes

We do not provide access to Mojeek through a .onion address and are not intending to do so in the short term. This is something that we will be revisiting periodically. We agree with the aims of the TOR Project, which allows people to escape both surveillance and subsequent actions from freedom-limiting actors. We are thus increasingly inclined to provide this access, but at the moment we have made the decision to wait.

As a search engine we deal every day with the issue of automated queries from bot traffic. It is something that consumes a large portion of our resources. We have to regularly develop and deploy ways of blocking access to Mojeek for many bad actors, whilst most importantly preserving that access for the good ones. It is a major challenge to deal with this as an independent search engine, and we would prefer to devote more of our time and resources to improving Mojeek for user and customers,.

This is our issue when it comes to providing a .onion address through which you can search on Mojeek; estimates in the past have suggested that up to 94% of traffic through the network as malicious. Sadly this reality is not very different for Mojeek. So whilst we deeply understand and resonate with the legitimate uses of the TOR network we have to be careful to deploy our resources to efforts that will make the most difference. We are not currently in a place where we could handle the increase in the expected automated traffic from bad actors. Dealing with this would detract from efforts to provide a great service to legitimate users and our focus on search quality and new products.

2 Likes

Thanks @Colin for answering. It’s perfectly reasonable to decline .onion service when the costs are that high. Given that, I’m real content to see the index keep growing and having more customers pay for Mojeek’s results.

1 Like

While I understand where you’re coming from, I think there are some misunderstandings.

First of all, the statistic from CloudFlare that up to 94% of Tor traffic is malicious - if this statistic is even still valid 6 years later in the first place - is probably not representative of what you’d see as a search engine. Of the things they mention, like vulnerability testing, scraping, login testing, spamming, etc, only vulnerability testing and scraping are applicable to a search engine, because there are no accounts and there is nothing to be posted. Information is only retrieved. I also feel like this CloudFlare statistic might be inherently biased because I think it’s pretty likely that only the websites that were seeing these issues in the first place might use a service like CloudFlare. Other (types of) websites might see lots of Tor users not causing issues and because they don’t have issues in general, they’re not using a service like CloudFlare, which means these Tor users are not counted in this statistic.

Tor is also fairly slow, and computationally expensive (because of all the layers of cryptography involved) to use for DoS attempts. It would be both faster and cheaper for the would-be attacker to subscribe to a service that lets them make larger amounts of requests, for example.

Secondly, Mojeek is already accessible over Tor. Tor has something called exit nodes which allow Tor traffic to exit the Tor network and make requests to normal websites, which is how I’ve been using Mojeek personally.

I think a .onion domain is actually a great opportunity to introduce more features to protect users and their Tor traffic from surveillance at more levels, without any (more) harm being done to Mojeek as a service. If you’re really seeing lots of issues from Tor exit nodes (besides the larger amount that you’d expect there to be from the simple fact that larger batches of people connect to your website over single IP addresses because many people use the same Tor exit nodes) then you could set up a simple captcha system for Tor users specifically for your .onion domain, and perhaps even implement Onion Locator which would prompt Tor users to use the onion domain instead. Not to mention that the Tor software has built-in solutions aimed to prevent onion addresses from being DDoSed from within the Tor network.

I get that this would probably be a lot of work, and as a small company you probably have things that your time is better spent on. However, I truly hope you reconsider your company’s stance on Tor. Perhaps look into it a bit more, and see that as a privacy oriented service, the benefits can outweigh the cons when done correctly. It has done so for countless news sites, human rights organizations, and search engines alike with the same or similar missions and core values.

1 Like

Thanks @Nova for sharing you views and constructive suggestions. As mentioned we will be considering, so this is valuable feedback. This is a short and immediate response to your points.

You are quite correct to question the age and somewhat limited relevance of the 94% statistic from Cloudflare.

It’s a very good point about the costs of Tor. As you probably know ordinary DDoS attacks are not possible over Tor. What matters in our case is whether Tor being fairly slow, and computationally expensive impacts automated search queries from bad actors.

The suggestion of captcha is a good one. The issue here is the lack of no-tracking captcha services/projects; presently hcaptcha is the only one we are aware of that might be a viable external option.

1 Like

The captcha situation is an interesting one. By far the most common one is Google reCAPTCHA, which is incredibly hostile towards Tor users. Tor users will almost always be presented with a puzzle, and I’ve found myself completing upwards of 10 puzzles before I was let through, and sometimes it still fails even though I’ve definitely done everything correctly. But quite often, Google won’t even allow Tor users to attempt to solve it to prove they’re human. It’ll go “We’re seeing unusual amounts of traffic from your PC [so, the Tor exit node]” without showing a puzzle at all.

hCaptcha, which you mentioned, does not discriminate against Tor users in quite the same way. Tor users will still be shown captchas more often, naturally, but they won’t have to complete more of them than ordinary users that also get a captcha. From my experience with hCaptcha on Tor I’ve had to solve two of them every time I was prompted, after which I was let through. If I’m not mistaken, hCaptcha even gives sites that it’s placed on a part of the money paid by companies for data labeling, which might be financially supportive, but I have no clue how much that would pay.

But also, Tor recently got an update with a proof-of-work system that onion services can enable. When a configured amount of active connections is succeeded, your onion service will start requiring Tor clients that want to connect to solve a cryptographic puzzle. How hard this puzzle is depends on the amount of connections you’re seeing. It makes it highly inconvenient for bots to create lots and lots of circuits to your service. The downside of this is that during one of these scraping events, legitimate users will similarly have to wait to complete the puzzle. Tor Browser presents them with an indication of how long it’s estimated to take so real users will get there eventually. However, this was mostly implemented to prevent a specific type of DoS attack where people flood the onion service with introduction requests. Though I suppose it might also prevent people from creating hundreds of connections to scrape search queries? It won’t stop a small scale scraper that can do fine with one sustained connection though (like mildly popular SearX instances) and it doesn’t actually make an effort to stop bots, it just makes it inconvenient to make lots of requests. Though perhaps a combination of, like, a 1 request per 5 seconds per session rate limit, and that PoW system, might be enough to turn scrapers away? I have no idea. You can read more about it here:

Lots of interesting solutions, I am excited for the future :slight_smile:

1 Like