To Geolocate or Not To Geolocate
In the current state, menshen dropped the mmdb for geolocation. It was a bit cumbersome to download database, getting API token etc.
I'll try to summarize our conversation from today about geolocation (and the need in bitmask-core to optionally provide a CC parameter in location queries):
-
There are two different reasons to geolocate clients. One has to do with gateway selection (so that menshen uses real location addition to load balancing, in order to minimize assignment of high-latency gateways), the other has to do with telemetry and reporting (we want CC and ASN).
-
The needs of a medium size provider (Riseup) are different from the needs of a small (and obfuscated) provider.
-
In the case of manual location override, menshen can be queried directly by location. If a location is passed, the CC is not really needed, and should be ignored.
The strategy in bitmask-core is to assume that CC is known a priori (via a bunch of possible sources), and then pass the CC as an optional parameter to menshen. No parameter means no filtering is done; the current strategy in menshen (still needs to be implemented in all queries) is to first filter the global gw pool to the Nth locations that are closer to the indicated CC, if any. No CC means that we'll be considering all GWs, and thus potentially returning gws that might be with little load, but that might be suboptimal because of high latency. Again, this is only true for providers with global presence.
Another reason in favor of the need of passing the CC is that for background refreshes, we want to tell menshen what is our rough location so that it only recommends the "best" gateways (i.e., less congested) suitable for our location. Again, if the user wants a manual location override, this is not needed. But we consider that "automatic" selection is the default.
In addition, some testing indicates that the naive spherical calculations (assuming a homogeneous globe) are not too useful for the order of magnitude of gateways that providers like riseup handle. An empirical latency model of the internet returns the closer locations to a client, and we can see that fine-grained location is not too important. Preferential assignment might be more of a problem.
By preferential assignment, I refer to the cases in which some locations will be optimal for a region (i.e., South America), but then they can be congested by usage by users in other region that could perhaps avoid it (i.e., North America). This has perhaps to do more with capacity planning and fairness than with finding optimal latency vs. load tradeoffs.
The general opinion seems to be that, for simplicity, we should drop the geolocation for gateway assignment for smaller deployments, and just let users pick their preferred location (or even gateway).
Now, options for geolocation are:
-
Outsource IP discovery to 3rd party services. Ubuntu is used in bitmask-core now.
-
Create and deploy an optional geolocation service. If integrated in menshen, this should be an optional module. This might not very useful for an obfuscated provider if we assume that the API contacts will be done via tunnels/introducer. Client needs to ensure that the query is done without any proxy (what is better, in terms of disguising, the signal of contacting cloudflare/ubuntu/etc for geolocation, or the signal of contacting provider's IP on the clear?)
-
Get just the IP from a 3rd party (or self-hosted) service, but embed the mmdb in the clients to lookup CC and ASN locally. This is useful for storing reports that will be sent later on.
-
Mobile clients might also get location from the phone (is this an option we're willing to consider?)
-
Solve the allocation problem by letting user override their preferred region (I suspect we can make do with just continents, or bigger regions than just countries, that are also problematic for other reasons).
One argument in favor of geolocation is that gateway allocation is more or less tested (and we have fallbacks like the timezone; or a home selection panel), while load balancing might be taken with a pinch of salt for the time being.
I personally think the way to go is to import an optional module (using mmdb probably) for the providers that do not want to rely on 3rd party geolocation services. I would consider embedding mmdb in special builds for monitoring (I'm skeptical of the added size in the binary). For riseup, I think we need to ask them if they prefer to deploy their own geolocation service/endpoint.