The Internet is a funny place. At the exact same moment that Russian troops are committing war crimes in the real world, Russian users online are just bopping around as if everything is cool. ShutdownHook is anything but a large-scale website, but it does get enough traffic to provide interesting insights in the form of global usage maps. And pretty much every day, browsers from Russia (and very occasionally Belarus) are stopping by to visit.
Well, at least they were until this afternoon. My love for free speech does not extend to aiding and abetting my enemies — and until the people of Russia and Belarus abandon their attacks on Ukraine, I’m afraid that is the best term for what they are. And before you spin up the de rigueur argument about not punishing people for the acts of their government, please just save it. I get the point, but there is nobody on earth that can fix these countries other than their citizens. They do bear responsibility — just as I and my fellow Americans did when we granted a cowardly, bullying toddler the United States’ nuclear codes for four years. Regardless of our individual votes.
Anyways, while I’m certainly not changing the world with my amateur postings here on ShutdownHook, I am trying in a very small way to share ideas and experience that will make folks better engineers and more creative and eclectic individuals. And I just don’t want to share that stuff with people who are, you know, helping to kill families and steal or destroy their homes. Weird, I know.
Enter RuBy — a tiny little web service that detects browsers from these two countries and replaces site content with a static Ukrainian Flag. You can add it to your web site too, and I hope you will. All it takes is one line anywhere on your site:
It’s not perfect — the same VPN functionality that folks use to stream The Great Pottery Throw Down before it’s available in the States will foil my script. But that’s fine — the point is to send a general message that these users are not welcome to participate in civilized company, and I think it does the trick.
If you’d rather not use the script from my server, the code is freely-available on github — go nuts. I’ll cover all the details in this post, so keep on reading.
Geolocation is a general term for a bunch of different ways to figure out where a particular device exists in the real world. The most precise of these is embedded GPS. Pretty much all of our phones can receive signals from the GPS satellite network and use that information to understand where they are — it’s how Google Maps shows your position as you sit in traffic during your daily commute. It’s amazing technology, and the speed with which we’ve become dependent on it is stunning.
Most other approaches to positioning are similar; they rely on databases that map some type of identifiable signal to known locations. For your phone that might be cell towers, each of which broadcasts a unique identifier. Combining this data (e.g., from opencellid.org) with real-time signal strength can give some pretty accurate results. You can do the same thing with a location-aware database of wifi networks like the one at wigle.net (the nostalgia behind “wardriving” is strong for this nerd). Even the old WWII-era LORAN system basically worked this way.
But the grand-daddy of location techniques on the Internet is IP-based geolocation, and it remains the most common for locating far-away clients without access to signal-based data. Each device on the Internet has an “IP Address” used to route messages — you can see yours at https://whatsmyip.com/ (ok technically that’s probably your router’s address, but close enough). This address is visible to both sides of a TCP/IP exchange (like a browser making a request to a web server), so if the server has access to a location-aware database of IP addresses, it can estimate the browser’s real-world location. The good folks at ip2location.com have been maintaining exactly this database for years, and insanely they still make a version available for free at https://lite.ip2location.com/.
The good news for IP-based geolocation is that it’s hard to technically spoof an IP address. The bad news is that it’s easy to insert devices between your browser and a server, so spoofing isn’t really even required to hide yourself. The most common approach is to use a virtual private network (“VPN”). With a VPN your browser doesn’t directly connect to the web server at all — instead, it connects to a VPN server and asks it to talk to the real server on your behalf. As far as the server is concerned, you live wherever your VPN server lives.
There are whole companies like NordVPN that deliver VPN services. They maintain thousands of VPN servers — one click makes your browser appear to be anywhere in the world. Great for getting around regional streaming restrictions! And to be fair, a really good way to increase your privacy profile on the Internet. But still, just a teeny bit shady.
There are a few ways to use IP-based location data to restrict who is allowed to visit a website. Most commercial or high-traffic sites sit behind some kind of a firewall, gateway or proxy, and most of these can automatically block traffic using location-based rules. This is actually pretty common, in particular to protect against countries (you know who you are) that tend to be havens for bad actors. Cloud providers like Azure and AWS are making this kind of protection more and more accessible, which is a great thing.
Another approach is to implement blocking at the application level, which is what I’ve done with RuBy. In theory this is super-simple, but there are some interesting quirks of the IP addressing landscape that make it worth some explanation.
But first a quick side note — there are no new ideas, and it turns out that I’m not the only person to have come up with this one. The folks over at redirectrussia.org have a script as well — it’s a little more complicated than mine, and a bit smarter — e.g., they limit web service calls by doing a first check on the browser’s timezone setting. They also allow the site owner to redirect blocked clients to a site of their choosing, whereas I just slap a flag over the page and call game over. Whichever you pick, you’re doing a solid for the good guys.
RuBy as a Web Service
Using the web service is about as simple as it gets; just add that one-line script fragment anywhere on your page and you’re done. Under the covers, what happens is this:
- The web service examines the incoming IP address and compares it to a list of known address ranges coming from Russia and Belarus. If the IP is not in one of those ranges, an empty script is returned and the page renders / behaves normally.
- If the IP is in one of those ranges, the returned script replaces the HTML of the page with a full-window rendering of the Ukrainian flag (complete with official colors #005BBB and #FFD500). I considered redirecting to another site, but preferred the vibe of fully dead-ending the page.
Most systems can pretty easily add script tags to template pages. For ShutdownHook it was a little harder because I was using a subscription plan at WordPress.com that doesn’t allow it. This isn’t a problem if you’re on the “business” plan (I chose to upgrade) or are hosting the WordPress software yourself or anywhere that allows plugins. After upgrading, I used the very nice “Insert Headers and Footers” plugin to insert the script tag into the HEAD section of my pages.
And really, that’s it. Done and done.
RuBy Under the Covers
The lookup code itself lives in RuBy.java. It depends on access to the IP2Location Lite “DB1” database; in particular the IPV6 / CSV version. Now, there are tons of ready-to-go libraries for working with this database, including for Java. I chose to implement my own because RuBy has very specific, simple requirements that lend themselves to a more space- and time-efficient implementation than a general-purpose library. A classic engineering tradeoff — are those benefits worth the costs of implementation and code ownership? In my case I think so, because I’m running the service for free and want to keep hardware costs to a minimum, but there are definitely arguments on both sides.
In a nutshell, RuBy is configured with a database file and a list of countries to block (specified as ISO-3166 alpha-2 codes). It makes a number of assumptions about the format of the data file (listed at the top of the source file), so be careful if you use another data source. Only matching ranges are loaded into an array sorted by the start of the range, and queries are handled by binary-searching into the array to find a potentially matching range and then checking its bounds. For Russia and Belarus, this ends up holding only about 18,000 records in memory, so resource use is pretty trivial.
IP addressing does get a little complicated though; converting text-based addresses to the integer values in the lookup array can be tricky.
Once upon a time we all used “v4” addresses, which you’ve surely seen and look like this: 127.0.0.1. Each of the four numbers are byte values from 0-255, so there are 8 * 4 = 32 bits available for a total of about 4.3 billion unique addresses. Converting these to a number is a simple matter that will look familiar to anyone who ever had to implement “atoi” in an interview setting:
a.b.c.d = (16777216 * a) + (65536 * b) + (256 * c) + d
Except, oops, it turns out that the Internet uses way more than 4.3 billion addresses. Back a few years ago this was the source of much hand-wringing and in fact the last IPv4 addresses were allocated to regional registries more than a decade ago. The long-term solution to the problem was to create “v6” addressing which uses 128 bits and can assign a unique address to a solid fraction of all the atoms that make up planet Earth. They’re pretty ugly (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334), but they do the trick.
Sadly though, change is hard, and IPv4 has stubbornly refused to die — only something like 20-40% of the traffic on the Internet is currently using IPv6. Mostly this is because somebody invented NAT (Network Address Translation) — a simple protocol that allows all of the dozens of network devices in your house or workplace to share a single public IP address. So at least for the foreseeable future, we’ll be living in a world where both versions are out in the wild.
To get the most coverage, we use the IP2Location database that includes both v4 and v6 addresses. All of the range values in this database are specified as v6 values, which we can manage because a v4 address can be converted to v6 just by adding “::FFFF:” to the front. This amounts to adding an offset of 281,470,681,743,360 to its natural value — you can see this and the other gyrations we do in the addressToBigInteger method (and for kicks its reverse in bigIntegerToAddress).
Spread the Word!
Technically, that’s about it — pretty simple at the end of the day. But getting everything lined up cleanly can be a bit of a hassle; I hope that between the service and the code I’ve made it a little easier.
Most importantly, I hope people actually use the code on their own websites. We really are at a critical moment in modern history — are we going to evolve into a global community able to face the big challenges, or will we slide back to 1850 and play pathetic imperialist games until we just extinguish ourselves? My generation hasn’t particularly distinguished itself yet in the face of this stuff, but I’m hopeful that this disaster is blatant enough that we’ll get it right. My call to action:
- If you run a website, consider blocking pariah nations. You can do this with your firewall or gateway, with the RuBy or Redirect Russia scripts, or just roll your own. The only sites I hope we’ll leave open are the ones that might help citizens in these countries learn the truth about what is really happening.
- Share this article with colleagues and friends on social media so they can do the same.
- And even more key, (1) give to causes like MSF that provide humanitarian aid, and (2) make sure our representatives continue supporting Ukraine with lethal aid and punishing Russia/Belarus with increasing sanctions.
If I can help with any of this, just drop me a line and let me know.
Attribution: This site or product includes IP2Location LITE data available from https://lite.ip2location.com.