Schroedingers Honeypot

I’ve moved my website back to a self-managed server. c0t0d0s0.org now runs at netcup¹. I was just getting increasingly annoyed at not being able to do anything administratively about the various things I was observing in my logfile. For example, to cut down the background noise in the Apache logfile by blocking accesses on an IP basis.

One of the problems was the anonymisation of the logfile. I knew that I was getting thousands of requests hitting WordPress components at times. But I haven’t used WordPress at all for almost 20 years. I switched to s9y afterwards and used it for years. And with the migration to Jekyll in 2021, there wasn’t even a single php file necessary for the operation of the blog. So I knew for sure all those WordPress .php accesses weren’t kosher.

However, thanks to the anonymisation in the Strato logfiles there was little I could do about it, because I only had a very imprecise idea of where they were coming from. That made it impossible to implement firewall blocks² or aggressive Apache access controls. The collateral damage would have been too large. The bot operators have also become too clever by now to offer any other distinguishing feature by which you could reliably identify and block them.

A dilemma arose. I don’t want to collect any PII, but IP addresses count as PII. So my new configuration on my own webserver would also have to anonymise the IP addresses.

At the same time, though, I wanted to be able to surgically block a single IP address at the network level. But for that I need the complete, non-anonymised IP address.

I have a solution that addresses both requirements. I redirect all that nonsense traffic aimed at WordPress components³ to a 403 anyway. Let me show you the part of my .htaccess that handles all the WordPress scans. I could of course also let the requests run into a 404, but in my case that’s a page styled like the rest of the blog, which would generate a lot of follow-up traffic. Suboptimal.

RewriteCond %{REQUEST_URI} /wp-[a-z-]+\.php [NC]
RewriteRule .* - [E=honeypot:1,F,L]

RewriteCond %{REQUEST_URI} (wp-admin|wp-json|wp-signup|wp-cron) [NC]
RewriteRule .* - [E=honeypot:1,F,L]

RewriteCond %{REQUEST_URI} xmlrpc\.php$ [NC]
RewriteRule .* - [E=honeypot:1,F,L]

RewriteCond %{REQUEST_URI} wlwmanifest\.xml$ [NC]
RewriteRule .* - [E=honeypot:1,F,L]

RewriteCond %{REQUEST_URI} (wp-includes|wp-content) [NC]
RewriteRule .* - [E=honeypot:1,F,L]

RewriteCond %{REQUEST_URI} wp-config [NC]
RewriteRule .* - [E=honeypot:1,F,L]

I could surely fold the rules into a single one, but the attempt at doing so looked extremely messy and unmaintainable. If the multiple regexps ever cause problems down the road, I can still rebuild it then. Besides, I’m potentially saving a lot of requests from hitting the webserver daemon in the first place. I think I still come out ahead on net.

The decisive bit here is the [E=honeypot:1,F,L]. L to end the discussion with mod_rewrite, and F to throw a 403. The E sets the environment variable honeypot.

It’s Schroedinger’s honeypot, so to speak. Without checking, a scanner doesn’t know whether the file is there or not. The honeypot sits in a superposition. I might be running a WordPress blog — or I might not. The scanner has to look, and gets a 403.

The interesting information for me is the fact that it asked at all: by doing so, the scanner has told me it’s a scanner, because there’s no other reason to be looking for WordPress artefacts on my Jekyll site.

You’d actually expect the 403 to be taken as a signal: “I know what you’re doing. I’ve done something about it. Stop it. Now!” But the scanners keep trying over and over.

So with those rules I’ve marked the requests that may not necessarily be malicious, but are certainly questionable. That’s something I can work with.

In a second step, I use conditional logging.

    LogFormat "%a %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined_anon
    CustomLog "|/usr/local/bin/anonymise-log /var/log/apache2/c0d0s0.org-access.log" combined_anon
    ErrorLog "|/usr/local/bin/anonymise-errorlog /var/log/apache2/c0d0s0.org-error.log"

These first three lines set up the normal logging. I pipe the log through an AWK script that anonymises the IP addresses. For IPv4 addresses, that simply means replacing the last octet with a zero. For IPv6, I anonymise down to /48.

In the configuration fragments below you’ll find an IP address 203.0.113.1. That’s a placeholder for your own IP, the one you reach your own webserver from. In most cases this will be the external IP of your router.

Please add the following lines to the VHost configuration of your webserver. You have to repeat this in the TLS section of the VHost config.

    SetEnvIf REDIRECT_honeypot 1 honeypot
    SetEnvIf Remote_Addr "^127\." !honeypot
    SetEnvIf Remote_Addr "^::1$" !honeypot
    SetEnvIf Remote_Addr "^203\.0\.113\.1$" !honeypot
    

and

    LogFormat "%h %{%Y-%m-%dT%H:%M:%S%z}t \"%r\" %>s \"%{User-Agent}i\"" honeypot
    CustomLog /var/log/apache2/honeypot.log honeypot env=honeypot

The REDIRECT_ in the construct SetEnvIf REDIRECT_honeypot 1 honeypot cost me a moment and some debugging.

The F flag in the rewrite rule doesn’t produce a direct 403 internally; instead it produces an internal redirect to the error document. During that internal redirect, Apache renames all environment variables. That’s how honeypot turns into REDIRECT_honeypot. If you don’t notice this, you’ll spend hours trying SetEnvIf honeypot 1 honeypot and wonder with an increasingly furrowed brow and dwindling patience why nothing ever ends up in the log. This behaviour can be found in the documentation.

Once the environment variable is set, a second log kicks in. And that one doesn’t anonymise the IP numbers.

At this point I’d like to mention that you can protect yourself even further against locking yourself out.

To do so, add the following line to the .htaccess right at the very top, after enabling the rewrite:

RewriteCond %{REMOTE_ADDR} ^203\.0\.113\.1$ 
RewriteRule .* - [E=honeypot_whitelist:1]

Whatever else happens, this unsets the honeypot environment variable further down the line.

In the VHost configuration you can then insert the following instead of the SetEnvIf block from before.

SetEnvIf REDIRECT_honeypot 1 honeypot
SetEnvIf REDIRECT_honeypot_whitelist 1 !honeypot 
SetEnvIf honeypot_whitelist 1 !honeypot
SetEnvIf Remote_Addr "^127\." !honeypot
SetEnvIf Remote_Addr "^::1$" !honeypot
SetEnvIf Remote_Addr "^203\.0\.113\.1$" !honeypot

The charm of this optional configuration is that your own IP is also stored in the .htaccess. That file is often part of the deployment process. It’s easier to transfer along with the website via rsync than to modify the Apache configuration on the webserver via SSH.

Now I can take this log and drop it in front of Fail2Ban’s feet. Configuring it can be dead simple. One time and you are out. If an IP address shows up in this logfile, I don’t have to give it any benefit of the doubt about maybe being a legitimate user. I know it isn’t.

The Fail2Ban configuration therefore looks like this:

# cat /etc/fail2ban/jail.d/apache-honeypot.conf

[apache-honeypot]
enabled   = true
filter    = apache-honeypot
logpath   = /var/log/apache2/honeypot.log
maxretry  = 1
findtime  = 60
bantime   = 86399
banaction = nftables-multiport
port      = http,https
protocol  = tcp

ignoreip  = 127.0.0.1/8 ::1 203.0.113.1

The ignoreip line isn’t strictly necessary, but it’s a second⁴ safety net in case I mess something up in the Apache config file. fail2ban has sent me to the console too many times because I managed to lock myself out. I’m not taking any more chances with that.

And then a filter rule that simply looks for the IP address at the beginning.

# cat /etc/fail2ban/filter.d/apache-honeypot.conf
[Definition]
failregex = ^<HOST>\s
ignoreregex =

With that, every IP address caught red-handed in a scan gets blocked. It can still happen that a scanner gets quite a few requests through. It can take a second until fail2ban has set the filter rule. Depending on how fast the scanner fires its requests, a handful of requests get through before the door slams shut and the “You shall not pass” is pronounced.

There are ways to speed up the processing, but since the request itself is already stopped by the 403, I didn’t want to raise the complexity of the solution.

Since then, my logfile has been a lot quieter. In return, though, the firewall configuration grows over time. At the time I’m writing this article, I have 39 entries after running this configuration for 3 hours. Let’s check with nft list ruleset:

	set addr-set-apache-honeypot {
		type ipv4_addr
		elements = { aaa.bbb.ccc.ddd, eee.fff.ggg.hhh,
			     [...]
			     qqq.xxx.yyy.zzz }
	}

For anyone wondering about the missing ports in that structure: They are defined in the f2b-chain nftables chain.

	chain f2b-chain {
		type filter hook input priority filter - 1; policy accept;
		[...]
		tcp dport { 80, 443 } ip saddr @addr-set-apache-honeypot reject with icmp port-unreachable
	}

The reason I use multiport here and not allport is that on top of belt and braces I also brought in double-sided tape on the waistband.⁵ Even if this mechanism accidentally blocks my own IP, only the webserver is affected. ssh stays available. For locking myself out of SSH, I have another Fail2Ban configuration.

There are people whose technical judgement I trust, and they trust netcup. So I decided to trust them as well. ↩
Which I couldn’t have configured anyway, because I didn’t have admin access to the system. ↩
requesting a lot of files like .env or .git ↩
Or third … ↩
In case you were wondering how Kylie Minogue’s costume in “Can’t Get You Out of My Head” stayed in place … it was explained to me: “double-sided tape”. ↩