Alpine Linux: When /tmp is not so temporary
A couple of days ago, one of my machines started reporting that the disk was about to run out of free space. Turns out that /tmp was using multiple gigabytes and contained many files (mostly backup files staged for a borgbackup transfer). I found this a bit strange, yet didn't think too much of it initially and figured a reboot would sort this out. However, it wasn't for long when my monitoring script annoyed me again.
So I looked into /tmp and to my surprise, everything was still there. Very strange, according to everything I know, /tmp gets cleared on reboot, either because it's a tmpfs or because init clears it on boot. It's been this way, on any distribution, since ever.
So clearly something must have changed. So I checked the last 3 major Major stable releases and found only a release note entry that new installations would use tmpfs now. But that's not exactly my problem. (The text was expanded after I raised an issue)
Getting to the bottom of this, I located where OpenRC, the init system used by Gentoo and Alpine Linux, runs the cleanup. It's controlled by the wipe_tmp variable in /etc/conf.d/bootmisc. And indeed, AlpineLinux did change that setting. With wipe_tmp="NO", only a handful of files such as locks and unix domain sockets get deleted.
So, setting "wipe_tmp" to "YES" and rebooting fixed the issue. /tmp was cleared and things are as they were in the good old days. Yay.
Nevertheless, this still leaves one question: Why did they change it? As the patch mentions an issue, let's take a look at it: https://gitlab.alpinelinux.org/alpine/aports/-/issues/13070
A user claims that the script rm -rf'd his machine! Wow.
As a result of this, Alpine Linux suggested that mounting /tmp as tmpfs should be the default and no wiping should occur, resulting of course in a discussion. Although the maintainers wrote some changes, none have yet been merged. While it seemingly never was properly reproduced, there are a few theories why the data loss happened, such as bind mounts or a corrupted file system, but overall an explanation why the code ascended to / is missing.
Perhaps surprising is the complexity of the cleanup code. It first tries 'find', apparently because it's supposed to be faster (and in the past, users complained about a slow cleanup apparently). Only if 'find' does not work, it tries falls back to the natural 'rm', even then trying to match patterns. One can wonder whether it really has to be this way. Nevertheless, among other things, an option such as -xdev for the find command would not hurt as well as maybe a "--one-file-system" for rm.
There a few takeaways from this story.
(1) Alpine Linux did not initially mention its change, which is a bit unfortunate. As new machines will use tmpfs, this should no longer be an issue for most. However, changing existing installations can be debated. It errors on the side of safety while changing the behaviour on many machines based largely on a scenario that has not been reproduced. Overall, you can argue for this patch, but such significant, common-practise defying change should have been communicated more clearly.
(2) Alpine Linux release notes: There are announcement on the web page and also on the wiki, which are not in sync: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14092#note_254873. This is suboptimal to say the last, as you are more likely to read the web page in your feed reader for example and you can hardly be blamed for thinking everything important has been mentioned on the web page already. It's not just the case for this release, therefore it's something Alpine Linux users ought to keep in mind.
(3) As for upstream OpenRC, so far no patch has been merged to address the situation. Therefore, should more reports in the future arise of similar incidents with users losing their files, this will be criticized and Alpine Linux decision will have proven to be the better way after all.
(4) tmpfs is the more reasonable option for the general case. It's faster and a quota can be applied easily in the mount options. In the context of this post, it also does not require extra, potentially problematic cleanup code. I actually could have seen myself just enabling the wiping again without switching to tmpfs. After all, the same code ran on my Gentoo for +8 years and never caused problems on Alpine. However, despite being a memory-constrained machine, in the end I saw no good reasons not to simply switch to tmpfs
(5) What does the FHS standard say about /tmp? Turns out deleting it between reboots is not a requirement, merely a recommendation. So for what it's worth, the change by Alpine Linux does not violate the FHS standard.
Way over a decade ago, a book taught me programs that pollute /tmp with large files should simply be thrown away. It's not wrong. You are not supposed to that.
Yet I created a script that's hardly better. While the backup files created by my scripts aren't large, over months many small ones accumulate. They get stored on /tmp before being transferred to the backup destination. It worked without problems for years, as I regularly reboot the machine for updates. So /tmp would eventually be cleaned sooner or later. However, the fact that scripts relied on that could have backfired if it weren't for the monitoring script. It's a good example on how a seemingly innocent decision that "just works", or something you didn't give much thought at all, may come back to hunt you even years later due to changes in the environment your code runs in. Now, a simple "find . -xdev -mtime +2 -delete" in the backup dir avoids the problem.
Overall, nothing terrible happened here. The change made by Alpine for existing installations is debatable, but with the assumption that programs/scripts writing to /tmp behave properly, it shouldn't cause issues. It's thus my own backup scripts fault that this may have potentially become an issue. However, I wish such significant deviation from common practise would have been communicated more clearly by Alpine.