Fun with Gentoo: Why don't we just shuffle those ROP gadgets away?

2023-01-26

Update 2023-02-05: Mention clang / lld alternative approach

Introduction

Recently, I stumbled upon a OpenBSD effort that attempts to make it harder to perform a ROP-based exploitaiton against sshd: sshd random relinking at boot
It comes down to this line in the Makefile:
cc -o sshd `echo ${OBJS} | tr ' ' '\n' | sort -R` ${LDADD}
The essence of the idea is to simply pass the .o files in a random order to the linker. Their order inside the sshd binary won't be predictable. On reboot, OpenBSD relinks the binary. This ensures that it'll differ between OpenBSD installations and thus, offsets for ROP gadgets will vary too. The idea is that this has the potential to make an attackers life harder, as a standard ROP attack requires inspecting the target binaries.

The basic idea is not new anyway, OpenBSD was doing it already since 2017 for the kernel and also for libc
Modyifing the internal structure of binaries to defend against ROP, is something that goes way back actually. What we have here is, in its effect, comparable to weak versions of fine-grained ASLR approaches.

Yet, it being a quite simple solution, has sold it to me. So I wondered: Can I have that too on my Gentoo machine? Also, why not randomly link the whole system?

Time to hack!

Explanation

First, let's get an idea how this differs from ASLR:


#include <stdio.h>
#include <stdlib.h>

int main(){

void *fp = &printf;
void *fp2 = &malloc;

printf("printf addr: %p\n", fp);
printf("malloc addr: %p\n", fp2);
printf("Offset: %li\n", fp2-fp);
}

$ ./a.out
printf addr: 0x7f4a61149460
malloc addr: 0x7f4a611efef0
Offset: 682640
$ ./a.out
printf addr: 0x7f3dd043a460
malloc addr: 0x7f3dd04e0ef0
Offset: 682640
Clearly, the addresses changed between both runs. ASLR, on Linux, does not shuffle around functions. The offset therefore remained the same.

Shuffling .o files

Let's create a quick example:

a.h:
int funca();
b.h:
int funcb();
a.c:
#include "a.h"
int funca(){ return 1; }
b.c:
#include "b.h"
int funcb(){ return 2; }
main.c:

#include <stdio.h>
#include "a.h"
#include "b.h"

int main(void)
{
	void *a = &funca;
	void *b = &funcb;

	printf("funca address: %p\n", a);
	printf("funcb address: %p\n", b);

	printf("Offset: %li\n", a-b);
}
```
Let's compile them:
$ gcc -c main.c a.c b.c
Let's link it once:

$ gcc main.o a.o b.o
$ ./a.out
funca address: 0x55bebb56f1af
funcb address: 0x55bebb56f1ba
Offset: -11
Let's reorder b.o and a.o:

$ gcc main.o b.o a.o
$ ./a.out
funca address: 0x55ccba9791ba
funcb address: 0x55ccba9791af
Offset: 11
The offset changed simply by switching the order of the .o files.

Imagine a large project with hundreds of .o files. If we randomize their order before linking, you won't be able to conclude where an instruction from another .o resides. The structure of the binary, thus also the offset to gadgets, has become much less predictable. Of course, all this is useless if the attacker has access to the binary. The idea is that this won't the case. Either due to relinking (OpenBSD's approach) or because we do not distribute it.


Shuffle the planet!

Doing it on the binary only has limited a effect. Ideally, we would want to shuffle its dependencies too.

But how are we going to do that?

A perfect task for Gentoo

If a binary distro decided to apply this approach, it would be pretty ineffective. An attacker would simply download the packages and we gained nothing.

Therefore, it would need to implement the "relink on reboot" concept as OpenBSD does, or at least ship compiled .o files and perform the shuffled link on the client that installs a package. While technically doable, it does not seem very practical.

This approach is much more practical on source-based distributions like Gentoo. You build the software you need from source, most often on the machine where it will run. As a side-effect, reproducible builds, which this technique breaks, are less of a concern anyway (because you've compiled your system from source).

Patch every Makefile?

So if you want to build every package from source, you are left with a problem: How do you actually pass the .o files in a shuffled order?

OpenBSD simply patches the Makefile of sshd. You can do this if you plan on selectively deploying this technique. But if you want to randomize the order of your whole system, are you going to patch around Makefiles etc. of every package you have installed?

ld wrapper
So my first idea was to simply patch ld in binutils. This way, it could be done centrally, no patching of Makefiles would be required. However, the thought of messing in its code base and thus to carry a binutils patch around forever was not particularly appealing to me.

Ultimately gcc calls the linker. I initially renamed the "ld" called by gcc and made it a python wrapper that would reorder the .o before passing them to real ld binary. This worked, but was still a bit dirty. Turns out there is a slightly cleaner way...

shuffleld: A wrapper script for gcc

So while making "ld" a wrapper worked, we can do better.

gcc is composed of several subcommands, such as "cc1", "as" and "collect2" (linker wrapper). When gcc does its thing, it calls them. It is however possible to influence this process using the "-wrapper" option.

To see what's going on, you can compile a hello world file like this:

gcc hello.c -o hello -wrapper echo
This would print (without doing anything) the commands and arguments gcc would use to actually produce a binary.

As it turns out, the -wrapper switch is perfect for my needs. It allows reordering .o files without requiring any patches, So shuffleld is born, which intercepts the "collect2" of gcc. It doesn't care about the "cc1" calls, as we only want to tinker with the linker call, not with compilation stage. The current state (and possible final, it's a hack after all) of shuffleld can be found in its github repo
The potential issue comes from the assumption that all .o files will be given continuously in the command line. The assumption appear to hold, but could blow up down the road. But well, it's hack.

The rest is easy on Gentoo: You set LDFLAGS="${LDFLAGS} -wrapper /path/to/shuffleld" and you are basically done. Packages with those LDFLAGS will have their .o reordered when linked. You can do this in your make.conf or on a per-package basis using the package.env of emerge (see appendix at the end).

Experiment and Analysis

Figuring out how to measure this to make some general conclusions is beyond the scope of a hack. But still we want to have some idea what we get.

To this end, let's build openssl without active shuffleld twice and use ROPgadget to extract some gadgets. Due to different compiler versions, gentoo USE flags and compiler options, numbers will vary between systems:


$ emerge openssl
$ ROPgadget --binary=/usr/lib64/libssl.so.1.1 --ropchain > result1.txt
$ emerge openssl
$ ROPgadget --binary=/usr/lib64/libssl.so.1.1 --ropchain > result2.txt
$ diff result1.txt result2.txt
We get silence from diff. No changes at all. Because the linking order is deterministic. This was to be expected, this is what we want to change.

Activating our LDFLAGS modification from above, and running the same:


$ emerge openssl
$ ROPgadget --binary=/usr/lib64/libssl.so.1.1 --ropchain > result1.shuffleld.txt
$ emerge openssl
$ ROPgadget --binary=/usr/lib64/libssl.so.1.1 --ropchain >  result2.shuffleld.txt
Inspecting the output files, we can clearly see addresses of the gadgets changed this time. This is what we wanted.

If we diff the two files now our console will be spammed. To get a feeling of how many unique remain, we can run:


grep "^0x" result1.shuffleld.txt > result1.shuffleld.gadgets.txt
grep "^0x" result2.shuffleld.txt > result2.shuffleld.gadgets.txt

cat result1.shuffleld.gadgets.txt | while read line ; do 
grep "$line" result2.shuffleld.gadgets.txt  
done | wc -l
Which gives us 581 gadgets that remain at the same offset, as opposed to 19634 before shuffleld. So around ~3% of gadgets remained, for this openssl example.

Performing the same experiment with qswiki I got 0.57%.

glibc
Of course, the most important target for this experiment should be glibc. However, glibc cannot be shuffled with this wrapper. It requires a small Makefile patch
In my test, ~0.10% gadgets remained at the same offset as opposed to 100% without this patch.

So why did some gadgets offsets not change?
Even if you compile an empty C program, you end up with code that is not yours:


cd /tmp
echo "int main(){}" > a.c
gcc /tmp/a.c
objdump -D a.o
The linker will attach pre-compiled code, for example glibc startup code or gcc crt*.o files. The shuffleld wrapper also does not reorder those. I experienced crashes when doing so, which doesn't surprise me, so I decided not to touch them.

Naturally, in the past somebody came up with an attack against this, calling it "return2csu" https://i.blackhat.com/briefings/asia/2018/asia-18-Marco-return-to-csu-a-new-method-to-bypass-the-64-bit-Linux-ASLR-wp.pdf. However, recent versions have hardened the glibc startup code: https://sourceware.org/bugzilla/show_bug.cgi?id=23323 Thus, it remains to be seen what can be squeezed out these days.

Conclusion

Seemingly, this looks great. We are changing the offset of most gadgets and could decide to proudly present these results to the world. But can we?

Not quite. The analysis merely confirms we shuffled the gadgets found by ROPgadget. The fact you moved 99.90% may sound good, but maybe the rest contains all that is needed for an attack?

Even if remaining gadgets weren't a problem, the situation is like this:
- It's useless against local attackers who can read the binary or library. You could make a binary chmod o=x (as OpenBSD does with sshd), but not a library.
- An arbitrary read is devastating to this "mitigation".
- It's not the most fine-grained approach. Reordering .o is not the same as reordering functions. It may be, if each single function is placed in a separate .o. This depends on the project but is rather rare to occur. Therefore, relative offsets inside a .o remain known. You can say this is a poor-man attempt at fine-grained ASLR.
- The glibc example showed that patching Makefiles may be required for some projects, so it may not be a simple "flip a switch" solution I hoped for.

But is this all in vain?

- A less-predictable memory structure is preferable over a predictable one. A single pointer leak gets you less.
- No runtime overhead (unless reordering symbols will have cache misses as a side effect, but nothing I am worrying about).
- It's dirt-cheap and not complex.

There is a certain irony in applying this hack on Gentoo though. While practical, it's not the distro that needs this the most. Due to difference in compiler versions, USE flags, compiler flags such as -march=, binaries will differ between Gentoo users anyway. Targeting Gentoo is therefore harder than say Ubuntu, were the set of binaries would practically be equal among users.

There BROP (Blind ROP) approaches, which would allow ROP exploits without having access to the binary (http://www.scs.stanford.edu/brop/bittau-brop.pdf). The paper even mentions Gentoo as a target. While cool, it assumes processes automatically restart after a crash and that execve() is not involved in that, or that it doesn't randomize the address space. However, this doesn't quite reflect real world conditions. There also challenges in replicating that attack. So I consider this rather theoretical.

So to conclude: (1) It's not a silver bullet (2) Should raise the bar, in particular for remote attacks. (3) I don't see what there is to lose.

Therefore, I will let emerge apply the glibc patch for me. I'll enable shuffleld for certain packages and if nothing weird happens, globally eventually. If there more libraries like glibc which require patches, so be it, but I probably won't bother. Should I get bored one day, I might see what can be done about the gadgets result from gcc crt files etc.

Alternatives
selfrando
selfrando offers a more stronger version , attempting to randomize each function. For a while it was included in hardened Torbrowser builds, but was removed. The reason why exactly was not given. The removal may not necessarily imply that it's ineffective in general, a browser attacker operates under different conditions than say an attacker against a network daemon. selfrando changes the binary on start, so it is maybe viable for binary distros. However, this might have a price in terms of performance.

Update 2023-02-05: clang / lld
A reader reached out to me and suggested an alternative approach using clang + the lld linker. The -ffunction-sections flag can be used to emit an ELF section for each function (this flag is also available on gcc). The lld linker has the --shuffle-sections=*=0 option which would shuffle those sections randomly. This approach is much less of a hack, since it doesn't require a shaky wrapper. Unlike the shuffleld hack, this is a more fine-grained approach as relative function offsets in .o files also change. Thanks koromix for letting me know!

Example:

clang-15 source.c -ffunction-sections -Wl,--shuffle-sections=*=0 -fuse-ld=lld
Other options such as -fdata-sections might improve on this, as it would change the offsets of static storage duration variables. As the flags may come with penalties (larger binaries, maybe also performance), enabling them globally may not be advisable.

Since I build some packages with clang, I have added those options to my clang emerge env. Whether these flags might do something for glibc (e. g. by trying to build it with gcc + using lld as the linker with the shuffle option) is something I might look into at some point.

Appendix: Gentoo environment

glibc Makefile patch
mkdir -p /etc/portage/patches/sys-libs/glibc
wget https://quitesimple.org/dl/glibc-shuffle.patch -O /etc/portage/patches/sys-libs/glibc/glibc-shuffle.patch


shuffleld env
echo 'LDFLAGS="${LDFLAGS} -wrapper /usr/local/bin/shuffleld"' > /etc/portage/env/shuffleld
echo 'dev-libs/openssl shuffleld' > /etc/portage/package.env/shuffleld
chmod o=r /etc/portage/env/shuffleld /etc/portage/package.env/shuffleld

For comments/remarks: Contact me