Possibly a bug in glibc around the getrandom(2) implementation.

Marcin Mielniczuk marmistrz.dev@zoho.eu
Fri Jul 14 13:11:00 GMT 2017


Hi!

While developing a ptrace-based utility I came across really weird 
things happening over there.
Much research on this topic led me to believe that what I was seeing is 
a bug in either glibc or the compiler.
I'm starting here, since glibc is a little higher-level.

First of all, I'm eager to trace this one deeper if you guide me. This 
would be basically part of the project I'm working at and I'm certainly 
willing to keep hunting the bug.

Short statement of a problem: a Python script run by the CPython 
interpreter smashes stack, if it's traced using ptrace. More details below.
I developed a small utility to trace and intercept the getrandom(2) 
syscall. It's original implementation is in Rust but for the sake of 
debugging I rewrote the code to C - it's the official ptrace API, 
nevertheless.

The C utility [1] invokes a Python script [2]. If the Python script is 
invoked standalone (without the tracing program), it correctly runs to 
the end. The same happens, if the shebang is changed to

    #!/usr/bin/python

i.e. the executable is specified directly. If `env` is used, on the 
other hand, this exits with a spectacular error message about the stack 
being smashed. [3]
What's even better - it happens only if env is the direct child of the 
tracer, e.g. if the tracer execs `valgrind ./pi.py`, everything works 
perfectly (remember that ./pi.py will invoke env!),
if the tracer execs `env valgrind ./pi.py` - everything explodes.

The stack is being smashed, indeed! If I patch CPython, so that it 
prints the address of the stack variable that the process is assigned - 
that's exactly the same as my utility prints, and analyzing the
memory map shows that it's only partially contained in stack. So the 
stack protector does the right job.

Moreover, if I patch getrandom(2) using the LD_PRELOAD trick (create a 
non-static function with the same signature and LD_PRELOAD it) 
everything works correct.
Even if I call the getrandom syscall in my function using syscall(2) - 
everything works perfectly. I.e. eliminating the glibc getrandom(2) is a 
workaround.

I was suspecting a CPython bug and my research makes me believe it's 
not. The getrandom(2) syscall is invoked from the random_seed_urandom 
[4] function.
It declares a stack variable which is passed through the 
_PyOS_URandomNonblock, [5] pyurandom [6] and py_getrandom [7] to the 
getrandom(2) glibc wrapper [8].

There's basically nothing suspicious in this code. But the whole magic 
happens when I tried to add some logging on the CPython side.
Adding any printf statement anywhere after issuing getrandom resulted 
in... an immediate fix! Suddenly, everything started to work as expected.
I have a hypothesis with no evidence that for some reason the stack 
space is freed too early but a read succeeds, since it doesn't modify 
the values left by the stack protector.

If I rebuild CPython with clang instead of gcc, getrandom(2) returns -1, 
indicating an error. Unfortunately, I didn't manage to find a way to 
peek errno from the traced process.
Everything I described happened even on an unoptimized build of CPython 
(-O0).

Another thing I noticed, that if the execution chain is tracer -> 
python, the RDI and RSI registers, containing the syscall arguments 
remain valid until the syscall exit.
If tracer -> env -> python is used, on the other hand, the arguments are 
invalidated in the chain of execution and the registers contain 
something else.

My environment: Arch Linux, kernel 4.9.36-1-lts, glibc 2.25, Python 
3.6.1 (and 3.7.0 from git).
The same error was reproduce on a similar setup but with kernel 
4.11.5-1-ARCH.

Do you have any ideas, what to do next?

Regards,
Marcin Mielniczuk

[1] https://gist.github.com/marmistrz/56eac71d3cb65fb22caa5de1c95300e3
[2]https://gist.github.com/marmistrz/787858bcc72884aff1cf881f45b8e962
[3] https://gist.github.com/marmistrz/5a26cecf438b592afcd2ce950609cba0
[4] 
https://github.com/python/cpython/blob/master/Modules/_randommodule.c#L204
[5] 
https://github.com/python/cpython/blob/master/Python/bootstrap_hash.c#L531
[6] 
https://github.com/python/cpython/blob/master/Python/bootstrap_hash.c#L465
[7] 
https://github.com/python/cpython/blob/master/Python/bootstrap_hash.c#L98
[8] 
https://github.com/python/cpython/blob/master/Python/bootstrap_hash.c#L125



More information about the Libc-help mailing list