Kernel prctl feature for syscall interception and emulation

Thu Nov 19 20:54:14 GMT 2020

On 11/19/20 20:57, David Laight wrote:
>>> The Windows code is not completely loaded at initialization time.  It
>>> also has dynamic libraries loaded later.  yes, wine knows the memory
>>> regions, but there is no guarantee there is a small number of segments
>>> or that the full picture is known at any given moment.
>> Yes, I didn't mean it was known statically at init time (although
>> maybe it can be; see below) just that all the code doing the loading
>> is under Wine's control (vs having system dynamic linker doing stuff
>> it can't reliably see, which is the case with host libraries).
> Since wine must itself make the mmap() system calls that make memory
> executable can't it arrange for windows code and linux code to be
> above/below some critical address?
>
> IIRC 32bit windows has the user/kernel split at 2G, so all the
> linux code could be shoe-horned into the top 1GB.
>
> A similar boundary could be picked for 64bit code.
>
> This would probably require flags to mmap() to map above/below
> the specified address (is there a flag for the 2G boundary
> these days - wine used to do very horrid things).
> It might also need a special elf interpreter to load the
> wine code itself high.
>
Wine does not control the loading of native libraries (which are subject
to ASLR and thus do not necessarily exactly follow mmap's top down
order). Wine is also not free to choose where to load the Windows
libraries. Some of Win libraries are relocatable, some are not. Even
those relocatable are still often assumed to be loaded at the base
address specified in PE, with assumption made either by library itself
or DRM or sandboxing / hotpatching / interception code from around.

Also, it is very common to DRMs to unpack the encrypted code to a newly
allocated segment (which gives no clue at the moment of allocation
whether it is going to be executable later), and then make it
executable. There are a lot of tricks about that and such code sometimes
assumes very specific (and Windows implementation dependent) things, in
particular, about the memory layout. Windows VirtualAlloc[Ex] gives the
way to request top down or bottom up allocation order, as well as
specific allocation address. The latter is not guaranteed to succeed of
course just like on Linux for obvious reasons, but if specific (high)
address ranges  always have some space available on Windows, then there
are the apps in the wild which depend of that, as far as our practice goes.

If we were given mmap flag for specifying memory allocation boundary,
and also a sort of process-wide dlopen() config option for specifying
that boundary for every host shared library load, the address space
separation could probably work... until we hit a tricky case when the
app wants to get a memory specifically high address range. I think we
can't do that cleanly as both Windows and Linux currently have the same
128TB limit for user address space on x64 and we've got no spare space
to safely put native code without potential interference with Windows code.