Bug 12376 - File offsets for PT_LOAD segments and resulting inequivalent memory aliases
Summary: File offsets for PT_LOAD segments and resulting inequivalent memory aliases
Status: RESOLVED FIXED
Alias: None
Product: binutils
Classification: Unclassified
Component: ld (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-08 19:22 UTC by John David Anglin
Modified: 2016-12-28 23:18 UTC (History)
2 users (show)

See Also:
Host:
Target: hppa-linux
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John David Anglin 2011-01-08 19:22:26 UTC
By default, the linker places sections in sequential order in executable
files for ELF targets like hppa-unknown-linux-gnu.  The glibc dynamic
linker uses mmap with MAP_FIXED to map PT_LOAD segments in the executable
file to the virtual address ranges specified in the program headers.
At the boundary between segments, we may have two different virtual
address pages with different protections mapping to the same physical
address page.

This results in two inequivalent aliases to the same physical page.  This
is inefficient, and certain hardware can lead to cache corruption and
segmentation faults.  For example, the HP PA8800 and PA8900 processors
do not support inequivalent aliases.

The issue can be seen with the following simple program:

int main () { return 0; }

Compiling with gcc -o xxx3 xxx3.c, I see with readelf:

ELF Header:
  Magic:   7f 45 4c 46 01 02 01 03 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - Linux
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           HPPA
  Version:                           0x1
  Entry point address:               0x10354
  Start of program headers:          52 (bytes into file)
  Start of section headers:          2880 (bytes into file)
  Flags:                             0x210, PA-RISC 1.1
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         7
  Size of section headers:           40 (bytes)
  Number of section headers:         31
  Section header string table index: 28

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        00010114 000114 00000d 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            00010124 000124 000020 00   A  0   0  4
  [ 3] .note.gnu.build-i NOTE            00010144 000144 000000 00   A  0   0  4
  [ 4] .hash             HASH            00010144 000144 000034 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          00010178 000178 000080 10   A  6   1  4
  [ 6] .dynstr           STRTAB          000101f8 0001f8 000080 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          00010278 000278 000010 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         00010288 000288 000020 00   A  6   1  4
  [ 9] .rela.dyn         RELA            000102a8 0002a8 00000c 0c   A  5   0  4
  [10] .rela.plt         RELA            000102b4 0002b4 000048 0c   A  5  23  4
  [11] .init             PROGBITS        000102fc 0002fc 000048 00  AX  0   0  4
  [12] .text             PROGBITS        00010344 000344 0003e0 00  AX  0   0  4
  [13] .fini             PROGBITS        00010724 000724 000028 00  AX  0   0  4
  [14] .rodata           PROGBITS        0001074c 00074c 000018 00   A  0   0  4
  [15] .PARISC.unwind    PROGBITS        00010764 000764 0000e0 04   A  0  12  4
  [16] .eh_frame_hdr     PROGBITS        00010844 000844 000014 00   A  0   0  4
  [17] .eh_frame         PROGBITS        00010858 000858 000034 00   A  0   0  4
  [18] .ctors            PROGBITS        0001188c 00088c 000008 00  WA  0   0  4
  [19] .dtors            PROGBITS        00011894 000894 000008 00  WA  0   0  4
  [20] .jcr              PROGBITS        0001189c 00089c 000004 00  WA  0   0  4
  [21] .dynamic          DYNAMIC         000118a0 0008a0 0000c8 08  WA  6   0  4
  [22] .data             PROGBITS        00011968 000968 000008 00  WA  0   0  4
  [23] .plt              PROGBITS        00011970 000970 00004c 08 WAX  0   0  4
  [24] .got              PROGBITS        000119bc 0009bc 00001c 04  WA  0   0  4
  [25] .bss              NOBITS          000119d8 0009d8 000014 00  WA  0   0  4
  [26] .note             NOTE            00000000 0009d8 000028 00      0   0  1
  [27] .comment          PROGBITS        00000000 000a00 000038 01  MS  0   0  1
  [28] .shstrtab         STRTAB          00000000 000a38 000106 00      0   0  1
  [29] .symtab           SYMTAB          00000000 001018 0004c0 10     30  56  4
  [30] .strtab           STRTAB          00000000 0014d8 0002b1 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00010034 0x00010034 0x000e0 0x000e0 R E 0x4
  INTERP         0x000114 0x00010114 0x00010114 0x0000d 0x0000d R   0x1
      [Requesting program interpreter: /lib/ld.so.1]
  LOAD           0x000000 0x00010000 0x00010000 0x0088c 0x0088c R E 0x1000
  LOAD           0x00088c 0x0001188c 0x0001188c 0x0014c 0x00160 RWE 0x1000
  DYNAMIC        0x0008a0 0x000118a0 0x000118a0 0x000c8 0x000c8 RW  0x4
  NOTE           0x000124 0x00010124 0x00010124 0x00020 0x00020 R   0x4
  GNU_EH_FRAME   0x000844 0x00010844 0x00010844 0x00014 0x00014 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .text .fini .rodata .PARISC.unwind .eh_frame_hdr .eh_frame 
   03     .ctors .dtors .jcr .dynamic .data .plt .got .bss 
   04     .dynamic 
   05     .note.ABI-tag 
   06     .eh_frame_hdr 

As can be seen, the .ctors section follows .eh_frame in the file with
only an adjustment for its alignment.  It is not page aligned in the file.

If we now look at how it is mapped into memory by the dynamic loader and
linux kernel, we see the following:

dave@gsyprf11:~$ cat /proc/26767/maps
00010000-00011000 r-xp 00000000 08:30 6615849                            /home2/dave/inequiv/xxx3
00011000-00012000 rwxp 00000000 08:30 6615849                            /home2/dave/inequiv/xxx3
40000000-40005000 rw-p 00000000 00:00 0 
40175000-40195000 r-xp 00000000 08:03 2502157                            /lib/ld-2.11.2.so
40195000-40199000 rwxp 0001f000 08:03 2502157                            /lib/ld-2.11.2.so
40199000-4019a000 rwxp 00000000 00:00 0 
403ee000-40547000 r-xp 00000000 08:03 2502172                            /lib/libc-2.11.2.so
40547000-4054e000 rwxp 00158000 08:03 2502172                            /lib/libc-2.11.2.so
4054e000-40550000 rwxp 00000000 00:00 0 
fdf00000-fdf23000 rwxp 00000000 00:00 0                                  [stack]

As can be seen in the first two maps, we have two maps pointing to the same
page in the file xxx3.  These in fact point to the same page in physical
memory.  So, the second map can apparently write to the address range
protected in the first map.  However, my main concern is the inequivalent
aliases to physical memory.

The Open Group rationale for mmap mentions the following issues:

"If an application requests a mapping that would overlay existing mappings in the process, it might be desirable that an implementation detect this and inform the application. However, the default, portable (not MAP_FIXED) operation does not overlay existing mappings. On the other hand, if the program specifies a fixed address mapping (which requires some implementation knowledge to determine a suitable address, if the function is supported at all), then the program is presumed to be successfully managing its own address space and should be trusted when it asks to map over existing data structures. Furthermore, it is also desirable to make as few system calls as possible, and it might be considered onerous to require an munmap() before an mmap() to the same address range. This volume of IEEE Std 1003.1-2001 specifies that the new mappings replace any existing mappings, following existing practice in this regard.

It is not expected, when the Memory Protection option is supported, that all hardware implementations are able to support all combinations of permissions at all addresses. When this option is supported, implementations are required to disallow write access to mappings without write permission and to disallow access to mappings without any access permission. Other than these restrictions, implementations may allow access types other than those requested by the application. For example, if the application requests only PROT_WRITE, the implementation may also allow read access. A call to mmap() fails if the implementation cannot support allowing all the access requested by the application. For example, some implementations cannot support a request for both write access and execute access simultaneously. All implementations supporting the Memory Protection option must support requests for no access, read access, write access, and both read and write access. Strictly conforming code must only rely on the required checks. These restrictions allow for portability across a wide range of hardware."

My impression from this is that it is not up to the kernel to detect overlapping
maps.  As far as the linker goes, I can see two options:

1) Adjust hppalinux.sh to modify TEXT_ADDR and SHLIB_TEXT_ADDR.  I need to
   add 0x400000 (4 MB) to make the addresses equivalent.  This rather chews
   up the virtual address space for shared libraries.

2) Align the PT_LOAD segments in the file.  I'm not sure how to do this
   at the moment.  Currently, maxpagesize is 0x1000, so this might not be
   a too onerous increase in file size.  On the otherhand, maxpagesize
   probably should be 0x10000.
Comment 1 John David Anglin 2011-01-09 17:05:02 UTC
This might also be a glibc bug:

$ strace /lib/ld-2.11.2.so ./xxx3execve("/lib/ld-2.11.2.so", ["/lib/ld-2.11.2.so", "./xxx3"], [/* 17 vars */]) = 0
brk(0)                                  = 0x4119a000
open("./xxx3", O_RDONLY)                = 3
read(3, "\177ELF\1\2\1\3\0\0\0\0\0\0\0\0\0\2\0\17\0\0\0\1\0\1\3T\0\0\0004"..., 512) = 512
fstat64(3, {st_mode=0, st_size=4380866642020, ...}) = 0
getcwd("/home2/dave/inequiv", 128)      = 20
mmap(0x10000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x10000
mmap(0x11000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x11000
close(3)                                = 0
newuname({sys="Linux", node="gsyprf11", ...}) = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 62087, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40192000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\2\1\3\0\0\0\0\0\0\0\0\0\3\0\17\0\0\0\1\0\2\0\350\0\0\0004"..., 512) = 512
fstat64(3, {st_mode=0, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000
mmap(NULL, 1449240, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x403ee000
mmap(0x40547000, 28672, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x158000) = 0x40547000
mmap(0x4054e000, 7448, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4054e000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40001000
mprotect(0x403ee000, 1413120, PROT_READ|PROT_WRITE) = 0
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40002000
mprotect(0x403ee000, 1413120, PROT_READ|PROT_EXEC) = 0
munmap(0x40192000, 62087)               = 0
exit_group(0)                           = ?

The same fd (3) is used for all mmap calls.  Possibly, different physical
memory pages would be allocated if different file descriptors were used.
Comment 2 H.J. Lu 2011-01-09 18:46:06 UTC
(In reply to comment #0)
> 2) Align the PT_LOAD segments in the file.  I'm not sure how to do this
>    at the moment.  Currently, maxpagesize is 0x1000, so this might not be
>    a too onerous increase in file size.  On the otherhand, maxpagesize
>    probably should be 0x10000.

It can changed by the maximum alignment of sections in the segment.
Comment 3 John David Anglin 2011-01-09 20:56:17 UTC
1) Adjust hppalinux.sh to modify TEXT_ADDR and SHLIB_TEXT_ADDR.  I need to
   add 0x400000 (4 MB) to make the addresses equivalent.  This rather chews
   up the virtual address space for shared libraries.

The breaks glibc build.  My sense is there is no reasonable binutils fix
but I'm not sure.
Comment 4 Alan Modra 2011-02-14 02:14:06 UTC
> These in fact point to the same page in physical memory.

Really?  00010000-00011000 and 00011000-00012000 are not different pages?
Comment 5 dave@hiauly1.hia.nrc.ca 2011-02-14 03:40:21 UTC
On Mon, 14 Feb 2011, amodra at gmail dot com wrote:

> > These in fact point to the same page in physical memory.
> 
> Really?  00010000-00011000 and 00011000-00012000 are not different pages?

They map to the same page as far as I can tell (both maps appear in the
list iterated using vma_prio_tree_foreach(mpnt, &iter, &mapping->i_mmap,
pgoff, pgoff)).  This can also be seen by looking at /proc/$PID/maps.
When multiple shared writeable mappings exist, I believed they are COWed.
So, effectively only one map is writeable.

Non equivalent aliases are a problem for architectures such as PA8800/PA8900.
They don't support non equivalent aliases in the sense that a write doesn't
invalidate non equivalent aliases.  The only thing that saves us is the
former address range is write protected, and it's rare to try to read using
the text map.  It seems possible that the text map could be corrupted
via the data map.  So, this might be a security issue.

The V-Class machines are even worse than PA8800/PA8900 because they don't
support non equivalent aliases regardless of whether they are read-only
or not.

These non equivalent aliases occur typically on the boundary page between
text and data.  The linux dynamic loader mmaps these regions as MAP_FIXED.
They are not mapped with MAP_SHARED but it seems the maps are shared
for shared libraries.  So far, it seems the hppa linux dynamic loader
always maps shared pages with equivalent aliases except for the boundary
page.

I think this is potentially an issue for certain MIPS and ARM cpus but
I don't know the details on whether they support non equivalent aliases
or not.  As far as I can tell, the same occurs for x86, etc, but I don't
think the non equivalent aliases matter, at least on linux.  On the
other hand, it looks like windows starts sections on page boundaries.

Probably, it would be best if load segments were aranged in executables
to optionally start on a file page boundary.  This would avoid the double
flush and having two non equivalent address ranges map to the same page.
Don't really want to start all sections on a page boundary as this would
waste a lot of file space.

I have looked a bit at trying to do this, but don't have a solution at
the moment.

Dave
Comment 6 Alan Modra 2011-02-14 03:58:32 UTC
If they are the same page, doesn't that mean your maxpagesize is wildly incorrect?  You must have maxpagesize at least as large as a memory page.
Comment 7 dave@hiauly1.hia.nrc.ca 2011-02-14 14:51:27 UTC
> If they are the same page, doesn't that mean your maxpagesize is wildly
> incorrect?  You must have maxpagesize at least as large as a memory page.

maxpagesize is set to 0x1000 which is the standard page size for parisc
linux.

The issue is not the virtual addresses of the page but the placement
of the loadable segments in the file.  These segments are mmap'd from
the file to physical memory.  Although the pages could be different
in memory, they are not.

The issue can be seen by looking at the mappings for a trivial program like:

int main () { return 0 }

For the the main executable, there is only one file page for text
and data.  Run program under gdb with a break on main.  Then inspect
the mappings.

The two mappings could be made equivalent, but this messes up shared
library support.

The hardware can support larger page sizes, but as far as I know nobody
uses them on linux.  There was some effort to provide support for larger
page sizes in the linux kernel but I don't believe the cache flush and
TLB support is complete.

Possibly, ELF_COMMONPAGESIZE should be defined and ELF_MAXPAGESIZE
increased to the linux kernel maximum.

Dave
Comment 8 Ian Lance Taylor 2011-02-14 16:35:25 UTC
It sounds like you are saying that the PA does not operate as most other processors do.  It seems very odd to me that the PA Linux kernel can not map the same file page to two different locations in virtual memory.  That is what the Linux kernel does on other processors.  You are talking about two mappings to the same physical page, but that is not what the ELF executable is requesting.  It is requesting two different virtual pages mapping to the same page in the file.

If that is indeed a limitation of PA Linux, then the only fix is to change the default linker script so that the first page of the data segment does not overlap with the last page of the text segment.  The way to do that is to set DATA_ADDR in the appropriate ld/emulparams file.  For example,

DATA_ADDR="ALIGN(${MAXPAGESIZE})"

That should always force the data segment to start on a new page.
Comment 9 dave@hiauly1.hia.nrc.ca 2011-02-14 17:26:00 UTC
On Mon, 14 Feb 2011, ian at airs dot com wrote:

> If that is indeed a limitation of PA Linux, then the only fix is to change the
> default linker script so that the first page of the data segment does not
> overlap with the last page of the text segment.  The way to do that is to set
> DATA_ADDR in the appropriate ld/emulparams file.  For example,
> 
> DATA_ADDR="ALIGN(${MAXPAGESIZE})"
> 
> That should always force the data segment to start on a new page.

I'll take another look but I think this only changes the virtual address
of the data segment and not the underlying file organization.

Looking at a typical file compiled for hppa64-hpux with readelf, we have:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
		   FileSiz            MemSiz              Flags  Align

  LOAD           0x0000000000000000 0x4000000000000000 0x0000000000000000
                 0x0000000000024bc8 0x0000000000024bc8  R E    8
  LOAD           0x0000000000025000 0x8000000100000000 0x0000000000000000
	         0x0000000000002ef0 0x0000000000003550  RW     8

Note the file offset for the data segment starts on a page boundary.
I think that I need to achieve the same on hppa-linux.

Dave
Comment 10 Ian Lance Taylor 2011-02-14 21:46:11 UTC
If you arrange for the data section to start on a new page, then the linker will always put that data section on a new page in the file.  It has to, because ELF specifies that the file offset is always equal to the page offset modulo the page size (unless you link with -N or -n).
Comment 11 dave@hiauly1.hia.nrc.ca 2011-02-14 23:22:20 UTC
On Mon, 14 Feb 2011, ian at airs dot com wrote:

> If you arrange for the data section to start on a new page, then the linker
> will always put that data section on a new page in the file.  It has to,
> because ELF specifies that the file offset is always equal to the page offset
> modulo the page size (unless you link with -N or -n).

Ah, that's the clue!  Testing fix.

Dave
Comment 12 Sourceware Commits 2011-02-18 18:20:32 UTC
CVSROOT:	/cvs/src
Module name:	src
Changes by:	danglin@sourceware.org	2011-02-18 18:20:29

Modified files:
	ld             : ChangeLog 
	ld/emulparams  : hppalinux.sh 

Log message:
	PR ld/12376
	emulparams/hppalinux.sh (DATA_ADDR): Define.
	(SHLIB_DATA_ADDR): Likewise.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/ChangeLog.diff?cvsroot=src&r1=1.2283&r2=1.2284
http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/emulparams/hppalinux.sh.diff?cvsroot=src&r1=1.14&r2=1.15
Comment 13 John David Anglin 2011-02-18 18:30:42 UTC
Fixed.
Comment 14 Sourceware Commits 2011-03-14 02:35:11 UTC
CVSROOT:	/cvs/src
Module name:	src
Branch: 	binutils-2_21-branch
Changes by:	danglin@sourceware.org	2011-03-14 02:35:08

Modified files:
	ld             : ChangeLog 
	ld/emulparams  : hppalinux.sh 

Log message:
	Backport from mainline:
	2011-02-18  John David Anglin  <dave.anglin@nrc-cnnrc.gc.ca>
	
	PR ld/12376
	emulparams/hppalinux.sh (DATA_ADDR): Define.
	(SHLIB_DATA_ADDR): Likewise.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/ChangeLog.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.2222.2.15&r2=1.2222.2.16
http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/emulparams/hppalinux.sh.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.14&r2=1.14.10.1
Comment 15 Sourceware Commits 2016-03-03 03:09:00 UTC
The master branch has been updated by H.J. Lu <hjl@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7bcc503f3ef52fcac0d9be31f1b82440ec7ed2ff

commit 7bcc503f3ef52fcac0d9be31f1b82440ec7ed2ff
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Mar 2 19:07:01 2016 -0800

    Skip ld-elf/pr19162.d for hppa-*-*
    
    ld-elf/pr19162.d fails for hppa-*-* since Dave Anglin's fix for PR 12376
    makes the data segment always start on a page boundary.
    
    	* testsuite/ld-elf/pr19162.d: Skip hppa-*-*.