By default, the linker places sections in sequential order in executable files for ELF targets like hppa-unknown-linux-gnu. The glibc dynamic linker uses mmap with MAP_FIXED to map PT_LOAD segments in the executable file to the virtual address ranges specified in the program headers. At the boundary between segments, we may have two different virtual address pages with different protections mapping to the same physical address page. This results in two inequivalent aliases to the same physical page. This is inefficient, and certain hardware can lead to cache corruption and segmentation faults. For example, the HP PA8800 and PA8900 processors do not support inequivalent aliases. The issue can be seen with the following simple program: int main () { return 0; } Compiling with gcc -o xxx3 xxx3.c, I see with readelf: ELF Header: Magic: 7f 45 4c 46 01 02 01 03 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, big endian Version: 1 (current) OS/ABI: UNIX - Linux ABI Version: 0 Type: EXEC (Executable file) Machine: HPPA Version: 0x1 Entry point address: 0x10354 Start of program headers: 52 (bytes into file) Start of section headers: 2880 (bytes into file) Flags: 0x210, PA-RISC 1.1 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 7 Size of section headers: 40 (bytes) Number of section headers: 31 Section header string table index: 28 Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 00010114 000114 00000d 00 A 0 0 1 [ 2] .note.ABI-tag NOTE 00010124 000124 000020 00 A 0 0 4 [ 3] .note.gnu.build-i NOTE 00010144 000144 000000 00 A 0 0 4 [ 4] .hash HASH 00010144 000144 000034 04 A 5 0 4 [ 5] .dynsym DYNSYM 00010178 000178 000080 10 A 6 1 4 [ 6] .dynstr STRTAB 000101f8 0001f8 000080 00 A 0 0 1 [ 7] .gnu.version VERSYM 00010278 000278 000010 02 A 5 0 2 [ 8] .gnu.version_r VERNEED 00010288 000288 000020 00 A 6 1 4 [ 9] .rela.dyn RELA 000102a8 0002a8 00000c 0c A 5 0 4 [10] .rela.plt RELA 000102b4 0002b4 000048 0c A 5 23 4 [11] .init PROGBITS 000102fc 0002fc 000048 00 AX 0 0 4 [12] .text PROGBITS 00010344 000344 0003e0 00 AX 0 0 4 [13] .fini PROGBITS 00010724 000724 000028 00 AX 0 0 4 [14] .rodata PROGBITS 0001074c 00074c 000018 00 A 0 0 4 [15] .PARISC.unwind PROGBITS 00010764 000764 0000e0 04 A 0 12 4 [16] .eh_frame_hdr PROGBITS 00010844 000844 000014 00 A 0 0 4 [17] .eh_frame PROGBITS 00010858 000858 000034 00 A 0 0 4 [18] .ctors PROGBITS 0001188c 00088c 000008 00 WA 0 0 4 [19] .dtors PROGBITS 00011894 000894 000008 00 WA 0 0 4 [20] .jcr PROGBITS 0001189c 00089c 000004 00 WA 0 0 4 [21] .dynamic DYNAMIC 000118a0 0008a0 0000c8 08 WA 6 0 4 [22] .data PROGBITS 00011968 000968 000008 00 WA 0 0 4 [23] .plt PROGBITS 00011970 000970 00004c 08 WAX 0 0 4 [24] .got PROGBITS 000119bc 0009bc 00001c 04 WA 0 0 4 [25] .bss NOBITS 000119d8 0009d8 000014 00 WA 0 0 4 [26] .note NOTE 00000000 0009d8 000028 00 0 0 1 [27] .comment PROGBITS 00000000 000a00 000038 01 MS 0 0 1 [28] .shstrtab STRTAB 00000000 000a38 000106 00 0 0 1 [29] .symtab SYMTAB 00000000 001018 0004c0 10 30 56 4 [30] .strtab STRTAB 00000000 0014d8 0002b1 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific) There are no section groups in this file. Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x00010034 0x00010034 0x000e0 0x000e0 R E 0x4 INTERP 0x000114 0x00010114 0x00010114 0x0000d 0x0000d R 0x1 [Requesting program interpreter: /lib/ld.so.1] LOAD 0x000000 0x00010000 0x00010000 0x0088c 0x0088c R E 0x1000 LOAD 0x00088c 0x0001188c 0x0001188c 0x0014c 0x00160 RWE 0x1000 DYNAMIC 0x0008a0 0x000118a0 0x000118a0 0x000c8 0x000c8 RW 0x4 NOTE 0x000124 0x00010124 0x00010124 0x00020 0x00020 R 0x4 GNU_EH_FRAME 0x000844 0x00010844 0x00010844 0x00014 0x00014 R 0x4 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .text .fini .rodata .PARISC.unwind .eh_frame_hdr .eh_frame 03 .ctors .dtors .jcr .dynamic .data .plt .got .bss 04 .dynamic 05 .note.ABI-tag 06 .eh_frame_hdr As can be seen, the .ctors section follows .eh_frame in the file with only an adjustment for its alignment. It is not page aligned in the file. If we now look at how it is mapped into memory by the dynamic loader and linux kernel, we see the following: dave@gsyprf11:~$ cat /proc/26767/maps 00010000-00011000 r-xp 00000000 08:30 6615849 /home2/dave/inequiv/xxx3 00011000-00012000 rwxp 00000000 08:30 6615849 /home2/dave/inequiv/xxx3 40000000-40005000 rw-p 00000000 00:00 0 40175000-40195000 r-xp 00000000 08:03 2502157 /lib/ld-2.11.2.so 40195000-40199000 rwxp 0001f000 08:03 2502157 /lib/ld-2.11.2.so 40199000-4019a000 rwxp 00000000 00:00 0 403ee000-40547000 r-xp 00000000 08:03 2502172 /lib/libc-2.11.2.so 40547000-4054e000 rwxp 00158000 08:03 2502172 /lib/libc-2.11.2.so 4054e000-40550000 rwxp 00000000 00:00 0 fdf00000-fdf23000 rwxp 00000000 00:00 0 [stack] As can be seen in the first two maps, we have two maps pointing to the same page in the file xxx3. These in fact point to the same page in physical memory. So, the second map can apparently write to the address range protected in the first map. However, my main concern is the inequivalent aliases to physical memory. The Open Group rationale for mmap mentions the following issues: "If an application requests a mapping that would overlay existing mappings in the process, it might be desirable that an implementation detect this and inform the application. However, the default, portable (not MAP_FIXED) operation does not overlay existing mappings. On the other hand, if the program specifies a fixed address mapping (which requires some implementation knowledge to determine a suitable address, if the function is supported at all), then the program is presumed to be successfully managing its own address space and should be trusted when it asks to map over existing data structures. Furthermore, it is also desirable to make as few system calls as possible, and it might be considered onerous to require an munmap() before an mmap() to the same address range. This volume of IEEE Std 1003.1-2001 specifies that the new mappings replace any existing mappings, following existing practice in this regard. It is not expected, when the Memory Protection option is supported, that all hardware implementations are able to support all combinations of permissions at all addresses. When this option is supported, implementations are required to disallow write access to mappings without write permission and to disallow access to mappings without any access permission. Other than these restrictions, implementations may allow access types other than those requested by the application. For example, if the application requests only PROT_WRITE, the implementation may also allow read access. A call to mmap() fails if the implementation cannot support allowing all the access requested by the application. For example, some implementations cannot support a request for both write access and execute access simultaneously. All implementations supporting the Memory Protection option must support requests for no access, read access, write access, and both read and write access. Strictly conforming code must only rely on the required checks. These restrictions allow for portability across a wide range of hardware." My impression from this is that it is not up to the kernel to detect overlapping maps. As far as the linker goes, I can see two options: 1) Adjust hppalinux.sh to modify TEXT_ADDR and SHLIB_TEXT_ADDR. I need to add 0x400000 (4 MB) to make the addresses equivalent. This rather chews up the virtual address space for shared libraries. 2) Align the PT_LOAD segments in the file. I'm not sure how to do this at the moment. Currently, maxpagesize is 0x1000, so this might not be a too onerous increase in file size. On the otherhand, maxpagesize probably should be 0x10000.
This might also be a glibc bug: $ strace /lib/ld-2.11.2.so ./xxx3execve("/lib/ld-2.11.2.so", ["/lib/ld-2.11.2.so", "./xxx3"], [/* 17 vars */]) = 0 brk(0) = 0x4119a000 open("./xxx3", O_RDONLY) = 3 read(3, "\177ELF\1\2\1\3\0\0\0\0\0\0\0\0\0\2\0\17\0\0\0\1\0\1\3T\0\0\0004"..., 512) = 512 fstat64(3, {st_mode=0, st_size=4380866642020, ...}) = 0 getcwd("/home2/dave/inequiv", 128) = 20 mmap(0x10000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x10000 mmap(0x11000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0) = 0x11000 close(3) = 0 newuname({sys="Linux", node="gsyprf11", ...}) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 62087, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40192000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\2\1\3\0\0\0\0\0\0\0\0\0\3\0\17\0\0\0\1\0\2\0\350\0\0\0004"..., 512) = 512 fstat64(3, {st_mode=0, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000 mmap(NULL, 1449240, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x403ee000 mmap(0x40547000, 28672, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x158000) = 0x40547000 mmap(0x4054e000, 7448, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4054e000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40001000 mprotect(0x403ee000, 1413120, PROT_READ|PROT_WRITE) = 0 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40002000 mprotect(0x403ee000, 1413120, PROT_READ|PROT_EXEC) = 0 munmap(0x40192000, 62087) = 0 exit_group(0) = ? The same fd (3) is used for all mmap calls. Possibly, different physical memory pages would be allocated if different file descriptors were used.
(In reply to comment #0) > 2) Align the PT_LOAD segments in the file. I'm not sure how to do this > at the moment. Currently, maxpagesize is 0x1000, so this might not be > a too onerous increase in file size. On the otherhand, maxpagesize > probably should be 0x10000. It can changed by the maximum alignment of sections in the segment.
1) Adjust hppalinux.sh to modify TEXT_ADDR and SHLIB_TEXT_ADDR. I need to add 0x400000 (4 MB) to make the addresses equivalent. This rather chews up the virtual address space for shared libraries. The breaks glibc build. My sense is there is no reasonable binutils fix but I'm not sure.
> These in fact point to the same page in physical memory. Really? 00010000-00011000 and 00011000-00012000 are not different pages?
On Mon, 14 Feb 2011, amodra at gmail dot com wrote: > > These in fact point to the same page in physical memory. > > Really? 00010000-00011000 and 00011000-00012000 are not different pages? They map to the same page as far as I can tell (both maps appear in the list iterated using vma_prio_tree_foreach(mpnt, &iter, &mapping->i_mmap, pgoff, pgoff)). This can also be seen by looking at /proc/$PID/maps. When multiple shared writeable mappings exist, I believed they are COWed. So, effectively only one map is writeable. Non equivalent aliases are a problem for architectures such as PA8800/PA8900. They don't support non equivalent aliases in the sense that a write doesn't invalidate non equivalent aliases. The only thing that saves us is the former address range is write protected, and it's rare to try to read using the text map. It seems possible that the text map could be corrupted via the data map. So, this might be a security issue. The V-Class machines are even worse than PA8800/PA8900 because they don't support non equivalent aliases regardless of whether they are read-only or not. These non equivalent aliases occur typically on the boundary page between text and data. The linux dynamic loader mmaps these regions as MAP_FIXED. They are not mapped with MAP_SHARED but it seems the maps are shared for shared libraries. So far, it seems the hppa linux dynamic loader always maps shared pages with equivalent aliases except for the boundary page. I think this is potentially an issue for certain MIPS and ARM cpus but I don't know the details on whether they support non equivalent aliases or not. As far as I can tell, the same occurs for x86, etc, but I don't think the non equivalent aliases matter, at least on linux. On the other hand, it looks like windows starts sections on page boundaries. Probably, it would be best if load segments were aranged in executables to optionally start on a file page boundary. This would avoid the double flush and having two non equivalent address ranges map to the same page. Don't really want to start all sections on a page boundary as this would waste a lot of file space. I have looked a bit at trying to do this, but don't have a solution at the moment. Dave
If they are the same page, doesn't that mean your maxpagesize is wildly incorrect? You must have maxpagesize at least as large as a memory page.
> If they are the same page, doesn't that mean your maxpagesize is wildly > incorrect? You must have maxpagesize at least as large as a memory page. maxpagesize is set to 0x1000 which is the standard page size for parisc linux. The issue is not the virtual addresses of the page but the placement of the loadable segments in the file. These segments are mmap'd from the file to physical memory. Although the pages could be different in memory, they are not. The issue can be seen by looking at the mappings for a trivial program like: int main () { return 0 } For the the main executable, there is only one file page for text and data. Run program under gdb with a break on main. Then inspect the mappings. The two mappings could be made equivalent, but this messes up shared library support. The hardware can support larger page sizes, but as far as I know nobody uses them on linux. There was some effort to provide support for larger page sizes in the linux kernel but I don't believe the cache flush and TLB support is complete. Possibly, ELF_COMMONPAGESIZE should be defined and ELF_MAXPAGESIZE increased to the linux kernel maximum. Dave
It sounds like you are saying that the PA does not operate as most other processors do. It seems very odd to me that the PA Linux kernel can not map the same file page to two different locations in virtual memory. That is what the Linux kernel does on other processors. You are talking about two mappings to the same physical page, but that is not what the ELF executable is requesting. It is requesting two different virtual pages mapping to the same page in the file. If that is indeed a limitation of PA Linux, then the only fix is to change the default linker script so that the first page of the data segment does not overlap with the last page of the text segment. The way to do that is to set DATA_ADDR in the appropriate ld/emulparams file. For example, DATA_ADDR="ALIGN(${MAXPAGESIZE})" That should always force the data segment to start on a new page.
On Mon, 14 Feb 2011, ian at airs dot com wrote: > If that is indeed a limitation of PA Linux, then the only fix is to change the > default linker script so that the first page of the data segment does not > overlap with the last page of the text segment. The way to do that is to set > DATA_ADDR in the appropriate ld/emulparams file. For example, > > DATA_ADDR="ALIGN(${MAXPAGESIZE})" > > That should always force the data segment to start on a new page. I'll take another look but I think this only changes the virtual address of the data segment and not the underlying file organization. Looking at a typical file compiled for hppa64-hpux with readelf, we have: Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x4000000000000000 0x0000000000000000 0x0000000000024bc8 0x0000000000024bc8 R E 8 LOAD 0x0000000000025000 0x8000000100000000 0x0000000000000000 0x0000000000002ef0 0x0000000000003550 RW 8 Note the file offset for the data segment starts on a page boundary. I think that I need to achieve the same on hppa-linux. Dave
If you arrange for the data section to start on a new page, then the linker will always put that data section on a new page in the file. It has to, because ELF specifies that the file offset is always equal to the page offset modulo the page size (unless you link with -N or -n).
On Mon, 14 Feb 2011, ian at airs dot com wrote: > If you arrange for the data section to start on a new page, then the linker > will always put that data section on a new page in the file. It has to, > because ELF specifies that the file offset is always equal to the page offset > modulo the page size (unless you link with -N or -n). Ah, that's the clue! Testing fix. Dave
CVSROOT: /cvs/src Module name: src Changes by: danglin@sourceware.org 2011-02-18 18:20:29 Modified files: ld : ChangeLog ld/emulparams : hppalinux.sh Log message: PR ld/12376 emulparams/hppalinux.sh (DATA_ADDR): Define. (SHLIB_DATA_ADDR): Likewise. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/ChangeLog.diff?cvsroot=src&r1=1.2283&r2=1.2284 http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/emulparams/hppalinux.sh.diff?cvsroot=src&r1=1.14&r2=1.15
Fixed.
CVSROOT: /cvs/src Module name: src Branch: binutils-2_21-branch Changes by: danglin@sourceware.org 2011-03-14 02:35:08 Modified files: ld : ChangeLog ld/emulparams : hppalinux.sh Log message: Backport from mainline: 2011-02-18 John David Anglin <dave.anglin@nrc-cnnrc.gc.ca> PR ld/12376 emulparams/hppalinux.sh (DATA_ADDR): Define. (SHLIB_DATA_ADDR): Likewise. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/ChangeLog.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.2222.2.15&r2=1.2222.2.16 http://sourceware.org/cgi-bin/cvsweb.cgi/src/ld/emulparams/hppalinux.sh.diff?cvsroot=src&only_with_tag=binutils-2_21-branch&r1=1.14&r2=1.14.10.1
The master branch has been updated by H.J. Lu <hjl@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=7bcc503f3ef52fcac0d9be31f1b82440ec7ed2ff commit 7bcc503f3ef52fcac0d9be31f1b82440ec7ed2ff Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Mar 2 19:07:01 2016 -0800 Skip ld-elf/pr19162.d for hppa-*-* ld-elf/pr19162.d fails for hppa-*-* since Dave Anglin's fix for PR 12376 makes the data segment always start on a page boundary. * testsuite/ld-elf/pr19162.d: Skip hppa-*-*.