This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 00/19] libctf, and CTF support for objdump and readelf

From: Nick Alcock <nick dot alcock at oracle dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: <binutils at sourceware dot org>
Date: Fri, 03 May 2019 15:23:46 +0100
Subject: Re: [PATCH 00/19] libctf, and CTF support for objdump and readelf
References: <20190430225706.159422-1-nick.alcock@oracle.com> <alpine.DEB.2.21.1905021508230.4027@digraph.polyomino.org.uk>

[looking at your comments first because you were so very helpful last
 time I contributed to glibc. :) ]

(And thank you! I haven't done quite everything you suggested, at least
not yet, but the 90% I have done is entirely beneficial and you spotted
a lot of things I overlooked.)

On 2 May 2019, Joseph Myers spake thusly:

> This patch series introduces a dependency of binutils on libctf.
>
> This means libctf should be portable to the same range of hosts as 
> binutils, all of which can be used as hosts for cross toolchains for a 
> range of targets (ELF and non-ELF).  For example, it should be portable to 
> hosts such as MinGW or OS X.

Seems sensible. It might lose some functionality, though, at least to
start with. (Say, sharing string tables with the overarching container,
opening CTF files by handing them an fd to a containing binary, etc.
There are alternatives callers can use in all these cases.)

I'll probably arrange for the deduplicating linker plugin to be
ELF-only, at least to start with, because I have no way to test on
anything else, and it might always keep strings and symbols internal to
the CTF file rather than trying to share them with the enclosing binary,
until someone else contributes that sort of thing for non-ELF.

> Some apparent portability issues in this code include:
>
> * Use of dlfcn.h.  Such use in existing binutils code (e.g. bfd/plugin.c) 
> is conditional, to avoid trying to use it on hosts without that 
> functionality.

This was used by ancient code in the OpenSolaris days that endeavoured
to dlopen() zlib to avoid linking against it (why one would want to
avoid linking against zlib is opaque to me). No user these days:
dropped.

> * Use of sys/mman.h.  Again, mmap usage in existing code is appropriately 
> conditional.

We can fall back to copying or malloc in that situation, in most cases.

However, the CTF archive code would be made significantly more
complicated, more than cancelling out the implementation simplicity
which was one reason for using mmap() there in the first place. So for
now my no-mmap() CTF archive code just fails: callers can detect the
failure and fall back to storing CTF containers separately in that case.
(Both reading and writing fail symmetrically, so you aren't going to end
up creating containers you then can't read.)

If there are really still platforms relevant outside industrial museums
without mmap(), we can rethink this, but I bet there aren't, or that any
such platforms aren't going to be storing huge numbers of CTF containers
in any case. (The use case for this is if you have so many TUs that you
can't store one per section without risking blowing the 64K section
limit. Any machine new enough to be dealing with anything in that size
range is going to have mmap() as well, right? Or something we can use
instead of it with similar semantics...)


Note that it's only *creating* CTF archives without mmap() that is too
horrible to countenance. It is relatively easy to support reading CTF
archives on non-mmap-supporting systems, if quite inefficiently, so we
could arrange to fall back to read-and-copy in that case, allowing
people in cross environments to not need to worry about whether their
target supports mmap() before creating CTF archives. This might be a
reasonable middle ground, perhaps?


(Added fallbacks for mmap() in all cases but CTF archives: as noted
above, we can add fallbacks for archive usage, too, just not creation.)

(oh btw you missed a bit: we use pread() too, and badly, ignoring the
possibility of short reads or -EINTR returns. Fixing, and adding a
fallback for that as well.)

> * Use of sys/errno.h.  The standard name is errno.h.

Ancient historical wart: fixed, thank you! How did I miss that?!

> * Use of elf.h.  Non-ELF hosts won't have such a header.  You should be 
> working with the existing include/elf/*.h definitions of ELF data 
> structures in binutils.

This is all for build hosts that aren't ELF, right? I don't think we can
reasonably expect ctf_open() or ctf_fdopen() to work for anything but
raw CTF files on non-ELF hosts, given that by their very nature these
functions are getting CTF data out of ELF sections, and non-ELF formats
don't necessarily support anything like the named section concept ELF
has got at all.

The only other ELF-specificity is looking up types by symbol table
offset. Again, I don't know enough about non-ELF platforms to know if
this concept is even practical there, which would mean the data object
and function info sections would be empty on such hosts, and
ctf_lookup_by_symbol(), ctf_func_info() and ctf_func_args() would not
function or would be excluded from the set of exported symbols entirely.

This would reduce libctf's utility, but not eliminate it: external
systems can still look up types by name or CTF type ID even if they
can't do it by symbol.

It is possible that such things could be revived: all we'd need for a
given non-ELF platform would be a way to consistently split whatever
they use as a symbol table into an ordered list of data and function
objects that could be referenced by those CTF sections. However, for
now, this functionality is intrinsically ELF-only in the sense that
nobody has ever considered how it might work on non-ELF platforms and it
certainly has no users there.

However, for now we can do a little better than this: see below.

> * Use of gelf.h.  This seems to be something from some versions of libelf, 
> which isn't an existing build dependency of binutils at all (and given the 
> existence of multiple, incompatible versions of libelf, one should be wary 
> of depending on it).  The only gelf.h I have locally here is in a checkout 
> of prelink sources.  Again, use existing ELF structures in headers present 
> in binutils.

This is a historical thing: libelf was of course part of Solaris so its
usage was pervasive, even when unnecessary, as here. What we're actually
using is a few datatypes, nothing more: the Elf64_Sym, from <elf.h> (on
Linux, provided by glibc), the Elf*_GHdr and Elf*_SHdr, and the
primitive ELF-sized datatypes like Elf64_Word that those structures use.

I don't see any immediate replacement for most of this stuff in
binutils, even though I'd expect to find it: the Elf64_External_Sym's
members are entirely the wrong type (char arrays), and there doesn't
seem to be any common non-architecture-dependent structure with more
useful types at all!

Elf64_Internal_Sym is very bfd-specific (and I'm trying not to have
libctf depend on libbfd unnecessarily, since it needs little of its
functionality), and the code in readelf that mocks up an internal_sym
from file data spends almost all its time getting around the problem
that its datatypes are different from the (standard-guaranteed) data
types in the ELF file itself. This is more futzing about than seems sane
given that we're not using the rest of bfd at all.

So I'd rather find a way to do the simple 'get a bit of very simple data
out of an ELF file we have an fd to (symbol lookups and a couple of
section lookups)' without needing to rejig everything to use bfd just to
do that, particularly given that libctf's APIs that involve the caller
passing info corresponding to a section into libctf do not require the
caller to use bfd and I have not the least idea how to go from
data+size-and-no-fd to a bfd_asection (it's probably not possible).


I could just copy the (fairly small number of) relevant types from
glibc's installed elf.h header into the ctf internals (the license is
compatible, after all, as is the copyright ownership), using a different
(CTF-internal) name to avoid clashes causing trouble at build time.
Would that be acceptable?

This lets us operate unchanged on non-ELF hosts and when not targetting
ELF, and leave this code in and even functional in that situation: it
detects ELF files by their magic number, which will presumably never
match things passed in to ctf_open() on non-ELF targets, and nothing
would ever generate contents for the function info or data object
sections on such non-ELF targets either (until we figured out how to do
so), so the ELF-specific code involved in reading those sections is also
not a problem.

Adding more magic numbers for more executable types is possible: if we
started handling COFF or PE or Mach-O or something like that, we would
probably soon hit a stage where it would become useful to start using
some bfd abstractions, but I think the time is not yet. (I don't know
enough about these formats to know if they even *have* named sections.)

> * Use of byteswap.h and endian.h.  Such headers are not portably 
> available.  Note how byteswap.h usage in gold / elfcpp is appropriately 
> conditional.

Makes sense. I can easily arrange to use code like elfcpp does in that
case.

... (done.)

Follow-Ups:
- Re: [PATCH 00/19] libctf, and CTF support for objdump and readelf
  - From: Pedro Alves

References:
- Re: [PATCH 00/19] libctf, and CTF support for objdump and readelf
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]