This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] elfclassify tool

From: Florian Weimer <fweimer at redhat dot com>
To: Mark Wielaard <mark at klomp dot org>
Cc: elfutils-devel at sourceware dot org, Panu Matilainen <pmatilai at laiskiainen dot org>
Date: Tue, 16 Apr 2019 13:38:30 +0200
Subject: Re: [PATCH] elfclassify tool
References: <87k1fz8c9q.fsf@oldenburg2.str.redhat.com> <2e6a27c552ae5e365db54ca6b432c77c9ad5b041.camel@klomp.org>

* Mark Wielaard:

>> I still need to implement an --unstripped option and fix the
>> iteration over the dynamic section.
>
> We did already discuss some of this off-list.

Thanks for summarizing the previous discussion.

> --elf PATH return 0 whenever the file can be opened and a minimal ELF
> header can be read (it might not be a completely valid ELF file). Do we
> want or need to do any more verification (e.g. try to get the full ELF
> header, walk through all phdrs and shdrs)?

If we ever need that, I think we should add it as separate options,
detecting both separate debuginfo and regular ELF files.

> --unstripped (not yet implemented) would be a classification that
> indicates whether the ELF file can be stripped (further), that is has a
> .symtab (symbol table), .debug_* sections (and possibly any non-
> loadable sections -- "file" only detects the first two).

Some non-allocated sections are expected in stripped binaries:
.gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this
context.  I'm not sure if we should flag any other non-allocated section
in this way.

> I am not sure --file=PATH is a useful option.

It's useful for determining if the file exists and can be mapped.

> But maybe we need some way to indicate whether a file is a real file or
> a symlink? But the current implementation returns 0 even for symlinks.
> As do every other option (if the file is a symlink to an ELF file of
> the requested classification). Is this what we want? I would suggest
> that we return 1 for anything that is not a regular file. But that
> would mean that for example eu-elfclassify --executable=/proc/$$/exe
> would also return 1 (currently it returns 0, which might be helpful in
> some cases).

I don't know what RPM needs in this context.  I expect that it can
easily filter out non-regular files.  My problem with symbolic link
detection is that it is so inconsistent—it generally applies to the
final pathname component, and that does not look useful to me.

> --loadable basically checks whether the given ELF file is not an object
> (ET_REL) file, so it will return 0 for either an executable, a shared
> object or core file, but not check whether any other attribute (like
> whether it has program headers and/or loadable segments). Personally I
> would like it if this at least included a check for a PT_LOAD segment.

Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible?
It is possible to have mostly empty objects, after all.

> This does not classify kernel modules as loadable objects.
> rpm does contain a check for that, it might make sense to include that
> as a separate classification in elfclassify --kernel-module.
>
> Kernel modules are also special because they can be compressed ELF
> files. Do we want to support that? (It is easy with gelf_elf_begin).
> That could for example be an flag/option like --compressed which can be
> combined with any other classification option?

How relevant are kernel modules to eu-elfclassify?

Is path-based detection feasible for kernel modules?

> I think another useful classification would be --debugfile which
> succeeds if the primary function of the given ELF file is being a
> separete debug file (basically .debug, .dwo or dwz .multi file) which
> cannot be linked and loaded on its own

That is very difficult to detect reliably, unfortunately, and would best
be implemented in lib(g)elf itself because it would be generally useful,
for all kinds of tools which expect to process real ELF files only.

> BTW. Florian, the extra options are certainly not required for you to
> implement to get eu-elfclassify accepted. They are just suggestions,
> which we might decide not to do/add. Or they can be added by others if
> they think they are useful.

Understood.  I would rather fix the command line syntax as a priority,
implement --unstripped, and add a test suite.

>> Suggestions for improving the argp/help output are welcome as
>> well.  I'm not familiar with argp at all.
>
> You usage of argp seems fine.

Trust me, it's been a struggle so far.

> But I think you don't want to use
> ARGP_NO_EXIT. That causes standard options like --version and --help to
> not exit (with success). Which is generally what we want.
> We do want to want --version and --help to not return an error
> indicator (this is actually checked by make distcheck).

I want to exit with status 2 on usage error.  I couldn't make that
happen without ARGP_NO_EXIT.  I'm open to different suggestions.

> I think we might want to avoid specific ELF concepts in the
> classification descriptors though. For example people might have a
> different concept of DSO.
>
>> I'm keeping a branch with these changes here:
>> 
>>   <https://pagure.io/fweimer/elfutils/commits/elfclassify>
>>
>
>> +/* Name and version of program.  */
>> +ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
>> +
>> +/* Bug report address.  */
>> +ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
>> +
>> +enum classify_command
>> +{
>> +  classify_file = 1000,
>> +  classify_elf,
>> +  classify_executable,
>> +  classify_shared,
>> +  classify_loadable
>> +};
>> +
>> +/* Set by parse_opt.  */
>> +static enum classify_command command;
>> +static const char *command_path;
>> +static int verbose;
>> +
>> +/* Set by map_file.  */
>> +static int file_fd = -1;
>
> map_file?
>
>> +static void
>> +open_file (void)
>> +{
>> +  if (verbose > 1)
>> +    fprintf (stderr, "debug: processing file: %s\n", command_path);
>> +
>> +  file_fd = open (command_path, O_RDONLY);
>> +  if (file_fd < 0)
>> +    {
>> +      if (errno == ENOENT)
>> +        exit (1);
>> +      else
>> +        error (2, errno, N_("opening %s"), command_path);
>> +    }
>> +  struct stat st;
>> +  if (fstat (file_fd, &st) != 0)
>> +    error (2, errno, N_("reading %s\n"), command_path);
>> +  if (!S_ISREG (st.st_mode))
>> +    exit (1);
>> +}
>
> That is odd, I assumed !S_ISREG would by true for symlinks.

No, the open followed the symbolic link.  This is needed for rejecting
directories.  I've added a comment.

>> +  if (verbose)
>> +    {
>> +      fprintf (stderr, "info: ELF type: %d\n", elf_type);
>> +      if (has_program_interpreter)
>> +        fputs ("info: program interpreter found\n", stderr);
>
> You might want to print the program interpreter here.

I can't do that without detecting first if the file is separated
debuginfo.  (Separated debuginfo has a program header that points
nowhere.)

>> +      if (has_dynamic)
>> +        fputs ("info: dynamic segment found\n", stderr);
>> +      if (has_soname)
>> +        fputs ("info: soname found\n", stderr);
>
> You might want to print the soname found here.

This needs access to the dynamic string table.  I don't know how easy
this is to implement.

>> +      if (has_pie_flag)
>> +        fputs ("info: PIE flag found\n", stderr);
>
> Maybe call it DF_1_PIE flag?

Changed.

>> +      if (has_dt_debug)
>> +        fputs ("info: DT_DEBUG found\n", stderr);
>> +    }
>> +}
>
>> +  /* This is probably a PIE program: there is no soname, but a program
>> +     interpreter.  In theory, this file could be also  */
>> +  if (has_program_interpreter)
>> +    return false;
>
> Comment seems to end abruptly.

Fixed.

>> +static error_t
>> +parse_opt (int key, char *arg, struct argp_state *state)
>> +{
>> +  switch (key)
>> +    {
>> +    case classify_file:
>> +    case classify_elf:
>> +    case classify_executable:
>> +    case classify_shared:
>> +    case classify_loadable:
>> +      command = key;
>> +      command_path = arg;
>> +      break;
>
> If you want to only allow one classification at a time you should check
> whether command is already set and call something like:
> argp_error (state, N_("Can only use one classification at a time."));

I tried that, but I ran into issues with that.  It also breaks --help
with multiple/conflicting flags.

I'm trying to come up with a different command line syntax anyway.

Thanks,
Florian

Follow-Ups:
- Re: [PATCH] elfclassify tool
  - From: Florian Weimer

References:
- [PATCH] elfclassify tool
  - From: Florian Weimer
- Re: [PATCH] elfclassify tool
  - From: Mark Wielaard

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]