[PATCH] elfclassify tool

Mon Apr 15 15:39:00 GMT 2019

Hi,

On Fri, 2019-04-12 at 17:38 +0200, Florian Weimer wrote:
> This patch adds an elfclassify tool, mainly for the benefit of RPM's
> find-debuginfo.sh.

I have CCed Panu to see if he has any input.

> I still need to implement an --unstripped option and fix the
> iteration over the dynamic section.

We did already discuss some of this off-list.

The basic idea is that we provide a replacement for using "file" as an
ELF file classifier. It currently provides the following options:

    --elf=PATH         Check if the file at PATH is a valid ELF object
    --executable=PATH  Check if the file at PATH is an ELF program
                       executable
    --file=PATH        Check PATH is file that can be read
    --loadable=PATH    Check if the file at PATH is a loadable object
                       (program or shared object)
    --shared=PATH      Check if the file at PATH is an ELF shared object
                       (DSO)
-v, --verbose          Output additional information (can be specified
                       multiple times)

The program returns 0 on success (the given PATH is if the requested
classification), return 1 on failure (the given PATH isn't of the
requested classification) or returns 2 on error.

Note that only one PATH can be given (the = is optional).

--elf PATH return 0 whenever the file can be opened and a minimal ELF
header can be read (it might not be a completely valid ELF file). Do we
want or need to do any more verification (e.g. try to get the full ELF
header, walk through all phdrs and shdrs)?

Where only one of --executable and --shared can be true for an ELF file.
They indicate whether the primary purpose of an ELF file is to be an
executable or a shared library (this is for example how rpm can make a
decision to strip or keep the symtab table, you might want to keep it
for an ELF file that is used primarily as a common shared library, but
not if it is primarily used as executable).

--unstripped (not yet implemented) would be a classification that
indicates whether the ELF file can be stripped (further), that is has a
.symtab (symbol table), .debug_* sections (and possibly any non-
loadable sections -- "file" only detects the first two).

I am not sure --file=PATH is a useful option.
But maybe we need some way to indicate whether a file is a real file or
a symlink? But the current implementation returns 0 even for symlinks.
As do every other option (if the file is a symlink to an ELF file of
the requested classification). Is this what we want? I would suggest
that we return 1 for anything that is not a regular file. But that
would mean that for example eu-elfclassify --executable=/proc/$$/exe
would also return 1 (currently it returns 0, which might be helpful in
some cases).

--loadable basically checks whether the given ELF file is not an object
(ET_REL) file, so it will return 0 for either an executable, a shared
object or core file, but not check whether any other attribute (like
whether it has program headers and/or loadable segments). Personally I
would like it if this at least included a check for a PT_LOAD segment.

This does not classify kernel modules as loadable objects.
rpm does contain a check for that, it might make sense to include that
as a separate classification in elfclassify --kernel-module.

Kernel modules are also special because they can be compressed ELF
files. Do we want to support that? (It is easy with gelf_elf_begin).
That could for example be an flag/option like --compressed which can be
combined with any other classification option?

I think another useful classification would be --debugfile which
succeeds if the primary function of the given ELF file is being a
separete debug file (basically .debug, .dwo or dwz .multi file) which
cannot be linked and loaded on its own

BTW. Florian, the extra options are certainly not required for you to
implement to get eu-elfclassify accepted. They are just suggestions,
which we might decide not to do/add. Or they can be added by others if
they think they are useful.

> Suggestions for improving the argp/help output are welcome as
> well.  I'm not familiar with argp at all.

You usage of argp seems fine. But I think you don't want to use
ARGP_NO_EXIT. That causes standard options like --version and --help to
not exit (with success). Which is generally what we want.
We do want to want --version and --help to not return an error
indicator (this is actually checked by make distcheck).

I think we might want to avoid specific ELF concepts in the
classification descriptors though. For example people might have a
different concept of DSO.

> I'm keeping a branch with these changes here:
> 
>   <https://pagure.io/fweimer/elfutils/commits/elfclassify>
>

> +/* Name and version of program.  */
> +ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
> +
> +/* Bug report address.  */
> +ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
> +
> +enum classify_command
> +{
> +  classify_file = 1000,
> +  classify_elf,
> +  classify_executable,
> +  classify_shared,
> +  classify_loadable
> +};
> +
> +/* Set by parse_opt.  */
> +static enum classify_command command;
> +static const char *command_path;
> +static int verbose;
> +
> +/* Set by map_file.  */
> +static int file_fd = -1;

map_file?

> +static void
> +open_file (void)
> +{
> +  if (verbose > 1)
> +    fprintf (stderr, "debug: processing file: %s\n", command_path);
> +
> +  file_fd = open (command_path, O_RDONLY);
> +  if (file_fd < 0)
> +    {
> +      if (errno == ENOENT)
> +        exit (1);
> +      else
> +        error (2, errno, N_("opening %s"), command_path);
> +    }
> +  struct stat st;
> +  if (fstat (file_fd, &st) != 0)
> +    error (2, errno, N_("reading %s\n"), command_path);
> +  if (!S_ISREG (st.st_mode))
> +    exit (1);
> +}

That is odd, I assumed !S_ISREG would by true for symlinks.

> +  if (verbose)
> +    {
> +      fprintf (stderr, "info: ELF type: %d\n", elf_type);
> +      if (has_program_interpreter)
> +        fputs ("info: program interpreter found\n", stderr);

You might want to print the program interpreter here.

> +      if (has_dynamic)
> +        fputs ("info: dynamic segment found\n", stderr);
> +      if (has_soname)
> +        fputs ("info: soname found\n", stderr);

You might want to print the soname found here.

> +      if (has_pie_flag)
> +        fputs ("info: PIE flag found\n", stderr);

Maybe call it DF_1_PIE flag?

> +      if (has_dt_debug)
> +        fputs ("info: DT_DEBUG found\n", stderr);
> +    }
> +}

> +  /* This is probably a PIE program: there is no soname, but a program
> +     interpreter.  In theory, this file could be also  */
> +  if (has_program_interpreter)
> +    return false;

Comment seems to end abruptly.

> +static bool
> +is_executable (void)
> +{
> +  if (!is_loadable ())
> +    return false;
> +
> +  /* A loadable object which is not a shared object is treated as an
> +     executable.  */
> +  return !is_shared ();
> +}
> +
> +static error_t
> +parse_opt (int key, char *arg, struct argp_state *state)
> +{
> +  switch (key)
> +    {
> +    case classify_file:
> +    case classify_elf:
> +    case classify_executable:
> +    case classify_shared:
> +    case classify_loadable:
> +      command = key;
> +      command_path = arg;
> +      break;

If you want to only allow one classification at a time you should check
whether command is already set and call something like:
argp_error (state, N_("Can only use one classification at a time."));

> +    case 'v':
> +      ++verbose;
> +      break;
> +
> +    case ARGP_KEY_ARG:
> +      argp_usage (state);
> +      exit (2);
> +    }
> +
> +  return 0;
> +}

Thanks,

Mark