[PATCH] elfclassify tool

Mark Wielaard mark@klomp.org
Fri Jul 19 13:24:00 GMT 2019


Hi,

Some answers to this older discussion to explain some of my recent
commits suggested for elfclassify.

On Tue, 2019-04-16 at 13:38 +0200, Florian Weimer wrote:
> * Mark Wielaard:
> > --elf PATH return 0 whenever the file can be opened and a minimal ELF
> > header can be read (it might not be a completely valid ELF file). Do we
> > want or need to do any more verification (e.g. try to get the full ELF
> > header, walk through all phdrs and shdrs)?
> If we ever need that, I think we should add it as separate options,
> detecting both separate debuginfo and regular ELF files.

You are probably right that a real verification isn't the task of
elfclassify. We already have elflint. But I did make --elf required by
default (with as check simply that libelf could open the file without
marking it as ELF_K_NONE). The user can disable it again with --not-
elf. It felt slightly odd to not have any filter active by default.

I did add two "sub" classifications --elf-file and --elf-archive
because I believe people often make a distinction between "normal" ELF
files and ELF archives (containers of ELF objects).

I also added detection of --debug-only ELF file using the heuristic
that Frank also came up with for the dbgserver (contains .debug_*
sections, but no allocated SHT_PROGBIT sections).

> > --unstripped (not yet implemented) would be a classification that
> > indicates whether the ELF file can be stripped (further), that is has a
> > .symtab (symbol table), .debug_* sections (and possibly any non-
> > loadable sections -- "file" only detects the first two).
> 
> Some non-allocated sections are expected in stripped binaries:
> .gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this
> context.  I'm not sure if we should flag any other non-allocated section
> in this way.

I don't think those sections need special flagging.

> > I am not sure --file=PATH is a useful option.
> 
> It's useful for determining if the file exists and can be mapped.
> 
> > But maybe we need some way to indicate whether a file is a real file or
> > a symlink? But the current implementation returns 0 even for symlinks.
> > As do every other option (if the file is a symlink to an ELF file of
> > the requested classification). Is this what we want? I would suggest
> > that we return 1 for anything that is not a regular file. But that
> > would mean that for example eu-elfclassify --executable=/proc/$$/exe
> > would also return 1 (currently it returns 0, which might be helpful in
> > some cases).
> 
> I don't know what RPM needs in this context.  I expect that it can
> easily filter out non-regular files.  My problem with symbolic link
> detection is that it is so inconsistent—it generally applies to the
> final pathname component, and that does not look useful to me.

Since --file was available again I reused it to indicate that the final
pathname component should be a regular file (not a symlink or special
device). I think that is useful in general. But you are right there are
other tools to filter out symlinks/special device files. It was useful
for quick and dirty testing though.

> > --loadable basically checks whether the given ELF file is not an object
> > (ET_REL) file, so it will return 0 for either an executable, a shared
> > object or core file, but not check whether any other attribute (like
> > whether it has program headers and/or loadable segments). Personally I
> > would like it if this at least included a check for a PT_LOAD segment.
> 
> Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible?
> It is possible to have mostly empty objects, after all.

I did add this extra check, because I am a little paranoid about having
to deal with totally broken ELF files. My reasoning was basically that
without a PT_LOAD there is nothing in memory to load/execute.

> > This does not classify kernel modules as loadable objects.
> > rpm does contain a check for that, it might make sense to include that
> > as a separate classification in elfclassify --kernel-module.
> > 
> > Kernel modules are also special because they can be compressed ELF
> > files. Do we want to support that? (It is easy with gelf_elf_begin).
> > That could for example be an flag/option like --compressed which can be
> > combined with any other classification option?
> 
> How relevant are kernel modules to eu-elfclassify?
> 
> Is path-based detection feasible for kernel modules?

Sadly kernel modules often need special treatment that you wouldn't
need for "normal" ET_REL files. path-based detection is only partially
possible (rpm used to use the extension name, but that was fragile). So
I added an option --linux-kernel-module that detects them. I also added
a -z option to try to detect ELF files inside compressed images because
it was easy and because these days kernel modules are often compressed.

> > I think another useful classification would be --debugfile which
> > succeeds if the primary function of the given ELF file is being a
> > separete debug file (basically .debug, .dwo or dwz .multi file) which
> > cannot be linked and loaded on its own
> 
> That is very difficult to detect reliably, unfortunately, and would best
> be implemented in lib(g)elf itself because it would be generally useful,
> for all kinds of tools which expect to process real ELF files only.

I think the heuristic mentioned above for --debug-only works pretty
well. I haven't spotted false positives yet. It is hard to do this in
libelf because the public interface is mostly locked and libelf treats
the segment and section views completely independently.

> > But I think you don't want to use
> > ARGP_NO_EXIT. That causes standard options like --version and --
> > help to
> > not exit (with success). Which is generally what we want.
> > We do want to want --version and --help to not return an error
> > indicator (this is actually checked by make distcheck).
> 
> I want to exit with status 2 on usage error.  I couldn't make that
> happen without ARGP_NO_EXIT.  I'm open to different suggestions.

There is one patch in my tree that does remove the ARGP_NO_EXIT
(because it breaks --help and friends). It documents that usage errors
produce an status code of 64. Processing errors cause a status of 2.
And matching results produce a status exit of 0 or 1.

If you really want to produce 2 for usage errors then we could try to
check things during ARGP_KEY_FINI. But I don't think it is too bad for
usage errors to produce a different status code from process errors.

Another change I made is that process errors (either opening/reading
files or parsing the ELF structures) exit the elfclassify directly. The
status is recorded and the next file is processed first. Only once all
inputs have been classified is the exit status set.

> > That is odd, I assumed !S_ISREG would by true for symlinks.
> 
> No, the open followed the symbolic link.  This is needed for rejecting
> directories.  I've added a comment.

I finally settled on treating directories special (and immediately
"reject" them). I think this matches mostly how you treated them
originally.

> > > +  if (verbose)
> > > +    {
> > > +      fprintf (stderr, "info: ELF type: %d\n", elf_type);
> > > +      if (has_program_interpreter)
> > > +        fputs ("info: program interpreter found\n", stderr);
> > 
> > You might want to print the program interpreter here.
> 
> I can't do that without detecting first if the file is separated
> debuginfo.  (Separated debuginfo has a program header that points
> nowhere.)

Yes. That is a bit of a pain :{ I didn't try this myself. It does seem
too tricky.

> > > +      if (has_dynamic)
> > > +        fputs ("info: dynamic segment found\n", stderr);
> > > +      if (has_soname)
> > > +        fputs ("info: soname found\n", stderr);
> > 
> > You might want to print the soname found here.
> 
> This needs access to the dynamic string table.  I don't know how easy
> this is to implement.

I also didn't do this. It shouldn't be that hard. We should probably
use the PT headers instead of the sections. But it was indeed more work
than I had.

> > If you want to only allow one classification at a time you should check
> > whether command is already set and call something like:
> > argp_error (state, N_("Can only use one classification at a time."));
> 
> I tried that, but I ran into issues with that.  It also breaks --help
> with multiple/conflicting flags.
> 
> I'm trying to come up with a different command line syntax anyway.

I like the new command line syntax. Especially the --stdin[0] is very
powerful! I tried briefly to detect "conflicting" classification
options. But it isn't easy to do so consistently. So just let the user
do whatever they want, even if that means nothing will ever match all
classifications requested.

Cheers,

Mark



More information about the Elfutils-devel mailing list