[PATCH] Add pdb archive target

Jan Beulich jbeulich@suse.com
Fri Aug 12 06:14:09 GMT 2022


On 12.08.2022 01:26, Mark Harmstone wrote:
> On 11/8/22 14:02, Jan Beulich wrote:
>  > On 26.07.2022 01:44, Mark Harmstone wrote:
>  >> This adds support for the "Multi-Stream Format" container format that
>  >> MSVC uses for its PDB debugging files, as documented at
>  >> https://llvm.org/docs/PDB/MsfFile.html.
>  >
>  > Looking at binutils/testsuite/binutils-all/pdb*.d I wonder what "support"
>  > here means: What is dumped is the binary contents of the file (claimed
>  > to be coming from section .data) rather than the inner file structure.
> 
> I'm not quite sure what you're getting at. This is purely for PDB files as
> archives, there is no inner file structure. The tests check that the hex
> dump of the files matches one possible way to represent an archive of the
> dummy files.

If it's an archive (and hence can hold multiple files), then surely it
has an internal structure. Or else your patch also wouldn't be needed,
btw. Dumping a *.a file gives you an idea what's in the file. I would
have expected the same for *.pdb (to a reasonable extent at least).
One might then easily see number of members, block size, etc. Perhaps
even the sizes of the individual members.

>  > Also this looks to cover only one of several flavors/versions, which may
>  > want calling out here and which also may influence the naming of certain
>  > things throughout the patch.
> 
> I assume you're basing this off the "Microsoft C/C++ MSF 7.00" in the header.
> MSVC++ 7 came out in 2002, and it's been the same since... there's only one
> living version of the format.

Well, bad luck with my picking of an older version of VC then - I
ended up checking a VC6 PDB file, which is version 2.00.

>  >> --- /dev/null
>  >> +++ b/bfd/pdb.c
>  >> @@ -0,0 +1,804 @@
>  >> +/* BFD back-end for PDB Multi-Stream Format archives.
>  >> +   Copyright (C) 2022 Free Software Foundation, Inc.
>  >> +
>  >> +   This file is part of BFD, the Binary File Descriptor library.
>  >> +
>  >> +   This program is free software; you can redistribute it and/or modify
>  >> +   it under the terms of the GNU General Public License as published by
>  >> +   the Free Software Foundation; either version 3 of the License, or
>  >> +   (at your option) any later version.
>  >> +
>  >> +   This program is distributed in the hope that it will be useful,
>  >> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>  >> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>  >> +   GNU General Public License for more details.
>  >> +
>  >> +   You should have received a copy of the GNU General Public License
>  >> +   along with this program; if not, write to the Free Software
>  >> +   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
>  >> +   MA 02110-1301, USA. */
>  >> +
>  >> +/* This describes the MSF file archive format, which is used for the
>  >> +   PDB debug info generated by MSVC. See https://llvm.org/docs/PDB/MsfFile.html
>  >> +   for a full description of the format. */
>  >> +
>  >> +#include "sysdep.h"
>  >> +#include "bfd.h"
>  >> +#include "libbfd.h"
>  >> +
>  >> +/* "Microsoft C/C++ MSF 7.00\r\n\x1a\x44\x53\0\0\0" */
>  >> +static const uint8_t pdb_magic[] =
>  >> +{ 0x4d, 0x69, 0x63, 0x72, 0x6f, 0x73, 0x6f, 0x66,
>  >> +  0x74, 0x20, 0x43, 0x2f, 0x43, 0x2b, 0x2b, 0x20,
>  >> +  0x4d, 0x53, 0x46, 0x20, 0x37, 0x2e, 0x30, 0x30,
>  >> +  0x0d, 0x0a, 0x1a, 0x44, 0x53, 0x00, 0x00, 0x00 };
>  >> +
>  >> +#define arch_eltdata(bfd) ((struct areltdata *) ((bfd)->arelt_data))
>  >> +
>  >> +static bfd_cleanup
>  >> +pdb_archive_p (bfd *abfd)
>  >> +{
>  >> +  int ret;
>  >> +  char magic[sizeof (pdb_magic)];
>  >> +
>  >> +  ret = bfd_bread (magic, sizeof (magic), abfd);
>  >> +  if (ret != sizeof (magic))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_wrong_format);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  if (memcmp (magic, pdb_magic, sizeof (magic)))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_wrong_format);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  return _bfd_no_cleanup;
>  >> +}
>  >> +
>  >> +static bfd *
>  >> +pdb_get_elt_at_index (bfd *abfd, symindex sym_index)
>  >> +{
>  >> +  char int_buf[sizeof (uint32_t)];
>  >> +  uint32_t block_size, block_map_addr, block, num_files;
>  >> +  uint32_t first_dir_block, dir_offset, file_size, block_off, left;
>  >> +  char name[10];
>  >> +  bfd *file;
>  >> +  char *buf;
>  >> +
>  >> +  /* get block_size */
>  >> +
>  >> +  if (bfd_seek (abfd, sizeof (pdb_magic), SEEK_SET))
>  >> +    return NULL;
>  >> +
>  >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_malformed_archive);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  block_size = bfd_getl32 (int_buf);
>  >> +
>  >> +  /* get block_map_addr */
>  >> +
>  >> +  if (bfd_seek (abfd, 4 * sizeof (uint32_t), SEEK_CUR))
>  >> +    return NULL;
>  >> +
>  >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_malformed_archive);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  block_map_addr = bfd_getl32 (int_buf);
>  >> +
>  >> +  /* get num_files */
>  >> +
>  >> +  if (bfd_seek (abfd, block_map_addr * block_size, SEEK_SET))
>  >> +    return NULL;
>  >> +
>  >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_malformed_archive);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  first_dir_block = bfd_getl32 (int_buf);
>  >> +
>  >> +  if (bfd_seek (abfd, first_dir_block * block_size, SEEK_SET))
>  >> +    return NULL;
>  >> +
>  >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_malformed_archive);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  num_files = bfd_getl32 (int_buf);
>  >> +
>  >> +  if (sym_index >= num_files)
>  >> +    {
>  >> +      bfd_set_error (bfd_error_no_more_archived_files);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  /* read file size */
>  >> +
>  >> +  dir_offset = sizeof (uint32_t) * (sym_index + 1);
>  >> +
>  >> +  if (dir_offset >= block_size)
>  >> +    {
>  >> +      uint32_t block_map_addr_off;
>  >> +
>  >> +      block_map_addr_off = ((dir_offset / block_size) * sizeof (uint32_t));
>  >> +
>  >> +      if (bfd_seek (abfd, (block_map_addr * block_size) + block_map_addr_off,
>  >> +            SEEK_SET))
>  >> +    return NULL;
>  >> +
>  >> +      if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_malformed_archive);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +      block = bfd_getl32 (int_buf);
>  >> +    }
>  >> +  else
>  >> +    {
>  >> +      block = first_dir_block;
>  >> +    }
>  >> +
>  >> +  if (bfd_seek (abfd, (block * block_size) + (dir_offset % block_size),
>  >> +        SEEK_SET))
>  >> +    return NULL;
>  >> +
>  >> +  if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
>  >> +    {
>  >> +      bfd_set_error (bfd_error_malformed_archive);
>  >> +      return NULL;
>  >> +    }
>  >> +
>  >> +  file_size = bfd_getl32 (int_buf);
>  >> +
>  >> +  /* create BFD */
>  >> +
>  >> +  sprintf (name, "%04lx", sym_index);
>  >
>  > Is there a reason for this 4-or-more digits naming of the file? Would
>  > it make sense to use 8 digits (beyond which the index apparently
>  > cannot grow)?
> 
> In practice, 4 digits is plenty. The number of files in the archive is
> proportional to the number of object files linked into the image... for the
> NT kernel, which is probably the most complicated EXE out there, the PDB
> has 1,100 files. I can't imagine anybody will ever go over 65,535 - and it's
> not visible anyway, unless you play around with ar.

Since you know the number of files in the archive, may I suggest that you
base the number of digits on that number of members, such that all
elements would be extracted to files with names of identical length?

>  >> --- /dev/null
>  >> +++ b/binutils/testsuite/binutils-all/pdb.exp
>  >> @@ -0,0 +1,157 @@
>  >> +#   Copyright (C) 2022 Free Software Foundation, Inc.
>  >> +
>  >> +# This file is part of the GNU Binutils.
>  >> +#
>  >> +# This program is free software; you can redistribute it and/or modify
>  >> +# it under the terms of the GNU General Public License as published by
>  >> +# the Free Software Foundation; either version 3 of the License, or
>  >> +# (at your option) any later version.
>  >> +#
>  >> +# This program is distributed in the hope that it will be useful,
>  >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>  >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>  >> +# GNU General Public License for more details.
>  >> +#
>  >> +# You should have received a copy of the GNU General Public License
>  >> +# along with this program; if not, write to the Free Software
>  >> +# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
>  >> +
>  >> +proc pdb_archive_1 { } {
>  >> +    global AR
>  >> +    global OBJDUMP
>  >> +    global srcdir
>  >> +    global subdir
>  >> +
>  >> +    set testname "pdb archive 1"
>  >> +
>  >> +    file delete tmpdir/test.pdb
>  >> +
>  >> +    # add short file (less than block size)
>  >> +
>  >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile1"]
>  >> +    if ![string match "" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"]
>  >> +    if ![string match "rw-r--r-- 0/0      3 *0000*" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
>  >> +    set exp [file_contents "$srcdir/$subdir/pdb1.d"]
>  >> +    if ![string equal $got $exp] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0000 --output=tmpdir"]
>  >> +    if ![string match "" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [file_contents tmpdir/0000]
>  >> +    set exp [file_contents "$srcdir/$subdir/pdbfile1"]
>  >> +    if ![string equal $got $exp] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    pass $testname
>  >> +}
>  >> +
>  >> +proc pdb_archive_2 { } {
>  >> +    global AR
>  >> +    global OBJDUMP
>  >> +    global srcdir
>  >> +    global subdir
>  >> +
>  >> +    set testname "pdb archive 2"
>  >> +
>  >> +    # add empty file
>  >> +
>  >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb /dev/null"]
>  >> +    if ![string match "" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"]
>  >> +    if ![string match "*\nrw-r--r-- 0/0      0 *0001*" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
>  >> +    set exp [file_contents "$srcdir/$subdir/pdb2.d"]
>  >> +    if ![string equal $got $exp] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0001 --output=tmpdir"]
>  >> +    if ![string match "" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [file_contents tmpdir/0001]
>  >> +    if ![string equal $got ""] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    pass $testname
>  >> +}
>  >> +
>  >> +proc pdb_archive_3 { } {
>  >> +    global AR
>  >> +    global OBJDUMP
>  >> +    global srcdir
>  >> +    global subdir
>  >> +
>  >> +    set testname "pdb archive 3"
>  >> +
>  >> +    # add long file (greater than block size)
>  >> +
>  >> +    set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile2"]
>  >> +    if ![string match "" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $AR "tv tmpdir/test.pdb"]
>  >> +    if ![string match "*\nrw-r--r-- 0/0   1032 *0002*" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
>  >> +    set exp [file_contents "$srcdir/$subdir/pdb3.d"]
>  >> +    if ![string equal $got $exp] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [binutils_run $AR "x tmpdir/test.pdb 0002 --output=tmpdir"]
>  >> +    if ![string match "" $got] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    set got [file_contents tmpdir/0002]
>  >> +    set exp [file_contents "$srcdir/$subdir/pdbfile2"]
>  >> +    if ![string equal $got $exp] {
>  >> +    fail $testname
>  >> +    return
>  >> +    }
>  >> +
>  >> +    pass $testname
>  >> +}
>  >> +
>  >> +pdb_archive_1
>  >> +pdb_archive_2
>  >> +pdb_archive_3
>  >
>  > The three functions look pretty similar. Any chance of folding them into
>  > just one, suitably parametrized?
> 
> I'm not sure - to my mind, that would imply that the functions were
> independent, when they need to be run one after the other.

Oh, that wasn't obvious to me at all. Would you mind adding a comment to
that effect then?

> I think we do need to test a small file, an empty file, and a long file, but
> could I get away with just having one test, which adds all three files?

This certainly makes sense.

Jan


More information about the Binutils mailing list