[PATCH] Add pdb archive target
Jan Beulich
jbeulich@suse.com
Fri Aug 12 06:14:09 GMT 2022
On 12.08.2022 01:26, Mark Harmstone wrote:
> On 11/8/22 14:02, Jan Beulich wrote:
> > On 26.07.2022 01:44, Mark Harmstone wrote:
> >> This adds support for the "Multi-Stream Format" container format that
> >> MSVC uses for its PDB debugging files, as documented at
> >> https://llvm.org/docs/PDB/MsfFile.html.
> >
> > Looking at binutils/testsuite/binutils-all/pdb*.d I wonder what "support"
> > here means: What is dumped is the binary contents of the file (claimed
> > to be coming from section .data) rather than the inner file structure.
>
> I'm not quite sure what you're getting at. This is purely for PDB files as
> archives, there is no inner file structure. The tests check that the hex
> dump of the files matches one possible way to represent an archive of the
> dummy files.
If it's an archive (and hence can hold multiple files), then surely it
has an internal structure. Or else your patch also wouldn't be needed,
btw. Dumping a *.a file gives you an idea what's in the file. I would
have expected the same for *.pdb (to a reasonable extent at least).
One might then easily see number of members, block size, etc. Perhaps
even the sizes of the individual members.
> > Also this looks to cover only one of several flavors/versions, which may
> > want calling out here and which also may influence the naming of certain
> > things throughout the patch.
>
> I assume you're basing this off the "Microsoft C/C++ MSF 7.00" in the header.
> MSVC++ 7 came out in 2002, and it's been the same since... there's only one
> living version of the format.
Well, bad luck with my picking of an older version of VC then - I
ended up checking a VC6 PDB file, which is version 2.00.
> >> --- /dev/null
> >> +++ b/bfd/pdb.c
> >> @@ -0,0 +1,804 @@
> >> +/* BFD back-end for PDB Multi-Stream Format archives.
> >> + Copyright (C) 2022 Free Software Foundation, Inc.
> >> +
> >> + This file is part of BFD, the Binary File Descriptor library.
> >> +
> >> + This program is free software; you can redistribute it and/or modify
> >> + it under the terms of the GNU General Public License as published by
> >> + the Free Software Foundation; either version 3 of the License, or
> >> + (at your option) any later version.
> >> +
> >> + This program is distributed in the hope that it will be useful,
> >> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> >> + GNU General Public License for more details.
> >> +
> >> + You should have received a copy of the GNU General Public License
> >> + along with this program; if not, write to the Free Software
> >> + Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston,
> >> + MA 02110-1301, USA. */
> >> +
> >> +/* This describes the MSF file archive format, which is used for the
> >> + PDB debug info generated by MSVC. See https://llvm.org/docs/PDB/MsfFile.html
> >> + for a full description of the format. */
> >> +
> >> +#include "sysdep.h"
> >> +#include "bfd.h"
> >> +#include "libbfd.h"
> >> +
> >> +/* "Microsoft C/C++ MSF 7.00\r\n\x1a\x44\x53\0\0\0" */
> >> +static const uint8_t pdb_magic[] =
> >> +{ 0x4d, 0x69, 0x63, 0x72, 0x6f, 0x73, 0x6f, 0x66,
> >> + 0x74, 0x20, 0x43, 0x2f, 0x43, 0x2b, 0x2b, 0x20,
> >> + 0x4d, 0x53, 0x46, 0x20, 0x37, 0x2e, 0x30, 0x30,
> >> + 0x0d, 0x0a, 0x1a, 0x44, 0x53, 0x00, 0x00, 0x00 };
> >> +
> >> +#define arch_eltdata(bfd) ((struct areltdata *) ((bfd)->arelt_data))
> >> +
> >> +static bfd_cleanup
> >> +pdb_archive_p (bfd *abfd)
> >> +{
> >> + int ret;
> >> + char magic[sizeof (pdb_magic)];
> >> +
> >> + ret = bfd_bread (magic, sizeof (magic), abfd);
> >> + if (ret != sizeof (magic))
> >> + {
> >> + bfd_set_error (bfd_error_wrong_format);
> >> + return NULL;
> >> + }
> >> +
> >> + if (memcmp (magic, pdb_magic, sizeof (magic)))
> >> + {
> >> + bfd_set_error (bfd_error_wrong_format);
> >> + return NULL;
> >> + }
> >> +
> >> + return _bfd_no_cleanup;
> >> +}
> >> +
> >> +static bfd *
> >> +pdb_get_elt_at_index (bfd *abfd, symindex sym_index)
> >> +{
> >> + char int_buf[sizeof (uint32_t)];
> >> + uint32_t block_size, block_map_addr, block, num_files;
> >> + uint32_t first_dir_block, dir_offset, file_size, block_off, left;
> >> + char name[10];
> >> + bfd *file;
> >> + char *buf;
> >> +
> >> + /* get block_size */
> >> +
> >> + if (bfd_seek (abfd, sizeof (pdb_magic), SEEK_SET))
> >> + return NULL;
> >> +
> >> + if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
> >> + {
> >> + bfd_set_error (bfd_error_malformed_archive);
> >> + return NULL;
> >> + }
> >> +
> >> + block_size = bfd_getl32 (int_buf);
> >> +
> >> + /* get block_map_addr */
> >> +
> >> + if (bfd_seek (abfd, 4 * sizeof (uint32_t), SEEK_CUR))
> >> + return NULL;
> >> +
> >> + if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
> >> + {
> >> + bfd_set_error (bfd_error_malformed_archive);
> >> + return NULL;
> >> + }
> >> +
> >> + block_map_addr = bfd_getl32 (int_buf);
> >> +
> >> + /* get num_files */
> >> +
> >> + if (bfd_seek (abfd, block_map_addr * block_size, SEEK_SET))
> >> + return NULL;
> >> +
> >> + if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
> >> + {
> >> + bfd_set_error (bfd_error_malformed_archive);
> >> + return NULL;
> >> + }
> >> +
> >> + first_dir_block = bfd_getl32 (int_buf);
> >> +
> >> + if (bfd_seek (abfd, first_dir_block * block_size, SEEK_SET))
> >> + return NULL;
> >> +
> >> + if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
> >> + {
> >> + bfd_set_error (bfd_error_malformed_archive);
> >> + return NULL;
> >> + }
> >> +
> >> + num_files = bfd_getl32 (int_buf);
> >> +
> >> + if (sym_index >= num_files)
> >> + {
> >> + bfd_set_error (bfd_error_no_more_archived_files);
> >> + return NULL;
> >> + }
> >> +
> >> + /* read file size */
> >> +
> >> + dir_offset = sizeof (uint32_t) * (sym_index + 1);
> >> +
> >> + if (dir_offset >= block_size)
> >> + {
> >> + uint32_t block_map_addr_off;
> >> +
> >> + block_map_addr_off = ((dir_offset / block_size) * sizeof (uint32_t));
> >> +
> >> + if (bfd_seek (abfd, (block_map_addr * block_size) + block_map_addr_off,
> >> + SEEK_SET))
> >> + return NULL;
> >> +
> >> + if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
> >> + {
> >> + bfd_set_error (bfd_error_malformed_archive);
> >> + return NULL;
> >> + }
> >> +
> >> + block = bfd_getl32 (int_buf);
> >> + }
> >> + else
> >> + {
> >> + block = first_dir_block;
> >> + }
> >> +
> >> + if (bfd_seek (abfd, (block * block_size) + (dir_offset % block_size),
> >> + SEEK_SET))
> >> + return NULL;
> >> +
> >> + if (bfd_bread (int_buf, sizeof (uint32_t), abfd) != sizeof (uint32_t))
> >> + {
> >> + bfd_set_error (bfd_error_malformed_archive);
> >> + return NULL;
> >> + }
> >> +
> >> + file_size = bfd_getl32 (int_buf);
> >> +
> >> + /* create BFD */
> >> +
> >> + sprintf (name, "%04lx", sym_index);
> >
> > Is there a reason for this 4-or-more digits naming of the file? Would
> > it make sense to use 8 digits (beyond which the index apparently
> > cannot grow)?
>
> In practice, 4 digits is plenty. The number of files in the archive is
> proportional to the number of object files linked into the image... for the
> NT kernel, which is probably the most complicated EXE out there, the PDB
> has 1,100 files. I can't imagine anybody will ever go over 65,535 - and it's
> not visible anyway, unless you play around with ar.
Since you know the number of files in the archive, may I suggest that you
base the number of digits on that number of members, such that all
elements would be extracted to files with names of identical length?
> >> --- /dev/null
> >> +++ b/binutils/testsuite/binutils-all/pdb.exp
> >> @@ -0,0 +1,157 @@
> >> +# Copyright (C) 2022 Free Software Foundation, Inc.
> >> +
> >> +# This file is part of the GNU Binutils.
> >> +#
> >> +# This program is free software; you can redistribute it and/or modify
> >> +# it under the terms of the GNU General Public License as published by
> >> +# the Free Software Foundation; either version 3 of the License, or
> >> +# (at your option) any later version.
> >> +#
> >> +# This program is distributed in the hope that it will be useful,
> >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> >> +# GNU General Public License for more details.
> >> +#
> >> +# You should have received a copy of the GNU General Public License
> >> +# along with this program; if not, write to the Free Software
> >> +# Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA.
> >> +
> >> +proc pdb_archive_1 { } {
> >> + global AR
> >> + global OBJDUMP
> >> + global srcdir
> >> + global subdir
> >> +
> >> + set testname "pdb archive 1"
> >> +
> >> + file delete tmpdir/test.pdb
> >> +
> >> + # add short file (less than block size)
> >> +
> >> + set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile1"]
> >> + if ![string match "" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $AR "tv tmpdir/test.pdb"]
> >> + if ![string match "rw-r--r-- 0/0 3 *0000*" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
> >> + set exp [file_contents "$srcdir/$subdir/pdb1.d"]
> >> + if ![string equal $got $exp] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $AR "x tmpdir/test.pdb 0000 --output=tmpdir"]
> >> + if ![string match "" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [file_contents tmpdir/0000]
> >> + set exp [file_contents "$srcdir/$subdir/pdbfile1"]
> >> + if ![string equal $got $exp] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + pass $testname
> >> +}
> >> +
> >> +proc pdb_archive_2 { } {
> >> + global AR
> >> + global OBJDUMP
> >> + global srcdir
> >> + global subdir
> >> +
> >> + set testname "pdb archive 2"
> >> +
> >> + # add empty file
> >> +
> >> + set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb /dev/null"]
> >> + if ![string match "" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $AR "tv tmpdir/test.pdb"]
> >> + if ![string match "*\nrw-r--r-- 0/0 0 *0001*" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
> >> + set exp [file_contents "$srcdir/$subdir/pdb2.d"]
> >> + if ![string equal $got $exp] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $AR "x tmpdir/test.pdb 0001 --output=tmpdir"]
> >> + if ![string match "" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [file_contents tmpdir/0001]
> >> + if ![string equal $got ""] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + pass $testname
> >> +}
> >> +
> >> +proc pdb_archive_3 { } {
> >> + global AR
> >> + global OBJDUMP
> >> + global srcdir
> >> + global subdir
> >> +
> >> + set testname "pdb archive 3"
> >> +
> >> + # add long file (greater than block size)
> >> +
> >> + set got [binutils_run $AR "cr --target=pdb tmpdir/test.pdb $srcdir/$subdir/pdbfile2"]
> >> + if ![string match "" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $AR "tv tmpdir/test.pdb"]
> >> + if ![string match "*\nrw-r--r-- 0/0 1032 *0002*" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $OBJDUMP "-s --target=binary tmpdir/test.pdb"]
> >> + set exp [file_contents "$srcdir/$subdir/pdb3.d"]
> >> + if ![string equal $got $exp] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [binutils_run $AR "x tmpdir/test.pdb 0002 --output=tmpdir"]
> >> + if ![string match "" $got] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + set got [file_contents tmpdir/0002]
> >> + set exp [file_contents "$srcdir/$subdir/pdbfile2"]
> >> + if ![string equal $got $exp] {
> >> + fail $testname
> >> + return
> >> + }
> >> +
> >> + pass $testname
> >> +}
> >> +
> >> +pdb_archive_1
> >> +pdb_archive_2
> >> +pdb_archive_3
> >
> > The three functions look pretty similar. Any chance of folding them into
> > just one, suitably parametrized?
>
> I'm not sure - to my mind, that would imply that the functions were
> independent, when they need to be run one after the other.
Oh, that wasn't obvious to me at all. Would you mind adding a comment to
that effect then?
> I think we do need to test a small file, an empty file, and a long file, but
> could I get away with just having one test, which adds all three files?
This certainly makes sense.
Jan
More information about the Binutils
mailing list