[COMMITTED htdocs] Add old wiki pages from fedorahosted.

Mark Wielaard mark@klomp.org
Tue Jan 3 14:15:00 GMT 2017

Most of these pages are years out of date, but they contain useful
(historical) information. Add an index of old wiki pages and a link
from the main page. All pages are the text versions of the old wiki

Signed-off-by: Mark Wielaard <mark@klomp.org>
 DebugInfo                |  21 ++++++
 DebugInfoTesting         |   4 +
 DwarfArchive             | 141 ++++++++++++++++++++++++++++++++++
 DwarfCmp                 |  19 +++++
 DwarfInterObject         |  57 ++++++++++++++
 DwarfLint                | 193 +++++++++++++++++++++++++++++++++++++++++++++++
 DwarfLocations           |  66 ++++++++++++++++
 DwarfOutput              |  51 +++++++++++++
 DwarfProducer            |  79 +++++++++++++++++++
 DwarfReaderSharing       |  96 +++++++++++++++++++++++
 DwarfRelocatable         | 158 ++++++++++++++++++++++++++++++++++++++
 DwarfRelocs              |  16 ++++
 DwarfTasks               |  84 +++++++++++++++++++++
 DwarfUnwinder            |  20 +++++
 DwarfValues              | 123 ++++++++++++++++++++++++++++++
 DwarfXml                 |  80 ++++++++++++++++++++
 DwflProjects             |  28 +++++++
 OldWikiIndex             |  61 +++++++++++++++
 RoadMap                  |  48 ++++++++++++
 RpmDebugInfo             |  22 ++++++
 SuspiciousDebuginfoCases | 148 ++++++++++++++++++++++++++++++++++++
 ThreadSafety             |  64 ++++++++++++++++
 index.html               |   1 +
 23 files changed, 1580 insertions(+)
 create mode 100644 DebugInfo
 create mode 100644 DebugInfoTesting
 create mode 100644 DwarfArchive
 create mode 100644 DwarfCmp
 create mode 100644 DwarfInterObject
 create mode 100644 DwarfLint
 create mode 100644 DwarfLocations
 create mode 100644 DwarfOutput
 create mode 100644 DwarfProducer
 create mode 100644 DwarfReaderSharing
 create mode 100644 DwarfRelocatable
 create mode 100644 DwarfRelocs
 create mode 100644 DwarfTasks
 create mode 100644 DwarfUnwinder
 create mode 100644 DwarfValues
 create mode 100644 DwarfXml
 create mode 100644 DwflProjects
 create mode 100644 OldWikiIndex
 create mode 100644 RoadMap
 create mode 100644 RpmDebugInfo
 create mode 100644 SuspiciousDebuginfoCases
 create mode 100644 ThreadSafety

diff --git a/DebugInfo b/DebugInfo
new file mode 100644
index 0000000..51ca79b
--- /dev/null
+++ b/DebugInfo
@@ -0,0 +1,21 @@
+== separate debuginfo conventions ==
+We have the existing convention of a debuginfo path, with directories like /usr/lib/debug wherein we look up individual ELF files' .debug counterpart file.  The old convention is by name (/usr/lib/debug/usr/bin/foo.debug for /usr/bin/foo with .gnu_debuglink foo.debug).  The new convention (tried first) is by build ID (/usr/lib/debug/.build-id/xx/xxx... in lowercase hex of the build ID bits), yielding a symlink to the individual .debug file (-> ../../usr/bin/foo.debug).
+If we go whole hog on DwarfInterObject debug.a format, then we no longer have individual .debug ELF files to find.
+ * Compatibility:
+   For older tools (gdb, crash, etc), have an exploder tool that turns debug.a back into individual .debug files (with copies in place of interobject refs).
+ * New lookup convention.
+   1. Could keep .build-id/xx/... symlink convention, but when the symlink is to an archive, find the ELF member with matching ID.
+      a. Symlink has automagic tie in for RpmDebugInfo to enable {{{yum install /usr/lib/debug/.build-id/...}}}
+   1. Just check all of /usr/lib/debug/*/debug.a by that name convention, match IDs inside (same even by name w/o IDs?)
+   1. Some new kind of lookup database in /usr/lib/debug/.build-id/something
+      a. Separate ones would be same as */debug.a with internal ID map.
+      b. Consolidated database in /usr/lib/debug needs fiddler tool, RpmDebugInfo %post magic (or resp. other packaging systems)
+==== build ID lookup map ====
+Build IDs lend themselves to a sorted table optimized for binary search.  A quick lookup table could be inside each package's debug.a like a {{{__.SYMDEF}}} pointing to the archive member with the ID.  A consolidated table in a file could point to the archive files by name; putting it like that makes the symlink hack sound OK.
+Lookup table would indicate: number B of bytes in a build ID, number N of IDs in the table.  Then an array of N offsets (archive member offsets or offsets into a string table), followed by N*B bytes giving the corresponding array of N B-byte IDs.  The arrays are sorted by memcmp ordering of the IDs.  Binary search by ID value is optimal, and the offset fields are aligned regardless of B.
\ No newline at end of file
diff --git a/DebugInfoTesting b/DebugInfoTesting
new file mode 100644
index 0000000..2f34ade
--- /dev/null
+++ b/DebugInfoTesting
@@ -0,0 +1,4 @@
+=== mass-testing against distro debuginfo files ===
+stub page, fill in details from
diff --git a/DwarfArchive b/DwarfArchive
new file mode 100644
index 0000000..ba003c3
--- /dev/null
+++ b/DwarfArchive
@@ -0,0 +1,141 @@
+== consolidated debug archive ==
+I've had an idea for a long time about sharing of DWARF data (and some
+other ELF bits) across files.  The plan is to have one big file that is a
+container for mostly-normal ELF files, but modified such that they can
+share some data.  I call this big file a ''consolidated debug archive'',
+abbreviated ''CDAR''.
+The original plan was to use a traditional '''ar''' archive.  But now I
+think it should be a special file format, somewhat inspired by glibc's
+''locale archive'' files.  I've thought about the details of this quite a
+lot, but never worked it all out concretely.  So everything here is subject
+to heavy reworking if someone actually tries to implement it.
+=== concept ===
+The observation is that usually many related ''.debug'' files travel
+together, such as in a '''-debuginfo''' rpm.  Just as many CUs in the same
+module repeat the same information, many files in a package repeat the same
+information.  This includes whole DWARF trees from the several modules
+using the same types from the same header files and so forth.  It also
+includes the same names used in many places: symbol names, section names,
+DWARF string tables, source file names.
+ * Every file pretty much has the same section names, so all
+   '''.shstrtab''' sections are redundant.
+ * When a DSO defines exported symbols, then each DSO or executable that
+   links to those symbols repeats the same symbol names in its .strtab too.
+ * The ELF symbol names in .strtab sections are the same names that appear
+   in DW_AT_name (for C) or DW_AT_linkage_name (for C++ mangled names)
+   values in DWARF data.
+ * The source file names that appear in .debug_line file tables also appear
+   in the DW_AT_name attributes of DW_TAG_compile_unit entries.
+=== archive format ===
+The archive consists of these sections:
+ * an archive header
+ * build ID table
+ * a file name table
+ * a ''constant pool''
+ * a ''CU pool''
+ * individual files
+=== files in a CDAR ===
+The main contents of a CDAR are individual ELF files.  These are the same
+things we see today in ''.debug'' files, with a few differences.
+ * Each "secondary" section in the file might be '''SHT_NOBITS''' instead
+   of its normal type.  This includes:
+   * string tables: .strtab, .shstrtab, .debug_str
+   * secondary DWARF sections: .debug_abbrev, .debug_line, .debug_macinfo,
+     .debug_loc, .debug_ranges
+   When a section is SHT_NOBITS, that means that its contents are part of
+   the ''pool'' (see below).  The offsets that would normally refer to the
+   section ('''st_name''', '''sh_name''', '''DW_FORM_strp''',
+   '''DW_FORM_sec_offset''', etc.) are instead interpreted as absolute
+   offsets into the ''pool''.
+   This is easy to integrate into the reader code.  When it initializes the
+   pointers into the mapped section data at startup time, when a section is
+   '''SHT_NOBITS''', it instead replaces that '''Elf_Data''' with one
+   pointing to the ''constant pool''.  The existing reader code then
+   automatically finds offsets inside the pool.  Existing code that
+   maintains caches indexed by the mapped data pointer will automatically
+   reuse and share its caches for data shared in the pool by multiple CUs
+   or multiple files.
+ * The DWARF entries in the file can use the new form
+   '''DW_FORM_GNU_ref_cdar'''.  This is treated similarly to
+   '''DW_FORM_ref_addr''', but its offset is taken as a position in the
+   ''CU pool'' rather than in the file's own .debug_info section.
+The storage of each individual file's contents is preceded by a simple file
+header that just gives its size.  It would be possible to extend this to
+include other fields like owner, mode, and mtime, like an '''ar''' file
+header has.  But it's not clear any of these are useful to have in a CDAR.
+If any, perhaps just mtime, but even that doesn't seem especially useful.
+=== build ID table ===
+This table supports quick lookup of files by their build IDs.  It
+associates a build ID with an entry in the file name table and with a file
+data record.
+My original idea is that its entries would be sorted by the build ID bits
+so that consumers can use binary search for a build ID.  Alternatively, it
+could be some scheme encoding a hash table, if that makes for faster
+lookups without the table being too much larger.
+The table format would be designed to be compact and alignment-friendly.
+Since build IDs are in general of arbitrary length, the archive header or
+the table's own header would need to indicate the length of IDs being used.
+There is no need to support disparate ID lengths inside a single CDAR.
+I imagine that each table entry would be something like three aligned words:
+ID table index, name table index, file record's absolute offset.
+=== file name table ===
+This table associates file names with the files and build IDs.
+Because of hard links, symlinks, or copies among the input files,
+there might be multiple file name table entries for a single file.
+I imagine that each table entry would be something like two aligned words:
+''constant pool'' offset of the file name, and file record's absolute offset.
+=== constant pool ===
+This is the "big soup" for everything that does not need any more outside
+structure.  It doesn't need any kind of header, the archive header could
+just give its position and length.
+All string tables can be merged together and be part of this.  Also all
+.debug_line tables can just live here, etc.  Because they are all in the
+same pool, strings that match a file name in some file table need not
+appear separately in a string table.  Those string table offsets will just
+be pool offsets, so they can point directly into part of a .debug_line file
+table where the same string appears.  This will avoid duplicating the same
+string that appears both in a file table and in a CU's '''DW_AT_name'''.
+Whatever else can be shared goes in here too.  All the "secondary" sections
+are not read by themselves, but only in chunks that have their own headers
+(or none) starting at an offset given in an attribute value or a CU header
+or similar place, and their own terminators.
+=== CU pool ===
+This is like a big .debug_info section.  The archive header gives its
+position in the file and its length.  Its internal structure is nothing but
+the normal sequence of CU headers and DIE data.
+It contains all '''DW_TAG_partial_unit''' entries, so enumerating any
+particular file's .debug_info section does not come across them to skip.
+When a top-level CU uses '''DW_TAG_imported_unit''', its '''DW_AT_import'''
+uses a '''DW_FORM_GNU_ref_cdar''' reference to point into the CU pool.
+Other references into shared entries do the same.
diff --git a/DwarfCmp b/DwarfCmp
new file mode 100644
index 0000000..b6e9eb2
--- /dev/null
+++ b/DwarfCmp
@@ -0,0 +1,19 @@
+== DWARF comparison tool ==
+dwarfcmp will be to DWARF as elfcmp is to ELF.  Compare two DWARF files, semantically at the DWARF level.
+This will be an important tool to check up on the DwarfProducer.
+ * compare DIE trees
+   * ignore offsets
+   * compare attr values
+     * ignore form encoding
+   * options to compare actual trees, or compare "semantic trees"
+     * flatten imported_unit into inlined contents of partial_unit
+     * follow inter-CU/interobject refs
+ * compare other sections semantically
+   * driven from tree/attr comparison: *ptr class forms hook into other comparators
+   * .debug_loc
+   * .debug_frame
+   * .debug_ranges
+ * compare quick lookup tables? (.debug_aranges, .debug_pubnames et al)
+   * or just let DwarfLint diagnose quick lookup tables that don't match info gleaned from DIEs
diff --git a/DwarfInterObject b/DwarfInterObject
new file mode 100644
index 0000000..17f4902
--- /dev/null
+++ b/DwarfInterObject
@@ -0,0 +1,57 @@
+== DWARF with inter-object references ==
+This is a proposal to extend the ELF formats used for DWARF so that DW_FORM_ref* can be resolved to a DIE in a CU in another object file.
+==== DWARF relocs & .debug_symtab ====
+In ET_REL .debug files, existing .rel.debug_info are split.  The relocs referring to symbols in allocated sections stay as they were.
+Any relocs resolved to .debug_* sections are moved to new .debug_rel.* sections.
+In other .debug files, compression can generate new relocs in .debug_rel.* sections.  These are nonallocated sections that apply to the nonallocated sections in the .debug file.
+.debug_rel* sections are SHT_REL(A) and follow the rules: sh_info points to .debug_* needing reloc, sh_link to the symbol table.
+However, all .debug_rel* sections point to a new .debug_symtab instead of the existing symtab.
+.debug_symtab is SHT_SYMTAB but contains only symbols needed for .debug_rel.debug_* relocs, disjoint from real program symbols.
+These can be SHN_UNDEF in one object if the same-named symbol is defined in another object.
+(???) Maybe don't support SHN_UNDEF at all, only support archive convention hack (below)?
+DW_FORM_ref_addr and similar uses that in DWARF encode a relocatable offset into a .debug_* section, are subject to inter-object refs.
+That means .debug_rel.debug_* has a reloc at the offset in the DWARF section where this relocatable offset appears.  The decoder has to check for such relocs when consuming a file with .debug_rel.debug_* reloc sections.  When resolving formref, or equivalent header field, the reloc's symbol takes you to another object that defines the symbol, and the DIE/etc at that symbol's value in that other object.
+==== DWARF archive convention ====
+Normally inter-object refs would only be supported when all the objects referring to each other are put together into an archive (debug.a).
+The archive members with normal names are the ELF .debug files (as from eu-strip -f), by convention named with what the full path name below the location of debug.a would be (usr/bin/foobar.debug, usr/lib/libfoo.so.1.debug, etc) if the old-style .debug files were separately installed.  If there are ELF files with .debug_symtab sections, the archive symtab would refer to those (?).
+====== consolidated sections ======
+Any sections that would normally be SHT_STRTAB can be SHT_NOBITS in an ELF file inside the archive.  That means that there is a file in the archive with the name of the section (.strtab) that can be used instead.  In this way, all the string tables in all the ELF files can be consolidated and uniquified in the common .strtab file.  The .strtab (or whatever name) archive member has exactly the contents that the ELF section would have.  Maybe even permit this for .shstrtab.  Perhaps instead always have a fixed-named file for all strtabs, being SHT_STRTAB (.strtab, .shstrtab) and .debug_str, merging all strings.
+The same can be done with .debug_{abbrev,str,loc,ranges}.  Could be one archive member for each.
+Or maybe it could be a single fixed name of the special archive member that is the total merge of all the nonallocated SHT_NOBITS sections.
+All offsets (loclistptr, abbrev offset, rangesptr?, etc) relative to this member contents instead of the normal section.
+====== consolidated .debug_symtab ======
+A similar treatment could be done for all the ELF files' .debug_symtab sections, used for the inter-object ref relocs.
+That is, consolidate into a single big symtab that all .debug_rel.* relocs refer to, kept in a special archive member.
+This member has contents like an SHT_SYMTAB section's contents, but st_value is an absolute position in the whole archive (lying in the middle of some ELF file, or some special archive member).
+If all inter-object relocs point to .debug_symtab that is SHT_NOBITS and shared in the consolidated .debug_symtab, then:
+ 1. No SHN_UNDEF symbols, all resolved at debug.a packing time.
+ 2. No need for symbol names, all st_name can be 0.
+This makes it attractive to mandate this and so not implement any SHN_UNDEF symbol resolution logic at all in the reader support for debug.a format.
+====== consolidated partial_unit .debug_info ======
+If using a consolidated .debug_symtab, interobject DW_FORM_ref_addr relocs can resolve to inside a consolidated .debug_info archive member.  This can contain all the partial_unit DIEs that ELF files' imported_unit's point to.  That keeps those DIEs out of any real file's top-level tree, so they don't have to be iterated over.
+====== archive quick-lookup ======
+Could have some more special archive members containing lookup tables for quick access.
+ 1. Look up by build ID.  Tie in to DebugInfo finding conventions.
+ 1. consolidated .debug_pubnames? .debug_pubtypes?  Seems handy for !DSOs in many-libraries package's archive.
+ 1. source file names -> CU
\ No newline at end of file
diff --git a/DwarfLint b/DwarfLint
new file mode 100644
index 0000000..a7035a9
--- /dev/null
+++ b/DwarfLint
@@ -0,0 +1,193 @@
+= DWARF lint tool =
+This will be a new tool analogous to elflint, to check DWARF formats for
+correctness and consistency.  The initial version can operate on a single
+DWARF file only.  In future if we do advanced size reduction techniques via
+interobject references, it would also have modes to check those references
+and the composite DWARF trees in whole.
+dwarflint is being developed on "dwarf" branch of elfutils GIT repository.
+dwarflint discovered a couple SuspiciousDebuginfoCases.
+== What it should do in future ==
+... can be found in [source:dwarflint/TODO@dwarf TODO file].
+== What it can do now ==
+What follows is a list of checks that dwarflint executes on DWARF files.
+You can always get the accurate list of checks by launching
+{{{dwarflint --list-checks --verbose}}}
+That will generate an output quite similar to what you see below.
+If you drop the `--verbose` option, the list will be more concise.
+=== check_debug_aranges ===
+Checks for low-level structure of .debug_aranges.  In addition it
+ - that relocations are valid.  In ET_REL files that certain fields
+   are relocated
+ - for dangling and duplicate CU references
+ - for garbage inside padding
+ - for zero-length ranges
+ - that the ranges cover all the address range covered by CUs
+=== check_debug_ranges ===
+Checks for low-level structure of .debug_ranges.  In addition it
+ - for overlapping and dangling references from .debug_info
+ - that base address is set and that it actually changes the address
+ - that ranges have a positive size
+ - that there are no unreferenced holes in the section
+ - that relocations are valid.  In ET_REL files that certain fields
+   are relocated
+ - neither or both of range start and range end are expected to be
+   relocated.  It's expected that they are both relocated against the
+   same section.
+=== check_debug_loc ===
+Checks for low-level structure of .debug_loc.  In addition it
+makes the same checks as .debug_ranges.  For location expressions
+it further checks:
+ - that DW_OP_bra and DW_OP_skip argument is non-zero and doesn't
+   escape the expression.  In addition it is required that the jump
+   ends on another instruction, not arbitrarily in the middle of the
+   byte stream, even if that position happened to be interpretable as
+   another well-defined instruction stream.
+ - on 32-bit machines it rejects DW_OP_const8u and DW_OP_const8s
+ - on 32-bit machines it checks that ULEB128-encoded arguments aren't
+   quantities that don't fit into 32 bits
+=== check_debug_pubnames ===
+Checks for low-level structure of .debug_pubnames.  In addition it
+ - for garbage inside padding
+ - that relocations are valid.  In ET_REL files that certain fields
+   are relocated
+Furthermore, if .debug_info is valid, it is checked:
+ - that references point to actual CUs and DIEs
+ - that there's only one pub section per CU
+=== check_debug_pubtypes ===
+Checks for low-level structure of .debug_pubtypes.  In addition it
+makes the same checks as check_debug_pubnames.
+=== check_debug_line ===
+Checks for low-level structure of .debug_line.  In addition it
+ - for normalized values of certain attributes (such as that
+   default_is_stmt is 0 or 1, even though technically any non-zero
+   value is allowed).
+ - for valid setting of opcode base (i.e. non-zero) and any file
+   indices
+ - that all include directories and all files are used
+ - that files with absolute paths don't refer to include directories,
+   and otherwise that the directory reference is valid
+ - that each used standard or extended opcode is known (note that this
+   assumes that elfutils know about all opcodes used in practice.  Be
+   sure to build against recent-enough version).
+ - that the line number program is properly terminated with the
+   DW_LNE_end_sequence instruction and that it contains at least one
+   other instruction
+ - that relocations are valid.  In ET_REL files that certain fields
+   are relocated
+Furthermore, if .debug_info is valid, it is checked:
+ - that each line table is used by some CU
+ - that the line table references at CUs point to actual line tables
+ - overlaps in defined addresses are probably OK, one instruction can
+   be derived from several statements.  But certain flags in table
+   should be consistent in that case, namely is_stmt, basic_block,
+   end_sequence, prologue_end, epilogue_begin, isa.
+=== check_debug_info ===
+Checks for low-level structure of .debug_info.  In addition it
+ - for dangling reference to .debug_abbrev section
+ - that reported CU address sizes are consistent
+ - that rangeptr values are aligned to CU address size
+ - it is checked that DW_AT_low_pc and DW_AT_high_pc are relocated
+   consistently
+ - that DIE references are well formed (both intra-CU and inter-CU)
+   and that local reference isn't needlessly formed as global
+ - that .debug_string references are well formed and referred strings
+   are properly NUL-terminated
+ - that referenced abbreviations actually exist
+ - that DIEs with children have the DW_AT_sibling attribute and that
+   the sibling actually is at the address reported at that attribute
+ - that the DIE chain is terminated
+ - that the last sibling in chain has no DW_AT_sibling attribute
+ - that the DIE with children actually has children (i.e. that the
+   chain is not empty)
+ - for format constraints (such as that there are no 64-bit CUs inside
+   DWARF 2 file)
+ - in 32-bit CUs, that location attributes are not formed with
+   DW_FORM_data8
+ - all the attribute checks done by check_debug_abbrev are done here
+   for attributes with DW_FORM_indirect.  Indirect form is forbidden
+   to be again indirect
+ - that all abbreviations are used
+ - that relocations are valid.  In ET_REL files that certain fields
+   are relocated
+=== check_debug_abbrev ===
+Checks for low-level structure of .debug_abbrev.  In addition it
+ - that all abbreviation tables are non-empty
+ - that certain attribute forms match expectations (mainly those that
+   we have to work with in subsequent check passes.  For example we
+   check that DW_AT_low_pc has a form of DW_FORM_{,ref_}addr)
+ - that all CUs that share an abbrev table are of the same DWARF
+   version
+ - that each abbrev table is used
+ - that abbrevs don't share abbrev codes
+ - that abbrev tags, attribute names and attribute forms are all known
+   (note that this assumes that elfutils know about all tags used in
+   practice.  Be sure to build against recent-enough version)
+ - that the value of has_children is either 0 or 1
+ - that DW_AT_sibling isn't formed as DW_FORM_ref_addr, and that it
+   isn't present at childless abbrevs
+ - that attributes are not duplicated at abbrev
+ - that DW_AT_high_pc is never used without DW_AT_low_pc.  If both are
+   used, that DW_AT_ranges isn't also used
+This check generally requires CU headers to be readable, i.e. that the
+.debug_info section is roughly well-defined.  If that isn't the case,
+many checks will still be done, operating under assumption that what
+we see is the latest DWARF format.  This may render some checks
+=== check_duplicate_DW_tag_variable ===
+Implements a check for two full DW_TAG_variable DIEs with the same
+DW_AT_name value.  This covers duplicate declaration, duplicate
+definition and declaration with definition.
+ https://lists.fedorahosted.org/pipermail/elfutils-devel/2010-July/001497.html
+ http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39524
+=== check_dups_abstract_origin ===
+If a given attribute name is present on a DIE, it is
+suspicious if that attribute name appears on the DIE that's the
+first DIE's DW_AT_abstract_origin or DW_AT_specification.
+ https://bugzilla.redhat.com/show_bug.cgi?id=527430
+=== check_expected_trees ===
+Checks whether all DIEs have the right attributes and the right children.
+Currently this is very much a work in progress.
+=== check_range_out_of_scope ===
+Check whether PC ranges reported at DIEs fall into the containing scope.
+=== check_matching_ranges ===
+Check that the ranges in .debug_aranges and .debug_ranges match.
+=== check_nodebug ===
+Checks that there are at least essential debuginfo sections present
+in the ELF file.
+=== locstats ===
+Computes a location info coverage statistics.  Goes through the whole
+DIE graph, looking at each variable and formal parameter, and
+determining scope coverage of its location information.  In other
+words for how big a part of scope we know, where the variable
+ https://lists.fedorahosted.org/pipermail/elfutils-devel/2010-July/001498.html
+ https://lists.fedorahosted.org/pipermail/elfutils-devel/2010-September/001602.html
diff --git a/DwarfLocations b/DwarfLocations
new file mode 100644
index 0000000..7cd32eb
--- /dev/null
+++ b/DwarfLocations
@@ -0,0 +1,66 @@
+== location hacking ==
+I've long thought we should have some fancy stuff for doing clever things
+with DWARF location expressions (actually DWARF expressions, value
+expressions, and location expressions, which are a family of related
+I always imagined this would be implemented only in C++ and thought about a
+few details but not really much concrete.  
+A little background:
+ * a ''DWARF expression'' is the basic stack machine program that can
+   compute a value of target address size
+ * a ''value expression'' is what's used for '''DW_AT_frame_base'''.  It
+   can be a DWARF expression or it can be a register name, which is a more
+   compact encoding of the program to simply fetch the register's value.
+   What it yields is a value, not the address or register location of a value.
+ * a ''location expression'' is the most general thing.  When it's a simple
+   DWARF expression, that yields a memory address that is the location
+   where the value of interest (e.g. a variable) is stored.  It can also be
+   a composite using '''DW_OP_piece''' or '''DW_OP_bit_piece'''.  The
+   components of such a composite can be read-only values that are computed
+   and marked with '''DW_OP_const_value''', or read-only literals from
+   '''DW_OP_implicit_value''', or virtual pointers from
+   '''DW_OP_GNU_implicit_pointer'''.  The whole thing is the location of
+   the value of interest, some of which might be immutable, or some of
+   which might be mutable in memory or registers (or parts of them).
+ * a ''relative location expression'' is what's used in
+   '''DW_AT_data_member_location''': it is like the tail of a location
+   expression, but it's defined only relative to some other location
+Some of the things this would do:
+ * internalize DWARF expressions, value expressions, location
+   expressions, and relative location expressions into C++ objects that
+   know which kind of thing they are
+ * optimize expressions statically
+ * combine a relative location with a location to produce a new location
+ * partially resolve values based on available dynamic information,
+   up to full resolution if all register and memory data is available
+ * map an optimized and partially resolved location into the terms of
+   another kind of evaluator
+The dynamic information includes the results of CFI in a given PC context.
+If the PC is known, then CFI becomes static location information.  In
+something like a systemtap probe or a GDB conditional breakpoint
+expression, this can be resolved statically and combined to produce an
+optimized location.
+I imagine the internalized and optimized form being a simple kind of
+expression tree produced from the stack machine program, where each node
+has some operand nodes and an operation that's still represented with a
+'''DW_OP_*''' opcode.  There could then be interfaces for visiting the
+nodes of the tree.  In systemtap, this would be used to replace the
+existing '''loc2c.c''' code with something that turns an optimized and
+partially-resolved location calculation into the systemtap translator's own
+expression tree form to be translated directly.
+For dynamic kinds of uses, there would be interfaces to have a location
+object say what kind of information it needed to reduce to a simpler form,
+such as CFI, register values, memory values.  Then as much as is
+immediately available can be supplied to make it simpler for the next phase
+of use.  In fully-dynamic situations, that would include supplying all
+current register and memory data needed to resolve to a final mutable
+location in registers or memory or a final value.
diff --git a/DwarfOutput b/DwarfOutput
new file mode 100644
index 0000000..0e9ac4c
--- /dev/null
+++ b/DwarfOutput
@@ -0,0 +1,51 @@
+==== dwarf_output ====
+The dwarf_output object will represent one DWARF file, output version.
+The dwarf_edit object is the model.  Unlike dwarf_edit, dwarf_output
+and its constituent objects are not mutable (all const all the time).
+Instead, the *only* way to construct the dwarf_output object is with a
+copy constructor from a template-compatible object (dwarf or dwarf_edit).
+This does copy construction down the tree, resulting in a dwarf_output
+with trees and values that are all const.
+==== collector ====
+The "collector" is an object (and hierarchy of contained objects) that
+holds all the potentially shareable data for dwarf_output objects.  The
+collector is separate from the object for an individual output file.  This
+is in preparation for the "consolidated debug archive" idea, where there
+would be sharing across multiple DWARF "file"s.  For the time being, the
+collector is only functioning as a subsidiary writer data structure for a
+single dwarf_output object for a single file.
+The collector keeps "append-only" sets of each type of object it manages.
+dwarf_output::type constructors work by inserting an object into the set,
+and returning a "const type &" reference pointing into the set.  Thus it
+takes care of uniquifying automagically as it goes.
+At the top, there is a debug_info_entry set.  This necessitates having
+a set for each constituent object that entails:
+ * debug_info_entry
+ * attributes (const unordered_set<const attribute>)
+ * attribute/attr_value
+ * string (use Ebl_Strtab)
+ * line table
+ * macinfo
+ * ranges
+ * location
+ * constant_block
+Each of those types has an "unordered_set<const type>" object in the
+collector.  Each type should have a hash function that returns a hash
+value stored at construction.  The constructors for each type need to
+collect their constituent's hashes along the way, so by the time the
+object is constructed, its hash value is stored and later hash
+computations are free.
+The dwarf_output template copy constructors work by calling methods on
+the collector object that construct the corresponding collector datum
+object, insert it into the corresponding set (i.e. discard the new
+object if it's already in the set), and return a reference to the
+object in the set.  The collector sets should also track for each
+object in the set how many duplicates of that object were inserted.
diff --git a/DwarfProducer b/DwarfProducer
new file mode 100644
index 0000000..113eac5
--- /dev/null
+++ b/DwarfProducer
@@ -0,0 +1,79 @@
+== libdw DWARF writing support ==
+The current thinking is to have a DWARF writer entirely in C++ that is
+wholly independent of the libdw (C) reader code.  Tnat is, DIEs being
+constructed for the writer can't be used with Dwarf_Die et al.  Only the
+C++ interfaces for handling DIE trees and attributes will be compatible
+(via templates? or just source-compatible?) between the C++ front end to
+the existing libdw reader code and the C++ writer code.
+To make the libdw reader code work with writer-constructed DIEs, either the
+reader would need special internal hooks to distinguish Dwarf_Die's taken
+from writer DIEs, or the writer would need to keep DWARF file format mocked
+up in memory so you can use the reader on it.  That means constructing
+abbrev tables for DIEs whose attributes are being composed on the fly, etc.
+Conversely, by keeping it to the C++ interface level only, we can have
+writer data structures that use straightforward STL types for attributes,
+child lists, etc., while collecting the data.  The writer need only lay out
+abbrevs for the file format when it actually goes to produce the output
+file format.
+=== abbrev generation ===
+The main task of the output generation is creating the optimal
+.debug_abbrev table to keep the .debug_info as small as possible.
+ 1. Collect all the DIE shapes used: haschildren + set of attr (name, form*) pairs (ignore DW_AT_sibling)
+    Note form* should be "form flavor", independent of exact encoding:
+    block, udata, sdata, string, addr, flag, ref.
+    Use shape as a key, and note all users of each shape.
+    (Note attrs are unordered, canonicalize order for key comparison.)
+ 1. On each shape
+    * For each block/*data/ref used, calculate smallest flavor that covers
+      all users.
+      * big enough for value
+      * relocatable form if value must be
+    * Assign the next abbrev code, store it in DIE objects of users.
+=== relocation writing ===
+DwarfRelocs will discuss smart reloc handling on the reader side, to
+replace the libdwfl relocate-en-masse code for .ko.debug files.  On the
+writer side we need the same level of sophistication.  At the C++ layer
+both sides should have compatible ways of describing relocatable forms such
+as target address.
+The writer can also generate internal relocs for ref_addr forms and
+fixed-size offset/address fields in headers and .debug_* sections.
+=== format feature compatibility ===
+The writer will have knobs for which DWARF features can be used in the
+output.  We'll use a common argp child for parsing a list of feature names
+into a flag-set; also canonical aliases for "set that gdb version N groks",
+etc.  An option to DwarfLint can complain about using features outside a
+given set.  The writer can be set to flag them, and/or have ways to
+transform them.  e.g. ref_addr, DwarfInterObject, imported_unit, etc.
+=== compressor/exploder ===
+The compressor or exploder logic will consist largely of transformations to
+exploit or get rid of constructs in the feature flag set.  Compressing will
+also do pure consolidation of duplicates within a CU, which exploding won't
+==== inter-CU refs ====
+When imported_unit/partial_unit format feature is enabled,
+duplicate reduction can look across CUs for matching subtrees.
+ 1. When two CUs contain an identical subtrees, generate a potential partial_unit containing it and replace each original copy with an imported_unit referring to it.
+ 2. Later, coalesce each partial_unit with all other partial_unit's pointed to by the same set of CUs.
+ 3. Finally, if any CU has been reduced to nothing but one imported_unit, coalesce the referenced partial_unit back into being that CU itself, with other CUs' imported_unit's referring to it.
+==== multi-object compression ====
+DwarfInterObject will discuss compressing multiple separate .debug files
+together so the resultant files can refer to each other's CUs.  The method
+would be the same as for inter-CU refs, stretched across CUs in many objects.
diff --git a/DwarfReaderSharing b/DwarfReaderSharing
new file mode 100644
index 0000000..0274b5f
--- /dev/null
+++ b/DwarfReaderSharing
@@ -0,0 +1,96 @@
+== libdw changes to exploit sharing better ==
+There are various internal reorganizations of libdw data structures that we
+should do to better exploit the shared data we expect to have in DWARF
+files from the compression/writer work.
+=== .debug_abbrev sharing ===
+The plan for the writer is to emit just one big .debug_abbrev chunk that
+will be used by all CUs in a file.  That is not a change of the format per
+se, and existing reader code will handle it fine.  The CU header fields
+will all just refer to the same .debug_abbrev offset (i.e. all of them
+being 0 in a normal final-linked file).
+The current libdw code will redundantly decode and internalize the abbrev
+details separately for each CU.  We should change this.
+The ''abbrev_hash'' field and related fields should move from '''struct
+Dwarf_CU''' into a place that can be shared.  This would be a new structure
+for the abbrev info.  Then that would be stored in a central table that's
+keyed by the pointer into the .debug_abbrev section.
+=== .debug_loc sharing ===
+Similarly we now have a table of previously-decoded location expressions
+that is indexed by the pointer into the .debug_loc section.  This is now
+rooted in the ''locs'' field of '''struct Dwarf_CU'''.
+In the writer, there is no reason not to put all location lists from all
+CUs into one big pool and generate just one unified .debug_loc section.
+It's not very likely we'd get much extra sharing from this, since the CUs'
+addresses won't overlap.  But perhaps we'd get some in a DwarfArchive or
+perhaps there are cases we haven't thought of within a single file.
+We can move ''locs'' into someplace shared across CUs (and across files).
+It's already indexed by the pointer into the .debug_loc data, so nothing
+about how it's used would change.
+=== .debug_line sharing ===
+In the writer, it will make sense to unify some separate CUs' .debug_line
+chunks together.  To start with, each ''partial_unit'' will normally
+contain no code definitions, so no actual line program, but it will need a
+''stmt_list'' pointer to a .debug_line chunk giving its file table.  We can
+certainly consolidate all such file tables together into one that has the
+union of all the file names used in any CU that has no line program.
+It might even make sense to unify all .debug_line chunks into one single
+one.  No two CUs will have overlapping addresses.  So it possibly makes
+sense to join all CUs' line tables together, thus sharing all file tables
+completely with no repetition of any file name.  It's not clear that is a
+net win, since consumers then would have to process the entire joined line
+program to populate their address<->line tables.  That might not be
+worthwhile if a consumer was only going to look at a few CUs and there are
+many.  There are also ideas about new a overarching by-address index that
+could optimize the lookup of a large single line program without playing
+out the whole thing.  That all remains to be resolved.
+But it's quite certain that for ''partial_unit'' .debug_line chunks with no
+line program, it is advantageous to join them together as a single unified
+file table.  When that's done, libdw should avoid decoding the same table
+more than once.
+Similar to .debug_abbrev, we can move the ''lines'' and ''files'' pointers
+out of '''struct Dwarf_CU''' into someplace shared.  That would be keyed by
+the pointer into .debug_line data that we get from ''DW_AT_stmt_list''.
+=== inter-file sharing ===
+The first obvious thing is to have these various pointer-indexed tables
+live in the '''Dwarf''' object.  But that only gets them shared across
+multiple CUs in the same file.  In the DwarfArchive plan, there is the
+possibility for sharing across files too.
+So it is better to define a new struct e.g. '''dwarf_sharing_context'''.
+That would contain all these tables.  Each table is indexed by the actual
+pointer to the mapped file data, rather than by section or file offset
+values.  In normal single-file reading, there would be one
+'''dwarf_sharing_context''' object allocated per file and '''Dwarf''' can
+just point to that.  With DwarfArchive, there would be one
+'''dwarf_sharing_context''' for the whole archive, and all the '''Dwarf'''
+objects for individual files would point to that same object.
+=== libdwfl sharing ===
+With the DwarfRelocatable ideas about revamping libdwfl, it could make
+sense to move from sharing one '''Dwarf''' object across many
+'''Dwfl_Module''' objects (probably in different '''Dwfl''' address spaces)
+to having one '''Dwarf''' object per '''Dwfl_Module'''.  Then almost
+everything that is now in '''Dwarf''' should move to a new internal
+structure that is shared across all the different '''Dwarf''' objects
+referring to the same file.  All the uses that can refer to addresses would
+have to supply both the '''Dwarf''' pointer through which the access came
+as well as the pointers into the internal structures.  Most of the DWARF
+resolution and caching stuff would be shared in those internal structures.
+But actually resolving addresses would refer to the '''Dwarf''' object in
+question to find its libdwfl hooks.
diff --git a/DwarfRelocatable b/DwarfRelocatable
new file mode 100644
index 0000000..0b20649
--- /dev/null
+++ b/DwarfRelocatable
@@ -0,0 +1,158 @@
+== libdw interfaces for relocatable addresses ==
+The git branch roland/relocate is a fork from the mainline that has pretty
+full support in libdw proper for handling relocatable addresses.  This has
+only pure C additions the API/ABI.  There are some mailing list postings
+about this from the time I did it, I haven't dug up the archive pointers
+but they should be added here.  This branch needs some updating before it
+can be merged, but was mostly done as far as the plain C interfaces, though
+not tested very well.
+The crux of the API additions is the new type Dwarf_Relocatable.  This is
+meant to be an opaque structure in the API that can serve in every place
+Dwarf_Addr is used.  All interfaces dealing with addresses get variants
+that use Dwarf_Relocatable instead.  Then there is the new function
+dwarf_relocatable_info to extract the contents of a Dwarf_Relocatable.
+That yields a GElf_Sym and associated symbol name and section name,
+and an addend from that symbol/section (i.e. r_addend from the reloc,
+or in-place contents for SHT_REL).
+These interfaces make it possible to deal with ET_REL (.o/.ko) files
+without first relocating them to fixed addresses as libdwfl does today.
+This is necessary for DWARF compression to operate on .ko files, which is a
+key target.  The original relocatable addresses must be preserved as
+relocatable in the output.
+=== plain reader benefits ===
+On the roland/relocate branch, libdwfl no longer applies relocations to
+ET_REL files when they are first read in.  Instead, the relocation support
+just interns the reloc sections when they are first needed.  This turns
+them into some simple internal arrays that are sorted for binary search.
+The individual reloc record are looked up here when they are needed to
+resolve a relocatable address value.  In the traditional interfaces, this
+leads back to using the libdwfl hooks to resolve section addresses or
+symbols on demand.  In the new '''dwarf_*_relocatable''' interfaces, it
+doesn't resolve anything but instead just fills in '''Dwarf_Relocatable'''
+objects to describe the reloc.
+This should improve the performance of libdwfl-using readers
+(e.g. systemtap) that don't exhaustively use all values in the file.  Now
+the relocation work will be done lazily, and mapped DWARF file data will
+never take COW faults because it's never modified in place.
+=== libdwfl reworking ===
+This stuff also offers the possibility of revamping the libdwfl
+implementation and interfaces to be somewhat less cumbersome to use.
+Originally libdwfl was a pure layer on top of libdw, so it is organized
+around giving access to the libdw data as it is and then giving address
+bias values that must be applied to addresses from libdw or libelf.
+Now that libdw has reloc hooks in its innards, we could change this.
+It could make sense to reorganize the libdw data structures in some
+ways related to DwarfReaderSharing.  Then we could get all the material
+sharing we want, but have each '''Dwarf''' object you get from libdwfl
+actually be tied to a single '''Dwfl_Module''' instead of being shared.
+Using plain libdw interfaces would then go via the reloc hooks back into
+libdwfl code to fully resolve to absolute addresses.  
+We might retain all the existing libdwfl interfaces, but all the ones
+wrapping libdw interfaces (not libelf interfaces) would then yield
+bias values of zero.
+=== C++ interfaces ===
+The unfinished precursor to relocatable address support in the writer is
+the reader-side C++ interfaces for relocatable addresses.  I didn't work
+these out in detail, but had some ideas.
+The core would be a new C++ type '''symbolic_address''', which is probably
+just a thin C++ wrapper object around Dwarf_Relocatable.  This would have
+query methods like '''is_absolute''', returning a '''bool''' saying whether
+this is really a resolved address or is truly relocatable.  An
+'''address''' method would yield the '''Dwarf_Addr''' or throw an exception
+if applied to an unresolved relocatable address.
+For relocatable addresses, my idea was that accessors would extract a
+symbol table iterator and an addend.  The symbol table iterator requires
+first defining some C++ interfaces for pure libelf stuff, not just libdw.
+Those would give an STL-style container interface for a symbol table, so
+you could iterate through it or index it with the usual interfaces.
+Various places in the '''dwarf''' branch code are marked with "XXX reloc"
+comments where some attention for this stuff is needed.  The
+'''dwarf::attr_value::address''' method would be replaced or supplemented
+with a '''symbolic_address''' method that gives the new object, and its
+'''dwarf_edit'''/'''dwarf_output''' cousins would permit assigning one.  In
+the writer, this eventually leads to generating reloc records in the
+Note that it's possible the '''constant''' and '''signed_constant''' also
+need to have relocatable variants.  I'm no longer really sure about that,
+but it needs to be investigated.  dwarflint should be able to tell us
+whether relocs are applied to any attribute values other than
+'''DW_FORM_addr''' ones.
+It's certainly the case that '''constant_block''' can need embedded relocs.
+This would need a new interface more sophisticated than the
+'''const_vector''' hack for those values.  I imagine something like an
+STL-style container whose elements are either literal-byte blocks or are
+relocatable values.
+Similarly, '''range_list''', '''line_entry''', '''ranges''',
+'''arange_list''', and '''location_attr''' need to be changed so that their
+address pieces can be '''symbolic_address''' objects.  (The corresponding
+new libdw C interfaces support that on the roland/relocate branch already.)
+Note that an individual '''location_attr::mapped_type''' needs to change
+from being a plain '''const_vector<uint8_t>''' to being a structured type
+as well.  That can do something to wrap the '''dwarf_getlocation*'''
+interfaces that yield a vector of distinct operations.  Each individual
+operation might be a '''DW_OP_addr''' that needs to be relocatable.
+My original thinking was that this would tie in to the much higher-level
+DwarfLocations ideas.
+=== dwarf_output emitting ET_REL ===
+There are two kinds of relocations in DWARF data.
+ 1. References to other DWARF sections.
+    Here the target of the reloc records is in some '''.debug_*''' section.
+    These don't need to exist as relocatable entities in the API at all.
+    In the reloc-aware libdw reading code, they are resolved automatically.
+    In the writer code to emit ET_REL files, these relocs would be
+    generated implicitly wherever they are needed to make the output files
+    properly linkable.
+ 2. Address and constant data.
+    Here the target of the reloc records is in some '''SHF_ALLOC'''
+    section, i.e. it's a real address.  For these, the reader has to yield
+    '''symbolic_address''' objects.  In the writer, these would emit relocs
+    matching what the original input relocs were.
+There are also two kinds of '''ET_REL''' files that we write:
+ 1. Actually relocatable objects.
+    This is for when we use the writer to produce normal ''.o'' files.
+    Here the internal DWARF references need to be relocatable so that they
+    can be linked together later.
+ 2. ''Final'' objects.
+    This is for cases like ''.ko'' files (Linux kernel modules).
+    Here the DWARF data stands on its own and will never be fed to the
+    linker.  So they don't actually need to have relocs for the internal
+    DWARF references at all.  Among the writer's output format knobs will
+    be ''final'' vs. ''reloc'' mode.  In ''final'' mode, we don't emit
+    relocs for DWARF references, we juts encode them directly.  This is
+    somewhat akin to how output files come from '''eu-unstrip -R'''.
+    Then when the reader is dealing with these input files, there is no
+    extra work for these internal references at all, they are just like a
+    real final linked object.
+    The relocs to real '''SHF_ALLOC''' sections still need relocs even in a
+    ''final'' object like a .ko file.
diff --git a/DwarfRelocs b/DwarfRelocs
new file mode 100644
index 0000000..60fa9b4
--- /dev/null
+++ b/DwarfRelocs
@@ -0,0 +1,16 @@
+== smart reloc handling in DWARF reader ==
+Today's libdwfl applies relocs at the ELF level to DWARF section data so that libdw can look at it and decode final offsets from it.
+Smarter is to get inside libdw in the limited number of places where reading a part of the format that can have relocated values in it.
+ * DW_FORM_ref_addr
+ * DW_FORM_data4/8, others?
+ * DW_FORM_addr
+ * header fields that are target address or fixed-size section offset (so could be reloc'd, i.e. 4/8 byte int forms)
+The library can keep a map of relocs indexed by mapped address their r_offset corresponds to, so formfoo calls look up the Dwarf_Attr.valp pointer directly to see if it's relocated.
+=== smart form data interfaces ===
+For relocatable quantities, we need form interfaces that give some means to indicate a reloc/symbol.
+Compatible form generators in the DwarfProducer are then natural to produce relocs from scratch.
diff --git a/DwarfTasks b/DwarfTasks
new file mode 100644
index 0000000..ae4f41d
--- /dev/null
+++ b/DwarfTasks
@@ -0,0 +1,84 @@
+== DWARF tasks & milestones ==
+==== testing note ====
+All the early milestones (all we have so far) have a testing goal
+in terms of a pass/fail test on a single file.  For each milestone,
+we'll consider it accomplished when the `single-file-test.sh`
+run of that test over all the DebugInfoTesting data has no failures.
+=== tasks ===
+ 1. C++ interface (libdw reader front-end)
+   1. containers/iterators: CU, DIE, attrs
+      * CU: iterable container of DIEs
+      * DIE
+        1. iterable container of children
+        1. iterable container of attrs
+      * attrs: dictionary of key(int)=>value
+        * no values yet
+        * container/iterator flavors:
+          1. raw pair iterator
+          1. dictionary w/o dup keys
+          1. hides sibling
+   1. '''Milestone:'''
+      iterate full tree, see tags, attr names (int)
+      * skeleton DwarfCmp, run self-to-self comparisons of .debug files
+        1. DebugInfoTesting `norel` set --- '''DONE (no FAIL) 2009-1-5'''
+ 1. C++ interface for writer
+   1. template-compatible with reader iterators + writable
+   1. simple pure memory data structures for now
+   1. no attr values yet
+   1. deep-copy from reader trees via tempaltes
+   1. '''Milestone:'''
+      copy input tree to writer data, apply comparator template widget to input
+      * hacked DwarfCmp, input-to-writer comparisons of .debug files.
+        1. DebugInfoTesting `norel` set
+ 1. attr value interfaces
+    1. background for design: DwarfValues
+    1. reader interfaces by value-space
+       a. stubs for relocation details
+       a. refine value class: e.g. src file name string
+    1. known-attribute map
+       a. '''Milestone:''' DwarfLint checks for known attrs with expected class
+        1. DebugInfoTesting `norel` set
+    1. '''Milestone:''' DwarfCmp compares values
+        1. test without meaningful reference comparisons
+        1. DebugInfoTesting `norel` set
+    1. compatible writer interfaces
+       a. simple pure memory data structures for now
+       a. copy/initialize from reader counterpart
+       a. xfrmers
+        1. src file name rewrite/collect
+        1. hashcons for dedup
+       a. '''Milestone:''' hacked copying-DwarfCmp compares values
+        1. DebugInfoTesting `norel` set
+ 1. relocatable values (see DwarfRelocs)
+   1. DwarfLint checks relocs
+      a. .debug_info: r_offset+r_type match form there
+      a. other sections
+   1. smart reader
+      a. convert libdw parsers to use hooks for address/offset handling
+       https://lists.fedorahosted.org/pipermail/elfutils-devel/2009-March/000208.html
+       1. hand-kludge test hooks that produce stderr output
+       1. hack/enhance dwarflint to emit similar line for each reloc
+       1. run e.g. dwarf-print or dwarfcmp to traverse data so as to
+          pass all data through those hooks, get the output
+       1. compare dwarflint and libdw-emitted lists, make sure they match
+       1. '''Milestone:''' comparison matches
+        a. self-test on build's own .o files
+        a. DebugInfoTesting `rel` set
+   1. value interfaces for relocatable value classes
+    a. `elfutils::symbolic_address`
+     * `DW_FORM_addr` attributes
+     * line info entries
+     * ranges/aranges
+    a. DWARF constant blocks with relocs in middle
+    a. location expressions (`DW_OP_addr`)
+     * until proper locexpr interface, treat as constant blocks
+   1. reloc generation
+    a. common reloc-generation internals
+     1. `ebl_simple_reloc_type` reciprocal to choose `r_type`s
+    a. hooks in all those value places
+    a. hooks in all places producing `.debug_*` offsets
+     1. "final" and fully-relocatable options
diff --git a/DwarfUnwinder b/DwarfUnwinder
new file mode 100644
index 0000000..b3905b1
--- /dev/null
+++ b/DwarfUnwinder
@@ -0,0 +1,20 @@
+== unwinders ==
+What's commonly called a "DWARF unwinder" is in fact a conflation of two
+or three separate things whose issues are quite different.
+ * the ''static'' component is what DWARF CFI is about
+   This entails resolving for a given PC what the rules are for
+   finding the CFA and the caller's registers (including return address).
+   We already have this implemented in libdw, the interfaces using
+   '''Dwarf_Frame''' are where it is.
+ * the ''dynamic'' component is applying those rules to particular register
+   and memory contents, which where the actual "unwinding" happens.
+The libdw CFI interfaces turn the CFI data for a given PC into the terms of
+DWARF location expressions.  This should be wed with DwarfLocations stuff
+and with components for gleaning register and memory data from the ELF
+notes and data segments of a core file, or other sources like live
+debugging interfaces, to yield the unwinding functionality.
diff --git a/DwarfValues b/DwarfValues
new file mode 100644
index 0000000..c5a10c0
--- /dev/null
+++ b/DwarfValues
@@ -0,0 +1,123 @@
+== DWARF attribute values ==
+The DWARF specification (2.2) describes attribute values and their
+encodings in terms of '''forms''' and '''classes'''.  We refine the
+taxonomy with the notion of '''value spaces'''.
+==== form ====
+The `DW_FORM_*` constants are the form encodings that appear in the actual
+DWARF data.  There are a few forms, and some have multiple particular
+encodings.  For example, the `udata` form comes in `data[1248]`, `udata',
+and `sdata` encodings, the `string` form comes in `string` and `strp`
+encodings, etc.  The libdw reading interfaces `dwarf_form*` gloss over
+these encoding differences, and users never care about them.
+The `block` and `data` forms are ambiguous.  They indicate the encoding so
+you can read the value from the file, but not what kind of value it really is.
+To disambiguate you have to know the '''class''' you are looking for.
+==== class ====
+DWARF (7.5.4) describes ''classes'' of attribute value.
+These do not exist in the format, but only in the consumer's perspective.
+Some classes unify several forms.  Some forms multiplex several classes.
+The consumer distinguishes what kind of value an attribute has by looking
+at the value's form as written, and the set of expected classes for the
+particular attribute (based on the known `DW_AT_*` values).
+==== value space ====
+The '''class''' delineation is mostly adequate to disambiguate attribute
+forms.  But it's not entirely so, while conversely in some cases it is
+overly specific for the consumer's abstract view.
+We refine DWARF's '''class''' with a slightly different concept that
+we'll call '''value space'''.  The value spaces are the categories of
+attribute value that a consumer really wants to think about.
+ reference::
+  class reference, pointer to another DIE.
+  Pointers outside the containing CU can be relocatable (`DW_FORM_ref_addr`).
+ CU-reference::
+  class reference, but CU-relative offset forms are invalid.
+  This has to point to a different CU's top-level DIE.
+ address::
+  class address, `DW_FORM_addr`.
+  Relocatable.
+ flag::
+  class flag, `DW_FORM_flag`: simple Boolean.
+ rangelistptr::
+  class rangelistptr, `data` forms: offset in `.debug_ranges`
+ lineptr::
+  class lineptr, `data` forms: offset in `.debug_line`
+ macptr::
+  class macptr, `data` forms: offset in `.debug_macinfo``
+ location::
+  class loclistptr or block
+ identifier::
+  class string, an identifier in the CU's language
+ filename::
+  class string, name of a source file or directory
+ fileidx::
+  class constant, an index into the CU's file table (e.g. DW_AT_decl_file)
+ lineidx::
+  class constant, a line number (e.g. DW_AT_decl_line)
+ string::
+  class string, string not an identifier or filename
+ enum-constant::
+  class constant, with a known set of values `DW_FOO_*`[[br]]
+  This is actually numerous value spaces all treated similarly.
+ constant::
+  class constant or block or string, a target value[[br]]
+  If certain `data` forms, might be relocatable; if `block` form,
+  might contain relocatable portions.
+To interpret an attribute's value, you must know what value space that
+attribute is in.  This comes from fixed knowledge of the known attribute
+names (`DW_AT_*`).  For the most part, just the attribute name tells you
+the value space.  However, e.g. `DW_AT_name` is overloaded as filename and
+identifier.  So for the full general case, you need to know the tag name
+and the attribute name (`DW_TAG_*`, `DW_AT_*`).  This pair maps into a set
+of value spaces that are expected for that attribute.  If the attribute has
+a form that can't be one of those value spaces, then the consumer barfs.
+When a transformation (such as compression) comes across an attribute whose
+name is unrecognized and whose form is ambiguous (`string`, `data`), then
+it cannot necessarily complete a safe transformation.  For example, any
+`data` form might be a `loclistptr`, so you can't rewrite the `.debug_loc`
+section in case the unknown attribute encoded an offset into the section;
+any `string` form might be a file name, so you can't rewrite file names;
+Some combinations of value spaces create new ambiguities.
+For example, if something is either a location or a constant,
+then a `data` form is either an integer constant or a loclistptr.
+If there are in fact any such combinations in the known set,
+there has to be some priority chosen to disambiguate.
+==== relocatable values ====
+The address and constant value spaces can have values determined by
+relocations to the allocated sections.  A consumer either wants implicitly
+relocated values (libdwfl) or explicit relocation information (compression
+and other transformations).  e.g. `GElf_Rela` + `GElf_Sym` + name
+There can also be relocations to the `.debug_*` sections,
+in `DW_FORM_ref_addr`, `DW_FORM_strp`, and all the *ptr classes.
+These are not interesting to a consumer or producer application,
+and can be handled (and generated) entirely under the covers in
+libdw.  In "final DWARF" (i.e. final links plus .ko), all of these
+can be applied in place and the relocs dropped.
+==== C++ interfaces ====
+Using `attr_value` objects will be based on the value space.
+This means it will depend on the attribute and tag.
+We want simple methods to extract in the expected value space and throw if
+the form is a mismatch.  We also want methods to ask which value space it
+is, and some polymorphic methods like generating printable strings.
+The reference values are their own can of worms unlike the others.
diff --git a/DwarfXml b/DwarfXml
new file mode 100644
index 0000000..2e30bc2
--- /dev/null
+++ b/DwarfXml
@@ -0,0 +1,80 @@
+== XML representation of DWARF ==
+The XML/DOM view of data is a pretty good fit for DWARF trees.
+We use it informally all the time to describe fragments of DWARF
+in discussion.  On the dwarf branch, tests/dwarf-print produces
+an almost-XML format of DWARF data.
+It would be valuable in several ways to define a rigorous and proper
+mapping of DWARF to XML.
+=== implementation ===
+If the world were built of ideal modular components, it would be very easy
+to write adapters that map between DOM interfaces and the C++ dwarf
+interfaces in both directions.  I still think this is the right way to
+approach it.  But when I looked at some XML implementation libraries, I
+didn't find any where it really seemed they were done in the right modular
+way to make this easy or clean.
+=== output uses ===
+The obvious use for XML output of DWARF is just to look at it, like we do
+with dwarf-print.  Having it be true XML (or rather, a true DOM, which can
+be printed as XML) has several advantages.
+ * You can use fancy XML-based viewers on it.
+ * You can apply XML-based technologies like ?XPath to it.
+   This could be an interesting way to do lots of prototyping work,
+   using stock XML tools to do queries, subsets, and transformations on
+   real DWARF data and see what you get out of it.  It's even possible
+   that using something like ?XPath expressions on a DOM implementation
+   that is backed by DWARF rather than XML could be a worthwhile way 
+   to implement something for real.
+=== input uses ===
+One thing that's long been needed is a way to hand-write (or hand-modify)
+DWARF data.  This would be great for things like test cases for elfutils
+and gdb.  You could maintain XML source files and then use elfutils tools
+to produce DWARF output files from that.
+=== schema ===
+What seems worthwhile to express in XML is the "semantic view" of DWARF
+trees.  The encoding details like abbrevs and forms are not very
+interesting.  What maps extremely well is the basic DIE tree structure,
+which is in essence just like the XML DOM: a tree where each node has an
+element type (tag), an unordered dictionary of key/value attribute pairs,
+and an arbitrary number of ordered children nodes.
+In DWARF, some attribute values are complex or indirect things such as
+constant blocks, location expressions, and line information tables.  These
+can't be represented simply in XML as attribute values.  Instead, there
+needs to be an additional family of XML elements outside the ones directly
+representing the DIE tree, and attributes that point to those.
+==== attribute values ====
+In DWARF, each attribute value is encoded in a form, and the combination
+of the form and the tag/attribute where it appears indicates a ''value
+space''.  The values of XML attributes are not distinguished this way.
+So a thorough XML representation would need to use some text encoding to
+indicate the DWARF value space.  Here are some examples:
+ * unadorned integers mean a '''constant'''
+ * ''addr:0x123'' means a literal '''address'''
+ * ''addr:foo+5'' means a symbolic '''address''' formed with a symbol
+   reference and an addend
+ * ''addr:foo'' means a symbolic '''address''' with an addend of zero
+ * ''addr:(.foo)0x234'' means a section-relative '''address'''
+ * ''#123'' means a '''reference'''.  The XML representation of the
+   referent DIE would use a fake attribute like ''id=#123'' to match.
+ * ''#loc_123'' means a location expression.  There would be an additional
+   tree of XML elements outside the '''<compile_unit>''' tree that defines
+   referent location expressions.
+ * ''#loclist_123'' means a location list.  Another outside tree defines
+   these.
+ * ''#rangelist_123'' for a range list, same story.
+ * ''#const_123'' for a constant block, similar again.
diff --git a/DwflProjects b/DwflProjects
new file mode 100644
index 0000000..6891923
--- /dev/null
+++ b/DwflProjects
@@ -0,0 +1,28 @@
+= libdwfl project ideas =
+ * `com.redhat.elfutils.roland.segments` has unfinished core-file support work, among other things.
+   Enable `eu-unstrip -n --core=file` as well as `eu-unstrip -n -p PID`.
+ * New standard report option to read a file in format like `eu-unstrip -n` output or similar.
+ * `"oops2line"`
+== The `com.redhat.elfutils.pmachata.sharing` branch ==
+ * The typical debugger of the future will handle multiple processes and have dwarf data loaded for
+   all of them.  Now most of the libraries that these processes use are the same each time (think
+   e.g. libc, widget toolkit, etc.).  So the goal
+   is for libdwfl to handle this case smartly, and share the data where possible.  That was the original
+   goal anyway, currently the branch diverged towards the more general libdwfl cleanup.
+ * Current todo:
+   - cleanup error handling
+   - share dwarf structures (see 20071108225922.200534D0564@magilla.localdomain 8 Nov 2007)
+   - _``_libdwfl_open_by_build_id can yield the dwfl_file
+   - move .sym* members into shared cache (see 20071206015343.3959F26F8EA@magilla.localdomain 5 Dec 2007)
+ * Being reviewed:
+   - ELF_C_FDREAD hack fixup (see 20080114021912.68FA726FA0B@magilla.localdomain)
+   - share build id, remove dwfl_file.valid (20080114021912.68FA726FA0B@magilla.localdomain)
+ * Done (old todo items)
\ No newline at end of file
diff --git a/OldWikiIndex b/OldWikiIndex
new file mode 100644
index 0000000..2d6046a
--- /dev/null
+++ b/OldWikiIndex
@@ -0,0 +1,61 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+   "http://www.w3.org/TR/html4/loose.dtd">
+  <HEAD>
+    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
+    <LINK HREF="default.css" REL="stylesheet" TYPE="text/css">
+    <TITLE>Old elfutils wiki index</TITLE>
+  </HEAD>
+  <BODY>
+    <H1>Old elfutils wiki index</H1>
+    <DIV CLASS="content">
+    <CENTER>
+    <TR><TD>
+        <DIV CLASS="abstract">
+          This is an index of pages from the old
+	  <A HREF="http://elfutils.org/">elfutils</A> fedorahosted trac wiki.
+	  The pages are most likely years out of date. But contain interesting
+	  history about the project and some (unfinished) goals.
+        </DIV>
+    </TD></TR>
+    </TABLE>
+    </CENTER>
+    <UL>
+      <LI><A HREF="RoadMap">RoadMap</A></LI>
+      <LI><A HREF="RpmDebugInfo">RpmDebugInfo</A></LI>
+      <LI><A HREF="DebugInfo">DebugInfo</A></LI>
+      <LI><A HREF="DebugInfoTesting">DebugInfoTesting</A></LI>
+      <LI><A HREF="DwarfArchive">DwarfArchive</A></LI>
+      <LI><A HREF="DwarfCmp">DwarfCmp</A></LI>
+      <LI><A HREF="DwarfInterObject">DwarfInterObject</A></LI>
+      <LI><A HREF="DwarfLint">DwarfLint</A></LI>
+      <LI><A HREF="DwarfLocations">DwarfLocations</A></LI>
+      <LI><A HREF="DwarfOutput">DwarfOutput</A></LI>
+      <LI><A HREF="DwarfProducer">DwarfProducer</A></LI>
+      <LI><A HREF="DwarfReaderSharing">DwarfReaderSharing</A></LI>
+      <LI><A HREF="DwarfRelocatable">DwarfRelocatable</A></LI>
+      <LI><A HREF="DwarfRelocs">DwarfRelocs</A></LI>
+      <LI><A HREF="DwarfTasks">DwarfTasks</A></LI>
+      <LI><A HREF="DwarfUnwinder">DwarfUnwinder</A></LI>
+      <LI><A HREF="DwarfValues">DwarfValues</A></LI>
+      <LI><A HREF="DwarfXml">DwarfXml</A></LI>
+      <LI><A HREF="DwflProjects">DwflProjects</A></LI>
+      <LI><A HREF="SuspiciousDebuginfoCases">SuspiciousDebuginfoCases</A></LI>
+      <LI><A HREF="ThreadSafety">ThreadSafety</A></LI>
+    </UL>
+    <CENTER><HR WIDTH="95%"></CENTER>
+    <DIV ALIGN="center">
+      Questions?
+      Send them to <A HREF="mailto:elfutils-devel@sourceware.org">elfutils-devel@sourceware.org</A>
+      or visit us on <TT>irc.freenode.net</TT> channel <TT>#elfutils</TT>.
+    </DIV>
+    </DIV>
+  </BODY>
diff --git a/RoadMap b/RoadMap
new file mode 100644
index 0000000..6b7567b
--- /dev/null
+++ b/RoadMap
@@ -0,0 +1,48 @@
+= elfutils development roadmap =
+ * project admin & community-building
+   * git migration
+   * roadmaps on fedorahosted wiki, planning on mailing list
+   * weekly status postings by active contributors (roland, pmachata)
+     * proposal: post each week, latest Monday 11:59pm UTC
+   * f11 feature page for debuginfo revamp
+     * tasks & delivery schedule
+   * ping possible future contributors
+   * buildbot?
+ * regular maintenance
+   * Fedora bugzilla
+     https://bugzilla.redhat.com/buglist.cgi?query_format=advanced&classification=Fedora&component=elfutils&component=libelf&bug_status=NEW&bug_status=ASSIGNED&bug_status=MODIFIED&known_name=Fedora%20elfutils%20bugs&query_based_on=Fedora%20elfutils%20bugs
+   * 0.138 release
+     * pending fixes already on trunk
+     * any bad bugs open (?)
+   * test coverage
+     * autotest conversion?
+       * handy for finer-grained lcov integration
+       * maybe less typing for new test scripts?
+ * dwarf compression plan: DwarfTasks
+   * DwarfProducer
+     * DwarfLint
+     * DwarfCmp
+     * DwarfOutput
+   * RpmDebugInfo
+ * consumer support
+   * main driver is systemtap
+   * libdwfl features
+     * zlib [https://bugzilla.redhat.com/show_bug.cgi?id=472136 RHBZ#472136]
+     * sharing
+       * pmachata/sharing branch
+       * not (yet?) priority for stap
+ * unscheduled projects
+   * DwarfRelocatable
+   * DwarfReaderSharing
+   * DwarfArchive
+   * DwarfXml
+   * DwarfUnwinder
+   * DwarfLocations
+   * DwflProjects
+     * migrate details here or to individual wiki pages
+     * ThreadSafety
+     * location hacking
+     * unwind branch
+   * acme's dwarves
+     * figure out bits from this work useful to modularize/incorporate
diff --git a/RpmDebugInfo b/RpmDebugInfo
new file mode 100644
index 0000000..b274c71
--- /dev/null
+++ b/RpmDebugInfo
@@ -0,0 +1,22 @@
+== debuginfo changes for rpmbuild ==
+https://fedoraproject.org/wiki/Features/DebugInfoRevamp targets Fedora 12 for deploying new DebugInfo schemes.
+=== separate -debuginfo from -debuginfo-src ===
+rpm macro/script changes to put the /usr/src/debug files in one rpm and the /usr/lib/debug files in another.
+Do some measurements on the rawhide files to see how big the sources actually are.
+ * rawhide 2009-12-15 x86_64 has ~24G of /usr/lib/debug files and ~8G of /usr/src/debug files
+   * note need to do an rpmbuild experiment to estimate compressed src sizes, probably compress much better than debug files do
+=== replace find-debuginfo.sh ===
+Possibly also tweak the macros that invoke it and use its output.
+We hope some elfutils components will be doing most of the complex DebugInfo work for the archive plan.
+The rpm script will invoke that once for each (class,data,machine) tuple represented by the binaries being processed.
+It will do what eu-strip now does, but roll the .debug files straight into the archive.
+The DWARF processor will also have options to rewrite the source directory names and collect source file names, replacing debugedit.
+Perhaps all these components will be accessible via Python bindings, which would be natural for rpmites to have the rpm script be in.
\ No newline at end of file
diff --git a/SuspiciousDebuginfoCases b/SuspiciousDebuginfoCases
new file mode 100644
index 0000000..5d71d68
--- /dev/null
+++ b/SuspiciousDebuginfoCases
@@ -0,0 +1,148 @@
+= Suspicious Debuginfo Cases =
+DwarfLint discovered a couple interesting cases that need to be investigated and either tracked down to a bug in our toolchain, or a bug in dwarflint.
+The following happens /a lot/.  Each of the following messages is just a sample of a much larger set of messages.  The output has around 300K lines total right now.
+$ ~/elfutils/build/src/dwarflint --strict libsclx.so.debug 
+warning: .debug_abbrev: 0x124993..0x124993: unnecessary padding with zero bytes.
+Another file (darkice-debuginfo-0.19-3.fc11.i386:usr:lib:debug:usr:bin:darkice.debug/darkice.debug):
+warning: .debug_abbrev: 0x76a8..0x76a9: unnecessary padding with zero bytes.
+warning: .debug_abbrev: 0x81da..0x81da: unnecessary padding with zero bytes.
+Associated eu-readelf output:
+          attr: declaration, form: flag, offset: 0x769d
+          attr: const_value, form: data2, offset: 0x769f
+Abbreviation section at offset 30376:
+Abbreviation section at offset 30377:
+Abbreviation section at offset 30378:
+ [    1] offset: 30378, children: yes, tag: compile_unit
+          attr: producer, form: strp, offset: 0x76aa
+          attr: language, form: data1, offset: 0x76ac
+          attr: name, form: strp, offset: 0x76ae
+(Padding forms dummy empty section in the DWARF file)
+warning: .debug_info: DIE 0x3d9056 (abbr. attribute 0x1020c): caused by this reference.
+warning: .debug_info: DIE 0x3df566 (abbr. attribute 0x101d2): caused by this reference.
+error: .debug_ranges: rangelist 0x29e0: the range 0x1ef920..0x1ef92f overlaps with another one.
+error: .debug_ranges: rangelist 0x29f0: the range 0x1ef930..0x1ef938 overlaps with another one.
+error: .debug_ranges: rangelist 0x2a00: the range 0x1ef940..0x1ef948 overlaps with another one.
+warning: .debug_ranges: 0xc720..0xc73f: unnecessary padding with zero bytes.
+warning: .debug_ranges: 0xc750..0xc76f: unnecessary padding with zero bytes.
+warning: .debug_ranges: 0xc780..0xc79f: unnecessary padding with zero bytes.
+warning: .debug_ranges: 0xdbd0..0xdd4f: unreferenced non-zero bytes.
+warning: .debug_ranges: 0x1e0c0..0x1e0df: unnecessary padding with zero bytes.
+error: .debug_aranges: addresses 0x1ef830..0x1ef91f of section .text are not covered.
+error: .debug_aranges: addresses 0x1f5ea6..0x1f5eaf of section .text are not covered.
+error: .debug_aranges: addresses 0x1f5eba..0x1f5ebf of section .text are not covered.
+error: .debug_aranges: addresses 0x1f5ece..0x1f5ecf of section .text are not covered.
+warning: .debug_loc: loclist 0x32b6b: entry covers no range.
+warning: .debug_loc: loclist 0x69232: entry covers no range.
+warning: .debug_loc: loclist 0x69245: entry covers no range.
+warning: .debug_loc: loclist 0x73795: entry covers no range.
+warning: .debug_loc: 0x116d3..0x116fb: unreferenced non-zero bytes.
+warning: .debug_loc: 0x1170c..0x11731: unreferenced non-zero bytes.
+warning: .debug_loc: 0x11742..0x11768: unreferenced non-zero bytes.
+warning: .debug_loc: 0x11779..0x1178b: unreferenced non-zero bytes.
+error: .debug_loc: addresses 0x1ef830..0x1ef9bf of section .text are not covered.
+error: .debug_loc: addresses 0x1efb3f..0x1efb3f of section .text are not covered.
+error: .debug_loc: addresses 0x1efbd8..0x1efbdf of section .text are not covered.
+error: .debug_loc: addresses 0x1efbe5..0x1efbef of section .text are not covered.
+error: .debug_aranges: arange 0x160 (CU 88194504): couldn't find a section that the range 0..0x6c covers.
+error: .debug_aranges: arange 0x170 (CU 88194504): the range 0x415180..0x4151c0 overlaps with another one.
+error: .debug_aranges: arange 0x180 (CU 88194504): the range 0x1f6570..0x1f65b3 overlaps with another one.
+error: .debug_aranges: arange 0x190 (CU 88194504): couldn't find a section that the range 0..0xa6 covers.
+error: .debug_aranges: arange 0x1a0 (CU 88194504): the range 0x200270..0x2002b0 overlaps with another one.
+error: .debug_aranges: arange 0x1b0 (CU 88194504): couldn't find a section that the range 0..0x57 covers.
+error: .debug_aranges: arange 0x1d0 (CU 88194504): the range 0x3b2db0..0x3b2e6a overlaps with another one.
+The coverage errors above seem to be real.  This is eu-readelf output of another file with that problem:
+DWARF section [27] '.debug_aranges' at offset 0x30f contains 85 entries:
+ [ 0] start: 0000000000, length:   321, CU DIE offset: 240301
+ [ 1] start: 0000000000, length:   262, CU DIE offset: 240301
+error: .debug_aranges: addresses 0x4980ae..0x4980af of section .text are not covered.
+error: .debug_aranges: addresses 0x4987a9..0x4987af of section .text are not covered.
+error: .debug_aranges: addresses 0x4987ca..0x4987cf of section .text are not covered.
+error: .debug_aranges: addresses 0x4987d8..0x4987df of section .text are not covered.
+error: .debug_aranges: addresses 0x49881a..0x49884f of section .text are not covered.
+warning: .debug_aranges (DIE 0x523a4): missing range 0x1f7000..0x1f729e, present in .debug_ranges.
+warning: .debug_aranges (DIE 0x523a4): missing range 0x1f72a0..0x1f753e, present in .debug_ranges.
+warning: .debug_ranges (DIE 0x89519): missing range 0x1f7570..0x1f7983, present in .debug_aranges.
+warning: .debug_aranges (DIE 0x9474b): missing range 0x1f7990..0x1f799e, present in .debug_ranges.
+warning: .debug_aranges (DIE 0x9474b): missing range 0x1f79a0..0x1f79a8, present in .debug_ranges.
+Another file (cone-debuginfo-0.75-2.fc11.i386:usr:lib:debug:usr:libexec:cone.debug/cone.debug)
+warning: .debug_ranges: rangelist 0xea558: entry covers no range.
+warning: .debug_ranges: rangelist 0xec8c8: entry covers no range.
+And associated eu-readelf printout:
+ [ ea558]  0x081c7270 <_ZN4mail11addressbook14GetAddressListINS_7addressEE14reportProgressEjjjj>..0x081c7270 <_ZN4mail11addressbook14GetAddress
+Some files lack .debug_pubnames (frysk-debuginfo-0.4-8.fc11.i386:usr:lib:debug:usr:lib:frysk:funit-*.debug/*.debug).  Perhaps related to JAVA?
+=== Location leaks out of DIE scope ===
+dwarflint output:
+$ ./proj/elfutils/dwarflint/build/src/dwarflint location-leaks.3dd8cc519e012e6aab3ed42effc9704f
+warning: .debug_loc: loclist 0x38: entry covers no range.
+error: .debug_info: DIE 0x62: attribute `location': PC range [0x400495, 0x40049a) outside containing scope
+error: .debug_info: DIE 0x51: in this context: [0x400498, 0x4004b2)
+appears in:
+ * GNU C 4.4.1 20090725 (Red Hat 4.4.1-2) [attachment:location-leaks.3dd8cc519e012e6aab3ed42effc9704f binary]
+fixed in:
+ * GNU C 4.4.1 20091005 (Red Hat 4.4.1-19)  (That's 4.4.1 with VTA backported)
+== Relocations ==
+=== SHN_COMMON relocations ===
+I found a case where a relocation of an address is formed against SHN_COMMON symbol.  tests/test-nlist.o in elfutils "dwarf" branh exhibits this on x86_64, the last relocation in .rela.debug_info is formed against symbol #25, which is:
+  Num:            Value   Size Type    Bind   Vis          Ndx Name
+   25: 0000000000000004      4 OBJECT  GLOBAL DEFAULT   COMMON bss
+How to handle this?
+=== R_X86_64_DTPOFF32 relocations ===
+Another case from elfutils: ./libelf/elf_error.os contains relocation with type "R_X86_64_DTPOFF32".  EBL punts it.  Is the bug in EBL library, or in the file?
\ No newline at end of file
diff --git a/ThreadSafety b/ThreadSafety
new file mode 100644
index 0000000..05e26e9
--- /dev/null
+++ b/ThreadSafety
@@ -0,0 +1,64 @@
+= Thread Safety Branch =
+This is related to com.redhat.elfutils.pmachata.threads.  The goal here is to make elfutils thread-safe.
+== libelf ==
+libelf is about done, and merged to trunk.
+ - The approach taken was to have a rwlock per Elf.  That lock is rdlocked/wrlocked according to the use.
+   wrlocking is relatively scarce, often necessary only to initialize write-once caches.
+ - All externally
+   visible functions lock on entry, unlock on leave, and are thin wrappers around functions that do the
+   actual work.  The pattern is that for externally visible `elf_X`, the workers are called `__libelf_X_rdlock`
+   and `__libelf_X_wrlock`.  These are called internally: wrlock worker assumes that the caller holds a write
+   lock, rdlock worker assumes a read lock.
+ - The worker may need to relock, e.g. to update a cache.  Because
+   in pthreads, there's no operation to upgrade a lock, the worker does that by first releasing the lock,
+   and wrlocking it immediately.  That means it loses the lock for a while, and Elf can meanwhile be transformed
+   in all kinds of ways.  The caller needs to be aware of that, and take care not to cache any data that might be
+   invalid after the lock is lost.
+== libdw ==
+This is work in progress.  It was dropped for now in favour of other elfutils work.
+ - The approach taken is similar to that taken for libelf.  We have a lock per Dwarf, and rdlock/wrlock that
+   as appropriate if work is done with data structure "descended" from that given Dwarf.  E.g. lock can be
+   taken like this: `rwlock_rdlock (attr->cu->dbg->lock);`
+ - `__libdw_visit_scopes` doesn't do any lock handling itself.  It needs 
+   at least a read lock, but the caller needs to use the right locking 
+   level with respect to the visitor that is called.  It's assumed that 
+   previsit and postvisit may relock.
+ - When handing the control over to external callback (for example 
+   `dwarf_func_inline_instances`), the visitor may need to unlock, so that 
+   the callback can use official (locking) elfutils API, and later lock 
+   again.  The plan is to use this unlock-callback-relock approach in all 
+   places where callbacks are used.
+ - The lock that is taken after callback returns is rdlock.  That makes 
+   sense, because write lock is typically needed only to init caches and 
+   similar, and that is relatively infrequent operation.  However that also 
+   means that the lock level can actually be downgraded.  When taking 
+   wrlock in advance ("we are going to need wrlock anyway, so take it right 
+   away"), the care has to be taken not to call functions that can 
+   downgrade the lock this way.
+ - I've been rich with comments that reason why we don't mind that this 
+   or that function may relock.  This is to mark calls where the relocking 
+   analysis has been done.  These marks, however, may become invalid as the 
+   code evolves.  I don't really know what to do about that, apart from 
+   stripping these marks when the branch stabilizes, and insisting that any 
+   future developers analyse the code again to see if relocking takes 
+   place, and is a problem with respect to their patch.
+ - If the function uses `Dwarf_Die *` as one of the arguments, and gives
+   up the lock (perhaps indirectly via another function), another
+   function can step in and modify that Die either directly, or via
+   `dwarf_child`, `dwarf_siblingof`, etc.  So all functions that call
+   (indirectly) functions that lose lock need to be checked for using `die->`
+   references after the lock may have been lost.  This analysis was not done.
\ No newline at end of file
diff --git a/index.html b/index.html
index c79dccc..0f0b9fe 100644
--- a/index.html
+++ b/index.html
@@ -154,6 +154,7 @@
       <LI>Some design decisions and <A HREF="https://sourceware.org/git/?p=elfutils.git;a=blob_plain;f=NOTES;hb=HEAD">NOTES</A>.</LI>
       <LI>List of <A HREF="DwarfExtensions">DwarfExtensions</A> recognized.</LI>
       <LI>List of <A HREF="ElflintGNU">ElflintGNU</a> issues recognized.</LI>
+      <LI><A HREF="OldWikiIndex">OldWikiIndex</A> (outdated) project list.</LI>
     <CENTER><HR WIDTH="95%"></CENTER>

More information about the Elfutils-devel mailing list