[PATCH 4/5] libctf: deduplicate and sort the string table

Nick Alcock nick.alcock@oracle.com
Sun Jun 30 18:14:00 GMT 2019


ctf.h states:

> [...] the CTF string table does not contain any duplicated strings.

Unfortunately this is entirely untrue: libctf has before now made no
attempt whatsoever to deduplicate the string table. It computes the
string table's length on the fly as it adds new strings to the dynamic
CTF file, and ctf_update() just writes each string to the table and
notes the current write position as it traverses the dynamic CTF file's
data structures and builds the final CTF buffer.  There is no global
view of the strings and no deduplication.

Fix this by erasing the ctf_dtvstrlen dead-reckoning length, and adding
a new dynhash table ctf_str_atoms that maps unique strings to a list
of references to those strings: a reference is a simple uint32_t * to
some value somewhere in the under-construction CTF buffer that needs
updating to note the string offset when the strtab is laid out.

Adding a string is now a simple matter of calling ctf_str_add_ref(),
which adds a new atom to the atoms table, if one doesn't already exist,
and adding the location of the reference to this atom to the refs list
attached to the atom: this works reliably as long as one takes care to
only call ctf_str_add_ref() once the final location of the offset is
known (so you can't call it on a temporary structure and then memcpy()
that structure into place in the CTF buffer, because the ref will still
point to the old location: ctf_update() changes accordingly).

Generating the CTF string table is a matter of calling
ctf_str_write_strtab(), which counts the length and number of elements
in the atoms table using the ctf_dynhash_iter() function we just added,
populating an array of pointers into the atoms table and sorting it into
order (to help compressors), then traversing this table and emitting it,
updating the refs to each atom as we go.  The only complexity here is
arranging to keep the null string at offset zero, since a lot of code in
libctf depends on being able to leave strtab references at 0 to indicate
'no name'.  Once the table is constructed and the refs updated, we know
how long it is, so we can realloc() the partial CTF buffer we allocated
earlier and can copy the table on to the end of it (and purge the refs
because they're not needed any more and have been invalidated by the
realloc() call in any case).

The net effect of all this is a reduction in uncompressed strtab sizes
of about 30% (perhaps a quarter to a half of all strings across the
Linux kernel are eliminated as duplicates). Of course, duplicated
strings are highly redundant, so the space saving after compression is
only about 20%: when the other non-strtab sections are factored in, CTF
sizes shrink by about 10%.

No change in externally-visible API or file format (other than the
reduction in pointless redundancy).

libctf/
	* ctf-impl.h: (struct ctf_strs_writable): New, non-const version of
	struct ctf_strs.
	(struct ctf_dtdef): Note that dtd_data.ctt_name is unpopulated.
	(struct ctf_str_atom): New, disambiguated single string.
	(struct ctf_str_atom_ref): New, points to some other location that
	references this string's offset.
	(struct ctf_file): New members ctf_str_atoms and ctf_str_num_refs.
	Remove member ctf_dtvstrlen: we no longer track the total strlen
	as we add strings.
	(ctf_str_create_atoms): Declare new function in ctf-string.c.
	(ctf_str_free_atoms): Likewise.
	(ctf_str_add): Likewise.
	(ctf_str_add_ref): Likewise.
	(ctf_str_purge_refs): Likewise.
	(ctf_str_write_strtab): Likewise.
	(ctf_realloc): Declare new function in ctf-util.c.

	* ctf-open.c (ctf_bufopen): Create the atoms table.
	(ctf_file_close): Destroy it.
	* ctf-create.c (ctf_update): Copy-and-free it on update.  No longer
	special-case the position of the parname string.  Construct the
	strtab by calling ctf_str_add_ref and ctf_str_write_strtab after the
	rest of each buffer element is constructed, not via open-coding:
	realloc the CTF buffer and append the strtab to it.  No longer
	maintain ctf_dtvstrlen.  Sort the variable entry table later, after
	strtab construction.
	(ctf_copy_membnames): Remove: integrated into ctf_copy_{s,l,e}members.
	(ctf_copy_smembers): Drop the string offset: call ctf_str_add_ref
	after buffer element construction instead.
	(ctf_copy_lmembers): Likewise.
	(ctf_copy_emembers): Likewise.
	(ctf_create): No longer maintain the ctf_dtvstrlen.
	(ctf_dtd_delete): Likewise.
	(ctf_dvd_delete): Likewise.
	(ctf_add_generic): Likewise.
	(ctf_add_enumerator): Likewise.
	(ctf_add_member_offset): Likewise.
	(ctf_add_variable): Likewise.
	(membadd): Likewise.
	* ctf-util.c (ctf_realloc): New, wrapper around realloc that aborts
	if there are active ctf_str_num_refs.
	(ctf_strraw): Move to ctf-string.c.
	(ctf_strptr): Likewise.
	* ctf-string.c: New file, strtab manipulation.

	* Makefile.am (libctf_a_SOURCES): Add it.
	* Makefile.in: Regenerate.
---
 libctf/Makefile.am  |   2 +-
 libctf/Makefile.in  |  12 +-
 libctf/ctf-create.c | 183 ++++++++++--------------
 libctf/ctf-impl.h   |  43 +++++-
 libctf/ctf-open.c   |   2 +
 libctf/ctf-string.c | 330 ++++++++++++++++++++++++++++++++++++++++++++
 libctf/ctf-util.c   |  35 ++---
 7 files changed, 469 insertions(+), 138 deletions(-)
 create mode 100644 libctf/ctf-string.c

Extracted from a much larger patch that tries to slice up identifiers at
underscores and the like to reduce strtab size.  (Doing that reduces strtab size
drastically, but the table of pointers to chunks worsens compressibility by more
than enough to compensate, at least until I think of cleverer ways to decide
which things to encode directly in the strtab to keep chunk table size down.
But this part of it is worthwhile.)

diff --git a/libctf/Makefile.am b/libctf/Makefile.am
index 926c9919c5..43fc78a412 100644
--- a/libctf/Makefile.am
+++ b/libctf/Makefile.am
@@ -34,7 +34,7 @@ noinst_LIBRARIES = libctf.a
 
 libctf_a_SOURCES = ctf-archive.c ctf-dump.c ctf-create.c ctf-decl.c ctf-error.c \
 		   ctf-hash.c ctf-labels.c ctf-lookup.c ctf-open.c ctf-open-bfd.c \
-		   ctf-subr.c ctf-types.c ctf-util.c
+		   ctf-string.c ctf-subr.c ctf-types.c ctf-util.c
 if NEED_CTF_QSORT_R
 libctf_a_SOURCES += ctf-qsort_r.c
 endif
diff --git a/libctf/Makefile.in b/libctf/Makefile.in
index 4fea156c44..c898eb4941 100644
--- a/libctf/Makefile.in
+++ b/libctf/Makefile.in
@@ -132,14 +132,15 @@ libctf_a_AR = $(AR) $(ARFLAGS)
 libctf_a_LIBADD =
 am__libctf_a_SOURCES_DIST = ctf-archive.c ctf-dump.c ctf-create.c \
 	ctf-decl.c ctf-error.c ctf-hash.c ctf-labels.c ctf-lookup.c \
-	ctf-open.c ctf-open-bfd.c ctf-subr.c ctf-types.c ctf-util.c \
-	ctf-qsort_r.c
+	ctf-open.c ctf-open-bfd.c ctf-string.c ctf-subr.c ctf-types.c \
+	ctf-util.c ctf-qsort_r.c
 @NEED_CTF_QSORT_R_TRUE@am__objects_1 = ctf-qsort_r.$(OBJEXT)
 am_libctf_a_OBJECTS = ctf-archive.$(OBJEXT) ctf-dump.$(OBJEXT) \
 	ctf-create.$(OBJEXT) ctf-decl.$(OBJEXT) ctf-error.$(OBJEXT) \
 	ctf-hash.$(OBJEXT) ctf-labels.$(OBJEXT) ctf-lookup.$(OBJEXT) \
-	ctf-open.$(OBJEXT) ctf-open-bfd.$(OBJEXT) ctf-subr.$(OBJEXT) \
-	ctf-types.$(OBJEXT) ctf-util.$(OBJEXT) $(am__objects_1)
+	ctf-open.$(OBJEXT) ctf-open-bfd.$(OBJEXT) ctf-string.$(OBJEXT) \
+	ctf-subr.$(OBJEXT) ctf-types.$(OBJEXT) ctf-util.$(OBJEXT) \
+	$(am__objects_1)
 libctf_a_OBJECTS = $(am_libctf_a_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -331,7 +332,7 @@ AM_CFLAGS = -std=gnu99 @ac_libctf_warn_cflags@ @warn@ @c_warn@ @WARN_PEDANTIC@ @
 noinst_LIBRARIES = libctf.a
 libctf_a_SOURCES = ctf-archive.c ctf-dump.c ctf-create.c ctf-decl.c \
 	ctf-error.c ctf-hash.c ctf-labels.c ctf-lookup.c ctf-open.c \
-	ctf-open-bfd.c ctf-subr.c ctf-types.c ctf-util.c \
+	ctf-open-bfd.c ctf-string.c ctf-subr.c ctf-types.c ctf-util.c \
 	$(am__append_1)
 all: config.h
 	$(MAKE) $(AM_MAKEFLAGS) all-am
@@ -412,6 +413,7 @@ distclean-compile:
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-open-bfd.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-open.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-qsort_r.Po@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-string.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-subr.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-types.Po@am__quote@
 @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ctf-util.Po@am__quote@
diff --git a/libctf/ctf-create.c b/libctf/ctf-create.c
index 86695f5abf..6ab0cf3b88 100644
--- a/libctf/ctf-create.c
+++ b/libctf/ctf-create.c
@@ -29,10 +29,8 @@
 
 /* To create an empty CTF container, we just declare a zeroed header and call
    ctf_bufopen() on it.  If ctf_bufopen succeeds, we mark the new container r/w
-   and initialize the dynamic members.  We set dtvstrlen to 1 to reserve the
-   first byte of the string table for a \0 byte, and we start assigning type
-   IDs at 1 because type ID 0 is used as a sentinel and a not-found
-   indicator.  */
+   and initialize the dynamic members.  We start assigning type IDs at 1 because
+   type ID 0 is used as a sentinel and a not-found indicator.  */
 
 ctf_file_t *
 ctf_create (int *errp)
@@ -82,7 +80,6 @@ ctf_create (int *errp)
   fp->ctf_dtbyname = dtbyname;
   fp->ctf_dthash = dthash;
   fp->ctf_dvhash = dvhash;
-  fp->ctf_dtvstrlen = 1;
   fp->ctf_dtnextid = 1;
   fp->ctf_dtoldid = 0;
   fp->ctf_snapshots = 0;
@@ -101,25 +98,24 @@ ctf_create (int *errp)
 }
 
 static unsigned char *
-ctf_copy_smembers (ctf_dtdef_t *dtd, uint32_t soff, unsigned char *t)
+ctf_copy_smembers (ctf_file_t *fp, ctf_dtdef_t *dtd, unsigned char *t)
 {
   ctf_dmdef_t *dmd = ctf_list_next (&dtd->dtd_u.dtu_members);
   ctf_member_t ctm;
 
   for (; dmd != NULL; dmd = ctf_list_next (dmd))
     {
-      if (dmd->dmd_name)
-	{
-	  ctm.ctm_name = soff;
-	  soff += strlen (dmd->dmd_name) + 1;
-	}
-      else
-	ctm.ctm_name = 0;
+      ctf_member_t *copied;
 
+      ctm.ctm_name = 0;
       ctm.ctm_type = (uint32_t) dmd->dmd_type;
       ctm.ctm_offset = (uint32_t) dmd->dmd_offset;
 
       memcpy (t, &ctm, sizeof (ctm));
+      copied = (ctf_member_t *) t;
+      if (dmd->dmd_name)
+	ctf_str_add_ref (fp, dmd->dmd_name, &copied->ctm_name);
+
       t += sizeof (ctm);
     }
 
@@ -127,26 +123,25 @@ ctf_copy_smembers (ctf_dtdef_t *dtd, uint32_t soff, unsigned char *t)
 }
 
 static unsigned char *
-ctf_copy_lmembers (ctf_dtdef_t *dtd, uint32_t soff, unsigned char *t)
+ctf_copy_lmembers (ctf_file_t *fp, ctf_dtdef_t *dtd, unsigned char *t)
 {
   ctf_dmdef_t *dmd = ctf_list_next (&dtd->dtd_u.dtu_members);
   ctf_lmember_t ctlm;
 
   for (; dmd != NULL; dmd = ctf_list_next (dmd))
     {
-      if (dmd->dmd_name)
-	{
-	  ctlm.ctlm_name = soff;
-	  soff += strlen (dmd->dmd_name) + 1;
-	}
-      else
-	ctlm.ctlm_name = 0;
+      ctf_lmember_t *copied;
 
+      ctlm.ctlm_name = 0;
       ctlm.ctlm_type = (uint32_t) dmd->dmd_type;
       ctlm.ctlm_offsethi = CTF_OFFSET_TO_LMEMHI (dmd->dmd_offset);
       ctlm.ctlm_offsetlo = CTF_OFFSET_TO_LMEMLO (dmd->dmd_offset);
 
       memcpy (t, &ctlm, sizeof (ctlm));
+      copied = (ctf_lmember_t *) t;
+      if (dmd->dmd_name)
+	ctf_str_add_ref (fp, dmd->dmd_name, &copied->ctlm_name);
+
       t += sizeof (ctlm);
     }
 
@@ -154,41 +149,25 @@ ctf_copy_lmembers (ctf_dtdef_t *dtd, uint32_t soff, unsigned char *t)
 }
 
 static unsigned char *
-ctf_copy_emembers (ctf_dtdef_t *dtd, uint32_t soff, unsigned char *t)
+ctf_copy_emembers (ctf_file_t *fp, ctf_dtdef_t *dtd, unsigned char *t)
 {
   ctf_dmdef_t *dmd = ctf_list_next (&dtd->dtd_u.dtu_members);
   ctf_enum_t cte;
 
   for (; dmd != NULL; dmd = ctf_list_next (dmd))
     {
-      cte.cte_name = soff;
+      ctf_enum_t *copied;
+
       cte.cte_value = dmd->dmd_value;
-      soff += strlen (dmd->dmd_name) + 1;
       memcpy (t, &cte, sizeof (cte));
+      copied = (ctf_enum_t *) t;
+      ctf_str_add_ref (fp, dmd->dmd_name, &copied->cte_name);
       t += sizeof (cte);
     }
 
   return t;
 }
 
-static unsigned char *
-ctf_copy_membnames (ctf_dtdef_t *dtd, unsigned char *s)
-{
-  ctf_dmdef_t *dmd = ctf_list_next (&dtd->dtd_u.dtu_members);
-  size_t len;
-
-  for (; dmd != NULL; dmd = ctf_list_next (dmd))
-    {
-      if (dmd->dmd_name == NULL)
-	continue;			/* Skip anonymous members.  */
-      len = strlen (dmd->dmd_name) + 1;
-      memcpy (s, dmd->dmd_name, len);
-      s += len;
-    }
-
-  return s;
-}
-
 /* Sort a newly-constructed static variable array.  */
 
 static int
@@ -220,15 +199,16 @@ int
 ctf_update (ctf_file_t *fp)
 {
   ctf_file_t ofp, *nfp;
-  ctf_header_t hdr;
+  ctf_header_t hdr, *hdrp;
   ctf_dtdef_t *dtd;
   ctf_dvdef_t *dvd;
   ctf_varent_t *dvarents;
+  ctf_strs_writable_t strtab;
 
-  unsigned char *s, *s0, *t;
+  unsigned char *t;
   unsigned long i;
   size_t buf_size, type_size, nvars;
-  void *buf;
+  unsigned char *buf, *newbuf;
   int err;
 
   if (!(fp->ctf_flags & LCTF_RDWR))
@@ -247,9 +227,6 @@ ctf_update (ctf_file_t *fp)
   hdr.cth_magic = CTF_MAGIC;
   hdr.cth_version = CTF_VERSION;
 
-  if (fp->ctf_flags & LCTF_CHILD)
-    hdr.cth_parname = 1;		/* parname added just below.  */
-
   /* Iterate through the dynamic type definition list and compute the
      size of the CTF type section we will need to generate.  */
 
@@ -298,15 +275,13 @@ ctf_update (ctf_file_t *fp)
   for (nvars = 0, dvd = ctf_list_next (&fp->ctf_dvdefs);
        dvd != NULL; dvd = ctf_list_next (dvd), nvars++);
 
-  /* Fill in the string table and type offset and size, compute the size
-     of the entire CTF buffer we need, and then allocate a new buffer and
-     memcpy the finished header to the start of the buffer.  */
+  /* Compute the size of the CTF buffer we need, sans only the string table,
+     then allocate a new buffer and memcpy the finished header to the start of
+     the buffer.  (We will adjust this later with strtab length info.)  */
 
   hdr.cth_typeoff = hdr.cth_varoff + (nvars * sizeof (ctf_varent_t));
   hdr.cth_stroff = hdr.cth_typeoff + type_size;
-  hdr.cth_strlen = fp->ctf_dtvstrlen;
-  if (fp->ctf_parname != NULL)
-    hdr.cth_strlen += strlen (fp->ctf_parname) + 1;
+  hdr.cth_strlen = 0;
 
   buf_size = sizeof (ctf_header_t) + hdr.cth_stroff + hdr.cth_strlen;
 
@@ -315,63 +290,45 @@ ctf_update (ctf_file_t *fp)
 
   memcpy (buf, &hdr, sizeof (ctf_header_t));
   t = (unsigned char *) buf + sizeof (ctf_header_t) + hdr.cth_varoff;
-  s = s0 = (unsigned char *) buf + sizeof (ctf_header_t) + hdr.cth_stroff;
-
-  s[0] = '\0';
-  s++;
 
-  if (fp->ctf_parname != NULL)
-    {
-      memcpy (s, fp->ctf_parname, strlen (fp->ctf_parname) + 1);
-      s += strlen (fp->ctf_parname) + 1;
-    }
+  hdrp = (ctf_header_t *) buf;
+  if ((fp->ctf_flags & LCTF_CHILD) && (fp->ctf_parname != NULL))
+    ctf_str_add_ref (fp, fp->ctf_parname, &hdrp->cth_parname);
 
-  /* Work over the variable list, translating everything into
-     ctf_varent_t's and filling out the string table, then sort the buffer
-     of ctf_varent_t's.  */
+  /* Work over the variable list, translating everything into ctf_varent_t's and
+     prepping the string table.  */
 
   dvarents = (ctf_varent_t *) t;
   for (i = 0, dvd = ctf_list_next (&fp->ctf_dvdefs); dvd != NULL;
        dvd = ctf_list_next (dvd), i++)
     {
       ctf_varent_t *var = &dvarents[i];
-      size_t len = strlen (dvd->dvd_name) + 1;
 
-      var->ctv_name = (uint32_t) (s - s0);
+      ctf_str_add_ref (fp, dvd->dvd_name, &var->ctv_name);
       var->ctv_type = dvd->dvd_type;
-      memcpy (s, dvd->dvd_name, len);
-      s += len;
     }
   assert (i == nvars);
 
-  ctf_qsort_r (dvarents, nvars, sizeof (ctf_varent_t), ctf_sort_var, s0);
   t += sizeof (ctf_varent_t) * nvars;
 
   assert (t == (unsigned char *) buf + sizeof (ctf_header_t) + hdr.cth_typeoff);
 
-  /* We now take a final lap through the dynamic type definition list and
-     copy the appropriate type records and strings to the output buffer.  */
+  /* We now take a final lap through the dynamic type definition list and copy
+     the appropriate type records to the output buffer, noting down the
+     strings as we go.  */
 
   for (dtd = ctf_list_next (&fp->ctf_dtdefs);
        dtd != NULL; dtd = ctf_list_next (dtd))
     {
-
       uint32_t kind = LCTF_INFO_KIND (fp, dtd->dtd_data.ctt_info);
       uint32_t vlen = LCTF_INFO_VLEN (fp, dtd->dtd_data.ctt_info);
 
       ctf_array_t cta;
       uint32_t encoding;
       size_t len;
+      ctf_stype_t *copied;
 
-      if (dtd->dtd_name != NULL)
-	{
-	  dtd->dtd_data.ctt_name = (uint32_t) (s - s0);
-	  len = strlen (dtd->dtd_name) + 1;
-	  memcpy (s, dtd->dtd_name, len);
-	  s += len;
-	}
-      else
-	dtd->dtd_data.ctt_name = 0;
+      dtd->dtd_data.ctt_name = 0;
 
       if (dtd->dtd_data.ctt_size != CTF_LSIZE_SENT)
 	len = sizeof (ctf_stype_t);
@@ -379,6 +336,9 @@ ctf_update (ctf_file_t *fp)
 	len = sizeof (ctf_type_t);
 
       memcpy (t, &dtd->dtd_data, len);
+      copied = (ctf_stype_t *) t;  /* name is at the start: constant offset.  */
+      if (dtd->dtd_name)
+	ctf_str_add_ref (fp, dtd->dtd_name, &copied->ctt_name);
       t += len;
 
       switch (kind)
@@ -432,24 +392,47 @@ ctf_update (ctf_file_t *fp)
 	case CTF_K_STRUCT:
 	case CTF_K_UNION:
 	  if (dtd->dtd_data.ctt_size < CTF_LSTRUCT_THRESH)
-	    t = ctf_copy_smembers (dtd, (uint32_t) (s - s0), t);
+	    t = ctf_copy_smembers (fp, dtd, t);
 	  else
-	    t = ctf_copy_lmembers (dtd, (uint32_t) (s - s0), t);
-	  s = ctf_copy_membnames (dtd, s);
+	    t = ctf_copy_lmembers (fp, dtd, t);
 	  break;
 
 	case CTF_K_ENUM:
-	  t = ctf_copy_emembers (dtd, (uint32_t) (s - s0), t);
-	  s = ctf_copy_membnames (dtd, s);
+	  t = ctf_copy_emembers (fp, dtd, t);
 	  break;
 	}
     }
   assert (t == (unsigned char *) buf + sizeof (ctf_header_t) + hdr.cth_stroff);
 
+  /* Construct the final string table and fill out all the string refs with the
+     final offsets.  Then purge the refs list, because we're about to move this
+     strtab onto the end of the buf, invalidating all the offsets.  */
+  strtab = ctf_str_write_strtab (fp);
+  ctf_str_purge_refs (fp);
+
+  /* Now the string table is constructed, we can sort the buffer of
+     ctf_varent_t's.  */
+  ctf_qsort_r (dvarents, nvars, sizeof (ctf_varent_t), ctf_sort_var,
+	       strtab.cts_strs);
+
+  if ((newbuf = ctf_realloc (fp, buf, buf_size + strtab.cts_len)) == NULL)
+    {
+      ctf_free (buf);
+      ctf_free (strtab.cts_strs);
+      return (ctf_set_errno (fp, EAGAIN));
+    }
+  buf = newbuf;
+  memcpy (buf + buf_size, strtab.cts_strs, strtab.cts_len);
+  hdrp = (ctf_header_t *) buf;
+  hdrp->cth_strlen = strtab.cts_len;
+  buf_size += hdrp->cth_strlen;
+  ctf_free (strtab.cts_strs);
+
   /* Finally, we are ready to ctf_simple_open() the new container.  If this
      is successful, we then switch nfp and fp and free the old container.  */
 
-  if ((nfp = ctf_simple_open (buf, buf_size, NULL, 0, 0, NULL, 0, &err)) == NULL)
+  if ((nfp = ctf_simple_open ((char *) buf, buf_size, NULL, 0, 0, NULL,
+			      0, &err)) == NULL)
     {
       ctf_free (buf);
       return (ctf_set_errno (fp, err));
@@ -466,7 +449,6 @@ ctf_update (ctf_file_t *fp)
   nfp->ctf_dtbyname = fp->ctf_dtbyname;
   nfp->ctf_dvhash = fp->ctf_dvhash;
   nfp->ctf_dvdefs = fp->ctf_dvdefs;
-  nfp->ctf_dtvstrlen = fp->ctf_dtvstrlen;
   nfp->ctf_dtnextid = fp->ctf_dtnextid;
   nfp->ctf_dtoldid = fp->ctf_dtnextid - 1;
   nfp->ctf_snapshots = fp->ctf_snapshots + 1;
@@ -476,6 +458,9 @@ ctf_update (ctf_file_t *fp)
 
   fp->ctf_dtbyname = NULL;
   fp->ctf_dthash = NULL;
+  ctf_str_free_atoms (nfp);
+  nfp->ctf_str_atoms = fp->ctf_str_atoms;
+  fp->ctf_str_atoms = NULL;
   memset (&fp->ctf_dtdefs, 0, sizeof (ctf_list_t));
 
   fp->ctf_dvhash = NULL;
@@ -559,10 +544,7 @@ ctf_dtd_delete (ctf_file_t *fp, ctf_dtdef_t *dtd)
 	   dmd != NULL; dmd = nmd)
 	{
 	  if (dmd->dmd_name != NULL)
-	    {
-	      fp->ctf_dtvstrlen -= strlen (dmd->dmd_name) + 1;
 	      ctf_free (dmd->dmd_name);
-	    }
 	  nmd = ctf_list_next (dmd);
 	  ctf_free (dmd);
 	}
@@ -579,8 +561,6 @@ ctf_dtd_delete (ctf_file_t *fp, ctf_dtdef_t *dtd)
       name = ctf_prefixed_name (kind, dtd->dtd_name);
       ctf_dynhash_remove (fp->ctf_dtbyname, name);
       free (name);
-
-      fp->ctf_dtvstrlen -= strlen (dtd->dtd_name) + 1;
       ctf_free (dtd->dtd_name);
     }
 
@@ -638,8 +618,6 @@ void
 ctf_dvd_delete (ctf_file_t *fp, ctf_dvdef_t *dvd)
 {
   ctf_dynhash_remove (fp->ctf_dvhash, dvd->dvd_name);
-
-  fp->ctf_dtvstrlen -= strlen (dvd->dvd_name) + 1;
   ctf_free (dvd->dvd_name);
 
   ctf_list_delete (&fp->ctf_dvdefs, dvd);
@@ -763,9 +741,6 @@ ctf_add_generic (ctf_file_t *fp, uint32_t flag, const char *name,
   dtd->dtd_name = s;
   dtd->dtd_type = type;
 
-  if (s != NULL)
-    fp->ctf_dtvstrlen += strlen (s) + 1;
-
   if (ctf_dtd_insert (fp, dtd) < 0)
     {
       ctf_free (dtd);
@@ -1272,7 +1247,6 @@ ctf_add_enumerator (ctf_file_t *fp, ctf_id_t enid, const char *name,
   dtd->dtd_data.ctt_info = CTF_TYPE_INFO (kind, root, vlen + 1);
   ctf_list_append (&dtd->dtd_u.dtu_members, dmd);
 
-  fp->ctf_dtvstrlen += strlen (s) + 1;
   fp->ctf_flags |= LCTF_DIRTY;
 
   return 0;
@@ -1392,9 +1366,6 @@ ctf_add_member_offset (ctf_file_t *fp, ctf_id_t souid, const char *name,
   dtd->dtd_data.ctt_info = CTF_TYPE_INFO (kind, root, vlen + 1);
   ctf_list_append (&dtd->dtd_u.dtu_members, dmd);
 
-  if (s != NULL)
-    fp->ctf_dtvstrlen += strlen (s) + 1;
-
   fp->ctf_flags |= LCTF_DIRTY;
   return 0;
 }
@@ -1456,7 +1427,6 @@ ctf_add_variable (ctf_file_t *fp, const char *name, ctf_id_t ref)
       return -1;			/* errno is set for us.  */
     }
 
-  fp->ctf_dtvstrlen += strlen (name) + 1;
   fp->ctf_flags |= LCTF_DIRTY;
   return 0;
 }
@@ -1536,9 +1506,6 @@ membadd (const char *name, ctf_id_t type, unsigned long offset, void *arg)
 
   ctf_list_append (&ctb->ctb_dtd->dtd_u.dtu_members, dmd);
 
-  if (s != NULL)
-    ctb->ctb_file->ctf_dtvstrlen += strlen (s) + 1;
-
   ctb->ctb_file->ctf_flags |= LCTF_DIRTY;
   return 0;
 }
diff --git a/libctf/ctf-impl.h b/libctf/ctf-impl.h
index 2601e1b082..b51118cc6f 100644
--- a/libctf/ctf-impl.h
+++ b/libctf/ctf-impl.h
@@ -71,6 +71,12 @@ typedef struct ctf_strs
   size_t cts_len;		/* Size of string table in bytes.  */
 } ctf_strs_t;
 
+typedef struct ctf_strs_writable
+{
+  char *cts_strs;		/* Base address of string table.  */
+  size_t cts_len;		/* Size of string table in bytes.  */
+} ctf_strs_writable_t;
+
 typedef struct ctf_dmodel
 {
   const char *ctd_name;		/* Data model name.  */
@@ -147,7 +153,7 @@ typedef struct ctf_dtdef
   ctf_list_t dtd_list;		/* List forward/back pointers.  */
   char *dtd_name;		/* Name associated with definition (if any).  */
   ctf_id_t dtd_type;		/* Type identifier for this definition.  */
-  ctf_type_t dtd_data;		/* Type node (see <ctf.h>).  */
+  ctf_type_t dtd_data;		/* Type node: name left unpopulated.  */
   union
   {
     ctf_list_t dtu_members;	/* struct, union, or enum */
@@ -173,6 +179,30 @@ typedef struct ctf_bundle
   ctf_dtdef_t *ctb_dtd;		/* CTF dynamic type definition (if any).  */
 } ctf_bundle_t;
 
+/* Atoms associate strings with a list of the CTF items that reference that
+   string, so that ctf_update() can instantiate all the strings using the
+   ctf_str_atoms and then reassociate them with the real string later.
+
+   Strings can be interned into ctf_str_atom without having refs associated
+   with them, for values that are returned to callers, etc.  Items are only
+   removed from this table on ctf_close(), but on every ctf_update(), all the
+   csa_refs in all entries are purged.  */
+
+typedef struct ctf_str_atom
+{
+  const char *csa_str;		/* Backpointer to string (hash key).  */
+  ctf_list_t csa_refs;		/* This string's refs.  */
+  unsigned long csa_snapshot_id; /* Snapshot ID at time of creation.  */
+} ctf_str_atom_t;
+
+/* The refs of a single string in the atoms table.  */
+
+typedef struct ctf_str_atom_ref
+{
+  ctf_list_t caf_list;		/* List forward/back pointers.  */
+  uint32_t *caf_ref;		/* A single ref to this string.  */
+} ctf_str_atom_ref_t;
+
 /* The ctf_file is the structure used to represent a CTF container to library
    clients, who see it only as an opaque pointer.  Modifications can therefore
    be made freely to this structure without regard to client versioning.  The
@@ -198,6 +228,8 @@ struct ctf_file
   ctf_hash_t *ctf_names;	    /* Hash table of remaining type names.  */
   ctf_lookup_t ctf_lookups[5];	    /* Pointers to hashes for name lookup.  */
   ctf_strs_t ctf_str[2];	    /* Array of string table base and bounds.  */
+  ctf_dynhash_t *ctf_str_atoms;	  /* Hash table of ctf_str_atoms_t.  */
+  uint64_t ctf_str_num_refs;	  /* Number of refs to cts_str_atoms.  */
   const unsigned char *ctf_base;  /* Base of CTF header + uncompressed buffer.  */
   const unsigned char *ctf_buf;	  /* Uncompressed CTF data buffer.  */
   size_t ctf_size;		  /* Size of CTF header + uncompressed data.  */
@@ -223,7 +255,6 @@ struct ctf_file
   ctf_list_t ctf_dtdefs;	  /* List of dynamic type definitions.  */
   ctf_dynhash_t *ctf_dvhash;	  /* Hash of dynamic variable mappings.  */
   ctf_list_t ctf_dvdefs;	  /* List of dynamic variable definitions.  */
-  size_t ctf_dtvstrlen;		  /* Total length of dynamic type+var strings.  */
   unsigned long ctf_dtnextid;	  /* Next dynamic type id to assign.  */
   unsigned long ctf_dtoldid;	  /* Oldest id that has been committed.  */
   unsigned long ctf_snapshots;	  /* ctf_snapshot() plus ctf_update() count.  */
@@ -341,6 +372,13 @@ extern char *ctf_decl_buf (ctf_decl_t *cd);
 
 extern const char *ctf_strraw (ctf_file_t *, uint32_t);
 extern const char *ctf_strptr (ctf_file_t *, uint32_t);
+extern int ctf_str_create_atoms (ctf_file_t *);
+extern void ctf_str_free_atoms (ctf_file_t *);
+extern const char *ctf_str_add (ctf_file_t *, const char *);
+extern const char *ctf_str_add_ref (ctf_file_t *, const char *, uint32_t *);
+extern void ctf_str_rollback (ctf_file_t *, ctf_snapshot_id_t);
+extern void ctf_str_purge_refs (ctf_file_t *);
+extern ctf_strs_writable_t ctf_str_write_strtab (ctf_file_t *);
 
 extern struct ctf_archive *ctf_arc_open_internal (const char *, int *);
 extern struct ctf_archive *ctf_arc_bufopen (const void *, size_t, int *);
@@ -356,6 +394,7 @@ extern ssize_t ctf_pread (int fd, void *buf, ssize_t count, off_t offset);
 _libctf_malloc_
 extern void *ctf_alloc (size_t);
 extern void ctf_free (void *);
+extern void *ctf_realloc (ctf_file_t *, void *, size_t);
 
 _libctf_malloc_
 extern char *ctf_strdup (const char *);
diff --git a/libctf/ctf-open.c b/libctf/ctf-open.c
index 777a6b5ca6..8fc854ae67 100644
--- a/libctf/ctf-open.c
+++ b/libctf/ctf-open.c
@@ -1373,6 +1373,7 @@ ctf_bufopen (const ctf_sect_t *ctfsect, const ctf_sect_t *symsect,
 
   memset (fp, 0, sizeof (ctf_file_t));
   ctf_set_version (fp, &hp, hp.cth_version);
+  ctf_str_create_atoms (fp);
 
   if (_libctf_unlikely_ (hp.cth_version < CTF_VERSION_2))
     fp->ctf_parmax = CTF_MAX_PTYPE_V1;
@@ -1528,6 +1529,7 @@ ctf_file_close (ctf_file_t *fp)
       ctf_dvd_delete (fp, dvd);
     }
   ctf_dynhash_destroy (fp->ctf_dvhash);
+  ctf_str_free_atoms (fp);
 
   ctf_free (fp->ctf_tmp_typeslice);
 
diff --git a/libctf/ctf-string.c b/libctf/ctf-string.c
new file mode 100644
index 0000000000..27bd7c2bba
--- /dev/null
+++ b/libctf/ctf-string.c
@@ -0,0 +1,330 @@
+/* CTF string table management.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of libctf.
+
+   libctf is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   This program is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+   See the GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; see the file COPYING.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <ctf-impl.h>
+#include <string.h>
+
+/* Convert an encoded CTF string name into a pointer to a C string by looking
+  up the appropriate string table buffer and then adding the offset.  */
+const char *
+ctf_strraw (ctf_file_t *fp, uint32_t name)
+{
+  ctf_strs_t *ctsp = &fp->ctf_str[CTF_NAME_STID (name)];
+
+  if (ctsp->cts_strs != NULL && CTF_NAME_OFFSET (name) < ctsp->cts_len)
+    return (ctsp->cts_strs + CTF_NAME_OFFSET (name));
+
+  /* String table not loaded or corrupt offset.  */
+  return NULL;
+}
+
+/* Return a guaranteed-non-NULL pointer to the string with the given CTF
+   name.  */
+const char *
+ctf_strptr (ctf_file_t *fp, uint32_t name)
+{
+  const char *s = ctf_strraw (fp, name);
+  return (s != NULL ? s : "(?)");
+}
+
+/* Remove all refs to a given atom.  */
+static void
+ctf_str_purge_atom_refs (ctf_str_atom_t *atom)
+{
+  ctf_str_atom_ref_t *ref, *next;
+
+  for (ref = ctf_list_next (&atom->csa_refs); ref != NULL; ref = next)
+    {
+      next = ctf_list_next (ref);
+      ctf_list_delete (&atom->csa_refs, ref);
+      ctf_free (ref);
+    }
+}
+
+/* Free an atom (only called on ctf_close().)  */
+static void
+ctf_str_free_atom (void *a)
+{
+  ctf_str_atom_t *atom = a;
+
+  ctf_str_purge_atom_refs (atom);
+  ctf_free (atom);
+}
+
+/* Create the atoms table.  There is always at least one atom in it, the null
+   string.  */
+int
+ctf_str_create_atoms (ctf_file_t *fp)
+{
+  fp->ctf_str_atoms = ctf_dynhash_create (ctf_hash_string, ctf_hash_eq_string,
+					  ctf_free, ctf_str_free_atom);
+  if (fp->ctf_str_atoms == NULL)
+    return -ENOMEM;
+
+  ctf_str_add (fp, "");
+  return 0;
+}
+
+/* Destroy the atoms table.  */
+void
+ctf_str_free_atoms (ctf_file_t *fp)
+{
+  ctf_dynhash_destroy (fp->ctf_str_atoms);
+}
+
+/* Add a string to the atoms table and return it, or return an existing string
+   if present, copying the passed-in string.  Returns NULL only when out of
+   memory (and do not touch the passed-in string in that case).  Possibly
+   augment the ref list with the passed-in ref.  */
+static const char *
+ctf_str_add_ref_internal (ctf_file_t *fp, const char *str,
+			  int add_ref, uint32_t *ref)
+{
+  char *newstr = NULL;
+  ctf_str_atom_t *atom = NULL;
+  ctf_str_atom_ref_t *aref = NULL;
+
+  atom = ctf_dynhash_lookup (fp->ctf_str_atoms, str);
+
+  if (add_ref)
+    {
+      if ((aref = ctf_alloc (sizeof (struct ctf_str_atom_ref))) == NULL)
+	return NULL;
+      aref->caf_ref = ref;
+    }
+
+  if (atom)
+    {
+      if (add_ref)
+	{
+	  ctf_list_append (&atom->csa_refs, aref);
+	  fp->ctf_str_num_refs++;
+	}
+      return atom->csa_str;
+    }
+
+  if ((atom = ctf_alloc (sizeof (struct ctf_str_atom))) == NULL)
+    goto oom;
+  memset (atom, 0, sizeof (struct ctf_str_atom));
+
+  if ((newstr = ctf_strdup (str)) == NULL)
+    goto oom;
+
+  if (ctf_dynhash_insert (fp->ctf_str_atoms, newstr, atom) < 0)
+    goto oom;
+
+  atom->csa_str = newstr;
+  atom->csa_snapshot_id = fp->ctf_snapshots;
+  if (add_ref)
+    {
+      ctf_list_append (&atom->csa_refs, aref);
+      fp->ctf_str_num_refs++;
+    }
+  return newstr;
+
+ oom:
+  ctf_free (atom);
+  ctf_free (aref);
+  ctf_free (newstr);
+  return NULL;
+}
+
+/* Add a string to the atoms table and return it, without augmenting the ref
+   list for this string.  */
+const char *
+ctf_str_add (ctf_file_t *fp, const char *str)
+{
+  if (str)
+    return ctf_str_add_ref_internal (fp, str, FALSE, 0);
+  return NULL;
+}
+
+/* A ctf_dynhash_iter_remove() callback that removes atoms later than a given
+   snapshot ID.  */
+static int
+ctf_str_rollback_atom (void *key _libctf_unused_, void *value, void *arg)
+{
+  ctf_str_atom_t *atom = (ctf_str_atom_t *) value;
+  ctf_snapshot_id_t *id = (ctf_snapshot_id_t *) arg;
+
+  return (atom->csa_snapshot_id > id->snapshot_id);
+}
+
+/* Roll back, deleting all atoms created after a particular ID.  */
+void
+ctf_str_rollback (ctf_file_t *fp, ctf_snapshot_id_t id)
+{
+  ctf_dynhash_iter_remove (fp->ctf_str_atoms, ctf_str_rollback_atom, &id);
+}
+
+/* Like ctf_str_add(), but additionally augment the atom's refs list with the
+   passed-in ref, whether or not the string is already present.  There is no
+   attempt to deduplicate the refs list (but duplicates are harmless).  */
+const char *
+ctf_str_add_ref (ctf_file_t *fp, const char *str, uint32_t *ref)
+{
+  if (str)
+    return ctf_str_add_ref_internal (fp, str, TRUE, ref);
+  return NULL;
+}
+
+/* An adaptor around ctf_purge_atom_refs.  */
+static void
+ctf_str_purge_one_atom_refs (void *key _libctf_unused_, void *value,
+			     void *arg _libctf_unused_)
+{
+  ctf_str_atom_t *atom = (ctf_str_atom_t *) value;
+  ctf_str_purge_atom_refs (atom);
+}
+
+/* Remove all the recorded refs from the atoms table.  */
+void
+ctf_str_purge_refs (ctf_file_t *fp)
+{
+  if (fp->ctf_str_num_refs > 0)
+    ctf_dynhash_iter (fp->ctf_str_atoms, ctf_str_purge_one_atom_refs, NULL);
+  fp->ctf_str_num_refs = 0;
+}
+
+/* Update a list of refs to the specified value. */
+static void
+ctf_str_update_refs (ctf_str_atom_t *refs, uint32_t value)
+{
+  ctf_str_atom_ref_t *ref;
+
+  for (ref = ctf_list_next (&refs->csa_refs); ref != NULL;
+       ref = ctf_list_next (ref))
+      *(ref->caf_ref) = value;
+}
+
+/* State shared across the strtab write process.  */
+typedef struct ctf_strtab_write_state
+{
+  /* Strtab we are writing, and the number of strings in it.  */
+  ctf_strs_writable_t *strtab;
+  size_t strtab_count;
+
+  /* Pointers to (existing) atoms in the atoms table, for qsorting.  */
+  ctf_str_atom_t **sorttab;
+
+  /* Loop counter for sorttab population.  */
+  size_t i;
+
+  /* The null-string atom (skipped during population).  */
+  ctf_str_atom_t *nullstr;
+} ctf_strtab_write_state_t;
+
+/* Count the number of entries in the strtab, and its length.  */
+static void
+ctf_str_count_strtab (void *key _libctf_unused_, void *value,
+	      void *arg)
+{
+  ctf_str_atom_t *atom = (ctf_str_atom_t *) value;
+  ctf_strtab_write_state_t *s = (ctf_strtab_write_state_t *) arg;
+
+  s->strtab->cts_len += strlen (atom->csa_str) + 1;
+  s->strtab_count++;
+}
+
+/* Populate the sorttab with pointers to the strtab atoms.  */
+static void
+ctf_str_populate_sorttab (void *key _libctf_unused_, void *value,
+		  void *arg)
+{
+  ctf_str_atom_t *atom = (ctf_str_atom_t *) value;
+  ctf_strtab_write_state_t *s = (ctf_strtab_write_state_t *) arg;
+
+  /* Skip the null string.  */
+  if (s->nullstr == atom)
+    return;
+
+  s->sorttab[s->i++] = atom;
+}
+
+/* Sort the strtab.  */
+static int
+ctf_str_sort_strtab (const void *a, const void *b)
+{
+  ctf_str_atom_t **one = (ctf_str_atom_t **) a;
+  ctf_str_atom_t **two = (ctf_str_atom_t **) b;
+
+  return (strcmp ((*one)->csa_str, (*two)->csa_str));
+}
+
+/* Write out and return a strtab containing all strings with recorded refs,
+   adjusting the refs to refer to the corresponding string.  The returned
+   strtab may be NULL on error.  */
+ctf_strs_writable_t
+ctf_str_write_strtab (ctf_file_t *fp)
+{
+  ctf_strs_writable_t strtab;
+  ctf_str_atom_t *nullstr;
+  uint32_t cur_stroff = 0;
+  ctf_strtab_write_state_t s;
+  ctf_str_atom_t **sorttab;
+  size_t i;
+
+  memset (&strtab, 0, sizeof (struct ctf_strs_writable));
+  memset (&s, 0, sizeof (struct ctf_strtab_write_state));
+  s.strtab = &strtab;
+
+  nullstr = ctf_dynhash_lookup (fp->ctf_str_atoms, "");
+  if (!nullstr)
+    {
+      ctf_dprintf ("Internal error: null string not found in strtab.\n");
+      strtab.cts_strs = NULL;
+      return strtab;
+    }
+
+  ctf_dynhash_iter (fp->ctf_str_atoms, ctf_str_count_strtab, &s);
+
+  ctf_dprintf ("%lu bytes of strings in strtab.\n",
+	       (unsigned long) strtab.cts_len);
+
+  /* Sort the strtab.  Force the null string to be first.  */
+  sorttab = calloc (s.strtab_count, sizeof (ctf_str_atom_t *));
+  if (!sorttab)
+      return strtab;
+
+  sorttab[0] = nullstr;
+  s.i = 1;
+  s.sorttab = sorttab;
+  s.nullstr = nullstr;
+  ctf_dynhash_iter (fp->ctf_str_atoms, ctf_str_populate_sorttab, &s);
+
+  qsort (&sorttab[1], s.strtab_count - 1, sizeof (ctf_str_atom_t *),
+	 ctf_str_sort_strtab);
+
+  if ((strtab.cts_strs = ctf_alloc (strtab.cts_len)) == NULL)
+    {
+      free (sorttab);
+      return strtab;
+    }
+
+  /* Update the strtab, and all refs.  */
+  for (i = 0; i < s.strtab_count; i++)
+    {
+      strcpy (&strtab.cts_strs[cur_stroff], sorttab[i]->csa_str);
+      ctf_str_update_refs (sorttab[i], cur_stroff);
+      cur_stroff += strlen (sorttab[i]->csa_str) + 1;
+    }
+  free (sorttab);
+
+  return strtab;
+}
diff --git a/libctf/ctf-util.c b/libctf/ctf-util.c
index 730f358a93..b813c0d414 100644
--- a/libctf/ctf-util.c
+++ b/libctf/ctf-util.c
@@ -95,28 +95,6 @@ ctf_sym_to_elf64 (const Elf32_Sym *src, Elf64_Sym *dst)
   return dst;
 }
 
-/* Convert an encoded CTF string name into a pointer to a C string by looking
-  up the appropriate string table buffer and then adding the offset.  */
-
-const char *
-ctf_strraw (ctf_file_t *fp, uint32_t name)
-{
-  ctf_strs_t *ctsp = &fp->ctf_str[CTF_NAME_STID (name)];
-
-  if (ctsp->cts_strs != NULL && CTF_NAME_OFFSET (name) < ctsp->cts_len)
-    return (ctsp->cts_strs + CTF_NAME_OFFSET (name));
-
-  /* String table not loaded or corrupt offset.  */
-  return NULL;
-}
-
-const char *
-ctf_strptr (ctf_file_t *fp, uint32_t name)
-{
-  const char *s = ctf_strraw (fp, name);
-  return (s != NULL ? s : "(?)");
-}
-
 /* Same as strdup(3C), but use ctf_alloc() to do the memory allocation. */
 
 _libctf_malloc_ char *
@@ -154,6 +132,19 @@ ctf_str_append (char *s, const char *append)
   return s;
 }
 
+/* A realloc() that fails noisily if called with any ctf_str_num_users.  */
+void *
+ctf_realloc (ctf_file_t *fp, void *ptr, size_t size)
+{
+  if (fp->ctf_str_num_refs > 0)
+    {
+      ctf_dprintf ("%p: attempt to realloc() string table with %lu active refs\n",
+		   (void *) fp, (unsigned long) fp->ctf_str_num_refs);
+      return NULL;
+    }
+  return realloc (ptr, size);
+}
+
 /* Store the specified error code into errp if it is non-NULL, and then
    return NULL for the benefit of the caller.  */
 
-- 
2.22.0.238.g049a27acdc



More information about the Binutils mailing list