[patch] debuginfod metadata extension

Aaron Merey amerey@redhat.com
Thu May 30 00:51:55 GMT 2024


Hi Frank,

Thanks for the patch, some suggestions below.

On Thu, May 23, 2024 at 9:49 PM Frank Ch. Eigler <fche@elastic.org> wrote:
>
> Hi -
>
> The following patch brings in the other long-queued piece of work by
> Ryan and myself.  Becuase of the long divergence of the branch, it
> took some manual matching up of master branch patches, so took some
> time.  The refactoring in debuginfod-client.c is the most complex;
> the server side is comparatively simple.
>
> Please send feedback!
>
>
> commit 97f10ba356b0184ebf83c242515563f8d4a21b87 (HEAD -> master)
> gpg: Signature made Thu 23 May 2024 07:14:54 PM EDT
> gpg:                using RSA key 4DD136490411C0A42B28844F258B6EFA0F209D24
> gpg: Good signature from "Frank Ch. Eigler <fche@elastic.org>" [ultimate]
> Author: Frank Ch. Eigler <fche@redhat.com>
> Date:   Mon Oct 31 17:40:01 2022 -0400
>
>     PR29472: debuginfod: add metadata query webapi, C api, client
>
>     This patch extends the debuginfod API with a "metadata query"
>     operation.  It allows clients to request an enumeration of file names
>     known to debuginfod servers, returning a JSON response including the
>     matching buildids.  This lets clients later download debuginfo for a
>     range of versions of the same named binaries, in case they need to to
>     prospective work (like systemtap-based live-patching).  It also lets
>     server operators implement prefetch triggering operations for popular
>     but slow debuginfo slivers like kernel vdso.debug files on fedora.
>
>     Implementation requires a modern enough json-c library, namely 0.11,
>     which dates from 2014.  Without that, debuginfod client/server bits
>     will refuse to build.
>
>     % debuginfod-find metadata file /bin/ls
>     % debuginfod-find metadata glob "/usr/local/bin/c*"
>
>     Refactored several functions in debuginfod-client.c, because the
>     metadata search logic is different for multiple servers (merge all
>     responses instead of first responder wins).
>
>     Documentation and testing are included.
>
>     Signed-off-by: Ryan Goldberg <rgoldber@redhat.com>
>     Signed-off-by: Frank Ch. Eigler <fche@redhat.com>
>
> diff --git a/NEWS b/NEWS
> index 6f931bb518cc..300db133526f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -3,6 +3,8 @@ Version 0.192 (one after 0.191)
>  debuginfod: Add per-file signature verification for integrity
>              checking, using RPM IMA scheme from Fedora/RHEL.
>
> +debuginfod: New API for metadata queries: file name -> buildid.
> +
>  Version 0.191 "Bug fixes in C major"
>
>  libdw: dwarf_addrdie now supports binaries lacking a .debug_aranges
> diff --git a/config/elfutils.spec.in b/config/elfutils.spec.in
> index 460729972420..eff045755730 100644
> --- a/config/elfutils.spec.in
> +++ b/config/elfutils.spec.in
> @@ -31,6 +31,8 @@ BuildRequires: pkgconfig(libmicrohttpd) >= 0.9.33
>  BuildRequires: pkgconfig(libcurl) >= 7.29.0
>  BuildRequires: pkgconfig(sqlite3) >= 3.7.17
>  BuildRequires: pkgconfig(libarchive) >= 3.1.2
> +# For debugindod metadata query
> +BuildRequires: pkgconfig(json-c) >= 0.11
>
>  # For tests need to bunzip2 test files.
>  BuildRequires: bzip2
> @@ -42,6 +44,8 @@ BuildRequires: bsdtar
>  BuildRequires: curl
>  # For run-debuginfod-response-headers.sh test case
>  BuildRequires: socat
> +# For run-debuginfod-find-metadata.sh
> +BuildRequires: jq
>
>  # For debuginfod rpm IMA verification
>  BuildRequires: rpm-devel
> diff --git a/configure.ac b/configure.ac
> index 5adf766720e4..836d61ea6c0d 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -863,9 +863,6 @@ AS_IF([test "x$enable_libdebuginfod" != "xno"], [
>        enable_libdebuginfod=yes # presume success
>        PKG_PROG_PKG_CONFIG
>        PKG_CHECK_MODULES([libcurl],[libcurl >= 7.29.0],[],[enable_libdebuginfod=no])
> -      if test "x$enable_libdebuginfod" = "xno"; then
> -        AC_MSG_ERROR([dependencies not found, use --disable-libdebuginfod to disable or --enable-libdebuginfod=dummy to build a (bootstrap) dummy library.])
> -      fi
>      else
>        AC_MSG_NOTICE([building (bootstrap) dummy libdebuginfo library])
>      fi
> @@ -899,10 +896,8 @@ AS_IF([test "x$enable_debuginfod" != "xno"], [
>      PKG_CHECK_MODULES([libmicrohttpd],[libmicrohttpd >= 0.9.33],[],[enable_debuginfod=no])
>      PKG_CHECK_MODULES([oldlibmicrohttpd],[libmicrohttpd < 0.9.51],[old_libmicrohttpd=yes],[old_libmicrohttpd=no])
>      PKG_CHECK_MODULES([sqlite3],[sqlite3 >= 3.7.17],[],[enable_debuginfod=no])
> -    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no], AC_DEFINE([HAVE_LIBARCHIVE], [0], [Define to 0 if libarchive is not available]))
> -    if test "x$enable_debuginfod" = "xno"; then
> -      AC_MSG_ERROR([dependencies not found, use --disable-debuginfod to disable.])
> -    fi
> +    PKG_CHECK_MODULES([libarchive],[libarchive >= 3.1.2],[],[enable_debuginfod=no])
> +    PKG_CHECK_MODULES([jsonc],[json-c >= 0.11],[],[enable_debuginfod=no])
>  ])
>
>  AS_IF([test "x$enable_debuginfod" != "xno"],AC_DEFINE([ENABLE_DEBUGINFOD],[1],[Build debuginfod]))
> diff --git a/debuginfod/Makefile.am b/debuginfod/Makefile.am
> index 5e4f9669d7c1..b74e3673a97e 100644
> --- a/debuginfod/Makefile.am
> +++ b/debuginfod/Makefile.am
> @@ -33,7 +33,7 @@ include $(top_srcdir)/config/eu.am
>  AM_CPPFLAGS += -I$(srcdir) -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
>            -I$(srcdir)/../libdw -I$(srcdir)/../libdwelf \
>            $(libmicrohttpd_CFLAGS) $(libcurl_CFLAGS) $(sqlite3_CFLAGS) \
> -          $(libarchive_CFLAGS)
> +          $(libarchive_CFLAGS) $(jsonc_CFLAGS)
>
>  # Disable eu- prefixing for artifacts (binaries & man pages) in this
>  # directory, since they do not conflict with binutils tools.
> @@ -70,10 +70,10 @@ bin_PROGRAMS += debuginfod-find
>  endif
>
>  debuginfod_SOURCES = debuginfod.cxx
> -debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(rpm_LIBS) -lpthread -ldl
> +debuginfod_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(libmicrohttpd_LIBS) $(sqlite3_LIBS) $(libarchive_LIBS) $(rpm_LIBS) $(jsonc_LIBS) $(libcurl_LIBS) -lpthread -ldl
>
>  debuginfod_find_SOURCES = debuginfod-find.c
> -debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS)
> +debuginfod_find_LDADD = $(libdw) $(libelf) $(libeu) $(libdebuginfod) $(argp_LDADD) $(fts_LIBS) $(jsonc_LIBS)
>
>  if LIBDEBUGINFOD
>  noinst_LIBRARIES = libdebuginfod.a
> @@ -97,7 +97,7 @@ libdebuginfod_so_LIBS = libdebuginfod_pic.a
>  if DUMMY_LIBDEBUGINFOD
>  libdebuginfod_so_LDLIBS =
>  else
> -libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) $(libelf) $(crypto_LIBS)
> +libdebuginfod_so_LDLIBS = -lpthread $(libcurl_LIBS) $(fts_LIBS) $(libelf) $(crypto_LIBS) $(jsonc_LIBS)
>  endif
>  $(LIBDEBUGINFOD_SONAME): $(srcdir)/libdebuginfod.map $(libdebuginfod_so_LIBS)
>         $(AM_V_CCLD)$(LINK) $(dso_LDFLAGS) -o $@ \
> diff --git a/debuginfod/debuginfod-client.c b/debuginfod/debuginfod-client.c
> index f01d1f0e55fa..c75abadf7dce 100644
> --- a/debuginfod/debuginfod-client.c
> +++ b/debuginfod/debuginfod-client.c
> @@ -71,6 +71,8 @@ int debuginfod_find_source (debuginfod_client *c, const unsigned char *b,
>  int debuginfod_find_section (debuginfod_client *c, const unsigned char *b,
>                              int s, const char *scn, char **p)
>                               { return -ENOSYS; }
> +int debuginfod_find_metadata (debuginfod_client *c,
> +                              const char *k, char *v, char **p) { return -ENOSYS; }
>  void debuginfod_set_progressfn(debuginfod_client *c,
>                                debuginfod_progressfn_t fn) { }
>  void debuginfod_set_verbose_fd(debuginfod_client *c, int fd) { }
> @@ -104,6 +106,7 @@ void debuginfod_end (debuginfod_client *c) { }
>  #include <sys/utsname.h>
>  #include <curl/curl.h>
>  #include <fnmatch.h>
> +#include <json-c/json.h>
>
>  /* If fts.h is included before config.h, its indirect inclusions may not
>     give us the right LFS aliases of these functions, so map them manually.  */
> @@ -211,6 +214,11 @@ static const char *cache_miss_filename = "cache_miss_s";
>  static const char *cache_max_unused_age_filename = "max_unused_age_s";
>  static const long cache_default_max_unused_age_s = 604800; /* 1 week */
>
> +/* The metadata_retention_default_s file within the debuginfod cache
> +   specifies how long metadata query results should be cached. */
> +static const long metadata_retention_default_s = 3600; /* 1 hour */
> +static const char *metadata_retention_filename = "metadata_retention_s";
> +
>  /* Location of the cache of files downloaded from debuginfods.
>     The default parent directory is $HOME, or '/' if $HOME doesn't exist.  */
>  static const char *cache_default_name = ".debuginfod_client_cache";
> @@ -249,9 +257,14 @@ struct handle_data
>       to the cache. Used to ensure that a file is not downloaded from
>       multiple servers unnecessarily.  */
>    CURL **target_handle;
> +
>    /* Response http headers for this client handle, sent from the server */
>    char *response_data;
>    size_t response_data_size;
> +
> +  /* Response metadata values for this client handle, sent from the server */
> +  char *metadata;
> +  size_t metadata_size;
>  };
>
>
> @@ -556,7 +569,8 @@ debuginfod_clean_cache(debuginfod_client *c,
>      return -errno;
>
>    regex_t re;
> -  const char * pattern = ".*/[a-f0-9]+(/debuginfo|/executable|/source.*|)$"; /* include dirs */
> +  const char * pattern = ".*/(metadata.*|[a-f0-9]+(/debuginfo|/executable|/source.*|))$"; /* include dirs */
> +  /* NB: also matches .../section/ subdirs, so extracted section files also get cleaned. */
>    if (regcomp (&re, pattern, REG_EXTENDED | REG_NOSUB) != 0)
>      return -ENOMEM;
>
> @@ -794,18 +808,9 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
>    }
>    /* Temporary buffer for realloc */
>    char *temp = NULL;
> -  if (data->response_data == NULL)
> -    {
> -      temp = malloc(numitems);
> -      if (temp == NULL)
> -        return 0;
> -    }
> -  else
> -    {
> -      temp = realloc(data->response_data, data->response_data_size + numitems);
> -      if (temp == NULL)
> -        return 0;
> -    }
> +  temp = realloc(data->response_data, data->response_data_size + numitems);
> +  if (temp == NULL)
> +    return 0;
>
>    memcpy(temp + data->response_data_size, buffer, numitems-1);
>    data->response_data = temp;
> @@ -815,6 +820,386 @@ header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
>    return numitems;
>  }
>
> +
> +static size_t
> +metadata_callback (char * buffer, size_t size, size_t numitems, void * userdata)
> +{
> +  if (size != 1)
> +    return 0;
> +  /* Temporary buffer for realloc */
> +  char *temp = NULL;
> +  struct handle_data *data = (struct handle_data *) userdata;
> +  temp = realloc(data->metadata, data->metadata_size + numitems + 1);
> +  if (temp == NULL)
> +    return 0;
> +
> +  memcpy(temp + data->metadata_size, buffer, numitems);
> +  data->metadata = temp;
> +  data->metadata_size += numitems;
> +  data->metadata[data->metadata_size] = '\0';
> +  return numitems;
> +}
> +
> +
> +/* This function takes a copy of DEBUGINFOD_URLS, server_urls, and
> + * separates it into an array of urls to query, each with a
> + * corresponding IMA policy. The url_subdir is either 'buildid' or
> + * 'metadata', corresponding to the query type. Returns 0 on success
> + * and -Posix error on failure.
> + */
> +int
> +init_server_urls(char* url_subdir, const char* type,
> +                 char *server_urls, char ***server_url_list, ima_policy_t **url_ima_policies,
> +                 int *num_urls, int vfd)
> +{
> +  /* Initialize the memory to zero */
> +  char *strtok_saveptr;
> +  ima_policy_t verification_mode = ignore; // The default mode
> +  char *server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
> +  /* Count number of URLs.  */
> +  int n = 0;
> +  assert (0 == strcmp(url_subdir, "buildid") || 0 == strcmp(url_subdir, "metadata"));

I prefer if we avoid asserts in library code.  ATM it doesn't look like
this assert can fail but we should return an error code to prevent a
future change from triggering the assert.

> +
> +  while (server_url != NULL)
> +    {
> +      int r;
> +      char *tmp_url;
> +      if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
> +        r = asprintf(&tmp_url, "%s%s", server_url, url_subdir);
> +      else
> +        r = asprintf(&tmp_url, "%s/%s", server_url, url_subdir);
> +
> +      if (r == -1)
> +        return -ENOMEM;
> +
> +      // When we encounted a (well-formed) token off the form ima:foo, we update the policy

Should be "encountered".

> +      // under which results from that server will be ima verified
> +      if (startswith(server_url, "ima:"))
> +        {
> +#ifdef ENABLE_IMA_VERIFICATION
> +          ima_policy_t m = ima_policy_str2enum(server_url + strlen("ima:"));
> +          if(m != undefined)
> +            verification_mode = m;
> +          else if (vfd >= 0)
> +            dprintf(vfd, "IMA mode not recognized, skipping %s\n", server_url);
> +#else
> +          if (vfd >= 0)
> +            dprintf(vfd, "IMA signature verification is not enabled, treating %s as ima:ignore\n", server_url);
> +#endif
> +          goto continue_next_url;
> +        }

This goto causes tmp_url to leak.  This can be fixed by placing
'if startswith(server_url, "ima:")) ...' before the calls to asprintf.

> +
> +      if (verification_mode==enforcing &&
> +          0==strcmp(url_subdir, "buildid") &&
> +          0==strcmp(type,"section")) // section queries are unsecurable
> +        {
> +          if (vfd >= 0)
> +            dprintf(vfd, "skipping server %s section query in IMA enforcing mode\n", server_url);
> +          goto continue_next_url;
> +        }
> +
> +      /* PR 27983: If the url is duplicate, skip it */
> +      int url_index;
> +      for (url_index = 0; url_index < n; ++url_index)
> +        {
> +          if(strcmp(tmp_url, (*server_url_list)[url_index]) == 0)
> +            {
> +              url_index = -1;
> +              break;
> +            }
> +        }
> +      if (url_index == -1)
> +        {
> +          if (vfd >= 0)
> +            dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
> +          free(tmp_url);
> +        }
> +      else
> +        {
> +          /* Have unique URL, save it, along with its IMA verification tag. */
> +          n ++;
> +          if (NULL == (*server_url_list = reallocarray(*server_url_list, n, sizeof(char*)))
> +              || NULL == (*url_ima_policies = reallocarray(*url_ima_policies, n, sizeof(ima_policy_t))))
> +            {
> +              free (tmp_url);
> +              return -ENOMEM;
> +            }
> +          (*server_url_list)[n-1] = tmp_url;
> +          if(NULL != url_ima_policies) (*url_ima_policies)[n-1] = verification_mode;
> +        }
> +
> +    continue_next_url:
> +      server_url = strtok_r(NULL, url_delim, &strtok_saveptr);
> +    }
> +  *num_urls = n;
> +  return 0;
> +}
> +
> +/* Some boilerplate for checking curl_easy_setopt.  */
> +#define curl_easy_setopt_ck(H,O,P) do {                        \
> +      CURLcode curl_res = curl_easy_setopt (H,O,P);    \
> +      if (curl_res != CURLE_OK)                                \
> +           {                                           \
> +             if (vfd >= 0)                             \
> +               dprintf (vfd,                           \
> +                         "Bad curl_easy_setopt: %s\n", \
> +                         curl_easy_strerror(curl_res));        \
> +             return -EINVAL;                           \
> +           }                                           \
> +      } while (0)
> +
> +
> +/*
> + * This function initializes a CURL handle. It takes optional callbacks for the write
> + * function and the header function, which if defined will use userdata of type struct handle_data*.
> + * Specifically the data[i] within an array of struct handle_data's.
> + * Returns 0 on success and -Posix error on failure.
> + */
> +int
> +init_handle(debuginfod_client *client,
> +  size_t (*w_callback)(char *buffer, size_t size, size_t nitems, void *userdata),
> +  size_t (*h_callback)(char *buffer, size_t size, size_t nitems, void *userdata),
> +  struct handle_data *data, int i, long timeout,
> +  int vfd)
> +{
> +  data->handle = curl_easy_init();
> +  if (data->handle == NULL)
> +    return -ENETUNREACH;
> +
> +  if (vfd >= 0)
> +    dprintf (vfd, "url %d %s\n", i, data->url);
> +
> +  /* Only allow http:// + https:// + file:// so we aren't being
> +    redirected to some unsupported protocol.
> +    libcurl will fail if we request a single protocol that is not
> +    available. https missing is the most likely issue  */
> +#if CURL_AT_LEAST_VERSION(7, 85, 0)
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS_STR,
> +                      curl_has_https ? "https,http,file" : "http,file");
> +#else
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PROTOCOLS,
> +                      ((curl_has_https ? CURLPROTO_HTTPS : 0) | CURLPROTO_HTTP | CURLPROTO_FILE));
> +#endif
> +  curl_easy_setopt_ck(data->handle, CURLOPT_URL, data->url);
> +  if (vfd >= 0)
> +    curl_easy_setopt_ck(data->handle, CURLOPT_ERRORBUFFER,
> +      data->errbuf);
> +  if (w_callback)
> +    {
> +      curl_easy_setopt_ck(data->handle,
> +                          CURLOPT_WRITEFUNCTION, w_callback);
> +      curl_easy_setopt_ck(data->handle, CURLOPT_WRITEDATA, data);
> +    }
> +  if (timeout > 0)
> +    {
> +      /* Make sure there is at least some progress,
> +         try to get at least 100K per timeout seconds.  */
> +      curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_TIME,
> +                           timeout);
> +      curl_easy_setopt_ck (data->handle, CURLOPT_LOW_SPEED_LIMIT,
> +                           100 * 1024L);
> +    }
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FILETIME, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FOLLOWLOCATION, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_FAILONERROR, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_NOSIGNAL, (long) 1);
> +  if (h_callback)
> +    {
> +      curl_easy_setopt_ck(data->handle,
> +                          CURLOPT_HEADERFUNCTION, h_callback);
> +      curl_easy_setopt_ck(data->handle, CURLOPT_HEADERDATA, data);
> +    }
> +  #if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
> +  curl_easy_setopt_ck(data->handle, CURLOPT_PATH_AS_IS, (long) 1);
> +  #else
> +  /* On old curl; no big deal, canonicalization here is almost the
> +      same, except perhaps for ? # type decorations at the tail. */
> +  #endif
> +  curl_easy_setopt_ck(data->handle, CURLOPT_AUTOREFERER, (long) 1);
> +  curl_easy_setopt_ck(data->handle, CURLOPT_ACCEPT_ENCODING, "");
> +  curl_easy_setopt_ck(data->handle, CURLOPT_HTTPHEADER, client->headers);
> +
> +  return 0;
> +}
> +
> +
> +/*
> + * This function busy-waits on one or more curl queries to complete. This can
> + * be controled via only_one, which, if true, will find the first winner and exit

Should be "controlled".

> + * once found. If positive maxtime and maxsize dictate the maximum allowed wait times
> + * and download sizes respectively. Returns 0 on success and -Posix error on failure.
> + */
> +int
> +perform_queries(CURLM *curlm, CURL **target_handle, struct handle_data *data, debuginfod_client *c,
> +                int num_urls, long maxtime, long maxsize, bool only_one, int vfd, int *committed_to)
> +{
> +  int still_running = -1;
> +  long loops = 0;
> +  *committed_to = -1;
> +  bool verbose_reported = false;
> +  struct timespec start_time, cur_time;
> +  if (c->winning_headers != NULL)
> +    {
> +      free (c->winning_headers);
> +      c->winning_headers = NULL;
> +    }
> +  if (maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
> +    return errno;

This should return -errno, otherwise the return value will be interpreted
by the caller of debuginfod_find_* as a valid fd.

> +  long delta = 0;
> +  do
> +    {
> +      /* Check to see how long querying is taking. */
> +      if (maxtime > 0)
> +        {
> +          if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1)
> +            return errno;

This should be -errno too.

> +          delta = cur_time.tv_sec - start_time.tv_sec;
> +          if ( delta >  maxtime)
> +            {
> +              dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta );
> +              return -ETIME;
> +            }
> +        }
> +      /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT.  */
> +      curl_multi_wait(curlm, NULL, 0, 1000, NULL);
> +      CURLMcode curlm_res = curl_multi_perform(curlm, &still_running);
> +
> +      if (only_one)
> +        {
> +          /* If the target file has been found, abort the other queries.  */
> +          if (target_handle && *target_handle != NULL)
> +            {
> +              for (int i = 0; i < num_urls; i++)
> +                if (data[i].handle != *target_handle)
> +                  curl_multi_remove_handle(curlm, data[i].handle);
> +                else
> +                  {
> +                    *committed_to = i;
> +                    if (c->winning_headers == NULL)
> +                      {
> +                        c->winning_headers = data[*committed_to].response_data;
> +                        if (vfd >= 0 && c->winning_headers != NULL)
> +                          dprintf(vfd, "\n%s", c->winning_headers);
> +                        data[*committed_to].response_data = NULL;
> +                        data[*committed_to].response_data_size = 0;
> +                      }
> +                  }
> +            }
> +
> +          if (vfd >= 0 && !verbose_reported && *committed_to >= 0)
> +            {
> +              bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
> +              dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
> +                       *committed_to);
> +              if (pnl)
> +                c->default_progressfn_printed_p = 0;
> +              verbose_reported = true;
> +            }
> +        }
> +
> +      if (curlm_res != CURLM_OK)
> +        {
> +          switch (curlm_res)
> +            {
> +            case CURLM_CALL_MULTI_PERFORM: continue;
> +            case CURLM_OUT_OF_MEMORY: return -ENOMEM;
> +            default: return -ENETUNREACH;
> +            }
> +        }
> +
> +      long dl_size = -1;
> +      if (only_one && target_handle)
> +        { // Only bother with progress functions if we're retrieving exactly 1 file

Why do we avoid progress functions for metadata downloads?  Without a
progress function the user isn't able to cancel metadata downloads.

> +          if (*target_handle && (c->progressfn || maxsize > 0))
> +            {
> +              /* Get size of file being downloaded. NB: If going through
> +                 deflate-compressing proxies, this number is likely to be
> +                 unavailable, so -1 may show. */
> +              CURLcode curl_res;
> +#if CURL_AT_LEAST_VERSION(7, 55, 0)
> +              curl_off_t cl;
> +              curl_res = curl_easy_getinfo(*target_handle,
> +                                           CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
> +                                           &cl);
> +              if (curl_res == CURLE_OK && cl >= 0)
> +                dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl);
> +#else
> +              double cl;
> +              curl_res = curl_easy_getinfo(*target_handle,
> +                                           CURLINFO_CONTENT_LENGTH_DOWNLOAD,
> +                                           &cl);
> +              if (curl_res == CURLE_OK && cl >= 0)
> +                dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl);
> +#endif
> +              /* If Content-Length is -1, try to get the size from
> +                 X-Debuginfod-Size */
> +              if (dl_size == -1 && c->winning_headers != NULL)
> +                {
> +                  long xdl;
> +                  char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size");
> +                  size_t off = strlen("x-debuginfod-size:");
> +
> +                  if (hdr != NULL && sscanf(hdr + off, "%ld", &xdl) == 1)
> +                    dl_size = xdl;
> +                }
> +            }
> +
> +          if (c->progressfn) /* inform/check progress callback */
> +            {
> +              loops ++;
> +              long pa = loops; /* default param for progress callback */
> +              if (*target_handle) /* we've committed to a server; report its download progress */
> +                {
> +              /* PR30809: Check actual size of cached file.  This same
> +                 fd is shared by all the multi-curl handles (but only
> +                 one will end up writing to it).  Another way could be
> +                 to tabulate totals in debuginfod_write_callback(). */
> +              struct stat cached;
> +              int statrc = fstat(data[*committed_to].fd, &cached);
> +              if (statrc == 0)
> +                pa = (long) cached.st_size;
> +              else
> +                {
> +                  /* Otherwise, query libcurl for its tabulated total.
> +                     However, that counts http body length, not
> +                     decoded/decompressed content length, so does not
> +                     measure quite the same thing as dl. */
> +                  CURLcode curl_res;
> +#if CURL_AT_LEAST_VERSION(7, 55, 0)
> +                  curl_off_t dl;
> +                  curl_res = curl_easy_getinfo(target_handle,
> +                                               CURLINFO_SIZE_DOWNLOAD_T,
> +                                               &dl);
> +                  if (curl_res == 0 && dl >= 0)
> +                    pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
> +#else
> +                  double dl;
> +                  curl_res = curl_easy_getinfo(target_handle,
> +                                               CURLINFO_SIZE_DOWNLOAD,
> +                                               &dl);
> +                  if (curl_res == 0)
> +                    pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
> +#endif
> +                }
> +
> +              if ((*c->progressfn) (c, pa, dl_size == -1 ? 0 : dl_size))
> +                break;
> +                }
> +            }
> +        }
> +      /* Check to see if we are downloading something which exceeds maxsize, if set.*/
> +      if (target_handle && *target_handle && dl_size > maxsize && maxsize > 0)
> +        {
> +          if (vfd >=0)
> +            dprintf(vfd, "Content-Length too large.\n");
> +          return -EFBIG;
> +        }
> +    } while (still_running);
> +
> +  return 0;
> +}
> +
> +
>  /* Copy SRC to DEST, s,/,#,g */
>
>  static void
> @@ -1258,56 +1643,134 @@ debuginfod_validate_imasig (debuginfod_client *c, int fd)
>
>
>
> -/* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
> -   with the specified build-id and type (debuginfo, executable, source or
> -   section).  If type is source, then type_arg should be a filename.  If
> -   type is section, then type_arg should be the name of an ELF/DWARF
> -   section.  Otherwise type_arg may be NULL.  Return a file descriptor
> -   for the target if successful, otherwise return an error code.
> -*/
> -static int
> -debuginfod_query_server (debuginfod_client *c,
> -                        const unsigned char *build_id,
> -                         int build_id_len,
> -                         const char *type,
> -                         const char *type_arg,
> -                         char **path)
> -{
> -  char *server_urls;
> -  char *urls_envvar;
> -  const char *section = NULL;
> -  const char *filename = NULL;
> -  char *cache_path = NULL;
> -  char *maxage_path = NULL;
> -  char *interval_path = NULL;
> -  char *cache_miss_path = NULL;
> -  char *target_cache_dir = NULL;
> -  char *target_cache_path = NULL;
> -  char *target_cache_tmppath = NULL;
> -  char suffix[PATH_MAX + 1]; /* +1 for zero terminator.  */
> -  char build_id_bytes[MAX_BUILD_ID_BYTES * 2 + 1];
> -  int vfd = c->verbose_fd;
> -  int rc;
>
> -  c->progressfn_cancel = false;
> +/* Helper function to create client cache directory.
> +   $XDG_CACHE_HOME takes priority over $HOME/.cache.
> +   $DEBUGINFOD_CACHE_PATH takes priority over $HOME/.cache and $XDG_CACHE_HOME.
>
> -  if (strcmp (type, "source") == 0)
> -    filename = type_arg;
> -  else if (strcmp (type, "section") == 0)
> +   Return resulting path name or NULL on error.  Caller must free resulting string.
> + */
> +static char *
> +make_cache_path(void)
> +{
> +  char* cache_path = NULL;
> +  int rc = 0;
> +  /* Determine location of the cache. The path specified by the debuginfod
> +     cache environment variable takes priority.  */
> +  char *cache_var = getenv(DEBUGINFOD_CACHE_PATH_ENV_VAR);
> +  if (cache_var != NULL && strlen (cache_var) > 0)
> +    xalloc_str (cache_path, "%s", cache_var);
> +  else
>      {
> -      section = type_arg;
> -      if (section == NULL)
> -       return -EINVAL;
> -    }
> +      /* If a cache already exists in $HOME ('/' if $HOME isn't set), then use
> +         that. Otherwise use the XDG cache directory naming format.  */
> +      xalloc_str (cache_path, "%s/%s", getenv ("HOME") ?: "/", cache_default_name);
>
> -  if (vfd >= 0)
> -    {
> -      dprintf (vfd, "debuginfod_find_%s ", type);
> -      if (build_id_len == 0) /* expect clean hexadecimal */
> -       dprintf (vfd, "%s", (const char *) build_id);
> -      else
> -       for (int i = 0; i < build_id_len; i++)
> -         dprintf (vfd, "%02x", build_id[i]);
> +      struct stat st;
> +      if (stat (cache_path, &st) < 0)
> +        {
> +          char cachedir[PATH_MAX];
> +          char *xdg = getenv ("XDG_CACHE_HOME");
> +
> +          if (xdg != NULL && strlen (xdg) > 0)
> +            snprintf (cachedir, PATH_MAX, "%s", xdg);
> +          else
> +            snprintf (cachedir, PATH_MAX, "%s/.cache", getenv ("HOME") ?: "/");
> +
> +          /* Create XDG cache directory if it doesn't exist.  */
> +          if (stat (cachedir, &st) == 0)
> +            {
> +              if (! S_ISDIR (st.st_mode))
> +                {
> +                  rc = -EEXIST;
> +                  goto out1;
> +                }
> +            }
> +          else
> +            {
> +              rc = mkdir (cachedir, 0700);
> +
> +              /* Also check for EEXIST and S_ISDIR in case another client just
> +                 happened to create the cache.  */
> +              if (rc < 0
> +                  && (errno != EEXIST
> +                      || stat (cachedir, &st) != 0
> +                      || ! S_ISDIR (st.st_mode)))
> +                {
> +                  rc = -errno;
> +                  goto out1;
> +                }
> +            }
> +
> +          free (cache_path);
> +          xalloc_str (cache_path, "%s/%s", cachedir, cache_xdg_name);
> +        }
> +    }
> +
> +  goto out;
> +
> + out1:
> +  (void) rc;
> +  free (cache_path);
> +  cache_path = NULL;
> +
> + out:
> +  if (cache_path != NULL)
> +    (void) mkdir (cache_path, 0700); // failures with this mkdir would be caught later too
> +  return cache_path;
> +}

rc should be returned to the caller.  Otherwise if make_cache_path fails
then debuginfod_find_metadata will return a default rc of 0 to the caller,
who will interpret it as an fd of the query results.

> +
> +
> +/* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
> +   with the specified build-id and type (debuginfo, executable, source or
> +   section).  If type is source, then type_arg should be a filename.  If
> +   type is section, then type_arg should be the name of an ELF/DWARF
> +   section.  Otherwise type_arg may be NULL.  Return a file descriptor
> +   for the target if successful, otherwise return an error code.
> +*/
> +static int
> +debuginfod_query_server_by_buildid (debuginfod_client *c,
> +                        const unsigned char *build_id,
> +                         int build_id_len,
> +                         const char *type,
> +                         const char *type_arg,
> +                         char **path)
> +{
> +  char *server_urls;
> +  char *urls_envvar;
> +  const char *section = NULL;
> +  const char *filename = NULL;
> +  char *cache_path = NULL;
> +  char *maxage_path = NULL;
> +  char *interval_path = NULL;
> +  char *cache_miss_path = NULL;
> +  char *target_cache_dir = NULL;
> +  char *target_cache_path = NULL;
> +  char *target_cache_tmppath = NULL;
> +  char suffix[PATH_MAX + 1]; /* +1 for zero terminator.  */
> +  char build_id_bytes[MAX_BUILD_ID_BYTES * 2 + 1];
> +  int vfd = c->verbose_fd;
> +  int rc, r;
> +
> +  c->progressfn_cancel = false;
> +
> +  if (strcmp (type, "source") == 0)
> +    filename = type_arg;
> +  else if (strcmp (type, "section") == 0)
> +    {
> +      section = type_arg;
> +      if (section == NULL)
> +       return -EINVAL;
> +    }
> +
> +  if (vfd >= 0)
> +    {
> +      dprintf (vfd, "debuginfod_find_%s ", type);
> +      if (build_id_len == 0) /* expect clean hexadecimal */
> +       dprintf (vfd, "%s", (const char *) build_id);
> +      else
> +       for (int i = 0; i < build_id_len; i++)
> +         dprintf (vfd, "%02x", build_id[i]);
>        if (filename != NULL)
>         dprintf (vfd, " %s\n", filename);
>        dprintf (vfd, "\n");
> @@ -1412,70 +1875,22 @@ debuginfod_query_server (debuginfod_client *c,
>      dprintf (vfd, "suffix %s\n", suffix);
>
>    /* set paths needed to perform the query
> -
> -     example format
> +     example format:
>       cache_path:        $HOME/.cache
>       target_cache_dir:  $HOME/.cache/0123abcd
>       target_cache_path: $HOME/.cache/0123abcd/debuginfo
>       target_cache_path: $HOME/.cache/0123abcd/source#PATH#TO#SOURCE ?
> -
> -     $XDG_CACHE_HOME takes priority over $HOME/.cache.
> -     $DEBUGINFOD_CACHE_PATH takes priority over $HOME/.cache and $XDG_CACHE_HOME.
>    */
>
> -  /* Determine location of the cache. The path specified by the debuginfod
> -     cache environment variable takes priority.  */
> -  char *cache_var = getenv(DEBUGINFOD_CACHE_PATH_ENV_VAR);
> -  if (cache_var != NULL && strlen (cache_var) > 0)
> -    xalloc_str (cache_path, "%s", cache_var);
> -  else
> +  cache_path = make_cache_path();
> +  if (!cache_path)
>      {
> -      /* If a cache already exists in $HOME ('/' if $HOME isn't set), then use
> -         that. Otherwise use the XDG cache directory naming format.  */
> -      xalloc_str (cache_path, "%s/%s", getenv ("HOME") ?: "/", cache_default_name);
> -
> -      struct stat st;
> -      if (stat (cache_path, &st) < 0)
> -        {
> -          char cachedir[PATH_MAX];
> -          char *xdg = getenv ("XDG_CACHE_HOME");
> -
> -          if (xdg != NULL && strlen (xdg) > 0)
> -            snprintf (cachedir, PATH_MAX, "%s", xdg);
> -          else
> -            snprintf (cachedir, PATH_MAX, "%s/.cache", getenv ("HOME") ?: "/");
> -
> -          /* Create XDG cache directory if it doesn't exist.  */
> -          if (stat (cachedir, &st) == 0)
> -            {
> -              if (! S_ISDIR (st.st_mode))
> -                {
> -                  rc = -EEXIST;
> -                  goto out;
> -                }
> -            }
> -          else
> -            {
> -              rc = mkdir (cachedir, 0700);
> -
> -              /* Also check for EEXIST and S_ISDIR in case another client just
> -                 happened to create the cache.  */
> -              if (rc < 0
> -                  && (errno != EEXIST
> -                      || stat (cachedir, &st) != 0
> -                      || ! S_ISDIR (st.st_mode)))
> -                {
> -                  rc = -errno;
> -                  goto out;
> -                }
> -            }
> -
> -          free (cache_path);
> -          xalloc_str (cache_path, "%s/%s", cachedir, cache_xdg_name);
> -        }
> +      rc = -ENOMEM;
> +      goto out;
>      }
> -
>    xalloc_str (target_cache_dir, "%s/%s", cache_path, build_id_bytes);
> +  (void) mkdir (target_cache_dir, 0700); // failures with this mkdir would be caught later too
> +
>    if (section != NULL)
>      xalloc_str (target_cache_path, "%s/%s-%s", target_cache_dir, type, suffix);
>    else
> @@ -1594,102 +2009,32 @@ debuginfod_query_server (debuginfod_client *c,
>    /* thereafter, goto out0 on error*/
>
>    /* Because of a race with cache cleanup / rmdir, try to mkdir/mkstemp up to twice. */
> -  for(int i=0; i<2; i++) {
> -    /* (re)create target directory in cache */
> -    (void) mkdir(target_cache_dir, 0700); /* files will be 0400 later */
> -
> -    /* NB: write to a temporary file first, to avoid race condition of
> -       multiple clients checking the cache, while a partially-written or empty
> -       file is in there, being written from libcurl. */
> -    fd = mkstemp (target_cache_tmppath);
> -    if (fd >= 0) break;
> -  }
> +  for(int i=0; i<2; i++)
> +    {
> +      /* (re)create target directory in cache */
> +      (void) mkdir(target_cache_dir, 0700); /* files will be 0400 later */
> +
> +      /* NB: write to a temporary file first, to avoid race condition of
> +         multiple clients checking the cache, while a partially-written or empty
> +         file is in there, being written from libcurl. */
> +      fd = mkstemp (target_cache_tmppath);
> +      if (fd >= 0) break;
> +    }
>    if (fd < 0) /* Still failed after two iterations. */
>      {
>        rc = -errno;
>        goto out0;
>      }
>
> -  /* Initialize the memory to zero */
> -  char *strtok_saveptr;
>    char **server_url_list = NULL;
>    ima_policy_t* url_ima_policies = NULL;
> -  char* server_url;
> -  /* Count number of URLs.  */
> -  int num_urls = 0;
> -
> -  ima_policy_t verification_mode = ignore; // The default mode
> -  for(server_url = strtok_r(server_urls, url_delim, &strtok_saveptr);
> -      server_url != NULL; server_url = strtok_r(NULL, url_delim, &strtok_saveptr))
> +  char *server_url;
> +  int num_urls;
> +  r = init_server_urls("buildid", type, server_urls, &server_url_list, &url_ima_policies, &num_urls, vfd);
> +  if (0 != r)
>      {
> -      // When we encounted a (well-formed) token off the form ima:foo, we update the policy
> -      // under which results from that server will be ima verified
> -      if(startswith(server_url, "ima:"))
> -      {
> -#ifdef ENABLE_IMA_VERIFICATION
> -        ima_policy_t m = ima_policy_str2enum(server_url + strlen("ima:"));
> -        if(m != undefined)
> -          verification_mode = m;
> -        else if (vfd >= 0)
> -          dprintf(vfd, "IMA mode not recognized, skipping %s\n", server_url);
> -#else
> -        if (vfd >= 0)
> -            dprintf(vfd, "IMA signature verification is not enabled, skipping %s\n", server_url);
> -#endif
> -        continue; // Not a url, just a mode change so keep going
> -      }
> -
> -      if (verification_mode==enforcing && 0==strcmp(type,"section"))
> -        {
> -          if (vfd >= 0)
> -            dprintf(vfd, "skipping server %s section query in IMA enforcing mode\n", server_url);
> -          continue;
> -        }
> -
> -      /* PR 27983: If the url is already set to be used use, skip it */
> -      char *slashbuildid;
> -      if (strlen(server_url) > 1 && server_url[strlen(server_url)-1] == '/')
> -        slashbuildid = "buildid";
> -      else
> -        slashbuildid = "/buildid";
> -
> -      char *tmp_url;
> -      if (asprintf(&tmp_url, "%s%s", server_url, slashbuildid) == -1)
> -        {
> -          rc = -ENOMEM;
> -          goto out1;
> -        }
> -      int url_index;
> -      for (url_index = 0; url_index < num_urls; ++url_index)
> -        {
> -          if(strcmp(tmp_url, server_url_list[url_index]) == 0)
> -            {
> -              url_index = -1;
> -              break;
> -            }
> -        }
> -      if (url_index == -1)
> -        {
> -          if (vfd >= 0)
> -            dprintf(vfd, "duplicate url: %s, skipping\n", tmp_url);
> -          free(tmp_url);
> -        }
> -      else
> -        {
> -          num_urls++;
> -          if (NULL == (server_url_list  = reallocarray(server_url_list, num_urls, sizeof(char*)))
> -#ifdef ENABLE_IMA_VERIFICATION
> -          || NULL == (url_ima_policies = reallocarray(url_ima_policies, num_urls, sizeof(ima_policy_t)))
> -#endif
> -            )
> -            {
> -              free (tmp_url);
> -              rc = -ENOMEM;
> -              goto out1;
> -            }
> -          server_url_list[num_urls-1] = tmp_url;
> -          if(NULL != url_ima_policies) url_ima_policies[num_urls-1] = verification_mode;
> -        }
> +      rc = r;
> +      goto out1;
>      }
>
>    /* No URLs survived parsing / filtering?  Abort abort abort. */
> @@ -1773,262 +2118,43 @@ debuginfod_query_server (debuginfod_client *c,
>
>        data[i].fd = fd;
>        data[i].target_handle = &target_handle;
> -      data[i].handle = curl_easy_init();
> -      if (data[i].handle == NULL)
> -        {
> -          if (filename) curl_free (escaped_string);
> -          rc = -ENETUNREACH;
> -          goto out2;
> -        }
>        data[i].client = c;
>
> -      if (filename) /* must start with / */
> -        {
> -          /* PR28034 escape characters in completed url to %hh format. */
> -          snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
> -                   build_id_bytes, type, escaped_string);
> -        }
> -      else if (section)
> -       snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
> -                build_id_bytes, type, section);
> -      else
> -        snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
> -      if (vfd >= 0)
> -       dprintf (vfd, "url %d %s\n", i, data[i].url);
> -
> -      /* Some boilerplate for checking curl_easy_setopt.  */
> -#define curl_easy_setopt_ck(H,O,P) do {                        \
> -      CURLcode curl_res = curl_easy_setopt (H,O,P);    \
> -      if (curl_res != CURLE_OK)                                \
> -       {                                               \
> -         if (vfd >= 0)                                 \
> -           dprintf (vfd,                               \
> -                    "Bad curl_easy_setopt: %s\n",      \
> -                    curl_easy_strerror(curl_res));     \
> -         rc = -EINVAL;                                 \
> -         goto out2;                                    \
> -       }                                               \
> -      } while (0)
> -
> -      /* Only allow http:// + https:// + file:// so we aren't being
> -        redirected to some unsupported protocol.
> -         libcurl will fail if we request a single protocol that is not
> -         available. https missing is the most likely issue  */
> -#if CURL_AT_LEAST_VERSION(7, 85, 0)
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_PROTOCOLS_STR,
> -                         curl_has_https ? "https,http,file" : "http,file");
> -#else
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_PROTOCOLS,
> -                         ((curl_has_https ? CURLPROTO_HTTPS : 0) | CURLPROTO_HTTP | CURLPROTO_FILE));
> -#endif
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_URL, data[i].url);
> -      if (vfd >= 0)
> -       curl_easy_setopt_ck(data[i].handle, CURLOPT_ERRORBUFFER,
> -                           data[i].errbuf);
> -      curl_easy_setopt_ck(data[i].handle,
> -                         CURLOPT_WRITEFUNCTION,
> -                         debuginfod_write_callback);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_WRITEDATA, (void*)&data[i]);
> -      if (timeout > 0)
> -       {
> -         /* Make sure there is at least some progress,
> -            try to get at least 100K per timeout seconds.  */
> -         curl_easy_setopt_ck (data[i].handle, CURLOPT_LOW_SPEED_TIME,
> -                              timeout);
> -         curl_easy_setopt_ck (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
> -                              100 * 1024L);
> -       }
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_FILETIME, (long) 1);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_HEADERFUNCTION,
> -                         header_callback);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_HEADERDATA,
> -                         (void *) &(data[i]));
> -#if LIBCURL_VERSION_NUM >= 0x072a00 /* 7.42.0 */
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_PATH_AS_IS, (long) 1);
> -#else
> -      /* On old curl; no big deal, canonicalization here is almost the
> -         same, except perhaps for ? # type decorations at the tail. */
> -#endif
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_AUTOREFERER, (long) 1);
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_ACCEPT_ENCODING, "");
> -      curl_easy_setopt_ck(data[i].handle, CURLOPT_HTTPHEADER, c->headers);
> -
> -      curl_multi_add_handle(curlm, data[i].handle);
> -    }
> -
> -  if (filename) curl_free(escaped_string);
> -  /* Query servers in parallel.  */
> -  if (vfd >= 0)
> -    dprintf (vfd, "query %d urls in parallel\n", num_urls);
> -  int still_running;
> -  long loops = 0;
> -  int committed_to = -1;
> -  bool verbose_reported = false;
> -  struct timespec start_time, cur_time;
> -
> -  free (c->winning_headers);
> -  c->winning_headers = NULL;
> -  if ( maxtime > 0 && clock_gettime(CLOCK_MONOTONIC_RAW, &start_time) == -1)
> -    {
> -      rc = -errno;
> -      goto out2;
> -    }
> -  long delta = 0;
> -  do
> -    {
> -      /* Check to see how long querying is taking. */
> -      if (maxtime > 0)
> -        {
> -          if (clock_gettime(CLOCK_MONOTONIC_RAW, &cur_time) == -1)
> -            {
> -              rc = -errno;
> -              goto out2;
> -            }
> -          delta = cur_time.tv_sec - start_time.tv_sec;
> -          if ( delta >  maxtime)
> -            {
> -              dprintf(vfd, "Timeout with max time=%lds and transfer time=%lds\n", maxtime, delta );
> -              rc = -ETIME;
> -              goto out2;
> -            }
> -        }
> -      /* Wait 1 second, the minimum DEBUGINFOD_TIMEOUT.  */
> -      curl_multi_wait(curlm, NULL, 0, 1000, NULL);
> -      CURLMcode curlm_res = curl_multi_perform(curlm, &still_running);
> -
> -      /* If the target file has been found, abort the other queries.  */
> -      if (target_handle != NULL)
> -       {
> -         for (int i = 0; i < num_urls; i++)
> -           if (data[i].handle != target_handle)
> -             curl_multi_remove_handle(curlm, data[i].handle);
> -           else
> -              {
> -               committed_to = i;
> -                if (c->winning_headers == NULL)
> -                  {
> -                    c->winning_headers = data[committed_to].response_data;
> -                    data[committed_to].response_data = NULL;
> -                    data[committed_to].response_data_size = 0;
> -                  }
> -
> -              }
> -       }
> -
> -      if (vfd >= 0 && !verbose_reported && committed_to >= 0)
> -       {
> -         bool pnl = (c->default_progressfn_printed_p && vfd == STDERR_FILENO);
> -         dprintf (vfd, "%scommitted to url %d\n", pnl ? "\n" : "",
> -                  committed_to);
> -         if (pnl)
> -           c->default_progressfn_printed_p = 0;
> -         verbose_reported = true;
> -       }
> -
> -      if (curlm_res != CURLM_OK)
> -        {
> -          switch (curlm_res)
> -            {
> -            case CURLM_CALL_MULTI_PERFORM: continue;
> -            case CURLM_OUT_OF_MEMORY: rc = -ENOMEM; break;
> -            default: rc = -ENETUNREACH; break;
> -            }
> -          goto out2;
> -        }
> -
> -      long dl_size = -1;
> -      if (target_handle && (c->progressfn || maxsize > 0))
> -        {
> -          /* Get size of file being downloaded. NB: If going through
> -             deflate-compressing proxies, this number is likely to be
> -             unavailable, so -1 may show. */
> -          CURLcode curl_res;
> -#if CURL_AT_LEAST_VERSION(7, 55, 0)
> -          curl_off_t cl;
> -          curl_res = curl_easy_getinfo(target_handle,
> -                                       CURLINFO_CONTENT_LENGTH_DOWNLOAD_T,
> -                                       &cl);
> -          if (curl_res == CURLE_OK && cl >= 0)
> -            dl_size = (cl > LONG_MAX ? LONG_MAX : (long)cl);
> -#else
> -          double cl;
> -          curl_res = curl_easy_getinfo(target_handle,
> -                                       CURLINFO_CONTENT_LENGTH_DOWNLOAD,
> -                                       &cl);
> -          if (curl_res == CURLE_OK && cl >= 0)
> -            dl_size = (cl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)cl);
> -#endif
> -          /* If Content-Length is -1, try to get the size from
> -             X-Debuginfod-Size */
> -          if (dl_size == -1 && c->winning_headers != NULL)
> -            {
> -              long xdl;
> -              char *hdr = strcasestr(c->winning_headers, "x-debuginfod-size");
> -              size_t off = strlen("x-debuginfod-size:");
> -
> -              if (hdr != NULL && sscanf(hdr + off, "%ld", &xdl) == 1)
> -                dl_size = xdl;
> -            }
> -        }
> -
> -      if (c->progressfn) /* inform/check progress callback */
> -        {
> -          loops ++;
> -          long pa = loops; /* default param for progress callback */
> -          if (target_handle) /* we've committed to a server; report its download progress */
> -            {
> -              /* PR30809: Check actual size of cached file.  This same
> -                 fd is shared by all the multi-curl handles (but only
> -                 one will end up writing to it).  Another way could be
> -                 to tabulate totals in debuginfod_write_callback(). */
> -              struct stat cached;
> -              int statrc = fstat(fd, &cached);
> -              if (statrc == 0)
> -                pa = (long) cached.st_size;
> -              else
> -                {
> -                  /* Otherwise, query libcurl for its tabulated total.
> -                     However, that counts http body length, not
> -                     decoded/decompressed content length, so does not
> -                     measure quite the same thing as dl. */
> -                  CURLcode curl_res;
> -#if CURL_AT_LEAST_VERSION(7, 55, 0)
> -                  curl_off_t dl;
> -                  curl_res = curl_easy_getinfo(target_handle,
> -                                               CURLINFO_SIZE_DOWNLOAD_T,
> -                                               &dl);
> -                  if (curl_res == 0 && dl >= 0)
> -                    pa = (dl > LONG_MAX ? LONG_MAX : (long)dl);
> -#else
> -                  double dl;
> -                  curl_res = curl_easy_getinfo(target_handle,
> -                                               CURLINFO_SIZE_DOWNLOAD,
> -                                               &dl);
> -                  if (curl_res == 0)
> -                    pa = (dl >= (double)(LONG_MAX+1UL) ? LONG_MAX : (long)dl);
> -#endif
> -                }
> -            }
> -
> -          if ((*c->progressfn) (c, pa, dl_size == -1 ? 0 : dl_size))
> -           {
> -             c->progressfn_cancel = true;
> -              break;
> -           }
> +      if (filename) /* must start with / */
> +        {
> +          /* PR28034 escape characters in completed url to %hh format. */
> +          snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
> +                   build_id_bytes, type, escaped_string);
>          }
> +      else if (section)
> +       snprintf(data[i].url, PATH_MAX, "%s/%s/%s/%s", server_url,
> +                build_id_bytes, type, section);
> +      else
> +        snprintf(data[i].url, PATH_MAX, "%s/%s/%s", server_url, build_id_bytes, type);
>
> -      /* Check to see if we are downloading something which exceeds maxsize, if set.*/
> -      if (target_handle && dl_size > maxsize && maxsize > 0)
> +      r = init_handle(c, debuginfod_write_callback, header_callback, &data[i], i, timeout, vfd);
> +      if (0 != r)
>          {
> -          if (vfd >=0)
> -            dprintf(vfd, "Content-Length too large.\n");
> -          rc = -EFBIG;
> +          rc = r;
> +          if (filename) curl_free (escaped_string);
>            goto out2;
>          }
> -    } while (still_running);
> +
> +      curl_multi_add_handle(curlm, data[i].handle);
> +    }
> +
> +  if (filename) curl_free(escaped_string);
> +
> +  /* Query servers in parallel.  */
> +  if (vfd >= 0)
> +    dprintf (vfd, "query %d urls in parallel\n", num_urls);
> +  int committed_to;
> +  r = perform_queries(curlm, &target_handle, data, c, num_urls, maxtime, maxsize, true,  vfd, &committed_to);
> +  if (0 != r)
> +    {
> +      rc = r;
> +      goto out2;
> +    }
>
>    /* Check whether a query was successful. If so, assign its handle
>       to verified_handle.  */
> @@ -2180,6 +2306,7 @@ debuginfod_query_server (debuginfod_client *c,
>                curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
>                curl_easy_cleanup (data[i].handle);
>                free(data[i].response_data);
> +              data[i].response_data = NULL;
>              }
>              free(c->winning_headers);
>              c->winning_headers = NULL;
> @@ -2427,7 +2554,7 @@ debuginfod_find_debuginfo (debuginfod_client *client,
>                            const unsigned char *build_id, int build_id_len,
>                             char **path)
>  {
> -  return debuginfod_query_server(client, build_id, build_id_len,
> +  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                   "debuginfo", NULL, path);
>  }
>
> @@ -2438,7 +2565,7 @@ debuginfod_find_executable(debuginfod_client *client,
>                            const unsigned char *build_id, int build_id_len,
>                             char **path)
>  {
> -  return debuginfod_query_server(client, build_id, build_id_len,
> +  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                   "executable", NULL, path);
>  }
>
> @@ -2447,7 +2574,7 @@ int debuginfod_find_source(debuginfod_client *client,
>                            const unsigned char *build_id, int build_id_len,
>                             const char *filename, char **path)
>  {
> -  return debuginfod_query_server(client, build_id, build_id_len,
> +  return debuginfod_query_server_by_buildid(client, build_id, build_id_len,
>                                   "source", filename, path);
>  }
>
> @@ -2456,8 +2583,8 @@ debuginfod_find_section (debuginfod_client *client,
>                          const unsigned char *build_id, int build_id_len,
>                          const char *section, char **path)
>  {
> -  int rc = debuginfod_query_server(client, build_id, build_id_len,
> -                                  "section", section, path);
> +  int rc = debuginfod_query_server_by_buildid(client, build_id, build_id_len,
> +                                              "section", section, path);
>    if (rc != -EINVAL && rc != -ENOSYS)
>      return rc;
>    /* NB: we fall through in case of ima:enforcing-filtered DEBUGINFOD_URLS servers,
> @@ -2508,6 +2635,383 @@ debuginfod_find_section (debuginfod_client *client,
>    return rc;
>  }
>
> +
> +int debuginfod_find_metadata (debuginfod_client *client,
> +                              const char* key, char* value, char **path)
> +{
> +  (void) client;
> +  (void) key;
> +  (void) value;
> +  (void) path;

These void statements can be removed.

> +
> +  char *server_urls = NULL;
> +  char *urls_envvar = NULL;
> +  char *cache_path = NULL;
> +  char *target_cache_dir = NULL;
> +  char *target_cache_path = NULL;
> +  char *target_cache_tmppath = NULL;
> +  char *target_file_name = NULL;
> +  char *key_and_value = NULL;
> +  int rc = 0, r;
> +  int vfd = client->verbose_fd;
> +  struct handle_data *data = NULL;
> +
> +  json_object *json_metadata = json_object_new_object();
> +  json_bool json_metadata_complete = true;
> +  json_object *json_metadata_arr = json_object_new_array();
> +  if (NULL == json_metadata)
> +    {
> +      rc = -ENOMEM;
> +      goto out;
> +    }
> +  json_object_object_add(json_metadata, "results",
> +                         json_metadata_arr ?: json_object_new_array() /* Empty array */);
> +
> +  if (NULL == value || NULL == key)
> +    {
> +      rc = -EINVAL;
> +      goto out;
> +    }
> +
> +  if (vfd >= 0)
> +    dprintf (vfd, "debuginfod_find_metadata %s %s\n", key, value);
> +
> +  /* Without query-able URL, we can stop here*/
> +  urls_envvar = getenv(DEBUGINFOD_URLS_ENV_VAR);
> +  if (vfd >= 0)
> +    dprintf (vfd, "server urls \"%s\"\n",
> +      urls_envvar != NULL ? urls_envvar : "");
> +  if (urls_envvar == NULL || urls_envvar[0] == '\0')
> +  {
> +    rc = -ENOSYS;
> +    goto out;
> +  }
> +
> +  /* set paths needed to perform the query
> +     example format:
> +     cache_path:        $HOME/.cache
> +     target_cache_dir:  $HOME/.cache/metadata
> +     target_cache_path: $HOME/.cache/metadata/KEYENCODED_VALUEENCODED
> +     target_cache_path: $HOME/.cache/metadata/KEYENCODED_VALUEENCODED.XXXXXX
> +  */
> +
> +  // libcurl > 7.62ish has curl_url_set()/etc. to construct these things more properly.
> +  // curl_easy_escape() is older
> +  {
> +    CURL *c = curl_easy_init();
> +    if (!c)
> +      {
> +        rc = -ENOMEM;
> +        goto out;
> +      }
> +    char *key_escaped = curl_easy_escape(c, key, 0);
> +    char *value_escaped = curl_easy_escape(c, value, 0);
> +
> +    // fallback to unescaped values in unlikely case of error
> +    xalloc_str (key_and_value, "key=%s&value=%s", key_escaped ?: key, value_escaped ?: value);
> +    xalloc_str (target_file_name, "%s_%s", key_escaped ?: key, value_escaped ?: value);
> +    curl_free(value_escaped);
> +    curl_free(key_escaped);
> +    curl_easy_cleanup(c);
> +  }
> +
> +  /* Check if we have a recent result already in the cache. */
> +  cache_path = make_cache_path();
> +  if (! cache_path)
> +    goto out;
> +  xalloc_str (target_cache_dir, "%s/metadata", cache_path);
> +  (void) mkdir (target_cache_dir, 0700);
> +  xalloc_str (target_cache_path, "%s/%s", target_cache_dir, target_file_name);
> +  xalloc_str (target_cache_tmppath, "%s/%s.XXXXXX", target_cache_dir, target_file_name);
> +
> +  int fd = open(target_cache_path, O_RDONLY);
> +  if (fd >= 0)
> +    {
> +      struct stat st;
> +      int metadata_retention = 0;
> +      time_t now = time(NULL);
> +      char *metadata_retention_path = 0;
> +
> +      xalloc_str (metadata_retention_path, "%s/%s", cache_path, metadata_retention_filename);
> +      if (metadata_retention_path)
> +        {
> +          rc = debuginfod_config_cache(client, metadata_retention_path,
> +                                       metadata_retention_default_s, &st);
> +          free (metadata_retention_path);
> +          if (rc < 0)
> +            rc = 0;
> +        }
> +      else
> +        rc = 0;
> +      metadata_retention = rc;
> +
> +      if (fstat(fd, &st) != 0)
> +        {
> +          rc = -errno;
> +          close (fd);
> +          goto out;
> +        }
> +
> +      if (metadata_retention > 0 && (now - st.st_mtime <= metadata_retention))
> +        {
> +          if (client && client->verbose_fd >= 0)
> +            dprintf (client->verbose_fd, "cached metadata %s", target_file_name);
> +
> +          if (path != NULL)
> +            {
> +              *path = target_cache_path; // pass over the pointer
> +              target_cache_path = NULL; // prevent free() in our own cleanup
> +            }
> +
> +          /* Success!!!! */
> +          rc = fd;
> +          goto out;
> +        }
> +
> +      /* We don't have to clear the likely-expired cached object here
> +         by unlinking.  We will shortly make a new request and save
> +         results right on top.  Erasing here could trigger a TOCTOU
> +         race with another thread just finishing a query and passing
> +         its results back.
> +      */
> +      // (void) unlink (target_cache_path);
> +
> +      close (fd);
> +    }
> +
> +  /* No valid cached metadata found: time to make the queries. */
> +
> +  free (client->url);
> +  client->url = NULL;
> +
> +  long maxtime = 0;
> +  const char *maxtime_envvar;
> +  maxtime_envvar = getenv(DEBUGINFOD_MAXTIME_ENV_VAR);
> +  if (maxtime_envvar != NULL)
> +    maxtime = atol (maxtime_envvar);
> +  if (maxtime && vfd >= 0)
> +    dprintf(vfd, "using max time %lds\n", maxtime);
> +
> +  long timeout = default_timeout;
> +  const char* timeout_envvar = getenv(DEBUGINFOD_TIMEOUT_ENV_VAR);
> +  if (timeout_envvar != NULL)
> +    timeout = atoi (timeout_envvar);
> +  if (vfd >= 0)
> +    dprintf (vfd, "using timeout %ld\n", timeout);
> +
> +  add_default_headers(client);
> +
> +  /* Make a copy of the envvar so it can be safely modified.  */
> +  server_urls = strdup(urls_envvar);
> +  if (server_urls == NULL)
> +  {
> +    rc = -ENOMEM;
> +    goto out;
> +  }
> +
> +  /* Thereafter, goto out1 on error*/
> +
> +  char **server_url_list = NULL;
> +  ima_policy_t* url_ima_policies = NULL;
> +  char *server_url;
> +  int num_urls = 0;
> +  r = init_server_urls("metadata", NULL, server_urls, &server_url_list, &url_ima_policies, &num_urls, vfd);
> +  if (0 != r)
> +    {
> +      rc = r;
> +      goto out1;
> +    }
> +
> +  CURLM *curlm = client->server_mhandle;
> +  assert (curlm != NULL);

We can replace this assert with returning an error code.

> +
> +  CURL *target_handle = NULL;
> +  data = malloc(sizeof(struct handle_data) * num_urls);
> +  if (data == NULL)
> +    {
> +      rc = -ENOMEM;
> +      goto out1;
> +    }
> +
> +  /* thereafter, goto out2 on error.  */
> +
> +  /* Initialize handle_data  */
> +  for (int i = 0; i < num_urls; i++)
> +    {
> +      if ((server_url = server_url_list[i]) == NULL)
> +        break;
> +      if (vfd >= 0)
> +        dprintf (vfd, "init server %d %s\n", i, server_url);
> +
> +      data[i].errbuf[0] = '\0';
> +      data[i].target_handle = &target_handle;
> +      data[i].client = client;
> +      data[i].metadata = NULL;
> +      data[i].metadata_size = 0;
> +      data[i].response_data = NULL;
> +      data[i].response_data_size = 0;
> +
> +      snprintf(data[i].url, PATH_MAX, "%s?%s", server_url, key_and_value);
> +
> +      r = init_handle(client, metadata_callback, header_callback, &data[i], i, timeout, vfd);
> +      if (0 != r)
> +        {
> +          rc = r;
> +          goto out2;
> +        }
> +      curl_multi_add_handle(curlm, data[i].handle);
> +    }
> +
> +  /* Query servers */
> +  if (vfd >= 0)
> +    dprintf (vfd, "Starting %d queries\n",num_urls);
> +  int committed_to;
> +  r = perform_queries(curlm, NULL, data, client, num_urls, maxtime, 0, false, vfd, &committed_to);
> +  if (0 != r)
> +    {
> +      rc = r;
> +      goto out2;
> +    }
> +
> +  /* NOTE: We don't check the return codes of the curl messages since
> +     a metadata query failing silently is just fine. We want to know what's
> +     available from servers which can be connected with no issues.
> +     If running with additional verbosity, the failure will be noted in stderr */
> +
> +  /* Building the new json array from all the upstream data and
> +     cleanup while at it.
> +   */
> +  for (int i = 0; i < num_urls; i++)
> +    {
> +      curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> +      curl_easy_cleanup (data[i].handle);
> +      free (data[i].response_data);
> +
> +      if (NULL == data[i].metadata)
> +        {
> +          if (vfd >= 0)
> +            dprintf (vfd, "Query to %s failed with error message:\n\t\"%s\"\n",
> +                     data[i].url, data[i].errbuf);
> +          json_metadata_complete = false;
> +          continue;
> +        }
> +
> +      json_object *upstream_metadata = json_tokener_parse(data[i].metadata);
> +      json_object *upstream_complete;
> +      json_object *upstream_metadata_arr;
> +      if (NULL == upstream_metadata ||
> +          !json_object_object_get_ex(upstream_metadata, "results", &upstream_metadata_arr) ||
> +          !json_object_object_get_ex(upstream_metadata, "complete", &upstream_complete))
> +        continue;
> +      json_metadata_complete &= json_object_get_boolean(upstream_complete);
> +      // Combine the upstream metadata into the json array
> +      for (int j = 0, n = json_object_array_length(upstream_metadata_arr); j < n; j++)
> +        {
> +          json_object *entry = json_object_array_get_idx(upstream_metadata_arr, j);
> +          json_object_get(entry); // increment reference count
> +          json_object_array_add(json_metadata_arr, entry);
> +        }
> +      json_object_put(upstream_metadata);
> +
> +      free (data[i].metadata);
> +    }
> +
> +  /* Because of race with cache cleanup / rmdir, try to mkdir/mkstemp up to twice. */
> +  for (int i=0; i<2; i++)
> +    {
> +      /* (re)create target directory in cache */
> +      (void) mkdir(target_cache_dir, 0700); /* files will be 0400 later */
> +
> +      /* NB: write to a temporary file first, to avoid race condition of
> +         multiple clients checking the cache, while a partially-written or empty
> +         file is in there, being written from libcurl. */
> +      fd = mkstemp (target_cache_tmppath);
> +      if (fd >= 0) break;
> +    }
> +  if (fd < 0) /* Still failed after two iterations. */
> +    {
> +      rc = -errno;
> +      goto out1;
> +    }
> +
> +  /* Plop the complete json_metadata object into the cache. */
> +  json_object_object_add(json_metadata, "complete", json_object_new_boolean(json_metadata_complete));
> +  const char* json_string = json_object_to_json_string_ext(json_metadata, JSON_C_TO_STRING_PRETTY);
> +  if (json_string == NULL)
> +    {
> +      rc = -ENOMEM;
> +      goto out1;
> +    }
> +  ssize_t res = write_retry (fd, json_string, strlen(json_string));
> +  (void) lseek(fd, 0, SEEK_SET); // rewind file so client can read it from the top
> +
> +  /* NB: json_string is auto deleted when json_metadata object is nuked */
> +  if (res < 0 || (size_t) res != strlen(json_string))
> +    {
> +      rc = -EIO;
> +      goto out1;
> +    }
> +  /* PR27571: make cache files casually unwriteable; dirs are already 0700 */
> +  (void) fchmod(fd, 0400);
> +
> +  /* rename tmp->real */
> +  rc = rename (target_cache_tmppath, target_cache_path);
> +  if (rc < 0)
> +    {
> +      rc = -errno;
> +      goto out1;
> +      /* Perhaps we need not give up right away; could retry or something ... */
> +    }
> +
> +  /* don't close fd - we're returning it */
> +  /* don't unlink the tmppath; it's already been renamed. */
> +  if (path != NULL)
> +   *path = strdup(target_cache_path);
> +
> +  rc = fd;
> +  goto out1;
> +
> +/* error exits */
> +out2:
> +  /* remove all handles from multi */
> +  for (int i = 0; i < num_urls; i++)
> +  {
> +    if (data[i].handle != NULL)
> +    {
> +      curl_multi_remove_handle(curlm, data[i].handle); /* ok to repeat */
> +      curl_easy_cleanup (data[i].handle);
> +      free (data[i].response_data);
> +      free (data[i].metadata);
> +    }
> +  }
> +
> +out1:
> +  free(data);
> +
> +  for (int i = 0; i < num_urls; ++i)
> +    free(server_url_list[i]);
> +  free(server_url_list);
> +  free(url_ima_policies);
> +
> +out:
> +  free (server_urls);
> +  json_object_put(json_metadata);
> +  /* Reset sent headers */
> +  curl_slist_free_all (client->headers);
> +  client->headers = NULL;
> +  client->user_agent_set_p = 0;
> +
> +  free (target_cache_dir);
> +  free (target_cache_path);
> +  free (target_cache_tmppath);
> +  free (key_and_value);
> +  free (target_file_name);
> +  free (cache_path);
> +
> +  return rc;
> +}
> +
> +
>  /* Add an outgoing HTTP header.  */
>  int debuginfod_add_http_header (debuginfod_client *client, const char* header)
>  {
> diff --git a/debuginfod/debuginfod-find.c b/debuginfod/debuginfod-find.c
> index 080dd8f2c6a3..b0a7c2360dd8 100644
> --- a/debuginfod/debuginfod-find.c
> +++ b/debuginfod/debuginfod-find.c
> @@ -1,6 +1,6 @@
>  /* Command-line frontend for retrieving ELF / DWARF / source files
>     from the debuginfod.
> -   Copyright (C) 2019-2020 Red Hat, Inc.
> +   Copyright (C) 2019-2023 Red Hat, Inc.
>     This file is part of elfutils.
>
>     This file is free software; you can redistribute it and/or modify
> @@ -30,7 +30,7 @@
>  #include <fcntl.h>
>  #include <gelf.h>
>  #include <libdwelf.h>
> -
> +#include <json-c/json.h>
>
>  /* Name and version of program.  */
>  ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
> @@ -49,9 +49,10 @@ static const char args_doc[] = N_("debuginfo BUILDID\n"
>                                    "executable PATH\n"
>                                    "source BUILDID /FILENAME\n"
>                                    "source PATH /FILENAME\n"
> -                                 "section BUILDID SECTION-NAME\n"
> -                                 "section PATH SECTION-NAME\n");
> -
> +                                  "section BUILDID SECTION-NAME\n"
> +                                  "section PATH SECTION-NAME\n"
> +                                  "metadata (glob|file|KEY) (GLOB|FILENAME|VALUE)\n"
> +                                  );
>
>  /* Definitions of arguments for argp functions.  */
>  static const struct argp_option options[] =
> @@ -145,49 +146,60 @@ main(int argc, char** argv)
>    /* If we were passed an ELF file name in the BUILDID slot, look in there. */
>    unsigned char* build_id = (unsigned char*) argv[remaining+1];
>    int build_id_len = 0; /* assume text */
> -
> -  int any_non_hex = 0;
> -  int i;
> -  for (i = 0; build_id[i] != '\0'; i++)
> -    if ((build_id[i] >= '0' && build_id[i] <= '9') ||
> -        (build_id[i] >= 'a' && build_id[i] <= 'f'))
> -      ;
> -    else
> -      any_non_hex = 1;
> -
> -  int fd = -1;
>    Elf* elf = NULL;
> -  if (any_non_hex) /* raw build-id */
> -    {
> -      fd = open ((char*) build_id, O_RDONLY);
> -      if (fd < 0)
> -        fprintf (stderr, "Cannot open %s: %s\n", build_id, strerror(errno));
> -    }
> -  if (fd >= 0)
> -    {
> -      elf = dwelf_elf_begin (fd);
> -      if (elf == NULL)
> -        fprintf (stderr, "Cannot open as ELF file %s: %s\n", build_id,
> -                elf_errmsg (-1));
> -    }
> -  if (elf != NULL)
> +
> +  /* Process optional buildid given via ELF file name, for some query types only. */
> +  if (strcmp(argv[remaining], "debuginfo") == 0
> +      || strcmp(argv[remaining], "executable") == 0
> +      || strcmp(argv[remaining], "source") == 0
> +      || strcmp(argv[remaining], "section") == 0)
>      {
> -      const void *extracted_build_id;
> -      ssize_t s = dwelf_elf_gnu_build_id(elf, &extracted_build_id);
> -      if (s > 0)
> +      int any_non_hex = 0;
> +      int i;
> +      for (i = 0; build_id[i] != '\0'; i++)
> +        if ((build_id[i] >= '0' && build_id[i] <= '9') ||
> +            (build_id[i] >= 'a' && build_id[i] <= 'f'))
> +          ;
> +        else
> +          any_non_hex = 1;
> +
> +      int fd = -1;
> +      if (any_non_hex) /* raw build-id */
>          {
> -          /* Success: replace the build_id pointer/len with the binary blob
> -             that elfutils is keeping for us.  It'll remain valid until elf_end(). */
> -          build_id = (unsigned char*) extracted_build_id;
> -          build_id_len = s;
> +          fd = open ((char*) build_id, O_RDONLY);
> +          if (fd < 0)
> +            fprintf (stderr, "Cannot open %s: %s\n", build_id, strerror(errno));
> +        }
> +      if (fd >= 0)
> +        {
> +          elf = dwelf_elf_begin (fd);
> +          if (elf == NULL)
> +            fprintf (stderr, "Cannot open as ELF file %s: %s\n", build_id,
> +                     elf_errmsg (-1));
> +        }
> +      if (elf != NULL)
> +        {
> +          const void *extracted_build_id;
> +          ssize_t s = dwelf_elf_gnu_build_id(elf, &extracted_build_id);
> +          if (s > 0)
> +            {
> +              /* Success: replace the build_id pointer/len with the binary blob
> +                 that elfutils is keeping for us.  It'll remain valid until elf_end(). */
> +              build_id = (unsigned char*) extracted_build_id;
> +              build_id_len = s;
> +            }
> +          else
> +            fprintf (stderr, "Cannot extract build-id from %s: %s\n", build_id, elf_errmsg(-1));
>          }
> -      else
> -        fprintf (stderr, "Cannot extract build-id from %s: %s\n", build_id, elf_errmsg(-1));
>      }
>
>    char *cache_name;
>    int rc = 0;
>
> +  /* By default the stdout output is the path of the cached file.
> +     Some requests (ex. metadata query may instead choose to do a different output,
> +     in that case a stringified json object) */
> +  bool print_cached_file = true;
>    /* Check whether FILETYPE is valid and call the appropriate
>       debuginfod_find_* function. If FILETYPE is "source"
>       then ensure a FILENAME was also supplied as an argument.  */
> @@ -221,6 +233,35 @@ main(int argc, char** argv)
>        rc = debuginfod_find_section(client, build_id, build_id_len,
>                                    argv[remaining+2], &cache_name);
>      }
> +  else if (strcmp(argv[remaining], "metadata") == 0) /* no buildid! */
> +    {
> +      if (remaining+2 == argc)
> +        {
> +          fprintf(stderr, "Require KEY and VALUE for \"metadata\"\n");
> +          return 1;
> +        }
> +
> +      rc = debuginfod_find_metadata (client, argv[remaining+1], argv[remaining+2],
> +                                     &cache_name);
> +      /* We output a pprinted JSON object, not the regular debuginfod-find cached file path */
> +      print_cached_file = false;
> +      json_object *metadata = json_object_from_file(cache_name);
> +      if(metadata)
> +        {
> +          printf("%s\n", json_object_to_json_string_ext(metadata,
> +                                                        JSON_C_TO_STRING_PRETTY
> +#ifdef JSON_C_TO_STRING_NOSLASHESCAPE /* json-c 0.15 */
> +                                                        | JSON_C_TO_STRING_NOSLASHESCAPE
> +#endif
> +                                                        ));
> +          json_object_put(metadata);
> +        }
> +      else
> +        {
> +          fprintf(stderr, "%s does not contain a valid JSON format object\n", cache_name);

If DEBUGINFOD_URLS is unset then cache_name is NULL and the
following is printed: "(null) does not contain a valid JSON format object".

debuginfod-find executable/debuginfo/source all print "Server query
failed: Function not implemented" when DEBUGINFOD_URLS isn't set.
debuginfod-find metadata should probably print this too when no URLs
are given.

> +          return 1;
> +        }
> +    }
>    else
>      {
>        argp_help (&argp, stderr, ARGP_HELP_USAGE, argv[0]);
> @@ -240,8 +281,6 @@ main(int argc, char** argv)
>    debuginfod_end (client);
>    if (elf)
>      elf_end(elf);
> -  if (fd >= 0)
> -    close (fd);
>
>    if (rc < 0)
>      {
> @@ -251,7 +290,7 @@ main(int argc, char** argv)
>    else
>      close (rc);
>
> -  printf("%s\n", cache_name);
> +  if(print_cached_file) printf("%s\n", cache_name);
>    free (cache_name);
>
>    return 0;
> diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
> index d9259ad26bb8..305edde81021 100644
> --- a/debuginfod/debuginfod.cxx
> +++ b/debuginfod/debuginfod.cxx
> @@ -76,6 +76,7 @@ extern "C" {
>  #include <netdb.h>
>  #include <math.h>
>  #include <float.h>
> +#include <fnmatch.h>
>
>
>  /* If fts.h is included before config.h, its indirect inclusions may not
> @@ -148,6 +149,7 @@ extern "C" {
>  #include "printversion.h"
>  #include "system.h"
>  }
> +#include <json-c/json.h>
>
>
>  inline bool
> @@ -220,7 +222,7 @@ static const char DEBUGINFOD_SQLITE_DDL[] =
>    "        foreign key (buildid) references " BUILDIDS "_buildids(id) on update cascade on delete cascade,\n"
>    "        primary key (buildid, file, mtime)\n"
>    "        ) " WITHOUT_ROWID ";\n"
> -  // Index for faster delete by file identifier
> +  // Index for faster delete by file identifier and metadata searches
>    "create index if not exists " BUILDIDS "_f_de_idx on " BUILDIDS "_f_de (file, mtime);\n"
>    "create table if not exists " BUILDIDS "_f_s (\n"
>    "        buildid integer not null,\n"
> @@ -246,6 +248,8 @@ static const char DEBUGINFOD_SQLITE_DDL[] =
>    "        ) " WITHOUT_ROWID ";\n"
>    // Index for faster delete by archive file identifier
>    "create index if not exists " BUILDIDS "_r_de_idx on " BUILDIDS "_r_de (file, mtime);\n"
> +  // Index for metadata searches
> +  "create index if not exists " BUILDIDS "_r_de_idx2 on " BUILDIDS "_r_de (content);\n"
>    "create table if not exists " BUILDIDS "_r_sref (\n" // outgoing dwarf sourcefile references from rpm
>    "        buildid integer not null,\n"
>    "        artifactsrc integer not null,\n"
> @@ -454,6 +458,9 @@ static const struct argp_option options[] =
>  #define ARGP_KEY_KOJI_SIGCACHE 0x100B
>     { "koji-sigcache", ARGP_KEY_KOJI_SIGCACHE, NULL, 0, "Do a koji specific mapping of rpm paths to get IMA signatures.", 0 },
>  #endif
> +#define ARGP_KEY_METADATA_MAXTIME 0x100C
> +   { "metadata-maxtime", ARGP_KEY_METADATA_MAXTIME, "SECONDS", 0,
> +     "Number of seconds to limit metadata query run time, 0=unlimited.", 0 },
>     { NULL, 0, NULL, 0, NULL, 0 },
>    };
>
> @@ -509,6 +516,7 @@ static long scan_checkpoint = 256;
>  #ifdef ENABLE_IMA_VERIFICATION
>  static bool requires_koji_sigcache_mapping = false;
>  #endif
> +static unsigned metadata_maxtime_s = 5;
>
>  static void set_metric(const string& key, double value);
>  static void inc_metric(const string& key);
> @@ -711,7 +719,10 @@ parse_opt (int key, char *arg,
>      case ARGP_SCAN_CHECKPOINT:
>        scan_checkpoint = atol (arg);
>        if (scan_checkpoint < 0)
> -        argp_failure(state, 1, EINVAL, "scan checkpoint");
> +        argp_failure(state, 1, EINVAL, "scan checkpoint");
> +      break;
> +    case ARGP_KEY_METADATA_MAXTIME:
> +      metadata_maxtime_s = (unsigned) atoi(arg);
>        break;
>  #ifdef ENABLE_IMA_VERIFICATION
>      case ARGP_KEY_KOJI_SIGCACHE:
> @@ -2382,6 +2393,58 @@ handle_buildid_r_match (bool internal_req_p,
>    return r;
>  }
>
> +void
> +add_client_federation_headers(debuginfod_client *client, MHD_Connection* conn){
> +  // Transcribe incoming User-Agent:
> +  string ua = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "User-Agent") ?: "";
> +  string ua_complete = string("User-Agent: ") + ua;
> +  debuginfod_add_http_header (client, ua_complete.c_str());
> +
> +  // Compute larger XFF:, for avoiding info loss during
> +  // federation, and for future cyclicity detection.
> +  string xff = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "X-Forwarded-For") ?: "";
> +  if (xff != "")
> +    xff += string(", "); // comma separated list
> +
> +  unsigned int xff_count = 0;
> +  for (auto&& i : xff){
> +    if (i == ',') xff_count++;
> +  }
> +
> +  // if X-Forwarded-For: exceeds N hops,
> +  // do not delegate a local lookup miss to upstream debuginfods.
> +  if (xff_count >= forwarded_ttl_limit)
> +    throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found, --forwared-ttl-limit reached \
> +and will not query the upstream servers");
> +
> +  // Compute the client's numeric IP address only - so can't merge with conninfo()
> +  const union MHD_ConnectionInfo *u = MHD_get_connection_info (conn,
> +                                                                MHD_CONNECTION_INFO_CLIENT_ADDRESS);
> +  struct sockaddr *so = u ? u->client_addr : 0;
> +  char hostname[256] = ""; // RFC1035
> +  if (so && so->sa_family == AF_INET) {
> +    (void) getnameinfo (so, sizeof (struct sockaddr_in), hostname, sizeof (hostname), NULL, 0,
> +                        NI_NUMERICHOST);
> +  } else if (so && so->sa_family == AF_INET6) {
> +    struct sockaddr_in6* addr6 = (struct sockaddr_in6*) so;
> +    if (IN6_IS_ADDR_V4MAPPED(&addr6->sin6_addr)) {
> +      struct sockaddr_in addr4;
> +      memset (&addr4, 0, sizeof(addr4));
> +      addr4.sin_family = AF_INET;
> +      addr4.sin_port = addr6->sin6_port;
> +      memcpy (&addr4.sin_addr.s_addr, addr6->sin6_addr.s6_addr+12, sizeof(addr4.sin_addr.s_addr));
> +      (void) getnameinfo ((struct sockaddr*) &addr4, sizeof (addr4),
> +                          hostname, sizeof (hostname), NULL, 0,
> +                          NI_NUMERICHOST);
> +    } else {
> +      (void) getnameinfo (so, sizeof (struct sockaddr_in6), hostname, sizeof (hostname), NULL, 0,
> +                          NI_NUMERICHOST);
> +    }
> +  }
> +
> +  string xff_complete = string("X-Forwarded-For: ")+xff+string(hostname);
> +  debuginfod_add_http_header (client, xff_complete.c_str());
> +}
>
>  static struct MHD_Response*
>  handle_buildid_match (bool internal_req_p,
> @@ -2615,58 +2678,8 @@ handle_buildid (MHD_Connection* conn,
>    debuginfod_set_progressfn (client, & debuginfod_find_progress);
>
>    if (conn)
> -    {
> -      // Transcribe incoming User-Agent:
> -      string ua = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "User-Agent") ?: "";
> -      string ua_complete = string("User-Agent: ") + ua;
> -      debuginfod_add_http_header (client, ua_complete.c_str());
> -
> -      // Compute larger XFF:, for avoiding info loss during
> -      // federation, and for future cyclicity detection.
> -      string xff = MHD_lookup_connection_value (conn, MHD_HEADER_KIND, "X-Forwarded-For") ?: "";
> -      if (xff != "")
> -        xff += string(", "); // comma separated list
> -
> -      unsigned int xff_count = 0;
> -      for (auto&& i : xff){
> -        if (i == ',') xff_count++;
> -      }
> +    add_client_federation_headers(client, conn);
>
> -      // if X-Forwarded-For: exceeds N hops,
> -      // do not delegate a local lookup miss to upstream debuginfods.
> -      if (xff_count >= forwarded_ttl_limit)
> -        throw reportable_exception(MHD_HTTP_NOT_FOUND, "not found, --forwared-ttl-limit reached \
> -and will not query the upstream servers");
> -
> -      // Compute the client's numeric IP address only - so can't merge with conninfo()
> -      const union MHD_ConnectionInfo *u = MHD_get_connection_info (conn,
> -                                                                   MHD_CONNECTION_INFO_CLIENT_ADDRESS);
> -      struct sockaddr *so = u ? u->client_addr : 0;
> -      char hostname[256] = ""; // RFC1035
> -      if (so && so->sa_family == AF_INET) {
> -        (void) getnameinfo (so, sizeof (struct sockaddr_in), hostname, sizeof (hostname), NULL, 0,
> -                            NI_NUMERICHOST);
> -      } else if (so && so->sa_family == AF_INET6) {
> -        struct sockaddr_in6* addr6 = (struct sockaddr_in6*) so;
> -        if (IN6_IS_ADDR_V4MAPPED(&addr6->sin6_addr)) {
> -          struct sockaddr_in addr4;
> -          memset (&addr4, 0, sizeof(addr4));
> -          addr4.sin_family = AF_INET;
> -          addr4.sin_port = addr6->sin6_port;
> -          memcpy (&addr4.sin_addr.s_addr, addr6->sin6_addr.s6_addr+12, sizeof(addr4.sin_addr.s_addr));
> -          (void) getnameinfo ((struct sockaddr*) &addr4, sizeof (addr4),
> -                              hostname, sizeof (hostname), NULL, 0,
> -                              NI_NUMERICHOST);
> -        } else {
> -          (void) getnameinfo (so, sizeof (struct sockaddr_in6), hostname, sizeof (hostname), NULL, 0,
> -                              NI_NUMERICHOST);
> -        }
> -      }
> -
> -      string xff_complete = string("X-Forwarded-For: ")+xff+string(hostname);
> -      debuginfod_add_http_header (client, xff_complete.c_str());
> -    }
> -
>    if (artifacttype == "debuginfo")
>      fd = debuginfod_find_debuginfo (client,
>                                      (const unsigned char*) buildid.c_str(),
> @@ -2873,6 +2886,225 @@ handle_metrics (off_t* size)
>    return r;
>  }
>
> +
> +static struct MHD_Response*
> +handle_metadata (MHD_Connection* conn,
> +                 string key, string value, off_t* size)
> +{
> +  MHD_Response* r;
> +  sqlite3 *thisdb = dbq;
> +
> +  // Query locally for matching e, d files
> +  string op;
> +  if (key == "glob")
> +    op = "glob";
> +  else if (key == "file")
> +    op = "=";
> +  else
> +    throw reportable_exception("/metadata webapi error, unsupported key");
> +
> +  // Since PR30378, the file names are segmented into two tables.  We
> +  // could do a glob/= search over the _files_v view that combines
> +  // them, but that means that the entire _files_v thing has to be
> +  // materialized & scanned to do the query.  Slow!  Instead, we can
> +  // segment the incoming file/glob pattern into dirname / basename
> +  // parts, and apply them to the corresponding table.  This is done
> +  // by splitting the value at the last "/".  If absent, the same
> +  // convention as is used in register_file_name().
> +
> +  string dirname, bname; // basename is a "poisoned" identifier on some distros
> +  size_t slash = value.rfind('/');
> +  if (slash == std::string::npos) {
> +    dirname = "";
> +    bname = value;
> +  } else {
> +    dirname = value.substr(0, slash);
> +    bname = value.substr(slash+1);
> +  }
> +
> +  // NB: further optimization is possible: replacing the 'glob' op
> +  // with simple equality, if the corresponding value segment lacks
> +  // metacharacters.  sqlite may or may not be smart enough to do so,
> +  // so we help out.
> +  string metacharacters = "[]*?";
> +  string dop = (op == "glob" && dirname.find_first_of(metacharacters) == string::npos) ? "=" : op;
> +  string bop = (op == "glob" && bname.find_first_of(metacharacters) == string::npos) ? "=" : op;
> +
> +  string sql = string(
> +                      // explicit query r_de and f_de once here, rather than the query_d and query_e
> +                      // separately, because they scan the same tables, so we'd double the work
> +                      "select d1.executable_p, d1.debuginfo_p, 0 as source_p, "
> +                      "       b1.hex, f1d.name || '/' || f1b.name as file, a1.name as archive "
> +                      "from " BUILDIDS "_r_de d1, " BUILDIDS "_files f1, " BUILDIDS "_fileparts f1b, " BUILDIDS "_fileparts f1d, "
> +                      BUILDIDS "_buildids b1, " BUILDIDS "_files_v a1 "
> +                      "where f1.id = d1.content and a1.id = d1.file and d1.buildid = b1.id "
> +                      "      and f1d.name " + dop + " ? and f1b.name " + bop + " ? and f1.dirname = f1d.id and f1.basename = f1b.id "
> +                      "union all \n"
> +                      "select d2.executable_p, d2.debuginfo_p, 0, "
> +                      "       b2.hex, f2d.name || '/' || f2b.name, NULL "
> +                      "from " BUILDIDS "_f_de d2, " BUILDIDS "_files f2, " BUILDIDS "_fileparts f2b, " BUILDIDS "_fileparts f2d, "
> +                      BUILDIDS "_buildids b2 "
> +                      "where f2.id = d2.file and d2.buildid = b2.id "
> +                      "      and f2d.name " + dop + " ? and f2b.name " + bop + " ? "
> +                      "      and f2.dirname = f2d.id and f2.basename = f2b.id");
> +
> +  // NB: we could query source file names too, thusly:
> +  //
> +  //    select * from " BUILDIDS "_buildids b, " BUILDIDS "_files_v f1, " BUILDIDS "_r_sref sr
> +  //    where b.id = sr.buildid and f1.id = sr.artifactsrc and f1.name " + op + "?"
> +  //    UNION ALL something with BUILDIDS "_f_s"
> +  //
> +  // But the first part of this query cannot run fast without the same index temp-created
> +  // during "maxigroom":
> +  //    create index " BUILDIDS "_r_sref_arc on " BUILDIDS "_r_sref(artifactsrc);
> +  // and unfortunately this index is HUGE.  It's similar to the size of the _r_sref
> +  // table, which is already the largest part of a debuginfod index.  Adding that index
> +  // would nearly double the .sqlite db size.
> +
> +  sqlite_ps *pp = new sqlite_ps (thisdb, "mhd-query-meta-glob", sql);
> +  pp->reset();
> +  pp->bind(1, dirname);
> +  pp->bind(2, bname);
> +  pp->bind(3, dirname);
> +  pp->bind(4, bname);
> +  unique_ptr<sqlite_ps> ps_closer(pp); // release pp if exception or return
> +
> +  json_object *metadata = json_object_new_object();
> +  if (!metadata) throw libc_exception(ENOMEM, "json allocation");
> +  defer_dtor<json_object*,int> metadata_d(metadata, json_object_put);
> +  json_object *metadata_arr = json_object_new_array();
> +  if (!metadata_arr) throw libc_exception(ENOMEM, "json allocation");
> +  json_object_object_add(metadata, "results", metadata_arr);
> +  // consume all the rows
> +  struct timespec ts_start;
> +  clock_gettime (CLOCK_MONOTONIC, &ts_start);
> +
> +  int rc;
> +  bool metadata_complete = true;
> +  while (SQLITE_DONE != (rc = pp->step()))
> +    {
> +      // break out of loop if we have searched too long
> +      struct timespec ts_end;
> +      clock_gettime (CLOCK_MONOTONIC, &ts_end);
> +      double deltas = (ts_end.tv_sec - ts_start.tv_sec) + (ts_end.tv_nsec - ts_start.tv_nsec)/1.e9;
> +      if (metadata_maxtime_s > 0 && deltas > metadata_maxtime_s)
> +        {
> +          metadata_complete = false;
> +          break;
> +        }
> +
> +      if (rc != SQLITE_ROW) throw sqlite_exception(rc, "step");
> +
> +      int m_executable_p = sqlite3_column_int (*pp, 0);
> +      int m_debuginfo_p  = sqlite3_column_int (*pp, 1);
> +      int m_source_p     = sqlite3_column_int (*pp, 2);
> +      string m_buildid   = (const char*) sqlite3_column_text (*pp, 3) ?: ""; // should always be non-null
> +      string m_file      = (const char*) sqlite3_column_text (*pp, 4) ?: "";
> +      string m_archive   = (const char*) sqlite3_column_text (*pp, 5) ?: "";
> +
> +      // Confirm that m_file matches in the fnmatch(FNM_PATHNAME)
> +      // sense, since sqlite's GLOB operator is a looser filter.
> +      if (key == "glob" && fnmatch(value.c_str(), m_file.c_str(), FNM_PATHNAME) != 0)
> +        continue;
> +
> +      auto add_metadata = [metadata_arr, m_buildid, m_file, m_archive](const string& type) {
> +        json_object* entry = json_object_new_object();
> +        if (NULL == entry) throw libc_exception (ENOMEM, "cannot allocate json");
> +        defer_dtor<json_object*,int> entry_d(entry, json_object_put);
> +
> +        auto add_entry_metadata = [entry](const char* k, string v) {
> +          json_object* s;
> +          if(v != "") {
> +            s = json_object_new_string(v.c_str());
> +            if (NULL == s) throw libc_exception (ENOMEM, "cannot allocate json");
> +            json_object_object_add(entry, k, s);
> +          }
> +        };
> +
> +        add_entry_metadata("type", type.c_str());
> +        add_entry_metadata("buildid", m_buildid);
> +        add_entry_metadata("file", m_file);
> +        if (m_archive != "") add_entry_metadata("archive", m_archive);
> +        if (verbose > 3)
> +          obatched(clog) << "metadata found local "
> +                         << json_object_to_json_string_ext(entry,
> +                                                           JSON_C_TO_STRING_PRETTY)
> +                         << endl;
> +
> +        // Increase ref count to switch its ownership
> +        json_object_array_add(metadata_arr, json_object_get(entry));
> +      };
> +
> +      if (m_executable_p) add_metadata("executable");
> +      if (m_debuginfo_p) add_metadata("debuginfo");
> +      if (m_source_p) add_metadata("source");
> +    }
> +  pp->reset();
> +
> +  unsigned num_local_results = json_object_array_length(metadata_arr);
> +
> +  // Query upstream as well
> +  debuginfod_client *client = debuginfod_pool_begin();
> +  if (client != NULL)
> +  {
> +    add_client_federation_headers(client, conn);
> +
> +    int upstream_metadata_fd;
> +    char *upstream_metadata_file = NULL;
> +    upstream_metadata_fd = debuginfod_find_metadata(client, key.c_str(), (char*)value.c_str(),
> +                                                    &upstream_metadata_file);
> +    if (upstream_metadata_fd >= 0) {
> +       /* json-c >= 0.13 has json_object_from_fd(). */
> +      json_object *upstream_metadata_json = json_object_from_file(upstream_metadata_file);
> +      free (upstream_metadata_file);
> +      json_object *upstream_metadata_json_arr;
> +      json_object *upstream_complete;
> +      if (NULL != upstream_metadata_json &&
> +          json_object_object_get_ex(upstream_metadata_json, "results", &upstream_metadata_json_arr) &&
> +          json_object_object_get_ex(upstream_metadata_json, "complete", &upstream_complete))
> +        {
> +          metadata_complete &= json_object_get_boolean(upstream_complete);
> +          for (int i = 0, n = json_object_array_length(upstream_metadata_json_arr); i < n; i++)
> +            {
> +              json_object *entry = json_object_array_get_idx(upstream_metadata_json_arr, i);
> +              if (verbose > 3)
> +                obatched(clog) << "metadata found remote "
> +                               << json_object_to_json_string_ext(entry,
> +                                                                 JSON_C_TO_STRING_PRETTY)
> +                               << endl;
> +
> +              json_object_get(entry); // increment reference count
> +              json_object_array_add(metadata_arr, entry);
> +            }
> +          json_object_put(upstream_metadata_json);
> +        }
> +      close(upstream_metadata_fd);
> +    }
> +    debuginfod_pool_end (client);
> +  }
> +
> +  unsigned num_total_results = json_object_array_length(metadata_arr);
> +
> +  if (verbose > 2)
> +    obatched(clog) << "metadata found local=" << num_local_results
> +                   << " remote=" << (num_total_results-num_local_results)
> +                   << " total=" << num_total_results
> +                   << endl;
> +
> +  json_object_object_add(metadata, "complete", json_object_new_boolean(metadata_complete));
> +  const char* metadata_str = json_object_to_json_string(metadata);
> +  if (!metadata_str)
> +    throw libc_exception (ENOMEM, "cannot allocate json");
> +  r = MHD_create_response_from_buffer (strlen(metadata_str),
> +                                       (void*) metadata_str,
> +                                       MHD_RESPMEM_MUST_COPY);
> +  *size = strlen(metadata_str);
> +  if (r)
> +    add_mhd_response_header(r, "Content-Type", "application/json");
> +  return r;
> +}
> +
> +
>  static struct MHD_Response*
>  handle_root (off_t* size)
>  {
> @@ -2939,6 +3171,7 @@ handler_cb (void * /*cls*/,
>    clock_gettime (CLOCK_MONOTONIC, &ts_start);
>    double afteryou = 0.0;
>    string artifacttype, suffix;
> +  string urlargs; // for logging
>
>    try
>      {
> @@ -3007,6 +3240,19 @@ handler_cb (void * /*cls*/,
>            inc_metric("http_requests_total", "type", artifacttype);
>            r = handle_metrics(& http_size);
>          }
> +      else if (url1 == "/metadata")
> +        {
> +          tmp_inc_metric m ("thread_busy", "role", "http-metadata");
> +          const char* key = MHD_lookup_connection_value(connection, MHD_GET_ARGUMENT_KIND, "key");
> +          const char* value = MHD_lookup_connection_value(connection, MHD_GET_ARGUMENT_KIND, "value");
> +          if (NULL == value || NULL == key)
> +            throw reportable_exception("/metadata webapi error, need key and value");
> +
> +          urlargs = string("?key=") + string(key) + string("&value=") + string(value); // apprx., for logging
> +          artifacttype = "metadata";
> +          inc_metric("http_requests_total", "type", artifacttype);
> +          r = handle_metadata(connection, key, value, &http_size);
> +        }
>        else if (url1 == "/")
>          {
>            artifacttype = "/";
> @@ -3043,7 +3289,7 @@ handler_cb (void * /*cls*/,
>    // afteryou: delay waiting for other client's identical query to complete
>    // deltas: total latency, including afteryou waiting
>    obatched(clog) << conninfo(connection)
> -                 << ' ' << method << ' ' << url
> +                 << ' ' << method << ' ' << url << urlargs
>                   << ' ' << http_code << ' ' << http_size
>                   << ' ' << (int)(afteryou*1000) << '+' << (int)((deltas-afteryou)*1000) << "ms"
>                   << endl;
> @@ -3396,6 +3642,7 @@ register_file_name(sqlite_ps& ps_upsert_fileparts,
>        dirname = name.substr(0, slash);
>        filename = name.substr(slash+1);
>      }
> +  // NB: see also handle_metadata()
>
>    // intern the two substrings
>    ps_upsert_fileparts
> @@ -4379,12 +4626,13 @@ void groom()
>    if (interrupted) return;
>
>    // NB: "vacuum" is too heavy for even daily runs: it rewrites the entire db, so is done as maxigroom -G
> -  sqlite_ps g1 (db, "incremental vacuum", "pragma incremental_vacuum");
> -  g1.reset().step_ok_done();
> -  sqlite_ps g2 (db, "optimize", "pragma optimize");
> -  g2.reset().step_ok_done();
> -  sqlite_ps g3 (db, "wal checkpoint", "pragma wal_checkpoint=truncate");
> -  g3.reset().step_ok_done();
> +  { sqlite_ps g (db, "incremental vacuum", "pragma incremental_vacuum"); g.reset().step_ok_done(); }
> +  // https://www.sqlite.org/lang_analyze.html#approx
> +  { sqlite_ps g (db, "analyze setup", "pragma analysis_limit = 1000;\n"); g.reset().step_ok_done(); }
> +  { sqlite_ps g (db, "analyze", "analyze"); g.reset().step_ok_done(); }
> +  { sqlite_ps g (db, "analyze reload", "analyze sqlite_schema"); g.reset().step_ok_done(); }
> +  { sqlite_ps g (db, "optimize", "pragma optimize"); g.reset().step_ok_done(); }
> +  { sqlite_ps g (db, "wal checkpoint", "pragma wal_checkpoint=truncate"); g.reset().step_ok_done(); }
>
>    database_stats_report();
>
> @@ -4769,6 +5017,8 @@ main (int argc, char *argv[])
>    if (maxigroom)
>      {
>        obatched(clog) << "maxigrooming database, please wait." << endl;
> +      // NB: this index alone can nearly double the database size!
> +      // NB: this index would be necessary to run source-file metadata searches fast
>        extra_ddl.push_back("create index if not exists " BUILDIDS "_r_sref_arc on " BUILDIDS "_r_sref(artifactsrc);");
>        extra_ddl.push_back("delete from " BUILDIDS "_r_sdef where not exists (select 1 from " BUILDIDS "_r_sref b where " BUILDIDS "_r_sdef.content = b.artifactsrc);");
>        extra_ddl.push_back("drop index if exists " BUILDIDS "_r_sref_arc;");
> diff --git a/debuginfod/debuginfod.h.in b/debuginfod/debuginfod.h.in
> index 73f633f0b8e9..3936b17b97cf 100644
> --- a/debuginfod/debuginfod.h.in
> +++ b/debuginfod/debuginfod.h.in
> @@ -63,9 +63,9 @@ debuginfod_client *debuginfod_begin (void);
>     it is a binary blob of given length.
>
>     If successful, return a file descriptor to the target, otherwise
> -   return a posix error code.  If successful, set *path to a
> -   strdup'd copy of the name of the same file in the cache.
> -   Caller must free() it later. */
> +   return a negative POSIX error code.  If successful, set *path to a
> +   strdup'd copy of the name of the same file in the cache.  Caller
> +   must free() it later. */
>
>  int debuginfod_find_debuginfo (debuginfod_client *client,
>                                const unsigned char *build_id,
> @@ -89,6 +89,27 @@ int debuginfod_find_section (debuginfod_client *client,
>                              const char *section,
>                              char **path);
>
> +/* Query the urls contained in $DEBUGINFOD_URLS for metadata
> +   with given query key/value.
> +
> +   If successful, return a file descriptor to the JSON document
> +   describing matches, otherwise return a negative POSIX error code.  If
> +   successful, set *path to a strdup'd copy of the name of the same
> +   file in the cache.  Caller must free() it later.
> +
> +   key can be one of 'glob' or 'file' corresponding to querying for value
> +   by exact name or using a pattern matching approach.
> +
> +   The JSON document will be of the form {results: [{...}, ...], complete: <bool>},
> +   where the results are JSON objects containing metadata and complete is true iff
> +   all of the federation of servers responded with complete results (as opposed to 1+
> +   failing to return or having an issue)
> +   */

"Having an issue" is a bit imprecise.  I suggest replacing "(as opposed
to..." with "If complete is false, at least one server in the federation
may have failed to respond or responded with partial metadata results due
to a timeout".

> +int debuginfod_find_metadata (debuginfod_client *client,
> +                              const char *key,
> +                              char* value,
> +                              char **path);
> +
>  typedef int (*debuginfod_progressfn_t)(debuginfod_client *c, long a, long b);
>  void debuginfod_set_progressfn(debuginfod_client *c,
>                                debuginfod_progressfn_t fn);
> diff --git a/debuginfod/libdebuginfod.map b/debuginfod/libdebuginfod.map
> index 6334373f01b0..9cee91cd79aa 100644
> --- a/debuginfod/libdebuginfod.map
> +++ b/debuginfod/libdebuginfod.map
> @@ -22,3 +22,6 @@ ELFUTILS_0.188 {
>    debuginfod_get_headers;
>    debuginfod_find_section;
>  } ELFUTILS_0.183;
> +ELFUTILS_0.192 {
> +  debuginfod_find_metadata;
> +} ELFUTILS_0.188;
> diff --git a/doc/debuginfod-client-config.7 b/doc/debuginfod-client-config.7
> index f16612084e9b..bb33fb0b8b6e 100644
> --- a/doc/debuginfod-client-config.7
> +++ b/doc/debuginfod-client-config.7
> @@ -167,3 +167,11 @@ are short-circuited (returning an immediate failure instead of sending
>  a new query to servers).  This accelerates queries that probably would
>  still fail.  The default is 600, 10 minutes.  0 means "forget
>  immediately".
> +
> +.TP
> +.B metadata_retention_s
> +This control file sets how long to remember the results of a metadata
> +query.  New queries for the same artifacts within this time window are
> +short-circuited (repeating the same results).  This accelerates
> +queries that probably would probably have the same results.  The
> +default is 3600, 1 hour.  0 means "do not retain".
> diff --git a/doc/debuginfod-find.1 b/doc/debuginfod-find.1
> index d7db1bfdd838..8c63b2c5a5e0 100644
> --- a/doc/debuginfod-find.1
> +++ b/doc/debuginfod-find.1
> @@ -29,6 +29,8 @@ debuginfod-find \- request debuginfo-related data
>  .B debuginfod-find [\fIOPTION\fP]... source \fIBUILDID\fP \fI/FILENAME\fP
>  .br
>  .B debuginfod-find [\fIOPTION\fP]... source \fIPATH\fP \fI/FILENAME\fP
> +.br
> +.B debuginfod-find [\fIOPTION\fP]... metadata \fIKEY\fP \fIVALUE\fP
>
>  .SH DESCRIPTION
>  \fBdebuginfod-find\fP queries one or more \fBdebuginfod\fP servers for
> @@ -119,6 +121,63 @@ l l.
>  \../bar/foo.c AT_comp_dir=/zoo/        source BUILDID /zoo//../bar/foo.c
>  .TE
>
> +.SS metadata \fIKEY\fP \fIVALUE\fP
> +
> +All designated debuginfod servers are queried for metadata about files
> +in their index.  Different search keys may be supported by different
> +servers.
> +
> +.TS
> +l l l .
> +KEY    VALUE   DESCRIPTION
> +
> +\fBfile\fP     path    exact match \fIpath\fP, including in archives
> +\fBglob\fP     pattern glob match \fIpattern\fP, including in archives
> +.TE
> +
> +The resulting output will look something like the following
> +{
> +  "results":[
> +    {
> +      "type":"executable",
> +      "buildid":"f0aa15b8aba4f3c28cac3c2a73801fefa644a9f2",
> +      "file":"/usr/local/bin/hello",
> +      "archive":"/opt/elfutils/tests/test-2290642/R/rhel7/hello2-1.0-2.x86_64.rpm"
> +    },
> +    {
> +      "type":"executable",
> +      "buildid":"bc1febfd03ca05e030f0d205f7659db29f8a4b30",
> +      "file":"hello2"
> +    }
> +  ],
> +  "complete":true
> +}'
> +
> +The results of the search are output to \fBstdout\fP as a JSON object
> +containing an array of objects, supplying metadata about each match, as
> +well as a boolean value corresponding to the completeness of the result.
> +The result is considered complete if all of the queries to upstream servers
> +returned complete results and the local query succeeded. This metadata report
> +may be cached.  It may be incomplete and may contain duplicates.
> +Additional JSON object fields may be present.
> +
> +.TS
> +l l l .
> +NAME   TYPE    DESCRIPTION
> +
> +\fBbuildid\fP  string  hexadecimal buildid associated with the file
> +\fBtype\fP     string  one of \fBdebuginfo\fP or \fBexecutable\fP
> +\fBfile\fP     string  matched file name, outside or inside the archive
> +\fBarchive\fP  string  archive containing matched file name, if any
> +.TE
> +
> +It's worth noting that \fBtype\fP cannot be \fBsource\fP since in order
> +to perform such a search fast enough additional indexing would need to be added to
> +the database which would nearly double it's size.
> +
> +The search also always combines both files and archives in the results
> +and at this time further granularity is not availible.
> +
>  .SH "OPTIONS"
>
>  .TP
> diff --git a/doc/debuginfod.8 b/doc/debuginfod.8
> index 577f58b6ee2e..f35ce6c1a9ca 100644
> --- a/doc/debuginfod.8
> +++ b/doc/debuginfod.8
> @@ -132,6 +132,14 @@ scanner/groomer server and multiple passive ones, thereby sharing
>  service load.  Archive pattern options must still be given, so
>  debuginfod can recognize file name extensions for unpacking.
>
> +.TP
> +.B "\-\-metadata\-maxtime=SECONDS"
> +Impose a limit on the runtime of metadata webapi queries.  These
> +queries, especially broad "glob" wildcards, can take a large amount of
> +time and produce large results.  Public-facing servers may need to
> +throttle them.  The default limit is 5 seconds.  Set 0 to disable this
> +limit.
> +
>  .TP
>  .B "\-D SQL" "\-\-ddl=SQL"
>  Execute given sqlite statement after the database is opened and
> @@ -421,6 +429,16 @@ variety of statistics about the operation of the debuginfod server.
>  The exact set of metrics and their meanings may change in future
>  versions.
>
> +.SS /metadata?key=\fIKEY\fP&value=\fIVALUE\fP
> +
> +This endpoint triggers a search of the files in the index plus any
> +upstream federated servers, based on given key and value.  If
> +successful, the result is a application/json textual array, listing
> +metadata for the matched files.  See \fIdebuginfod-find(1)\fP for
> +documentation of the common key/value search parameters, and the
> +resulting data schema.
> +
> +
>  .SH DATA MANAGEMENT
>
>  debuginfod stores its index in an sqlite database in a densely packed

We should add a man page for debuginfod_find_metadata as well.

> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 4547d95de76c..3cc9ded43b6a 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -266,12 +266,13 @@ TESTS += run-debuginfod-dlopen.sh \
>          run-debuginfod-federation-sqlite.sh \
>          run-debuginfod-federation-link.sh \
>           run-debuginfod-percent-escape.sh \
> -         run-debuginfod-x-forwarded-for.sh \
> -         run-debuginfod-response-headers.sh \
> -         run-debuginfod-extraction-passive.sh \
> +        run-debuginfod-x-forwarded-for.sh \
> +        run-debuginfod-response-headers.sh \
> +        run-debuginfod-extraction-passive.sh \
>          run-debuginfod-webapi-concurrency.sh \
>          run-debuginfod-section.sh \
> -        run-debuginfod-IXr.sh
> +        run-debuginfod-IXr.sh \
> +        run-debuginfod-find-metadata.sh
>  endif
>  if !OLD_LIBMICROHTTPD
>  # Will crash on too old libmicrohttpd
> @@ -603,7 +604,8 @@ EXTRA_DIST = run-arextract.sh run-arsymtest.sh run-ar.sh \
>               run-debuginfod-webapi-concurrency.sh \
>              run-debuginfod-section.sh \
>              run-debuginfod-IXr.sh \
> -                run-debuginfod-ima-verification.sh \
> +            run-debuginfod-ima-verification.sh \
> +            run-debuginfod-find-metadata.sh \
>              debuginfod-rpms/fedora30/hello2-1.0-2.src.rpm \
>              debuginfod-rpms/fedora30/hello2-1.0-2.x86_64.rpm \
>              debuginfod-rpms/fedora30/hello2-debuginfo-1.0-2.x86_64.rpm \
> diff --git a/tests/debuginfod-subr.sh b/tests/debuginfod-subr.sh
> index c3b0603ddb2e..000e27708192 100755
> --- a/tests/debuginfod-subr.sh
> +++ b/tests/debuginfod-subr.sh
> @@ -26,6 +26,7 @@ type curl 2>/dev/null || (echo "need curl"; exit 77)
>  type rpm2cpio 2>/dev/null || (echo "need rpm2cpio"; exit 77)
>  type cpio 2>/dev/null || (echo "need cpio"; exit 77)
>  type bzcat 2>/dev/null || (echo "need bzcat"; exit 77)
> +type ss 2>/dev/null || (echo "need ss"; exit 77)
>  bsdtar --version | grep -q zstd && zstd=true || zstd=false
>  echo "zstd=$zstd bsdtar=`bsdtar --version`"
>
> diff --git a/tests/run-debuginfod-find-metadata.sh b/tests/run-debuginfod-find-metadata.sh
> new file mode 100755
> index 000000000000..f19c5a6e6942
> --- /dev/null
> +++ b/tests/run-debuginfod-find-metadata.sh
> @@ -0,0 +1,113 @@
> +#!/usr/bin/env bash
> +#
> +# Copyright (C) 2022 Red Hat, Inc.

This should be 2024.

> +# This file is part of elfutils.
> +#
> +# This file is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# elfutils is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. $srcdir/debuginfod-subr.sh
> +
> +# for test case debugging, uncomment:
> +set -x
> +unset VALGRIND_CMD
> +# VALGRIND_CMD="valgrind --enable-debuginfod=no"
> +
> +type curl 2>/dev/null || { echo "need curl"; exit 77; }
> +type jq 2>/dev/null || { echo "need jq"; exit 77; }
> +
> +pkg-config json-c libcurl || { echo "one or more libraries are missing (libjson-c, libcurl)"; exit 77; }
> +
> +DB=${PWD}/.debuginfod_tmp.sqlite
> +export DEBUGINFOD_CACHE_PATH=${PWD}/.client_cache
> +tempfiles $DB ${DB}_2
> +
> +# This variable is essential and ensures no time-race for claiming ports occurs
> +# set base to a unique multiple of 100 not used in any other 'run-debuginfod-*' test
> +base=13100
> +get_ports
> +mkdir R D
> +cp -rvp ${abs_srcdir}/debuginfod-rpms/rhel7 R
> +cp -rvp ${abs_srcdir}/debuginfod-debs/*deb D
> +
> +env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS= ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -R \
> +    -d $DB -p $PORT1 -t0 -g0 R > vlog$PORT1 2>&1 &
> +PID1=$!
> +tempfiles vlog$PORT1
> +errfiles vlog$PORT1
> +
> +wait_ready $PORT1 'ready' 1
> +wait_ready $PORT1 'thread_work_total{role="traverse"}' 1
> +wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
> +wait_ready $PORT1 'thread_busy{role="scan"}' 0
> +
> +env LD_LIBRARY_PATH=$ldpath DEBUGINFOD_URLS="http://127.0.0.1:$PORT1 https://bad/url.web" ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod $VERBOSE -U \
> +    -d ${DB}_2 -p $PORT2 -t0 -g0 D > vlog$PORT2 2>&1 &
> +PID2=$!
> +tempfiles vlog$PORT2
> +errfiles vlog$PORT2
> +
> +wait_ready $PORT2 'ready' 1
> +wait_ready $PORT2 'thread_work_total{role="traverse"}' 1
> +wait_ready $PORT2 'thread_work_pending{role="scan"}' 0
> +wait_ready $PORT2 'thread_busy{role="scan"}' 0
> +
> +# have clients contact the new server
> +export DEBUGINFOD_URLS=http://127.0.0.1:$PORT2
> +
> +tempfiles json.txt
> +# Check that we find correct number of files, both via local and federated links
> +RESULTJ=`env LD_LIBRARY_PATH=$ldpath ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod-find metadata glob "/u?r/bin/*"`
> +echo $RESULTJ
> +N_FOUND=`echo $RESULTJ | jq '.results | length'`
> +test $N_FOUND -eq 1
> +RESULTJ=`env LD_LIBRARY_PATH=$ldpath ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod-find metadata glob "/usr/lo?al/bin/*"`
> +echo $RESULTJ
> +N_FOUND=`echo $RESULTJ | jq '.results | length'`
> +test $N_FOUND -eq 2
> +
> +
> +# Query via the webapi as well
> +curl http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*'
> +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' | jq '.results[0].buildid == "f17a29b5a25bd4960531d82aa6b07c8abe84fa66"'` = 'true'
> +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' | jq '.results[0].file == "/usr/bin/hithere"'` = 'true'
> +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' | jq '.results[0].archive | test(".*hithere.*deb")'` = 'true'
> +# Note we query the upstream server too, since the downstream will have an incomplete result due to the badurl
> +test `curl -s http://127.0.0.1:$PORT1'/metadata?key=glob&value=/usr/bin/*hi*' | jq '.complete == true'` = 'true'
> +test `curl -s http://127.0.0.1:$PORT2'/metadata?key=glob&value=/usr/bin/*hi*' | jq '.complete == false'` = 'true'
> +
> +# An empty array is returned on server error or if the file DNE
> +RESULTJ=`env LD_LIBRARY_PATH=$ldpath ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod-find metadata file "/this/isnt/there"`
> +echo $RESULTJ
> +test `echo $RESULTJ | jq ".results == [ ]" ` = 'true'
> +
> +kill $PID1
> +kill $PID2
> +wait $PID1
> +wait $PID2
> +PID1=0
> +PID2=0
> +
> +# check it's still in cache
> +RESULTJ=`env LD_LIBRARY_PATH=$ldpath ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod-find metadata file "/usr/bin/hithere"`
> +echo $RESULTJ
> +test `echo $RESULTJ | jq ".results == [ ]" ` = 'true'
> +
> +# invalidate cache, retry previously successful query to now-dead servers
> +echo 0 > $DEBUGINFOD_CACHE_PATH/metadata_retention_s
> +RESULTJ=`env LD_LIBRARY_PATH=$ldpath ${VALGRIND_CMD} ${abs_builddir}/../debuginfod/debuginfod-find metadata glob "/u?r/bin/*"`
> +echo $RESULTJ
> +test `echo $RESULTJ | jq ".results == [ ]" ` = 'true'
> +test `echo $RESULTJ | jq ".complete == false" ` = 'true'
> +
> +exit 0
>

I was experimenting with metadata queries to a local server that indexed
a single rpm (binutils-2.41-8.fc40.x86_64.rpm). The following commands
produced JSON with empty "results":

    debuginfod-find metadata glob '*'
    debuginfod-find metadata glob '/usr*'
    debuginfod-find metadata glob '/usr/bin*'

Using the glob '/usr/bin/*' did produce results with complete metadata.

I haven't looked into the cause but this seems like a bug.  I'd expect
the first 3 globs to return at least as many results as the 4th. If this is
intentional behaviour then I think it should be documented.

Aaron



More information about the Elfutils-devel mailing list