[Bug debuginfod/27277] Describe retrieved files when verbose

Wed Sep 8 15:01:38 GMT 2021

Hi Noah,

On Wed, 2021-08-25 at 14:08 -0400, Noah Sanci via Elfutils-devel wrote:
> There appear to exist use cases that intend to simply check for the
> existence of content in a debuginfod server, without actually
> downloading it.  In HTTP land, the HEAD operation is the natural
> expression of this.
> Instead of implementing a HEAD/describe option, allow users, with
> enough
> verbosity, to print the HTTP response headers upon retrieving a file.
> E.g output:
> 
> HTTP/1.1 200 OK
> Connection: Keep-Alive
> Content-Length: 2428240
> Cache-Control: public
> Last-Modified: Sat, 15 May 2021 20:49:51 GMT
> Content-Type: application/octet-stream
> Date: Tue, 03 Aug 2021 18:50:36 GMT
> 
> Note: the new test is mostly compatible with the nsanci/test-fix
> commit. When nsanci/test-fix merges with master I can resubmit the
> patch with the test properly rebased and adjusted for maximum
> efficiency.

Please do resubmit the test now that we have separate debuginfod tests.

> The HEAD functionality will (likely) be put into a new PR for
> existing checking.

Thanks. That is https://sourceware.org/bugzilla/show_bug.cgi?id=28284

> From 6351258d707337b69563d4be8effbb30fc42f784 Mon Sep 17 00:00:00
> 2001
> From: Noah Sanci <nsanci@redhat.com>
> Date: Wed, 28 Jul 2021 14:46:05 -0400
> Subject: [PATCH] debuginfod: PR27277 - Describe retrieved files when
> verbose
> 
> There appear to exist use cases that intend to simply check for the
> existence of content in a debuginfod server, without actually
> downloading it.  In HTTP land, the HEAD operation is the natural
> expression of this.
> Instead of implementing a HEAD/describe option,

Maybe drop this first part, it describes what is not implemented in
this patch.

>  allow users, with enough
> verbosity, to print the HTTP response headers upon retrieving a file.
> E.g output:
> 
> HTTP/1.1 200 OK
> Connection: Keep-Alive
> Content-Length: 2428240
> Cache-Control: public
> Last-Modified: Sat, 15 May 2021 20:49:51 GMT
> Content-Type: application/octet-stream
> Date: Tue, 03 Aug 2021 18:50:36 GMT
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=27277
> 
> Signed-off-by: Noah Sanci <nsanci@redhat.com>
> [...]
> diff --git a/debuginfod/ChangeLog b/debuginfod/ChangeLog
> index 530f7dc7..cb9e50c7 100644
> --- a/debuginfod/ChangeLog
> +++ b/debuginfod/ChangeLog
> @@ -4,6 +4,17 @@
>  	* debuginfod.cxx (handler_cb): Fix after_you unique_set key
>  	to the entire incoming URL.
>  
> +2021-08-02  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-client.c (struct debuginfod_client): New field 
> +	winning_headers.

It seems technically we don't need winning_headers at this point. But
it will be useful when you implement the other PR.

> +	(struct handle_data): New field response_data.
> +	(header_callback): Store received headers in response_data.
> +	(debuginfod_query_server): Activate CURLOPT_HEADERFUNCTION.
> +	Save winning response_data.
> +	(debuginfod_end): free client winning headers.
> +
> [...]
> +static size_t
> +header_callback (char * buffer, size_t size, size_t numitems, void * userdata)
> +{
> +  if (size != 1)
> +    return 0;
> +  /* Temporary buffer for realloc */
> +  char *temp = NULL;
> +  size_t userlen = 0;
> +  if (*(char**)userdata == NULL)
> +    {
> +      temp = malloc(numitems+1);
> +      if (temp == NULL)
> +        return 0;
> +      memset(temp, '\0', numitems+1);
> +    }
> +  else
> +    {
> +      userlen = strlen(*(char**)userdata);
> +      temp = realloc(*(char**)userdata, userlen + numitems + 1);
> +      if (temp == NULL)
> +       return 0;
> +    }
> +  strncat(temp, buffer, numitems);
> +  *(char**)userdata = temp;
> +  return numitems;
> +}

I am slightly confused about this function.
Is the idea that this is given (part of) the headers and allocates
memory for userdata to store the parts it wants/needs?

If so, do we need to filter the headers here for those that interest
us, or is that simply all headers?

Do we need to protect against using too much memory?
Currently we keep reallocing till we run out of memory, maybe set some
predetermined limit? (64K of headers should be enough for anybody...)

Why do you clear the temp memory the first time, but don't clear the
end of the realloc data? Is the clearing of the whole block necessary
given that the temp data is overwritten immediately with the buffer
contents?

>  /* Query each of the server URLs found in $DEBUGINFOD_URLS for the file
>     with the specified build-id, type (debuginfo, executable or source)
>     and filename. filename may be NULL. If found, return a file
> @@ -936,10 +966,13 @@ debuginfod_query_server (debuginfod_client *c,
>  	  curl_easy_setopt (data[i].handle, CURLOPT_LOW_SPEED_LIMIT,
>  			    100 * 1024L);
>  	}
> +      data[i].response_data = NULL;
>        curl_easy_setopt(data[i].handle, CURLOPT_FILETIME, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_FOLLOWLOCATION, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_FAILONERROR, (long) 1);
>        curl_easy_setopt(data[i].handle, CURLOPT_NOSIGNAL, (long) 1);
> +      curl_easy_setopt(data[i].handle, CURLOPT_HEADERFUNCTION, header_callback);
> +      curl_easy_setopt(data[i].handle, CURLOPT_HEADERDATA, (void *) &(data[i].response_data));

Aha, ok, so this is how the headers are gotten and finally assigned to
response_data..

> @@ -1161,10 +1205,10 @@ debuginfod_query_server (debuginfod_client *c,
>                  {
>                    char *effective_url = NULL;
>                    long resp_code = 500;
> -                  CURLcode ok1 = curl_easy_getinfo (target_handle,
> +                  CURLcode ok1 = curl_easy_getinfo (msg->easy_handle,
>  						    CURLINFO_EFFECTIVE_URL,
>  						    &effective_url);
> -                  CURLcode ok2 = curl_easy_getinfo (target_handle,
> +                  CURLcode ok2 = curl_easy_getinfo (msg->easy_handle,
>  						    CURLINFO_RESPONSE_CODE,
>  						    &resp_code);
>                    if(ok1 == CURLE_OK && ok2 == CURLE_OK && effective_url)

Why this change?

> +2021-08-04  Noah Sanci  <nsanci@redhat.com>
> +
> +	PR27277
> +	* debuginfod-find.1: Increasing verbosity describes the downloaded
> +	file.

Thanks for updating the docs.

I haven't reviewed the test case yet.

Cheers,

Mark