Bug 28034 - debuginfod server: preserve + etc. elements in incoming webapi url
Summary: debuginfod server: preserve + etc. elements in incoming webapi url
Status: RESOLVED FIXED
Alias: None
Product: elfutils
Classification: Unclassified
Component: debuginfod (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Noah Sanci
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-30 18:37 UTC by Frank Ch. Eigler
Modified: 2021-09-27 18:54 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2021-06-30 18:37:49 UTC
When requesting some source files, some URL-inconvenient chars sometimes pop up.  Example from f33 libstdc++:

   /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/condition_variable.cc

As this URL is passed into debuginfod's handler_cb, it appears that the + signs are helpfully unescaped to spaces by libmicrohttpd, which 'course breaks everything.  We need to suppress this HTTP URL processing step.  Also worth checking that %HEX decoding is also suppressed.

(.... alternately, we could change the client to urlencode such identifiers .... and change the webapi to spell this out.  But I hope that's not necessary.)
Comment 1 Frank Ch. Eigler 2021-06-30 20:00:16 UTC
RFC3986 appears to mandate %-escaping many characters in order to pass through a textbook URI/URL path string.  So formally noting this in the webapi and calling curl_easy_escape() on the outgoing URLs may be the thing to do.

https://datatracker.ietf.org/doc/html/rfc3986/#section-2.2
https://curl.se/libcurl/c/curl_easy_escape.html
Comment 2 Mark Wielaard 2021-07-07 12:29:11 UTC
Could we do both?

Start properly %-escaping outgoing URLs from libdebuginfod client (and documenting that this is the correct/expected way to do GET requests).

And only do unescaping in libmicrohttpd if the URL contains a %.

This assumes that + -> ' ' escaping is normally unwanted (because to proper way is to encode a space as %20).

But maybe I am missing some subtlety or maybe we have no way of stopping libmicrohttpd of "unescaping"?
Comment 3 Frank Ch. Eigler 2021-07-07 13:45:32 UTC
> This assumes that + -> ' ' escaping is normally unwanted (because to proper way > is to encode a space as %20).

I can't find it in current RFCs but traditional use of '+' to encode ' ' is still very widespread, esp. in querystrings.  And unfortunately libmicrohttpd does the "+" -> " " transform practically unconditionally.

So I suspect we have no choice but to %HEX urlencode all RFC3986-"reserved" (or not "unreserved") characters.  I'd use %20 for space too.
Comment 4 Mark Wielaard 2021-08-25 12:33:11 UTC
commit 9ab0c139eebf4ba40ac721224a673e4b66d29cd9
Author: Noah Sanci <nsanci@redhat.com>
Date:   Fri Jul 16 15:16:20 2021 -0400

    debuginfod: PR28034 - client-side %-escape url characters
    
    When requesting some source files, some URL-inconvenient chars
    sometimes pop up.  Example from f33 libstdc++:
    /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/
    gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/
    libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/
    condition_variable.cc
    As this URL is passed into debuginfod's handler_cb, it appears that the
    + signs are helpfully unescaped to spaces by libmicrohttpd, which
    'course breaks everything.
    In order to ensure the server properly parses urls such as this one,
    %-escape characters on the client side so that the correct url
    is preserved and properly processed on the server side.
    
    https://sourceware.org/bugzilla/show_bug.cgi?id=28034
    
    Signed-off-by: Noah Sanci <nsanci@redhat.com>
Comment 5 Frank Ch. Eigler 2021-08-25 12:34:41 UTC
We should probably tweak this to avoid %-escaping the '/' characters, which default apache httpd mod_proxy configurations treat as sus.
Comment 6 Mark Wielaard 2021-08-25 12:43:38 UTC
(In reply to Frank Ch. Eigler from comment #5)
> We should probably tweak this to avoid %-escaping the '/' characters, which
> default apache httpd mod_proxy configurations treat as sus.

Ah, OK, yes. Reopened for now to get that fixed.
Comment 7 Noah Sanci 2021-09-27 18:54:19 UTC
debuginfod: PR28034 - No longer escape '/', and loop efficiency
    
Previously, urls containing '/', so most urls, would escape '/' to %2F,
which is undesirable for use in other libraries which may escape
differently. This patch escapes the '/' and replaces all of them
ensuring there are no %2Fs sent.
Some inefficiencies within the code were fixed, such as changing constant
operations of a while loop within a for loop to a while loop outside of
a for loop. Also strlen is no longer used within the loop, simplifying
the interior operations to mere arithmetic.