When requesting some source files, some URL-inconvenient chars sometimes pop up. Example from f33 libstdc++: /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/condition_variable.cc As this URL is passed into debuginfod's handler_cb, it appears that the + signs are helpfully unescaped to spaces by libmicrohttpd, which 'course breaks everything. We need to suppress this HTTP URL processing step. Also worth checking that %HEX decoding is also suppressed. (.... alternately, we could change the client to urlencode such identifiers .... and change the webapi to spell this out. But I hope that's not necessary.)
RFC3986 appears to mandate %-escaping many characters in order to pass through a textbook URI/URL path string. So formally noting this in the webapi and calling curl_easy_escape() on the outgoing URLs may be the thing to do. https://datatracker.ietf.org/doc/html/rfc3986/#section-2.2 https://curl.se/libcurl/c/curl_easy_escape.html
Could we do both? Start properly %-escaping outgoing URLs from libdebuginfod client (and documenting that this is the correct/expected way to do GET requests). And only do unescaping in libmicrohttpd if the URL contains a %. This assumes that + -> ' ' escaping is normally unwanted (because to proper way is to encode a space as %20). But maybe I am missing some subtlety or maybe we have no way of stopping libmicrohttpd of "unescaping"?
> This assumes that + -> ' ' escaping is normally unwanted (because to proper way > is to encode a space as %20). I can't find it in current RFCs but traditional use of '+' to encode ' ' is still very widespread, esp. in querystrings. And unfortunately libmicrohttpd does the "+" -> " " transform practically unconditionally. So I suspect we have no choice but to %HEX urlencode all RFC3986-"reserved" (or not "unreserved") characters. I'd use %20 for space too.
commit 9ab0c139eebf4ba40ac721224a673e4b66d29cd9 Author: Noah Sanci <nsanci@redhat.com> Date: Fri Jul 16 15:16:20 2021 -0400 debuginfod: PR28034 - client-side %-escape url characters When requesting some source files, some URL-inconvenient chars sometimes pop up. Example from f33 libstdc++: /buildid/44d8485cb75512c2ca5c8f70afbd475cae30af4f/source/usr/src/debug/ gcc-10.3.1-1.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/ libstdc++-v3/src/c++11/../../../../../libstdc++-v3/src/c++11/ condition_variable.cc As this URL is passed into debuginfod's handler_cb, it appears that the + signs are helpfully unescaped to spaces by libmicrohttpd, which 'course breaks everything. In order to ensure the server properly parses urls such as this one, %-escape characters on the client side so that the correct url is preserved and properly processed on the server side. https://sourceware.org/bugzilla/show_bug.cgi?id=28034 Signed-off-by: Noah Sanci <nsanci@redhat.com>
We should probably tweak this to avoid %-escaping the '/' characters, which default apache httpd mod_proxy configurations treat as sus.
(In reply to Frank Ch. Eigler from comment #5) > We should probably tweak this to avoid %-escaping the '/' characters, which > default apache httpd mod_proxy configurations treat as sus. Ah, OK, yes. Reopened for now to get that fixed.
debuginfod: PR28034 - No longer escape '/', and loop efficiency Previously, urls containing '/', so most urls, would escape '/' to %2F, which is undesirable for use in other libraries which may escape differently. This patch escapes the '/' and replaces all of them ensuring there are no %2Fs sent. Some inefficiencies within the code were fixed, such as changing constant operations of a while loop within a for loop to a while loop outside of a for loop. Also strlen is no longer used within the loop, simplifying the interior operations to mere arithmetic.