PATCH: PR27092 debuginfod improve low-memory operations
Mark Wielaard
mark@klomp.org
Fri Feb 5 17:24:47 GMT 2021
Hi Frank,
On Thu, 2021-02-04 at 20:45 -0500, Frank Ch. Eigler via Elfutils-devel wrote:
> PR27092: debuginfod low-memory handling
>
> A couple of closely related pieces of work allow more early warning
> about low storage/memory conditions:
>
> - New prometheus metrics to track filesystem freespace, and more
> details about some errors.
This part looks fine.
Maybe this is a general question about exposing metrics. But does this
help people who want to do denial of service attacks? I mean by seeing
what effect some queries might have on freespace/errors?
Not a showstopper, but maybe something to think about before
encouraging people to expose their metrics?
> - Frequent checking of $TMPDIR freespace, to trigger fdcache
> emergency flushes.
This too looks fine.
> - Switch to floating point prometheus metrics, to communicate
> fractions - and short time intervals - accurately.
You move all metrics to use doubles. Including the metrics that use
integral values using inc_metric (the threads, _total and _count ones).
I cannot tell how that looks on the Prometheus side. Isn't it better to
keep those as integrals? Aka maybe inc_metric shouldn't call through to
add_metric? Or maybe have two metric maps, one for counts and one for
values/durations?
> - Fix startup-time pthread-creation error handling.
OK, so these are now fatal errors, which makes sense. If some of the
threads cannot be created there is not much sense in trying to carry
on.
> Testing is smoke-test-level only as it is hard to create
> free-space-limited $TMPDIRs. Locally tested against tiny through
> medium tmpfs filesystems, with or without sqlite db also there. Shows
> a pleasant stream of diagnostics and metrics during shortage but
> generally does not fail outright. However, catching an actual
> libstdc++- or kernel-level OOM is beyond our ken.
[...]
> +2021-02-04 Frank Ch. Eigler <fche@redhat.com>
> +
> + PR27092 low-memory handling
> + * debuginfod.cxx (fdcache_mintmp): New parameter, with cmd-line option.
> + (parse_opt): Parse it.
> + (main): Default it.
> + (statfs_free_enough_p): New function.
> + (libarchive_fdcache::*): Call it to trigger emergency fdcache flush.
> + (thread_main_scanner): Call it to report filesystem fullness metrics.
> + (groom): Ditto.
> + (set/add_metric): Take double rather than int64_t values.
> + (archive_exception): Propagate suberror to metric label.
> + (main): Detect pthread creation fatal errors properly.
All code changes look fine. Please feel free to push if you think my
questions above are a little silly :) The actual changes seem to do
what you intended.
Thanks,
Mark
More information about the Elfutils-devel
mailing list