This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug debuginfod/25394] groom vs. scan race condition


https://sourceware.org/bugzilla/show_bug.cgi?id=25394

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mark at klomp dot org

--- Comment #2 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Frank Ch. Eigler from comment #1)
> commit 34e67018914cf9ebbef07065965755b6554fd66e
> let's try to put out of our minds the four subsequent cleanup patches

Thanks!
And the cleanups are important. No worries, that is what the buildbots are for.

In this case it was actually a couple of separate things:

- On some arches the size and signedness of time_t is different.
  Leading to some fixes/casts when doing (long) time calculations:

commit 91b7beaef91b60fbde13dadf86091e57c8245008
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 14:44:15 2020 -0500

    PR25394 followup: debuginfod casting fixes

    Buildbot reports type warnings in time_t arithmetic.
    Explicit (long) cast pushed as obvious.

commit 09d76c1dd5e45c5512db997e52234dd2ddab8c2d
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 14:44:15 2020 -0500

    PR25394 followup#2: debuginfod casting fixes

    Buildbot still reports type warnings in time_t arithmetic.
    Explicit (long)er cast pushed as obvious ... or is it? :-)

- On the buildbot workers /usr/sbin might not have been in the PATH
  (when the worker was (re)started through cron)
  This caused the ss binary not to be found.
  Fixed in on the workers.

- last_rescan timestamp was lost:

commit c351734f4feff176b3e0ca8fbbc8353053c3ab6d
Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 15:37:33 2020 -0500

    PR25394 cont'd: debuginfod timing fix for fts-traversal thread

    The new code neglected to set the last_rescan timestamp, leading
    to overly frequent rescanning.

- As mentioned in the original commit acting on the USR1 signal might be
delayed a bit now. But the testcase actually depended on the current immediate
timing.

Author: Frank Ch. Eigler <fche@redhat.com>
Date:   Mon Jan 20 15:37:33 2020 -0500

    PR25394 cont'd: debuginfod testsuite fix for -USR1 timing

    If a SIGUSR1 is sent before the initial traversal, it no longer
    results in an extra traversal.  That's a sensible effect.  The
    test case just needs to wait before the kill -USR1.

Just adding a wait_ready $PORT1 'thread_work_total{role="traverse"}' 1 in the
testcase fixes that.

It is actually amazing that the full testsuite was GREEN on our local setups.
Thanks buildbot for having weird arches and timings :)

P.S. Please don't forget the signed-off-by line on your commits.
See
https://sourceware.org/git/?p=elfutils.git;a=blob_plain;f=CONTRIBUTING;hb=HEAD

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]