Indefinite hang in getaddrinfo / check_pf / make_request
Steven Schlansker
stevenschlansker@gmail.com
Tue Sep 29 22:05:00 GMT 2015
On Sep 24, 2015, at 11:36 AM, Steven Schlansker <stevenschlansker@gmail.com> wrote:
>
> On Sep 22, 2015, at 9:59 PM, Steven Schlansker <stevenschlansker@gmail.com> wrote:
>
>>
>>> On Sep 22, 2015, at 9:04 PM, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
>>>
>>> On Tue, Sep 22, 2015 at 8:53 PM, Steven Schlansker
>>> <stevenschlansker@gmail.com> wrote:
>>>
>>>> We found the following issue:
>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=15946
>>>
>>> You may be seeing https://sourceware.org/bugzilla/show_bug.cgi?id=12926 instead.
>>>
>>> See if that patch has been applied to your sources as well.
>>
>> Thanks for finding this. While that fix is not applied to our deployed version,
>> I think the symptoms are slightly different
>
> Thanks Paul and Adhemerval for the advice. I believe I have evidence that this is
> not the same issue as either 15946 or 12926.
> ...
>
> I am going to spend some time trying to distill down a test case that just exercises the check_pf code and see if I can reproduce in isolation.
> In the meantime, does anyone have any ideas for further diagnostics that would be useful? I'm not sure how to check the kernel side of the netlink socket effectively,
> to see if it actually tried to reply or not...
Hello again, in case anyone stumbles across this in the future --
I got a test case, and narrowed it down further. It seems to be related
to incorrect kernel handling of the netlink sockets; under contention
they can get lost:
https://lkml.org/lkml/2015/9/24/712
Kernel 4.0.4 is known to be affected. We're testing out 4.0.9
in the hopes it is not. So this is in fact a new bug, albeit
not a glibc bug.
Thank you for your time.
More information about the Libc-help
mailing list