This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] resolv: Always set *resplen2 out parameter in send_dg [BZ #19791]


On 03/24/2016 04:52 AM, Carlos O'Donell wrote:
> On 03/23/2016 04:22 PM, Florian Weimer wrote:
>> This is the new patch.
>>
>> It sets *resplen2 strictly before each return (except for the success
>> case).  I have convinced myself that this does not alter any correct
>> cases.  If send_dg does not return 0
> 
> This patch looks good to me, though it took me a while to understand
> the code path that triggers the flaw and for the sake of correctness
> I'm going to write it down below.

We all agree that send_dg needs to be split up into smaller functions,
in a way that simplifies the program logic.

> My concerns are that this change adds complexity and I wonder if there
> isn't a simpler solution to just unconditionally clear *resplen2 if
> resplen2 is non-NULL?

It would reduce the number of a code changes (some of which come from
eliminating a couple of gotos and are not strictly necessary).  However,
I don't see how this approach reduces complexity.  If we move the
initialization under the wait: label, we'll still need to change some
late error returns.  I think resetting *resplen2 should really be part
of any error return, and it's not especially easy to see why particular
error returns do not need it with initialization only under wait: label.

>> I ran the external resolver test suite, apart from a tst-partial-fail
>> failure due to libresolv's newly found ability to find non-broken
>> responses, there are no failures.

(trying to match your answer to the right part of my message)

> I assume the case looks like this:
> 
> - send_dg(): sets *resplen2 in error and returns.
> - __libc_res_nsend(): tries an invalid server (ipv6 with ipv6 disabled)

No, this happens with a valid resolv.conf, too.  The packet that
triggers this is garbage, though.

> - send_dg(): calls reopen() which ignores the server as invalid
>              and returns without clearing *resplen2
> - __libc_res_nsend():
>  546                         if (n == 0 && (buf2 == NULL || *resplen2 == 0))
>  547                                 goto next_ns;
>  Line 546, particularly the *resplen2 == 0 prevents us from switching to the
>  next ns because we believe we have a valid answer in the second response
>  buffer.

Right, thanks for tracking this down.  In this particularly case, we
have n == 0 && *resplen2 == 12 (the 2/-1 SERVFAIL/garbage answer case in
tst-partial-fail).

> It isn't entirely clear to me how you would get an initial failure that
> would set *resplen2 to non-zero in a configuration as described in BZ
> #19791. It seems like you would always fail first and that would mean
> you'd be stuck looping over servers you can't contact.

The problem here is not that *resplen2 is not initialized, it's that we
have an error return (invalid data recognized as such), but we have
received the second response, overwritten *resplen2 with a positive
value, and still return an error (due to the garbage response).

> FYI, it annoys me that we set *resplen2 to 1 here as a `hack` to cause
> the code in __libc_res_nsend to avoid switching to the next nameserver.
> 
> 1185                     *resplen2 = 1;
> 1186                     return resplen;
> 
> We are overloading the meaning of *resplen2 here for our own purposes
> of avoiding switching to the next nameserver, but the only way to fix
> that is to add explicit information about the desire for send_dg to
> switch or not switch to the next nameserver.

Agreed, but that's something for the coming cleanup (along with the
next_ns removal).

> Again, this annoys me that we overload size 1 to prevent ns switching
> from happening, which is something we should be explicit about, but that
> would require better data structures in this code and that's a future change.
> 
> Notice the non-obiouvs trickery with things like "resplen > 1" to avoid the use
> of "1" to mean we stayed on the same nameserver (not to mention the use of
> saved

You mean save_gotsomewhere?

Thanks for your comments.

I'll commit the current patch after some additional tests.

Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]