This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Ignoring failures and altering behavior



On 17/10/2018 11:03, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>> But I think there is a larger question here: Should we keep running at
>>> all cost, possibly giving quite different results, or is it better to
>>> actually report the errors we encounter and stop?
> 
>> My wild guess is these kind of errors are usually contingency ones that
>> are usually taking for granted or handled as intermittent. Do have any
>> bug reports with such issues?
> 
> I extracted this from a public Google bug report:
> 
>   <https://sourceware.org/bugzilla/show_bug.cgi?id=22041>
> 
> Based on the information they provided, it matches the failure more
> closely than the bug they fixed with the patches they posted.
> 
> I need to think about it for a bit, but I don't immediately recall that
> I have encountered the issue myself.
>   
>> In any case, not reporting issues to user is not a good policy imho.
>> It sets the API contract can not fail, where in fact it just changing
>> to different semantic in case of failure.
> 
> The downside could be that if you have an unreachable (NFS) directory on
> your ld.so search path, you can't launch any programs anymore.  Or if
> /etc/nsswitch.conf is corrupted (leading to EIO errors etc. when reading
> it), you can no longer log in over SSH.  So in some cases, the right
> choice could be tough.
> 
> But in general, my feeling is that we paper over far too many errors.

I think *silently* changing the API semantic should be avoided. Either
we document it is default behaviour and log it (even if it is only on
debug mode) or change it to be a default option with a possibility to
assert or return an error in failure case.  Both examples you cited 
show how difficult it can be to a system administrator to debug such
failures without direct errors indications.

> 
>> GNU guidelines usually do not set hard limits on APIs, so I think is
>> fair expectation that depending of function usage resource acquisition
>> may fail. Usually I see giving the user an option to actually handle this 
>> issues it better than silently ignoring it (we might still use the current 
>> policy as ignoring certain issues as default semantic).
> 
> Sure, but if we are too picky, then the user might not see anything
> because the system does not boot. 8-)

Yes, but usually I think this in a indication of fragile setup of system
defaults and/or organization. Using the setup of unreachable NFS directory, 
if a system administrator is relying on such configuration it is expected it
might fail due a myriad of issues. Best course of action, IMHO, is to at
least give an easier way to *debug* it.

> 
>> I also think each error case might require a different answer depending.  
>> On NSS services load, for instance, one option might to syslog failues 
>> (as for NIS) and add a config option to always return failure. The
>> gconv/iconv might be more tricky, since some uses on top of the cache
>> conf load define their semantics as 'no failure is expected'.
> 
> For gconv/iconv, these appear to be bugs.  We need to fix fwide to be
> able to return errors.  At least POSIX clearly describes how to
> communicate errors to the caller.  For everything else, we already have
> a clear way to report errors, I think.  Downwards from gconv/iconv, the
> issue is dlopen, of course, where we cannot tell a resource allocation
> failure from a missing DSO (as in the NSS bug mentioned above).
> 
> Thanks,
> Florian
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]