This is the mail archive of the
mailing list for the glibc project.
Re: Ignoring failures and altering behavior
* Adhemerval Zanella:
>> But I think there is a larger question here: Should we keep running at
>> all cost, possibly giving quite different results, or is it better to
>> actually report the errors we encounter and stop?
> My wild guess is these kind of errors are usually contingency ones that
> are usually taking for granted or handled as intermittent. Do have any
> bug reports with such issues?
I extracted this from a public Google bug report:
Based on the information they provided, it matches the failure more
closely than the bug they fixed with the patches they posted.
I need to think about it for a bit, but I don't immediately recall that
I have encountered the issue myself.
> In any case, not reporting issues to user is not a good policy imho.
> It sets the API contract can not fail, where in fact it just changing
> to different semantic in case of failure.
The downside could be that if you have an unreachable (NFS) directory on
your ld.so search path, you can't launch any programs anymore. Or if
/etc/nsswitch.conf is corrupted (leading to EIO errors etc. when reading
it), you can no longer log in over SSH. So in some cases, the right
choice could be tough.
But in general, my feeling is that we paper over far too many errors.
> GNU guidelines usually do not set hard limits on APIs, so I think is
> fair expectation that depending of function usage resource acquisition
> may fail. Usually I see giving the user an option to actually handle this
> issues it better than silently ignoring it (we might still use the current
> policy as ignoring certain issues as default semantic).
Sure, but if we are too picky, then the user might not see anything
because the system does not boot. 8-)
> I also think each error case might require a different answer depending.
> On NSS services load, for instance, one option might to syslog failues
> (as for NIS) and add a config option to always return failure. The
> gconv/iconv might be more tricky, since some uses on top of the cache
> conf load define their semantics as 'no failure is expected'.
For gconv/iconv, these appear to be bugs. We need to fix fwide to be
able to return errors. At least POSIX clearly describes how to
communicate errors to the caller. For everything else, we already have
a clear way to report errors, I think. Downwards from gconv/iconv, the
issue is dlopen, of course, where we cannot tell a resource allocation
failure from a missing DSO (as in the NSS bug mentioned above).