This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Ignoring failures and altering behavior



On 17/10/2018 07:36, Florian Weimer wrote:
> In many cases, glibc currently papers over resource allocation failures
> by changing behavior.  Some examples:
> 
> * If we cannot load a NSS service module with dlopen for any reason
>   (including malloc/mmap failures), we pretend that the service module
>   does not exist.  NSS will possibly return different data as a result.
> 
> * If we cannot allocate memory for the gconv/iconv path, we will ignore
>   the user configuration and use a built-in path.
> 
> * If we cannot open /etc/host.conf for any reason, we do not report an
>   error and pretend that the file does not exist, possibly altering
>   name resolution results.
> 
> * If we get a read error indicating file system corruption or
>   unavailability for a directory in the dynamic linker, we ignore that
>   directory.
> 
> I'm sure there are many more examples.
> 
> File system errors can be quite tricky.  I used this code in one case to
> tell persistent, possibly intended errors from actual problems:
> 
>   FILE *fp = fopen (_PATH_RESCONF, "rce");
>   if (fp == NULL)
>     switch (errno)
>       {
>       case EACCES:
>       case EISDIR:
>       case ELOOP:
>       case ENOENT:
>       case ENOTDIR:
>       case EPERM:
>         /* Ignore these errors.  They are persistent errors caused
>            by file system contents.  */
>         break;
>       default:
>         /* Other errors refer to resource allocation problems and
>            need to be handled by the application.  */
>         return NULL;
>       }
> 
> But I think there is a larger question here: Should we keep running at
> all cost, possibly giving quite different results, or is it better to
> actually report the errors we encounter and stop?
> 

My wild guess is these kind of errors are usually contingency ones that
are usually taking for granted or handled as intermittent. Do have any
bug reports with such issues?

In any case, not reporting issues to user is not a good policy imho.
It sets the API contract can not fail, where in fact it just changing
to different semantic in case of failure.

GNU guidelines usually do not set hard limits on APIs, so I think is
fair expectation that depending of function usage resource acquisition
may fail. Usually I see giving the user an option to actually handle this 
issues it better than silently ignoring it (we might still use the current 
policy as ignoring certain issues as default semantic).

I also think each error case might require a different answer depending.  
On NSS services load, for instance, one option might to syslog failues 
(as for NIS) and add a config option to always return failure. The
gconv/iconv might be more tricky, since some uses on top of the cache
conf load define their semantics as 'no failure is expected'.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]