Bug 2679 - getopt and optind (when called with different arguments)
Summary: getopt and optind (when called with different arguments)
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.4
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-21 10:41 UTC by Lorenzo Bettini
Modified: 2019-04-10 08:16 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lorenzo Bettini 2006-05-21 10:41:28 UTC
I found a strange behavior in getopt that raises when getopt (or
getopt_long) is called with new argv and argc w.r.t. the ones used in
a previous invocation (in the same process).

Actually this might seem a strange situation, since you usually pass
the argc and argv passed to the main function.  However, I'm using
getopt_long to parse options that are not always the one passed at
command line: they might come from a configuration file, or might be
stored somewhere else.  

I'm the maintainer of GNU gengetopt that generates command line
parsers, and in general option parsers, that use getopt_long.  Thus a
program, can parse the command line, then a configuration file, and so
getopt_long is called with different arguments (string vectors).
optind is set to 1, each time new arguments are used (as requested by
the documentation).

However, sometimes, in such context, some strange behaviors are
experienced and most of the time also illegal accesses to memory
(reported by valgrind, or segfaults).

Taking a look at the getopt.c I see the following code (part of
_getopt_internal_r function):

  if (d->optind == 0 || !d->__initialized)
    {
      if (d->optind == 0)
	d->optind = 1;	/* Don't scan ARGV[0], the program name.  */
      optstring = _getopt_initialize (argc, argv, optstring, d);
      d->__initialized = 1;
    }

where d is the _getopt_data struct containing also pointers such as
__next_char and argv indexes such as __first_nonopt and __last_nonopt.

Now, these elements are initiliazed only the first time or when optind
== 0.  

That's basically the problem: when getopt_long is called with new
argv, since optind is set to 1, the internal structure is not
initialized again, and then it contains pointers to the previous
vector, resulting in strange behaviors or also illegal memory accesses
if the previous vector has already been deallocated, or if the
previous vector had bigger size than the current one.

I seem to understand that the solution is that optind should be set to
0 before any new use of getopt_long, but this is not documented
anywhere but in the source:

"On entry to `getopt', zero means this is the first call; initialize."

and this does not seem to be standard, since optind should be 1
before any call, as also noted in the getopt.c itself:

/* 1003.2 says this must be 1 before any call.  */

I think that the above check should actually be

  if (d->optind == 0 || d->optind == 1 || !d->__initialized)
    {
      if (d->optind == 0)
	d->optind = 1;	/* Don't scan ARGV[0], the program name.  */
      optstring = _getopt_initialize (argc, argv, optstring, d);
      d->__initialized = 1;
    }

i.e., the initialization should be performed even when optind == 1
since "optind must be 1 before any call".

By users of gengetopt I was reported that by using other
implementation of getopt_long, setting optind = 0 makes also the
program name to be interpreted as an option (since it is in position 0
in argv), which, although odd, it is more obvious since optind "is the
index of the next element of the ARGV array to be processed".

Thus setting optind to 0 before the initial invocations makes GNU
gengetopt generate code that would work only with GNU implementation
of getopt, due to feature that I seem to understand as not standard,
and not documented (and thus are allowed to change in the future,
breaking existing code relying on it)...

or am I missing something?

Otherwise I guess the above proposed modification is correct.
Comment 1 Ulrich Drepper 2006-05-24 21:37:55 UTC
What "other implementations of getopt_long" do is irrelevant.  This is a GNU
extension and whatever others implemented is a derivate.  File bugs with those
implementations.

Is see no problem with the existing code.  Just use optind the way it is
required.  If you think some documentation is missing provide a patch.
Comment 2 Lorenzo Bettini 2006-05-25 07:12:27 UTC
(In reply to comment #1)
> What "other implementations of getopt_long" do is irrelevant.  This is a GNU
> extension and whatever others implemented is a derivate.  File bugs with those
> implementations.
> 
> Is see no problem with the existing code.  Just use optind the way it is
> required.  If you think some documentation is missing provide a patch.

All I'm saying is that, as I understand, the standard requires optind to be 1 at
the beginning; I was wondering why not taking (optind == 1) as the
initialization condition...  Is there a reason why using (optind == 0)?