Bug 5786 - sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23
Summary: sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: unspecified
: P2 minor
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-22 15:38 UTC by Michael Kerrisk
Modified: 2014-07-02 07:03 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Kerrisk 2008-02-22 15:38:33 UTC
In Linux 2.6.23, the limit on the size of [argv + environ] became controllable
by the user, based on the RLIMIT_STACK resource limit.  (See the man page
excerpt below.)  Glibc doesn't seem to know this yet -- I'm not even sure
whether it can be made to know it.  Anyway, as things stand, the return value of
sysconf(_SC_ARG_MAX) is no longer accurate (unless I've missed something), and
this report is a heads up on that point.

From execve.2

   Limits on size of arguments and environment
       Most Unix implementations impose some limit on the  total
       size  of the command-line argument (argv) and environment
       (envp) strings that may  be  passed  to  a  new  program.
       POSIX.1  allows an implementation to advertise this limit
       using the ARG_MAX constant (either defined in  <limits.h>
       or    available    at    run    time   using   the   call
       sysconf(_SC_ARG_MAX)).

       On Linux prior to kernel 2.6.23, the memory used to store
       the  environment  and  argument strings was limited to 32
       pages (defined by the kernel constant MAX_ARG_PAGES).  On
       architectures  with a 4-kB page size, this yields a maxi-
       mum size of 128 kB.

       On kernel 2.6.23 and later, most architectures support  a
       size  limit  derived  from the soft RLIMIT_STACK resource
       limit (see getrlimit(2)) that is in force at the time  of
       the  execve()  call.   For these architectures, the total
       size is limited to 1/4 of the  allowed  stack  size,  the
       limit  per  string  is  32  pages  (the  kernel  constant
       MAX_ARG_STRLEN), and the maximum  number  of  strings  is
       0x7FFFFFFF.   (This change allows programs to have a much
       larger argument and/or environment  list.   Imposing  the
       1/4-limit  ensures  that  the new program always has some
       stack space.)  Architectures with  no  memory  management
       unit  are  excepted:  they maintain the limit that was in
       effect before kernel 2.6.23.
Comment 1 Carlos O'Donell 2008-02-23 00:28:11 UTC
Interesting change, this is nice for programs that don't use libiberty's @file
support.

This change makes _SC_ARG_MAX variable over the life of the program, simialr to
_SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will
need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return
different values before and after a call to..."

We can immediately add support for calling getrlimit to compute the result of
sysconf(_SC_ARG_MAX), conditional on `__LINUX_KERNEL_VERSION >= 0x020617' (>=
2.6.23) i.e. the minimum kernel version supported by this glibc is 2.6.23.
Otherwise sysconf(_SC_ARM_MAX) must continue to return ARG_MAX, less than
accurate, but still correct.

The alternative is to add a new RLIMIT_* resource. Glibc may call getrlimit to
see if that is set (the kernel would take care to compute the right value),
return that for sysconf(_SC_ARG_MAX), otherwise ARG_MAX. This code would be
enabled if you were building against headers that defined the new RLIMIT_* resource.

What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that 
RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it
doesn't honour providing ARG_MAX space.

Are you interested in helping implement this change in glibc?

Are you working with someone on the kernel side?
Comment 2 michael.kerrisk@googlemail.com 2008-02-23 07:52:08 UTC
Subject: Re:  sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23

On 23 Feb 2008 00:28:12 -0000, carlos at codesourcery dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
>  ------- Additional Comments From carlos at codesourcery dot com  2008-02-23 00:28 -------
>  Interesting change, this is nice for programs that don't use libiberty's @file
>  support.
>
>  This change makes _SC_ARG_MAX variable over the life of the program, simialr to
>  _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will
>  need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return
>  different values before and after a call to..."

I don't think this is true.  Please read the text that I wrote for the
man page.  The limit is determined by the RLIMIT_STACK value that is
in force **at the time of the execve()**.  Thereafter, it is
invariant.

>  We can immediately add support for calling getrlimit to compute the result of
>  sysconf(_SC_ARG_MAX), conditional on `__LINUX_KERNEL_VERSION >= 0x020617' (>=
>  2.6.23) i.e. the minimum kernel version supported by this glibc is 2.6.23.
>  Otherwise sysconf(_SC_ARM_MAX) must continue to return ARG_MAX, less than
>  accurate, but still correct.
>
>  The alternative is to add a new RLIMIT_* resource. Glibc may call getrlimit to
>  see if that is set (the kernel would take care to compute the right value),
>  return that for sysconf(_SC_ARG_MAX), otherwise ARG_MAX. This code would be
>  enabled if you were building against headers that defined the new RLIMIT_* resource.
>
>  What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that
>  RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it
>  doesn't honour providing ARG_MAX space.

POSIX.1 says ARG_MAX must only be at least 4096.  That's all the
kernel must honour.  I haven't actually checked whether it does honour
that though.

>  Are you interested in helping implement this change in glibc?
>
>  Are you working with someone on the kernel side?

I'm the man-pages maintainer.  While I'd like to help, three weeks ago
I became a father, and will have very few available cycles for the
next 6 weeks or more.  What I do have will be entirely given over to
man pages.  From April or so, I'd have time to help -- but I'd guess
you want to do things faster.

Cheers,

Michael
Comment 3 Carlos O'Donell 2008-02-25 17:13:35 UTC
Subject: Re:  sysconf(_SC_ARG_MAX) no longer accurate since
 Linux kernel 2.6.23

michael dot kerrisk at googlemail dot com wrote:
>>  This change makes _SC_ARG_MAX variable over the life of the program, simialr to
>>  _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will
>>  need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return
>>  different values before and after a call to..."
> 
> I don't think this is true.  Please read the text that I wrote for the
> man page.  The limit is determined by the RLIMIT_STACK value that is
> in force **at the time of the execve()**.  Thereafter, it is
> invariant.

The standard requires that the return value of sysconf(_SC_ARG_MAX) 
remain invariant over the lifetime of the calling process, and execve 
doesn't make a new process, instead it overlays a new process image. 
Note that the pid and resource limits are inherited.

Consider the following scenario:

1. At startup the application calls sysconf(_SC_ARG_MAX) to compute how 
many arguments it may pass to execve.

2. The application, in the course of running, calls setrlimit with a 
lower RLIMIT_STACK.

3. The application calls execve.

Expected behaviour:
- Application has atleast sysconf(_SC_ARG_MAX) space to pass argv and 
envp to the execve.

New behaviour:
- There may not be enough room to pass those parameters?

If we allow the value to change over the lifetime of a process then the 
wording of the standard should be updated.

>>  What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that
>>  RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it
>>  doesn't honour providing ARG_MAX space.
> 
> POSIX.1 says ARG_MAX must only be at least 4096.  That's all the
> kernel must honour.  I haven't actually checked whether it does honour
> that though.

That is not all the kernel must honour. The value returned by 
sysconf(_SC_ARG_MAX) shall not be more restrictive than whatever value 
_ARG_MAX had at compile time.

Kernel implementation:

- The kernel does not provide an initial minimum of _ARG_MAX space, see 
fs/exec.c (__bprm_mm_init) where "vma->vm_start = vma->vm_end - 
PAGE_SIZE;" is set. The kernel provides an initial PAGE_SIZE block 
regardless of RLIMIT_STACK, unfortunately this is not enough space.

- The kernel does not maintain a minimum of _ARG_MAX space, see 
fs/exec.c (get_arg_page) where "size > rlim[RLIMIT_STACK].rlim_cur / 4" 
is checked. The kernel should maintain a minimum of _ARG_MAX space.

IMO these are kernel bugs in 2.6.23. Filed.
http://bugzilla.kernel.org/show_bug.cgi?id=10095

In summary:

The kernel should use the value of _ARG_MAX, as defined at kernel 
compile time, as the per-process minimum number of bytes allocated for 
argv and envp, regardless of the RLIMIT_STACK value.

The specification should be changed to indicate that calls to 
setrlimit(RLIMIT_STACK, ...) may change the returned value of 
sysconf(_SC_ARG_MAX).

Add a new resource for getrlimit called "RLIMIT_ARG_MAX" and implement 
this in the kernel to return the value used by the kernel (This will 
likely return "current->signal->rlim[RLIMIT_STACK].rlim_cur / 4".

Glibc will return getrlimit(RLIMIT_ARG_MAX,...) if it is available or 
_ARG_MAX as the return value for sysconf(_SC_ARG_MAX).

Comments?
Comment 4 michael.kerrisk@googlemail.com 2008-02-26 09:53:37 UTC
Subject: Re:  sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23

On 25 Feb 2008 17:13:36 -0000, carlos at codesourcery dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
>  ------- Additional Comments From carlos at codesourcery dot com  2008-02-25 17:13 -------
>
> Subject: Re:  sysconf(_SC_ARG_MAX) no longer accurate since
>   Linux kernel 2.6.23
>
>
> michael dot kerrisk at googlemail dot com wrote:
>  >>  This change makes _SC_ARG_MAX variable over the life of the program, simialr to
>  >>  _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will
>  >>  need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return
>  >>  different values before and after a call to..."
>  >
>  > I don't think this is true.  Please read the text that I wrote for the
>  > man page.  The limit is determined by the RLIMIT_STACK value that is
>  > in force **at the time of the execve()**.  Thereafter, it is
>  > invariant.
>
>  The standard requires that the return value of sysconf(_SC_ARG_MAX)
>  remain invariant over the lifetime of the calling process, and execve
>  doesn't make a new process, instead it overlays a new process image.

Doh!  Yes, of course you are right!

>  Note that the pid and resource limits are inherited.
>
>  Consider the following scenario:
>
>  1. At startup the application calls sysconf(_SC_ARG_MAX) to compute how
>  many arguments it may pass to execve.
>
>  2. The application, in the course of running, calls setrlimit with a
>  lower RLIMIT_STACK.
>
>  3. The application calls execve.
>
>  Expected behaviour:
>  - Application has atleast sysconf(_SC_ARG_MAX) space to pass argv and
>  envp to the execve.
>
>  New behaviour:
>  - There may not be enough room to pass those parameters?

Agreed.

>  If we allow the value to change over the lifetime of a process then the
>  wording of the standard should be updated.

Well, I suppose it could be worth trying to se whetehr that change
would make it through the standards process/

>  >>  What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that
>  >>  RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it
>  >>  doesn't honour providing ARG_MAX space.
>  >
>  > POSIX.1 says ARG_MAX must only be at least 4096.  That's all the
>  > kernel must honour.  I haven't actually checked whether it does honour
>  > that though.
>
>  That is not all the kernel must honour. The value returned by
>  sysconf(_SC_ARG_MAX) shall not be more restrictive than whatever value
>  _ARG_MAX had at compile time.
>
>  Kernel implementation:
>
>  - The kernel does not provide an initial minimum of _ARG_MAX space, see
>  fs/exec.c (__bprm_mm_init) where "vma->vm_start = vma->vm_end -
>  PAGE_SIZE;" is set. The kernel provides an initial PAGE_SIZE block
>  regardless of RLIMIT_STACK, unfortunately this is not enough space.

Yes, but I'm not sure that we can say that the kernel is advertising a
particular value for ARG_MAX.  Yes, there is a definition in
include/linux/limits.h, but it was never used in the kernel sources as
far as I can see.  Being weaselly, I believe the header file could
equally be amended to say

#define ARG_MAX 4096

>  - The kernel does not maintain a minimum of _ARG_MAX space, see
>  fs/exec.c (get_arg_page) where "size > rlim[RLIMIT_STACK].rlim_cur / 4"
>  is checked. The kernel should maintain a minimum of _ARG_MAX space.
>
>  IMO these are kernel bugs in 2.6.23. Filed.
>  http://bugzilla.kernel.org/show_bug.cgi?id=10095

Ahh -- only just read that now.   I see Peter saying some of the same
things as me, but I don't know that I agree with all he says.

>  In summary:
>
>  The kernel should use the value of _ARG_MAX, as defined at kernel
>  compile time, as the per-process minimum number of bytes allocated for
>  argv and envp, regardless of the RLIMIT_STACK value.

As I say, the kernel folk could just redefine ARG_MAX as 4096.

>  The specification should be changed to indicate that calls to
>  setrlimit(RLIMIT_STACK, ...) may change the returned value of
>  sysconf(_SC_ARG_MAX).

As I think about this more, it seems ugly.  The real problem is that
RLIMIT_STACK should probably not have been overloaded to also e used
for controlling ARG_MAX.  That's a bit of a hack, and I'd suspect that
the POSIX folks would (rightly) reject it.

>  Add a new resource for getrlimit called "RLIMIT_ARG_MAX" and implement
>  this in the kernel to return the value used by the kernel (This will
>  likely return "current->signal->rlim[RLIMIT_STACK].rlim_cur / 4".

Is your meaning here, that the RLIMIT_ARG_MAX limit would be
read-only, returning a value based on RLIMIT_STACK?  That is not
consistent with the semantics of other rlimits.

Cheers,

Michael

>  Glibc will return getrlimit(RLIMIT_ARG_MAX,...) if it is available or
>  _ARG_MAX as the return value for sysconf(_SC_ARG_MAX).
>
>  Comments?
>
>
>
>
>  --
>
>
>  http://sourceware.org/bugzilla/show_bug.cgi?id=5786
>
>  ------- You are receiving this mail because: -------
>  You reported the bug, or are watching the reporter.
>
Comment 5 Carlos O'Donell 2008-02-26 13:57:15 UTC
I think we are in agreement here:

A. It is worthwhile to recommend a change to POSIX.1, making note that ARG_MAX
is now variable. The exact wording of the change is up for discussion.

Let me clarify the following issues:

1. The kernel must not lower the value of ARG_MAX in include/linux/limits.h.
This would break binary compatibility.

2. I would propose that RLIMIT_ARG_MAX be a read and write value. How the kernel
implements this does not have to be discussed here.

3. glibc would use getrlimit(RLIMIT_ARG_MAX, &lim); to determine if the
currently running kernel supports a variable size of argument and environ space. 

Notes:
- Without (2) and (3) userspace lacks a programmatic way to determine the [argv
+ environ] space limit. Userspace could still probe the size by repeatedly
calling execve and looking for E2BIG errors, unfortunately there are performance
considerations.
Comment 6 michael.kerrisk@googlemail.com 2008-02-26 14:33:08 UTC
Subject: Re:  sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23

On 26 Feb 2008 13:57:17 -0000, carlos at codesourcery dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
>  ------- Additional Comments From carlos at codesourcery dot com  2008-02-26 13:57 -------
>  I think we are in agreement here:
>
>  A. It is worthwhile to recommend a change to POSIX.1, making note that ARG_MAX
>  is now variable. The exact wording of the change is up for discussion.
>
>  Let me clarify the following issues:
>
>  1. The kernel must not lower the value of ARG_MAX in include/linux/limits.h.
>  This would break binary compatibility.

I'm inclined to agree.

>  2. I would propose that RLIMIT_ARG_MAX be a read and write value. How the kernel
>  implements this does not have to be discussed here.

Sounds fine.  The only possible object would be that we are changing
the ABI that was put in place in 2.6.23.  But I'm not sure how much
that really matters.

>  3. glibc would use getrlimit(RLIMIT_ARG_MAX, &lim); to determine if the
>  currently running kernel supports a variable size of argument and environ space.

Sounds okay.

>  Notes:
>  - Without (2) and (3) userspace lacks a programmatic way to determine the [argv
>  + environ] space limit. Userspace could still probe the size by repeatedly
>  calling execve and looking for E2BIG errors, unfortunately there are performance
>  considerations.

Agreed.
Comment 7 Ulrich Drepper 2008-03-08 07:40:06 UTC
I've checked in a patch.
Comment 8 Jackie Rosen 2014-02-16 19:42:55 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.