In Linux 2.6.23, the limit on the size of [argv + environ] became controllable by the user, based on the RLIMIT_STACK resource limit. (See the man page excerpt below.) Glibc doesn't seem to know this yet -- I'm not even sure whether it can be made to know it. Anyway, as things stand, the return value of sysconf(_SC_ARG_MAX) is no longer accurate (unless I've missed something), and this report is a heads up on that point. From execve.2 Limits on size of arguments and environment Most Unix implementations impose some limit on the total size of the command-line argument (argv) and environment (envp) strings that may be passed to a new program. POSIX.1 allows an implementation to advertise this limit using the ARG_MAX constant (either defined in <limits.h> or available at run time using the call sysconf(_SC_ARG_MAX)). On Linux prior to kernel 2.6.23, the memory used to store the environment and argument strings was limited to 32 pages (defined by the kernel constant MAX_ARG_PAGES). On architectures with a 4-kB page size, this yields a maxi- mum size of 128 kB. On kernel 2.6.23 and later, most architectures support a size limit derived from the soft RLIMIT_STACK resource limit (see getrlimit(2)) that is in force at the time of the execve() call. For these architectures, the total size is limited to 1/4 of the allowed stack size, the limit per string is 32 pages (the kernel constant MAX_ARG_STRLEN), and the maximum number of strings is 0x7FFFFFFF. (This change allows programs to have a much larger argument and/or environment list. Imposing the 1/4-limit ensures that the new program always has some stack space.) Architectures with no memory management unit are excepted: they maintain the limit that was in effect before kernel 2.6.23.
Interesting change, this is nice for programs that don't use libiberty's @file support. This change makes _SC_ARG_MAX variable over the life of the program, simialr to _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return different values before and after a call to..." We can immediately add support for calling getrlimit to compute the result of sysconf(_SC_ARG_MAX), conditional on `__LINUX_KERNEL_VERSION >= 0x020617' (>= 2.6.23) i.e. the minimum kernel version supported by this glibc is 2.6.23. Otherwise sysconf(_SC_ARM_MAX) must continue to return ARG_MAX, less than accurate, but still correct. The alternative is to add a new RLIMIT_* resource. Glibc may call getrlimit to see if that is set (the kernel would take care to compute the right value), return that for sysconf(_SC_ARG_MAX), otherwise ARG_MAX. This code would be enabled if you were building against headers that defined the new RLIMIT_* resource. What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it doesn't honour providing ARG_MAX space. Are you interested in helping implement this change in glibc? Are you working with someone on the kernel side?
Subject: Re: sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23 On 23 Feb 2008 00:28:12 -0000, carlos at codesourcery dot com <sourceware-bugzilla@sourceware.org> wrote: > > ------- Additional Comments From carlos at codesourcery dot com 2008-02-23 00:28 ------- > Interesting change, this is nice for programs that don't use libiberty's @file > support. > > This change makes _SC_ARG_MAX variable over the life of the program, simialr to > _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will > need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return > different values before and after a call to..." I don't think this is true. Please read the text that I wrote for the man page. The limit is determined by the RLIMIT_STACK value that is in force **at the time of the execve()**. Thereafter, it is invariant. > We can immediately add support for calling getrlimit to compute the result of > sysconf(_SC_ARG_MAX), conditional on `__LINUX_KERNEL_VERSION >= 0x020617' (>= > 2.6.23) i.e. the minimum kernel version supported by this glibc is 2.6.23. > Otherwise sysconf(_SC_ARM_MAX) must continue to return ARG_MAX, less than > accurate, but still correct. > > The alternative is to add a new RLIMIT_* resource. Glibc may call getrlimit to > see if that is set (the kernel would take care to compute the right value), > return that for sysconf(_SC_ARG_MAX), otherwise ARG_MAX. This code would be > enabled if you were building against headers that defined the new RLIMIT_* resource. > > What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that > RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it > doesn't honour providing ARG_MAX space. POSIX.1 says ARG_MAX must only be at least 4096. That's all the kernel must honour. I haven't actually checked whether it does honour that though. > Are you interested in helping implement this change in glibc? > > Are you working with someone on the kernel side? I'm the man-pages maintainer. While I'd like to help, three weeks ago I became a father, and will have very few available cycles for the next 6 weeks or more. What I do have will be entirely given over to man pages. From April or so, I'd have time to help -- but I'd guess you want to do things faster. Cheers, Michael
Subject: Re: sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23 michael dot kerrisk at googlemail dot com wrote: >> This change makes _SC_ARG_MAX variable over the life of the program, simialr to >> _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will >> need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return >> different values before and after a call to..." > > I don't think this is true. Please read the text that I wrote for the > man page. The limit is determined by the RLIMIT_STACK value that is > in force **at the time of the execve()**. Thereafter, it is > invariant. The standard requires that the return value of sysconf(_SC_ARG_MAX) remain invariant over the lifetime of the calling process, and execve doesn't make a new process, instead it overlays a new process image. Note that the pid and resource limits are inherited. Consider the following scenario: 1. At startup the application calls sysconf(_SC_ARG_MAX) to compute how many arguments it may pass to execve. 2. The application, in the course of running, calls setrlimit with a lower RLIMIT_STACK. 3. The application calls execve. Expected behaviour: - Application has atleast sysconf(_SC_ARG_MAX) space to pass argv and envp to the execve. New behaviour: - There may not be enough room to pass those parameters? If we allow the value to change over the lifetime of a process then the wording of the standard should be updated. >> What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that >> RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it >> doesn't honour providing ARG_MAX space. > > POSIX.1 says ARG_MAX must only be at least 4096. That's all the > kernel must honour. I haven't actually checked whether it does honour > that though. That is not all the kernel must honour. The value returned by sysconf(_SC_ARG_MAX) shall not be more restrictive than whatever value _ARG_MAX had at compile time. Kernel implementation: - The kernel does not provide an initial minimum of _ARG_MAX space, see fs/exec.c (__bprm_mm_init) where "vma->vm_start = vma->vm_end - PAGE_SIZE;" is set. The kernel provides an initial PAGE_SIZE block regardless of RLIMIT_STACK, unfortunately this is not enough space. - The kernel does not maintain a minimum of _ARG_MAX space, see fs/exec.c (get_arg_page) where "size > rlim[RLIMIT_STACK].rlim_cur / 4" is checked. The kernel should maintain a minimum of _ARG_MAX space. IMO these are kernel bugs in 2.6.23. Filed. http://bugzilla.kernel.org/show_bug.cgi?id=10095 In summary: The kernel should use the value of _ARG_MAX, as defined at kernel compile time, as the per-process minimum number of bytes allocated for argv and envp, regardless of the RLIMIT_STACK value. The specification should be changed to indicate that calls to setrlimit(RLIMIT_STACK, ...) may change the returned value of sysconf(_SC_ARG_MAX). Add a new resource for getrlimit called "RLIMIT_ARG_MAX" and implement this in the kernel to return the value used by the kernel (This will likely return "current->signal->rlim[RLIMIT_STACK].rlim_cur / 4". Glibc will return getrlimit(RLIMIT_ARG_MAX,...) if it is available or _ARG_MAX as the return value for sysconf(_SC_ARG_MAX). Comments?
Subject: Re: sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23 On 25 Feb 2008 17:13:36 -0000, carlos at codesourcery dot com <sourceware-bugzilla@sourceware.org> wrote: > > ------- Additional Comments From carlos at codesourcery dot com 2008-02-25 17:13 ------- > > Subject: Re: sysconf(_SC_ARG_MAX) no longer accurate since > Linux kernel 2.6.23 > > > michael dot kerrisk at googlemail dot com wrote: > >> This change makes _SC_ARG_MAX variable over the life of the program, simialr to > >> _SC_OPEN_MAX (after a call to setrlimit with RLIMIT_NOFILE). The standard will > >> need to be changed, as it was changed for _SC_OPEN_MAX, to say "...may return > >> different values before and after a call to..." > > > > I don't think this is true. Please read the text that I wrote for the > > man page. The limit is determined by the RLIMIT_STACK value that is > > in force **at the time of the execve()**. Thereafter, it is > > invariant. > > The standard requires that the return value of sysconf(_SC_ARG_MAX) > remain invariant over the lifetime of the calling process, and execve > doesn't make a new process, instead it overlays a new process image. Doh! Yes, of course you are right! > Note that the pid and resource limits are inherited. > > Consider the following scenario: > > 1. At startup the application calls sysconf(_SC_ARG_MAX) to compute how > many arguments it may pass to execve. > > 2. The application, in the course of running, calls setrlimit with a > lower RLIMIT_STACK. > > 3. The application calls execve. > > Expected behaviour: > - Application has atleast sysconf(_SC_ARG_MAX) space to pass argv and > envp to the execve. > > New behaviour: > - There may not be enough room to pass those parameters? Agreed. > If we allow the value to change over the lifetime of a process then the > wording of the standard should be updated. Well, I suppose it could be worth trying to se whetehr that change would make it through the standards process/ > >> What happens if you have less than 512 kB of RLIMIT_STACK? A quarter of that > >> RLIMIT_STACK could be less than ARG_MAX. I would think it a kernel bug if it > >> doesn't honour providing ARG_MAX space. > > > > POSIX.1 says ARG_MAX must only be at least 4096. That's all the > > kernel must honour. I haven't actually checked whether it does honour > > that though. > > That is not all the kernel must honour. The value returned by > sysconf(_SC_ARG_MAX) shall not be more restrictive than whatever value > _ARG_MAX had at compile time. > > Kernel implementation: > > - The kernel does not provide an initial minimum of _ARG_MAX space, see > fs/exec.c (__bprm_mm_init) where "vma->vm_start = vma->vm_end - > PAGE_SIZE;" is set. The kernel provides an initial PAGE_SIZE block > regardless of RLIMIT_STACK, unfortunately this is not enough space. Yes, but I'm not sure that we can say that the kernel is advertising a particular value for ARG_MAX. Yes, there is a definition in include/linux/limits.h, but it was never used in the kernel sources as far as I can see. Being weaselly, I believe the header file could equally be amended to say #define ARG_MAX 4096 > - The kernel does not maintain a minimum of _ARG_MAX space, see > fs/exec.c (get_arg_page) where "size > rlim[RLIMIT_STACK].rlim_cur / 4" > is checked. The kernel should maintain a minimum of _ARG_MAX space. > > IMO these are kernel bugs in 2.6.23. Filed. > http://bugzilla.kernel.org/show_bug.cgi?id=10095 Ahh -- only just read that now. I see Peter saying some of the same things as me, but I don't know that I agree with all he says. > In summary: > > The kernel should use the value of _ARG_MAX, as defined at kernel > compile time, as the per-process minimum number of bytes allocated for > argv and envp, regardless of the RLIMIT_STACK value. As I say, the kernel folk could just redefine ARG_MAX as 4096. > The specification should be changed to indicate that calls to > setrlimit(RLIMIT_STACK, ...) may change the returned value of > sysconf(_SC_ARG_MAX). As I think about this more, it seems ugly. The real problem is that RLIMIT_STACK should probably not have been overloaded to also e used for controlling ARG_MAX. That's a bit of a hack, and I'd suspect that the POSIX folks would (rightly) reject it. > Add a new resource for getrlimit called "RLIMIT_ARG_MAX" and implement > this in the kernel to return the value used by the kernel (This will > likely return "current->signal->rlim[RLIMIT_STACK].rlim_cur / 4". Is your meaning here, that the RLIMIT_ARG_MAX limit would be read-only, returning a value based on RLIMIT_STACK? That is not consistent with the semantics of other rlimits. Cheers, Michael > Glibc will return getrlimit(RLIMIT_ARG_MAX,...) if it is available or > _ARG_MAX as the return value for sysconf(_SC_ARG_MAX). > > Comments? > > > > > -- > > > http://sourceware.org/bugzilla/show_bug.cgi?id=5786 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
I think we are in agreement here: A. It is worthwhile to recommend a change to POSIX.1, making note that ARG_MAX is now variable. The exact wording of the change is up for discussion. Let me clarify the following issues: 1. The kernel must not lower the value of ARG_MAX in include/linux/limits.h. This would break binary compatibility. 2. I would propose that RLIMIT_ARG_MAX be a read and write value. How the kernel implements this does not have to be discussed here. 3. glibc would use getrlimit(RLIMIT_ARG_MAX, &lim); to determine if the currently running kernel supports a variable size of argument and environ space. Notes: - Without (2) and (3) userspace lacks a programmatic way to determine the [argv + environ] space limit. Userspace could still probe the size by repeatedly calling execve and looking for E2BIG errors, unfortunately there are performance considerations.
Subject: Re: sysconf(_SC_ARG_MAX) no longer accurate since Linux kernel 2.6.23 On 26 Feb 2008 13:57:17 -0000, carlos at codesourcery dot com <sourceware-bugzilla@sourceware.org> wrote: > > ------- Additional Comments From carlos at codesourcery dot com 2008-02-26 13:57 ------- > I think we are in agreement here: > > A. It is worthwhile to recommend a change to POSIX.1, making note that ARG_MAX > is now variable. The exact wording of the change is up for discussion. > > Let me clarify the following issues: > > 1. The kernel must not lower the value of ARG_MAX in include/linux/limits.h. > This would break binary compatibility. I'm inclined to agree. > 2. I would propose that RLIMIT_ARG_MAX be a read and write value. How the kernel > implements this does not have to be discussed here. Sounds fine. The only possible object would be that we are changing the ABI that was put in place in 2.6.23. But I'm not sure how much that really matters. > 3. glibc would use getrlimit(RLIMIT_ARG_MAX, &lim); to determine if the > currently running kernel supports a variable size of argument and environ space. Sounds okay. > Notes: > - Without (2) and (3) userspace lacks a programmatic way to determine the [argv > + environ] space limit. Userspace could still probe the size by repeatedly > calling execve and looking for E2BIG errors, unfortunately there are performance > considerations. Agreed.
I've checked in a patch.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.