Summary: | Fix use of cpu_set_t with sched_getaffinity when booted on a system with more than 1024 possible cpus. | ||
---|---|---|---|
Product: | glibc | Reporter: | Carlos O'Donell <carlos> |
Component: | libc | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | RESOLVED MOVED | ||
Severity: | normal | CC: | drepper.fsp, fweimer |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | 2.18 | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | Test case which reflects the truncation assumption (does not currently work) |
Description
Carlos O'Donell
2013-06-14 19:31:24 UTC
Also needing fixing are pthread_getaffinity_np and pthread_setaffinty_np. Maybe addressing this should also fix this design issue with the cpu_set_t functions: CPU_ALLOC returns a set which is capable of storing more CPUs than the user requested because the request is rounded up to the next multiple of sizeof (long) * CHAR_BIT CPUs. This is reflected in the return value of CPU_ALLOC_SIZE. As a result, the sched_getaffinity call in set = CPU_ALLOC (count); sched_getaffinity (0, CPU_ALLOC_SIZE (count), set); can succeed even if count is smaller than the maximum number of relevant CPUs. This defeats the purpose of the EINVAL error return code in case the specified CPU set is not large enough. Additional details: Sadly, sysconf(_SC_PROCESSORS_ONLN) not the number of CPUs which is relevant to the sched_getaffinity system call. I have system which reports _NPROCESSORS_ONLN and _NPROCESSORS_CONF as 40 (and /proc/cpuinfo and /proc/stat match that), yet calling sched_getaffinity with small arguments fails: [pid 3420] sched_getaffinity(0, 8, 0x146b010) = -1 EINVAL (Invalid argument) [pid 3420] sched_getaffinity(0, 16, 0x146b010) = -1 EINVAL (Invalid argument) [pid 3420] sched_getaffinity(0, 32, {ffffffffff, 0, 0, 0}) = 32 The kernel seems to operate with a nr_cpu_ids value of 240: kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:240 nr_cpu_ids:240 nr_node_ids:2 kernel: RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=240. nr_cpu_ids is not directly exposed to user space, I think, so the EINVAL behavior is pretty much required. If other systems have such inflated nr_cpu_ids values, this could become an issue well before systems with 1024 hardware threads become common. My laptop has nr_cpu_ids=8 (4 hardware threads), another server has nr_cpu_ids=144, with the same number of hardware threads. I don't think it's prudent to assume that nr_cpu_ids value remains constant after boot, either. So I wonder if the premise of this bug report (we can get rid of the EINVAL return value and truncate results) is correct. There is no other way to obtain the magic constant, and even if you grab the number from somewhere, it may be racy with regards to system reconfiguration. A potential solution would accept smaller size arguments for sched_getaffinity, as long as the other bits on the kernel side a cleared. Tools such as taskset could then restrict the CPU set to the supported CPUs for legacy applications. This is what we do with RLIMIT_NOFILEāit is still 1024 on many systems to prevent issues with the select function and the default FD_SETSIZE value. Unfortunately, it is somewhat more restrictive. Created attachment 8322 [details]
Test case which reflects the truncation assumption (does not currently work)
(In reply to Florian Weimer from comment #2) > Maybe addressing this should also fix this design issue with the cpu_set_t > functions: > > CPU_ALLOC returns a set which is capable of storing more CPUs than the user > requested because the request is rounded up to the next multiple of sizeof > (long) * CHAR_BIT CPUs. This is reflected in the return value of > CPU_ALLOC_SIZE. As a result, the sched_getaffinity call in > > set = CPU_ALLOC (count); > sched_getaffinity (0, CPU_ALLOC_SIZE (count), set); > > can succeed even if count is smaller than the maximum number of relevant > CPUs. This defeats the purpose of the EINVAL error return code in case the > specified CPU set is not large enough. Agreed. That is a flaw. The allocation should certainly be large enough to hold all possible CPUs ever (nr_cpumask_bits in kernel speak). (In reply to Florian Weimer from comment #3) > Additional details: > > Sadly, sysconf(_SC_PROCESSORS_ONLN) not the number of CPUs which is relevant > to the sched_getaffinity system call. I have system which reports > _NPROCESSORS_ONLN and _NPROCESSORS_CONF as 40 (and /proc/cpuinfo and > /proc/stat match that), yet calling sched_getaffinity with small arguments > fails: > > [pid 3420] sched_getaffinity(0, 8, 0x146b010) = -1 EINVAL (Invalid argument) > [pid 3420] sched_getaffinity(0, 16, 0x146b010) = -1 EINVAL (Invalid > argument) > [pid 3420] sched_getaffinity(0, 32, {ffffffffff, 0, 0, 0}) = 32 > > The kernel seems to operate with a nr_cpu_ids value of 240: > > kernel: setup_percpu: NR_CPUS:5120 nr_cpumask_bits:240 nr_cpu_ids:240 > nr_node_ids:2 > kernel: RCU restricting CPUs from NR_CPUS=5120 to nr_cpu_ids=240. > > nr_cpu_ids is not directly exposed to user space, I think, so the EINVAL > behavior is pretty much required. It is not. We need to get access to nr_cpumask_bits, and likely the only way to do that is read it from proc like some of the code does today. See this thread: https://sourceware.org/ml/libc-alpha/2013-07/msg00288.html What is the actual problem that we are trying to fix? We could make sched_getaffinity succeed if the affinity mask contains only zero bits outside the mask buffer supplied by the application. But then, the only way to run an application which specifies too small a set would require preventing the application from running on those extra CPUs. This could leave parts of the machine unused. We cannot drop discard set bits because some concurrent algorithms need to know on which CPUs they can run. I do not want to introduce a CPU number offset in user space (which could be used to run multiple small-affinity-set applications on all CPUs). This is probably something for the kernel. Container/namespace support for affinity masks may even make sense for other reasons. (In reply to Florian Weimer from comment #7) > What is the actual problem that we are trying to fix? Applications using a fixed cpu_set_t on a machine with more than 1024 CPUs should always just work, but it doesn't. > We could make sched_getaffinity succeed if the affinity mask contains only > zero bits outside the mask buffer supplied by the application. But then, > the only way to run an application which specifies too small a set would > require preventing the application from running on those extra CPUs. This > could leave parts of the machine unused. This is a fine compromise. The application was not designed to run on those CPUs anyway and need to be rewritten. The point is to allow such binaries to continue to run. Otherwise this is the equivalent of having broken compatibility. > We cannot drop discard set bits because some concurrent algorithms need to > know on which CPUs they can run. Agreed. We can continue to return EINVAL in this case. > I do not want to introduce a CPU number offset in user space (which could be > used to run multiple small-affinity-set applications on all CPUs). This is > probably something for the kernel. Container/namespace support for affinity > masks may even make sense for other reasons. I think with the above changes we could make this work. CPU ID translation should be performed by the kernel. I filed: https://bugzilla.kernel.org/show_bug.cgi?id=151821 |