This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: What is the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?

From: KOSAKI Motohiro <kosaki dot motohiro at gmail dot com>
To: Carlos O'Donell <carlos at redhat dot com>
Cc: KOSAKI Motohiro <kosaki dot motohiro at gmail dot com>, Roland McGrath <roland at hack dot frob dot com>, libc-alpha <libc-alpha at sourceware dot org>
Date: Thu, 25 Jul 2013 11:53:39 -0400
Subject: Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
References: <51E42BFE dot 7000301 at redhat dot com> <51E4A0BB dot 2070802 at gmail dot com> <51E4A123 dot 9070001 at gmail dot com> <51E6F3ED dot 8000502 at redhat dot com> <51E6F956 dot 5050902 at gmail dot com> <51E714DE dot 6060802 at redhat dot com> <CAHGf_=oZW3kNA3V-9u+BZNs3tL3JKCsO2a0Q6f0iJzo=N4Wb8w at mail dot gmail dot com> <51E7B205 dot 3060905 at redhat dot com> <20130722214335 dot D9AFF2C06F at topped-with-meat dot com> <51EDB378 dot 8070301 at redhat dot com> <20130722224553 dot 933BA2C070 at topped-with-meat dot com> <51EDB993 dot 9000204 at redhat dot com> <CAHGf_=oaJkCxbQHUXK1-au0ni0atWmiTzWYataV_soaMTZWRKQ at mail dot gmail dot com> <51EE8632 dot 4000103 at redhat dot com>

(7/23/13 9:33 AM), Carlos O'Donell wrote:

On 07/22/2013 11:52 PM, KOSAKI Motohiro wrote:

* Used to create a minimally sized structure to track
   per-logical-CPU data.
   - As it is implemented _SC_NPROCESSORS_CONF is a minimal
     value. Fixing it to match your expected semantics e.g
     making it the number of possible CPUs, is going
     to make this value potentially much larger.


Hmm.

This doesn't cross my mind. I think this case should use (c) because otherwise
your application may crash when cpu hotplug occur.


It may crash only if you don't keep the data consitent e.g. using a cpu
count that doesn't match the allocated structure size.


Right.

However, I think I'm starting to agree with Roland, read my thoughts
below.

Practically, we have no way to know cpu hot adding _synchronously_ and
there is several
race window if you are polling to change ._SC_NPROCESSORS_CONF value.


With any dynamic system either notification-based or polling you are going
to have some period of time where the userspace data structure is going
to be out of sync with the real hardware state and it takes time for the
state to become consistent. Userspace code would have to be written to
adjust for this (ignoring POSIX requirement that these constants never
change).

However, if userspace code has to adjust for changing _SC_NPROCESSORS_CONF,
assuming it isn't equal to possible CPUs, then we have exactly the same
problem we had with sched_getaffinity returning EINVAL. Userspace code must
be written in a loop to look at _SC_NPROCESSORS_CONF and increase storage
size if the number of configured CPUs grows larger.

So while we saved the user writing looping code for sched_getaffinity,
we still have the same looping code one level higher.

I think that perhaps Roland is right here and that _SC_NPROCESSORS_CONF
should just be changed to match "possible" cpus for linux.

Actually, in kernel possible cpus structure was mainly created to
effectively represent
per-cpu data.


OK, so the reality is that possible shouldn't be orders of magnitude
larger than configured or online cpus?


It depend on firmware. sorry, we can't promise anything. We have two
worst case. 1) the firm ware is buggy and report plenty CPUs wrongly.
I have no idea about that. But note, I have no seen such mistake in
LKML. 2) Customer buys hotplug aware high-eng machine but he only buy
a few cpus. In this case, firmware report "hey, I have a lot of enhancement
chance!". This is also unavoidable. But, as far as I observed, such high-end
machine is very costly, and then customers don't take such wrong choice.

From point of applications view, we have a way to mitigate the above problems.
To use mmap() instead of malloc()+memset() help to mitigate waste memory because
Linux allocate actual memory at first touch. Then if some per-cpu area never
accessed, memory wouldn't allocated at all. Yes, this is not perfect because
page fault is page size operation. We can waste page size memory. However,
at least, Linux kernel does similar technique and it is considered acceptable
memory usage in wide range customers.

The rest problem is mlockall(). If you use mlockall() all allocation data will
be allocated at mmap() time instead of first touch. Then, waste memory will be
increased. It is userland specific and kernel people have no idea. But I guess
such applications like memory waste rather than unpredictable latency delay.

This is my concern. Does this cross your mind?

References:
- What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: KOSAKI Motohiro
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: KOSAKI Motohiro
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: KOSAKI Motohiro
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: KOSAKI Motohiro
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Roland McGrath
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Roland McGrath
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: KOSAKI Motohiro
- Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?

Re: What is the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?