This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: What *is* the API for sched_getaffinity? Should sched_getaffinity always succeed when using cpu_set_t?


On 07/23/2013 05:48 PM, Roland McGrath wrote:
>> On 07/22/2013 06:45 PM, Roland McGrath wrote:
>>> I have a hard time seeing why (b) would ever be useful.
>>> I think (c) was always the intended semantic of _SC_NPROCESSORS_CONF.
>>
>> That's different than what we have implemented in glibc.
> 
> Bug.

OK, I can agree with that, but only if I buy the rest of the
rationale for _SC_NPROCESSOR_CONF.
 
>> Why do you have a hard time seeing that (b) would be useful?
> 
> It doesn't give you information that you can actually use in any
> reliable way.  If it's not an upper bound for what _SC_NPROCESSORS_ONLN
> might report, then you don't know any such upper bound and the only way
> you can ever cope with _SC_NPROCESSORS_ONLN values increasing in the
> future is to use on-demand dynamic allocation when _SC_NPROCESSORS_ONLN
> does change.  At best, that's overly complicated to implement.

I agree that it is overly complicated.

I also agree that it is exactly the kind of complicated solution
I'm trying to avoid by making sched_getaffinity never return
EINVAL and require dynamic allocation.

Therefore I am caught red handed suggesting something I'm already
trying to avoid in another API.

Conclusion: I didn't think it through sufficiently.
 
>> I see (b) being useful for:
>>
>> * Detection of number of logical cpus that are in the
>>   system vs. number that are online.
>>   - Ask your admin to bring the rest of them online?
> 
> Do applications really need a canonical interface for this?  That's a
> purely administrative issue well outside the scope of things that an
> application can ordinarily do anything useful with.  And how is (b) any
> more what you want than (c) is for this purpose?  Ask your admin to
> bring more online; ask your admin to plug more in and then bring them
> online.

Applications include diagnostic and monitoring systems, but
I can buy that these systems may not want a portable or
canonical way to look at this information. It would be handy
though if such portable interfaces existed.

The distinction between (b) and (c) is that in (b) you have
the hardware plugged in and the OS/hardware combination has
configured the processor. However it might be used for something
else or offline because it's getting a firmware upgrade, who
knows. The point is that you have it and that's a useful distinction
for a diagnostic application.

The whole point is moot though if we don't think a canonical
interface is useful, and that a monitoring application might
just use sysfs directly on Linux.

If I were writing such applications I'd use sysconf before
having to go to sysfs.
 
>> * Used to create a minimally sized structure to track 
>>   per-logical-CPU data.
>>   - As it is implemented _SC_NPROCESSORS_CONF is a minimal
>>     value. Fixing it to match your expected semantics e.g
>>     making it the number of possible CPUs, is going
>>     to make this value potentially much larger.
> 
> How is this a useful size for anything?  The only per-CPU data an
> application might maintain is about CPUs it can actually use.  If
> that's what it's doing, then _SC_NPROCESSORS_ONLN is what it wants.
> If it wants to prepare a data structure that will be able to hold all
> the per-CPU data for all CPUs that it might encounter during the life
> of the process, then (b) is insufficient and only (c) is useful.

I disagree that the only per-CPU data an application might maintain
is about CPUs it can actually use. A CPU might have gone offline
for any number of reasons and you want to keep existing data for
that CPU present (what happens to a process that has an affinity
for a CPU and that CPU goes offline?).

I agree with your claim about (c) being the most useful for allocating
enough resources to handle all possible number of CPUS that might
ever come online.

>> What use is there to knowing (c) except to choose to optimize
>> space vs. time and allocate sufficient resources to track all
>> possible cpus that system could have (only a reboot can change
>> this in linux right now)?
> 
> Your previous description said that CPU hotplug would change the value.

Yes, CPU hotplug can change the value of (b).

The value of (c) is constant.

> If that's not true, then I have no idea what distinguishes your (b) from
> your (c) in any way that is remotely meaningful to application software.

I think we have talked past one another.

If you want to reduce memory usage you allocate `(b) * <per cpu data size>'
and reallocate as required (complicated).

Otherwise you allocate `(c) * <per cpu data size>' and never reallocate
(simple).

The problem I have is that (c) might be quite big on certain systems
e.g. 4096 while (b) is relatively small and mostly constant e.g. 512.

Eventually (c) might not even exist if the kernel gets a dynamic
framework for adding cpus, but that's another conversation.

> I have not seen anything to dissuade me from the position that the
> definitions I gave earlier are the only ones that make any kind of
> worthwhile sense to an application:
> 
> 1. Amount of hardware parallelism currently available (_SC_NPROCESSORS_ONLN)
> 2. Upper bound on values of #1 in the life of a process (_SC_NPROCESSORS_CONF)
 
I'm happy with such a definition.

My key points:

(1) It wasn't clear from the manual that _SC_NPROCESSORS_CONF was the
    OS maximum e.g. (c). The manual needs clarification.

(2) The clarified definition of _SC_NPROCESSORS_CONF means there is
    a bug in the glibc linux code for this constant. The linux code
    currently exports (b) for _SC_NPROCESSORS_CONF and it is not
    constant.

(3) There are use case for a 3rd constant that is the soft limit
    of available processors or some compliment e.g.
    _SC_NPROCESSORS_ONLN + _SC_NPROCESSORS_OFFLN < _SC_NPROCESSORS_CONF.
    With _SC_NPROCESSORS_OFFLN being an invented name for some value
    that meets the definition of (b). We need not add or discuss this
    now.
 
> Consider what it would look like if you proposed _SC_NPROCESSORS_*
> values with specified meanings for a future POSIX.1 revision (which is
> indeed something we should arrange to get done).  If it's not
> meaningful in terms of characteristics of the system that a conforming
> application can observe, then it doesn't belong in the standard.  The
> standard doesn't describe how its calls relate to abstract notions or
> to concrete hardware that happens to underlie the implementation.  It
> describes how its calls relate to the behavior of the system that can
> be observed by conforming applications.  How, other than the two
> definitions I gave above, would you define any parameter in this family?

Trick question? I wouldn't. 

I'd remove _SC_NPROCESSORS_* entirely, except that they provide semi-useful
information for use in conjunction with sched_getaffinity,
sched_setaffinity (both of which are not POSIX conforming, see
bug 15088 and bug 14829), pthread_getaffinity_np and pthread_setaffinity_np
(none of which tells you anything about the processor, and now with
ARM big.LITTLE there is a huge difference if you ask for affinity on
a one of the A15 cores vs A7 cores).

What can a portable application hope to achieve by knowing how many
cpus are online, offline or possibe?

It's a similar situation to what we had with elision vs. lock hinting.

In the case of increasing performance through increased parallelism
the application has no way to know what the OS is going to do with
the threads it creates based on the number of processors. The 
application's best hope is to test for paralleism, record the 
optimal number, use that, and adjust periodically (even harder to
do with heterogeneous systems).

We have multiple runtimes from gomp, to silk, to c++, that are all
going to have work stealing thread pools, and if all of them use
sysconf (_SC_NPROCESSORS_ONLN) we will oversubscribe the system.

My gut feeling is that _SC_NPROCESSORS_* and the associated
functions e.g. get_nprocs/get_nprocs_conf shouldn't exist.

If I had to define something I would probably define: available to you, 
not available to you, and possible.

My initial interpretation was that _SC_NPROCESSORS_CONF was
`offline + online', but you say that the intent was `possible'.

What we really need is a much richer set of APIs for scheduling
(process/thread) and affinity, but we won't solve that here today.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]