This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v4] Make bindresvport() function to multithread-safe


On Thu, Oct 18, 2012 at 3:58 PM, Rich Felker <dalias@aerifal.cx> wrote:
> On Thu, Oct 18, 2012 at 02:49:43PM -0400, Carlos O'Donell wrote:
>> On Tue, Oct 16, 2012 at 1:26 PM, Rich Felker <dalias@aerifal.cx> wrote:
>> > On Tue, Oct 16, 2012 at 05:33:51PM +0800, Peng Haitao wrote:
>> >>
>> >> On 09/28/2012 10:27 PM, Carlos O'Donell wrote:
>> >> >>  /*
>> >> >>   * Bind a socket to a privileged IP port
>> >> >>   */
>> >> >>  int
>> >> >>  bindresvport (int sd, struct sockaddr_in *sin)
>> >> >>  {
>> >> >> -  static short port;
>> >> >> +  static __thread short port;
>> >> >
>> >> > I'm curious how our relocation count changes with this?
>> >> >
>> >> > Net neutral or did we gain extra relocations by changing this from
>> >> > static to static __thread?
>> >> >
>> >> > Could you have a look into that?
>> >> >
>> >>
>> >> The static variable will cause bindresvport to unsafe.
>> >>
>> >> Use thread A and thread B to explain:
>> >>
>> >> thread A: when port is 1023, execute "if (port > endport)"
>> >> thread B: execute "sin->sin_port = htons (port++);", port is 1024
>> >> thread A: execute "res = __bind (sd, sin, sizeof (struct sockaddr_in));"
>> >>
>> >> So, static should be replaced with static __thread.
>> >
>> > Adding overhead to every thread in every program for the sake of a
>> > nonstandard function that will almost never be used is not a good
>> > design. Instead, it should simply be protected by a lock, or modified
>> > atomically.
>>
>> The increase in glibc's .tdata size by a short?
>>
>> I think that using __thread here increases concurrency in the library
>> without the cost of locking which can require a local CPU to take
>> ownership of an entire cache line. That cache line may contain more
>> than just the data we wanted to work with.
>>
>> The two scenarios are:
>>
>> (a) Mutex
>> - Decreased concurrency.
>> - Every program increase .data by sizeof(mutex) (maybe 40-50 bytes).
>> - Function is now thread safe.
>
> Isn't there some internal lock type that's only 4 or 8 bytes rather
> than a whole pthread_mutex_t?
>
>> (b) __thread
>> - Increased concurrency.
>> - Every thread increases .tdata by sizeof(short) (maybe 4 bytes with alignment).
>> - Function is now thread safe.
>>
>> I don't see that the answer is that clear cut.
>>
>> Given the future of lots and lots of cores I'd rather debug (b).
>>
>> Rich, Why do you lean towards (a)?
>
> A couple reasons...
>
> (1) Use of this function is extremely rare, so it should be optimized
> towards not adding cost in programs that don't use it.

I don't buy this without data to backup the assertion.

We need to collect user usage metrics to make this claim and we don't.

We can certainly wave our hands around, but it's not real engineering.

> (2) The runtime cost of this function is bounded by syscalls, so
> optimizing userspace code for performance is mostly useless. (Not
> entirely since the locking cost could invalidate cache in other
> threads, but I still suspect this will be very rare.)

Now this is a *great* counter-argument to using __thread variables.

However, I still don't buy that this is the right way forward for
the long-term.

> Of course for this one function, the cost (a few bytes of TLS) does
> not matter. I'm more thinking from a general policy perspective;
> adding TLS all over the place will eventually add up. It would be even
> worse if this were initialized TLS, since that adds runtime cost too.

I agree that adding TLS all over the place will eventually add up,
*but* the number of cores on devices is going up and quickly.
If we keep adding locks we're bound to get hosed for performance
since locks require complex coherency protocols between the cores.
If we could elide the locks and simply use local data it's a huge
concurrency win.

I can't condone the use of a lock when a small amount of TLS solves
the problem, even if the TLS is repeated for every process in the
system. Even then we could switch to some lazy TLS allocation
(which we have) for more kinds of TLS variables and it's a win-win.

I see Peng's recent patch, using __thread, with a minor performance
hit in userspace, as a significant win. It makes the function thread
safe for glibc with minimal memory footprint and retains excellent
concurrency characteristics.

Peng needs to sort out his employers FSF copyright assignment though
since this and other patches are large enough to require it.

I'm open to more conversation on this topic since it's a policy
issue that will continue to be relevant in the years to come.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]