This is the mail archive of the
mailing list for the pthreas-win32 project.
RE: New pthread_once implementation
- From: Ross Johnson <ross dot johnson at homemail dot com dot au>
- To: Vladimir Kliatchko <vladimir at kliatchko dot com>
- Cc: 'Gottlob Frege' <gottlobfrege at gmail dot com>, Pthreads-Win32 list <pthreads-win32 at sources dot redhat dot com>
- Date: Sat, 28 May 2005 23:52:39 +1000
- Subject: RE: New pthread_once implementation
- References: <0IH70010T4SR3C@mta10.srv.hcvlny.cv.net>
On Sat, 2005-05-28 at 06:51 -0400, Vladimir Kliatchko wrote:
> > -----Original Message-----
> > From: firstname.lastname@example.org [mailto:pthreads-win32-
> > email@example.com] On Behalf Of Ross Johnson
> > Sent: Friday, May 27, 2005 11:48 PM
> > To: Vladimir Kliatchko
> > Cc: 'Gottlob Frege'; Pthreads-Win32 list
> > Subject: RE: New pthread_once implementation
> > On Fri, 2005-05-27 at 21:30 -0400, Vladimir Kliatchko wrote:
> > > Nice catch. Let me see if I can fix it.
> > >
> > > Note that the same problem exists in the currently released event-based
> > > implementation (cvs version 1.16):
> > >
> > > thread1 comes in, start initing
> > > thread2 creates event, starts waiting
> > > thread3 comes in starts waiting
> > > thread1 is cancelled, signals event
> > > thread2 wakes up, proceeds to the point right before the resetEvent
> > > thread3 wakes up, closes event handle
> > > thread2 resets closed handle
> > Relies on HANDLE uniqueness and assumes that an error will result. This
> > is why the 2.6.0 version (and earlier) checks the return code and
> > restores Win32 LastError if necessary - for GetLastError transparency.
> Does Windows guarantee that the handles are not reused? What happens if a
> thread closes a handle while another thread is blocked on it? Is any of this
> in Microsoft documentation? Consider the following scenario for the
> event-based implementation:
Well, apparently they're not unique when recycled, so there is a bug
here to fix in both versions:
[Under "Native Windows NT Objects"]
"Unlike the handles that are maintained by the Win32 USER and GDI
subsystem components, handles to native objects under Windows NT are not
unique; that is, upon destruction of an object, the corresponding handle
may be recycled and will look exactly like the handle to the destroyed
But they are local to the process, rather than system wide if that
> > > Also, regarding my previous comment to Ross about very high cost of
> > using
> > > InterlockedExchangeAdd for MBR:
> > > I did some simple benchmarking. Running pthread_once 50,000,000 on my
> > pretty
> > > slow single CPU machine takes about 2.1 seconds. Replacing
> > > InterlockedExchangeAdd with simple read brings it down to 0.6 seconds.
> > This
> > > looks significant.
> > Using the PTW32_INTERLOCKED_COMPARE_EXCHANGE macro as in your latest (in
> > CVS) version and building the library for inlined functions (nmake VC-
> > inlined) and x86 architecture causes customised versions of
> > InterlockedCompareExchange to be used, and this results in inlined asm.
> > Same for PTW32_INTERLOCKED_EXCHANGE.
> > Also, on single-CPU x86, the library dynamically switches to using
> > 'cmpxchg' rather than 'lock cmpxchg' to avoid locking the bus. This
> > appears to match what the kernel32.dll versions do. On non-x86
> > architectures the kernel32.dll versions are called, with call overhead.
> > PTW32_INTERLOCKED_EXCHANGE_ADD could be added, as could other
> > architectures. See ptw32_InterlockedCompareExchange.c
> I have rerun my benchmark with VC-inline. The difference is now less
> significant 0.9 vs 0.6 but still noticeable. I guess cmpxchg even without
> locking is quite expensive. On multi-CPU systems the difference should be
> much higher due to the time it takes to lock the bus and to the contention
> it may cause. It sounded as if you did not care much to try to optimize it.
> I did not mean to suggest that we have to do it right now either. I just
> wanted to get your opinion on whether we want to deal with this in the
By all means include any optimisation you think is worthwhile. I was
just pointing out that the difference isn't necessarily 2.1 v 0.6.