hyperthreading fix, try #1
Tue Feb 8 06:31:00 GMT 2005
On Mon, Feb 07, 2005 at 08:17:52AM +0100, Volker Bandke wrote:
>Which system configuration did you use to recreate the problem?
I got enough donations to purchase the following:
Motherboard: ASUS P4P800SE
CPU: CPU P4/3.0EGHz 800M 478P/1MB HT RT
HD: Samsung 120GB
Case: ASPIRE XINFINITY BL 350W RTL
I purchased this from Newegg. I love that company.
I put the system together in one night, turned it on, and it worked.
All of the lights came on correctly, the system booted with a CD, and
transferring data from my old system proceeded without a hitch, thanks
to my knoppix CD -- love that knoppix, too.
The one thing that took me forever to fix was getting XP running.
Somehow my XP CD got cracked with a big chunk taken out of it, so I had
to get a new CD, and I ended up transferring data from my old system
multiple times as I attempted to install the new CD without overwriting
all of my existing data.
The way I usually do this is to copy raw partitions over, since my
windows box is multi-boot and represents years of work. Sometimes the
OS figures out how to reconfigure itself, sometimes it needs a nudge.
In this case, it needed to be whacked with a large branch.
I couldn't get W2K working but I've held off further investigations in
that for another time.
>also, can you describe (in _short_ terms) the cause of the error?
Cygwin has a problem because normal pipe I/O on windows is not
interruptible (generically speaking - you could kludge it on NT).
So, to work around this problem, it starts up pipe i/o in a thread
and kills the thread when a signal comes in. It's a sledge hammer
approach to interrupting pipe I/O.
The pipe thread uses a synchronization event to tell the initiating
reader when the pipe is all set, has grabbed its arguments and is ready
to go. This event was also used to tell the reader that there was a
Previous to my fix, cygwin did not reliably wait for both events to
happen so, after the first read on a pipe, it would become out of
sync. This would present a problem on any kind of SMP-like system
but it wouldn't be as noticeable on a non-SMP system.
Once I ran the test case twenty times or so, I went back and looked at
the code I'd previously stared at for hours and saw a few
synchronization issues. For once the back trace from gdb showed that
something was clearly amiss.
So, the fix was to try much harder to ensure that we've correctly waited
for notification events, even in the scenario when cygwin thinks it has
to terminate a thread due to the arrival of a signal. It is possible
that the read has completed in that case and cygwin should not throw the
data away since the read really *wasn't* terminated by a signal.
Unfortunately, there is still a race here. I have an idea about how to
fix the race but it would introduce a destabilizing change that I'd
rather not chance before 1.5.13 is released. Given that I can't
reproduce the problem with the test script anymore, I think I'll release
cygwin with this change plus any other potential fixes required to
handle the "make -j" problem.
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
More information about the Cygwin