This is the mail archive of the
mailing list for the glibc project.
Re: Showstopper for 2.1.3
- To: "Kevin B. Hendricks" <khendricks at ivey dot uwo dot ca>
- Subject: Re: Showstopper for 2.1.3
- From: Kaz Kylheku <kaz at ashi dot footprints dot net>
- Date: Sun, 13 Feb 2000 18:03:49 -0800 (PST)
- cc: Ulrich Drepper <drepper at cygnus dot com>, libc-alpha Mailinglist <libc-alpha at sourceware dot cygnus dot com>
On Sun, 13 Feb 2000, Kevin B. Hendricks wrote:
> Date: Sun, 13 Feb 2000 20:19:43 -0500
> From: Kevin B. Hendricks <email@example.com>
> To: Kaz Kylheku <firstname.lastname@example.org>,
Ulrich Drepper <email@example.com>
> Cc: libc-alpha Mailinglist <firstname.lastname@example.org>
> Subject: Re: Showstopper for 2.1.3
> >I foresaw that the tight loop might not completely eliminate the ``time
> >dilation'' problem when the condition sleep receives many signal
> >interrupts, and have a ready solution for that.
> If you read the bug reports 1597 and 1598 closely you will see that the
> problem is not just time dilation but a real bug in the kernel's
> implementation of nanosleep. Your tight loop should have worked with
> possibly just a bit of extra time involved because of not counting from
That's what I expected.
> time when exiting and looping around to the next call to nanosleep.
Yes, I seem to understand that more or less fully now. ;)
> Unfortunately, it is just not lost time outside the nanosleep call but the
> remaining time is actually increasing slightly when time is converted to
> jiffees and back to time again in the kernel when nanosleep is interrupted.
> This really needs to be fixed in the kernel.
In fact while working on the change I did read through the kernel code and
was wondering about the jiffies conversion. I didn't generate signals
often enough to trigger unusual behavior.
Anyway, it obviously needs to be fixed in the kernel too; we can't have
the nanosleep library function return more time than it was given.
The new patch I just sent was tested aggressively. In the test program, I'm
calling pthread_kill() in a loop without any delay, in order to keep spamming
another thread with as many signals as possible. The handler is set up with
sigaction(). So many cycles are stolen from the thread that it slows down to a
crawl. Yet it does not hang forever in the pthread_cond_timedwait and I'm not
seeing time dilation.
However, I'm seeing other disturbing behavior. The signals are interfering with
printf, giving rise to a positive ferror(stdout). It must be that the
underlying write() calls to the tty are being aborted by the delivery of
this signal. So it's hard to get accurate tracing.
Anyway, I need to make the whole test program available. I'll do that after I
give it a bit of haircut and shave, and massage in a workaround for the printf
problem. I did promise Andreas some test code a while ago.