This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: futex and soft lockup


Actually I realized you have to be right pointing to the kernel direction. :-)
I was blinded by the fact that older glibc didn't have the problem and forgot about the basic concepts. I also didn't realize the kernel is actually quite active on the futex area. I am trying some latest kernel futex fixes and it seems promising.
The newer glibc might just be triggering some different kernel path or simply being faster (so easier to trigger some race conditions).
thanks for the tip.

-gong


----- Original Message ----
From: Américo Wang <xiyou.wangcong@gmail.com>
To: Gong Cheng <chengg11@yahoo.com>
Cc: libc-help@sourceware.org
Sent: Tuesday, October 20, 2009 2:35:16 AM
Subject: Re: futex and soft lockup

On Tue, Oct 20, 2009 at 4:55 AM, Gong Cheng <chengg11@yahoo.com> wrote:
> Hi,
>    I am running glibc-2.5-34.x86_64.rpm (for CentOS) on top of a 2.6.31 (tried 2.6.30 too) kernel, and I am consistently seeing system soft lockups like the following:
>
> BUG: soft lockup - CPU#0 stuck for 61s! [<my program>:3068]
> <snip>
> Call Trace:
>  [<ffffffff8130e8d6>] ? _spin_lock+0x16/0x40
>  [<ffffffff8105fe85>] ? futex_wait_setup+0x75/0x100
>  [<ffffffff81060109>] ? futex_wait+0xf9/0x270
>  [<ffffffff8108c80b>] ? zone_statistics+0x5b/0x90
>  [<ffffffff810619fb>] ? do_futex+0xbb/0xcb0
>  [<ffffffff81082f98>] ? ____pagevec_lru_add+0x138/0x150
>  [<ffffffff810317ac>] ? update_curr+0x6c/0xc0
>  [<ffffffff810831b1>] ? __lru_cache_add+0x71/0xb0
>  [<ffffffff81083204>] ? lru_cache_add_lru+0x14/0x30
>  [<ffffffff8130eda1>] ? _spin_unlock+0x11/0x40
>  [<ffffffff8108f0de>] ? do_wp_page+0x28e/0x7b0
>  [<ffffffff81090e3a>] ? handle_mm_fault+0x59a/0x7c0
>  [<ffffffff8130ea12>] ? _spin_lock_irqsave+0x22/0x50
>  [<ffffffff8130ee63>] ? _spin_unlock_irqrestore+0x13/0x40
>  [<ffffffff81062680>] ? sys_futex+0x90/0x150
>  [<ffffffff81029417>] ? do_page_fault+0x187/0x2d0
>  [<ffffffff8100bceb>] ? system_call_fastpath+0x16/0x1b
>
> previously when running glibc-2.5.18 I didn't have this problem. In fact, if I switch back to 2.5.18 while keeping everything else  the same, the problem immediately stops.
>
> My program uses pthread and futex extensively. If I run the program in single-threaded mode, then I don't have the issue.
>
> I am aware I am not providing a lot of information here, but just want to quickly check if this issue is known to anyone here?
> Also in general, is it a bad idea to combine 2.5-34 glibc with the latest kernel?
>
> I'd appreciate any tips on this issue!

This is more than a kernel problem.. :)

Kernel is not supposed to have a 'soft lockup' no matter how you use futex in
user-space. Would mind to try the latest git kernel with glibc-2.5-34?

Thanks.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]