Bug 2355

Summary: FryskGui abort, show that Xlib: sequence lost in reply type 0x0
Product: frysk Reporter: Wu Zhou <woodzltc>
Component: generalAssignee: Andrew Cagney <cagney>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: powerpc64-unknown-linux-gnu Target: powerpc64-redhat-linux-gnu
Build: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 2188, 2104    

Description Wu Zhou 2006-02-19 14:22:28 UTC
On a pure PPC64/FC5 installation (with selinux disabled), I build the frysk from
the latest CVS source. It builds ok and FryskGui starts up also ok.  When I try
to add an observer to a random process:

1. startup FryskGui
2. choose a random process in the processes view
3. right click on the process
4. FryskGui aborts and displays:
Xlib: sequence lost (0x10000 > 0x2080) in reply type 0x0!

runs with option "--sync" will get the same hang, but display a two lines of
different error message: 

Xlib: unexpected async reply (sequence 0x254f) 
Xlib: unexpected async reply (sequence 0x2551)

Attach the process to gdb, will see that a huge number of new threads created:

(gdb) info threads
  9 Thread 4398113800112 (LWP 3118)  0x000000805ce3eafc in .__pthread_cond_wait
    () from /lib64/libpthread.so.0
  8 Thread 4398185234352 (LWP 3119)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  7 Thread 4398185689008 (LWP 3124)  0x000000805dc5920c in .timer_helper_thread
    () from /lib64/librt.so.1
  6 Thread 4398212181936 (LWP 3126)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  5 Thread 4398222667696 (LWP 3127)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  4 Thread 4398233153456 (LWP 3128)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  3 Thread 4398254124976 (LWP 3379)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  2 Thread 4398201696176 (LWP 3380)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  1 Thread 4398046677088 (LWP 3117)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
(gdb) c
Continuing.
[New Thread 4398243639216 (LWP 3385)]
[New Thread 4398264610736 (LWP 3386)]
[New Thread 4398275096496 (LWP 3387)]
[New Thread 4398285582256 (LWP 3388)]
[New Thread 4398299054000 (LWP 3389)]
[New Thread 4398309539760 (LWP 3390)]
[New Thread 4398320025520 (LWP 3392)]
[New Thread 4398330511280 (LWP 3393)]
[New Thread 4398340997040 (LWP 3394)]
[New Thread 4398351482800 (LWP 3395)]
[New Thread 4398361968560 (LWP 3396)]
[New Thread 4398372454320 (LWP 3397)]
[New Thread 4398382940080 (LWP 3398)]
[New Thread 4398393425840 (LWP 3399)]
[New Thread 4398403911600 (LWP 3400)]
[New Thread 4398414397360 (LWP 3401)]
[New Thread 4398424883120 (LWP 3402)]
[New Thread 4398435368880 (LWP 3403)]
[New Thread 4398445854640 (LWP 3404)]
[New Thread 4398456340400 (LWP 3405)]
[New Thread 4398466826160 (LWP 3406)]
[New Thread 4398477311920 (LWP 3407)]
[New Thread 4398487797680 (LWP 3408)]
[New Thread 4398498283440 (LWP 3409)]
[New Thread 4398508769200 (LWP 3410)]
[New Thread 4398519254960 (LWP 3411)]
[New Thread 4398529740720 (LWP 3412)]
[New Thread 4398540226480 (LWP 3413)]
[New Thread 4398550712240 (LWP 3414)]
[New Thread 4398561198000 (LWP 3415)]
[New Thread 4398571683760 (LWP 3416)]
[New Thread 4398582169520 (LWP 3417)]
[New Thread 4398592655280 (LWP 3418)]
[New Thread 4398603141040 (LWP 3419)]
[New Thread 4398613626800 (LWP 3421)]
[New Thread 4398624112560 (LWP 3422)]
[New Thread 4398634598320 (LWP 3423)]
[New Thread 4398645084080 (LWP 3424)]
[New Thread 4398655569840 (LWP 3425)]
[New Thread 4398666055600 (LWP 3426)]
[New Thread 4398676541360 (LWP 3427)]
[New Thread 4398687027120 (LWP 3428)]
[New Thread 4398697512880 (LWP 3429)]
[New Thread 4398707998640 (LWP 3430)]
[New Thread 4398718484400 (LWP 3431)]
[New Thread 4398728970160 (LWP 3432)]
[New Thread 4398739455920 (LWP 3433)]
[New Thread 4398749941680 (LWP 3434)]
[New Thread 4398760427440 (LWP 3435)]
[New Thread 4398770913200 (LWP 3436)]
[New Thread 4398781398960 (LWP 3437)]
[New Thread 4398791884720 (LWP 3438)]
[New Thread 4398802370480 (LWP 3439)]
[New Thread 4398812856240 (LWP 3440)]
[New Thread 4398823342000 (LWP 3441)]
[New Thread 4398833827760 (LWP 3442)]
[New Thread 4398844313520 (LWP 3443)]
[New Thread 4398854799280 (LWP 3444)]
[New Thread 4398865285040 (LWP 3445)]
[New Thread 4398875770800 (LWP 3446)]
[New Thread 4398886256560 (LWP 3448)]
[New Thread 4398896742320 (LWP 3449)]
[New Thread 4398907228080 (LWP 3450)]
[New Thread 4398917713840 (LWP 3451)]
[New Thread 4398928199600 (LWP 3452)]
[New Thread 4398938685360 (LWP 3453)]
[New Thread 4398949171120 (LWP 3454)]
[New Thread 4398959656880 (LWP 3455)]
[New Thread 4398970142640 (LWP 3456)]
[New Thread 4398980628400 (LWP 3457)]
[New Thread 4398991114160 (LWP 3458)]
[New Thread 4399001599920 (LWP 3459)]
[New Thread 4399012085680 (LWP 3460)]
[New Thread 4399022571440 (LWP 3461)]
[New Thread 4399033057200 (LWP 3462)]
[New Thread 4399043542960 (LWP 3463)]
[New Thread 4399054028720 (LWP 3464)]
[New Thread 4399064514480 (LWP 3465)]
[New Thread 4399075000240 (LWP 3466)]
[New Thread 4399085486000 (LWP 3467)]
[New Thread 4399095971760 (LWP 3468)]
[New Thread 4399106457520 (LWP 3469)]

Program received signal SIGINT, Interrupt.
[Switching to Thread 4398046677088 (LWP 3117)]
0x000000805c7a9048 in .__libc_poll () from /lib64/libc.so.6
(gdb) where
#0  0x000000805c7a9048 in .__libc_poll () from /lib64/libc.so.6
#1  0x000000805ca30798 in .XProcessInternalConnection ()
   from /usr/lib64/libX11.so.6
#2  0x000000805ca30bf4 in ._XReadPad () from /usr/lib64/libX11.so.6
#3  0x000000805ca04a40 in .XGetImage () from /usr/lib64/libX11.so.6
#4  0x000000805d8fbf70 in .cairo_xlib_surface_set_drawable ()
   from /usr/lib64/libcairo.so.2
#5  0x000000805d8fc2c4 in .cairo_xlib_surface_set_drawable ()
   from /usr/lib64/libcairo.so.2
#6  0x000000805d8eb408 in .cairo_surface_status ()
   from /usr/lib64/libcairo.so.2
#7  0x000000805d8eb490 in .cairo_surface_status ()
   from /usr/lib64/libcairo.so.2
#8  0x000000805d8ebf48 in .cairo_surface_reference ()
   from /usr/lib64/libcairo.so.2
#9  0x000000805d8ec27c in .cairo_surface_reference ()
   from /usr/lib64/libcairo.so.2
#10 0x000000805d8e19f0 in .cairo_font_options_create ()
   from /usr/lib64/libcairo.so.2
#11 0x000000805d8e1b38 in .cairo_font_options_create ()
   from /usr/lib64/libcairo.so.2
#12 0x000000805d8e1c44 in .cairo_font_options_create ()
   from /usr/lib64/libcairo.so.2
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) info threads
  91 Thread 4399106457520 (LWP 3469)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  90 Thread 4399095971760 (LWP 3468)  0x000000805c7a9048 in .__libc_poll ()
   from /lib64/libc.so.6
  89 Thread 4399085486000 (LWP 3467)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  88 Thread 4399075000240 (LWP 3466)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  87 Thread 4399064514480 (LWP 3465)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  86 Thread 4399054028720 (LWP 3464)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  85 Thread 4399043542960 (LWP 3463)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  84 Thread 4399033057200 (LWP 3462)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  83 Thread 4399022571440 (LWP 3461)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  82 Thread 4399012085680 (LWP 3460)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  81 Thread 4399001599920 (LWP 3459)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  80 Thread 4398991114160 (LWP 3458)  0x000000805ce42d18 in .__lll_lock_wait ()
---Type <return> to continue, or q <return> to quit---
Comment 1 Wu Zhou 2006-02-21 05:04:25 UTC
I made some other debugging attempts.  The number of the thread generated are
almost the same, equaling 420.  Then it will be signaled by a SIGPWR, the
debugging session is like this:

[New Thread 4402556125776 (LWP 1665)]
[New Thread 4402566611536 (LWP 1666)]
[New Thread 4402577097296 (LWP 1667)]
[New Thread 4402587583056 (LWP 1668)]
[New Thread 4402598068816 (LWP 1669)]

Program received signal SIGPWR, Power fail/restart.
[Switching to Thread 4402598068816 (LWP 1669)]
0x000000805ce42d18 in .__lll_lock_wait () from /lib64/libpthread.so.0
(gdb) info threads
* 424 Thread 4402598068816 (LWP 1669)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  423 Thread 4402587583056 (LWP 1668)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  422 Thread 4402577097296 (LWP 1667)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  421 Thread 4402566611536 (LWP 1666)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  420 Thread 4402556125776 (LWP 1665)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  419 Thread 4402545640016 (LWP 1664)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  418 Thread 4402535154256 (LWP 1663)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  417 Thread 4402524668496 (LWP 1662)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  416 Thread 4402514182736 (LWP 1661)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  415 Thread 4402503696976 (LWP 1660)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  414 Thread 4402493211216 (LWP 1659)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  413 Thread 4402482725456 (LWP 1658)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  412 Thread 4402472239696 (LWP 1657)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  411 Thread 4402461753936 (LWP 1656)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  410 Thread 4402451268176 (LWP 1655)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  409 Thread 4402440782416 (LWP 1654)  0x000000805ce42d18 in .__lll_lock_wait ()
   from /lib64/libpthread.so.0
  408 Thread 4402430296656 (LWP 1653)  0x000000805ce42d18 in .__lll_lock_wait ()
---Type <return> to continue, or q <return> to quit---


It seems that too many threads are generated incorrectly, and they issue async
xlib request, and then trigger the sequence lost error.

It seems that this is PPC64 specific problem of some xlib-dependent library. 
Any insight on this?
Comment 2 Wu Zhou 2006-02-21 05:10:24 UTC
If I don't use "--sync" option, it will sometimes trigger a SEGV error, the
backtrace is like this:

(gdb) c
Continuing.
[New Thread 4398185484880 (LWP 2466)]
[New Thread 4398200185424 (LWP 2467)]
[New Thread 4398210671184 (LWP 2468)]
[Thread 4398200185424 (LWP 2467) exited]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4398210671184 (LWP 2468)]
_cairo_surface_show_glyphs (scaled_font=0x10b2f3b0,
operator=CAIRO_OPERATOR_DEST_OUT,
    pattern=0x40009c8bec8, dst=0x0, source_x=10, source_y=2, dest_x=10, dest_y=2,
    width=103, height=10, glyphs=0x10d2e0a0, num_glyphs=13) at cairo-surface.c:1487
1487        if (dst->status)
(gdb) p dst
$1 = (cairo_surface_t *) 0x0
(gdb) bt
#0  _cairo_surface_show_glyphs (scaled_font=0x10b2f3b0,
    operator=CAIRO_OPERATOR_DEST_OUT, pattern=0x40009c8bec8, dst=0x0, source_x=10,
    source_y=2, dest_x=10, dest_y=2, width=103, height=10, glyphs=0x10d2e0a0,
    num_glyphs=13) at cairo-surface.c:1487
#1  0x000000805d8de354 in _cairo_scaled_font_show_glyphs (scaled_font=0x10b2f3b0,
    operator=CAIRO_OPERATOR_DEST_OUT, pattern=0x40009c8bec8, surface=0x0,
source_x=10,
    source_y=2, dest_x=10, dest_y=2, width=103, height=10, glyphs=0x10d2e0a0,
    num_glyphs=13) at cairo-font.c:930
#2  0x000000805d8e0048 in _cairo_gstate_show_glyphs_draw_func
(closure=0x40009c8c160,
    operator=CAIRO_OPERATOR_DEST_OUT, src=0x40009c8bec8, dst=0x0, dst_x=0, dst_y=0,
    extents=0x40009c8c188) at cairo-gstate.c:2053
#3  0x000000805d8e1274 in _cairo_gstate_clip_and_composite (clip=0x10d300a0,
    operator=CAIRO_OPERATOR_DEST_OUT, src=0x40009c8bec8,
    draw_func=@0x805d933900: 0x805d8dfef0 <_cairo_gstate_show_glyphs_draw_func>,
    draw_closure=0x40009c8c160, dst=0x0, extents=0x40009c8c188) at
cairo-gstate.c:1094
#4  0x000000805d8e15b0 in _cairo_gstate_show_glyphs (gstate=0x10d30000,
glyphs=Variable "glyphs" is not available.
)
    at cairo-gstate.c:2131
#5  0x000000805d8d8418 in cairo_show_glyphs (cr=0x10d29ff0, glyphs=Variable
"glyphs" is not available.
) at cairo.c:2158
#6  0x000000805d368efc in .pango_cairo_show_glyph_string ()
   from /usr/lib64/libpangocairo-1.0.so.0
#7  0x000000805d3171e0 in .pango_renderer_draw_glyphs ()
   from /usr/lib64/libpango-1.0.so.0
#8  0x000000805d367fd0 in .pango_cairo_show_glyph_string ()
   from /usr/lib64/libpangocairo-1.0.so.0
#9  0x000000805d98542c in .gdk_draw_layout_line () from
/usr/lib64/libgdk-x11-2.0.so.0
#10 0x000000805d3171e0 in .pango_renderer_draw_glyphs ()
   from /usr/lib64/libpango-1.0.so.0
#11 0x000000805d317794 in .pango_renderer_draw_layout_line ()
   from /usr/lib64/libpango-1.0.so.0
#12 0x000000805d317b90 in .pango_renderer_draw_layout ()
   from /usr/lib64/libpango-1.0.so.0
#13 0x000000805d983c2c in .gdk_draw_layout_with_colors ()
   from /usr/lib64/libgdk-x11-2.0.so.0
---Type <return> to continue, or q <return> to quit---

Comment 3 Andrew Cagney 2006-02-21 18:15:03 UTC
Wu,

BTW, several bugs should interest you here:

- use gtk timer mechanism
http://sourceware.org/bugzilla/show_bug.cgi?id=2352
the timeline widget creates too many threads

- frysk hangs for a bit
http://sourceware.org/bugzilla/show_bug.cgi?id=2299
I think the fix to gcj will get pushed in a few days

- diego's got a mega 64-bit Java-GNOME binding patch pending

- sigsegv in Java-GNOME
http://sourceware.org/bugzilla/show_bug.cgi?id=2333
Comment 4 Wu Zhou 2006-03-03 09:04:45 UTC
No such error with the latest CVS.  I believe the following patch fix that:      

  http://sources.redhat.com/ml/frysk-cvs/2006-q1/msg00355.html
  Replaced librt timer_*() timers with glib g_timeout_add().) 
Comment 5 Wu Zhou 2006-03-03 09:05:47 UTC
Mark this as fixed.