[PATCH 2/4] nptl: Handle EPIPE on tst-cancel2

Adhemerval Zanella adhemerval.zanella@linaro.org
Wed Aug 21 19:41:00 GMT 2019



On 20/08/2019 13:46, Adhemerval Zanella wrote:
> 
> 
> On 20/08/2019 12:30, Florian Weimer wrote:
>> * Adhemerval Zanella:
>>
>>> For tst-cancel2.c, if I add a sleep (1) between pthread_create and 
>>> pthread_cancel you can see this issue more clearly (dump with strace):
>>>
>>> [pid  2587] set_robust_list(0x7fffabccf290, 24) = 0
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000) = 100000
>>> [pid  2587] write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 100000 <unfinished ...>
>>> [pid  2586] <... nanosleep resumed>0x7ffff0c9e7f0) = 0
>>>
>>> ########### Cancellation start to act here, by loading the libgcc to unwinding
>>> [pid  2586] open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 5
>>> [pid  2586] fstat(5, {st_mode=S_IFREG|0644, st_size=63776, ...}) = 0
>>> [pid  2586] mmap(NULL, 63776, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7fffabf00000
>>> [pid  2586] close(5)                    = 0
>>> [pid  2586] open("/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 5
>>> [pid  2586] read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\25\0\1\0\0\0\340+\0\0\0\0\0\0"..., 832) = 832
>>> [pid  2586] fstat(5, {st_mode=S_IFREG|0755, st_size=133696, ...}) = 0
>>> [pid  2586] mmap(NULL, 197688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7fffab480000
>>> [pid  2586] mmap(0x7fffab4a0000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0x10000) = 0x7fffab4a0000
>>> [pid  2586] close(5)                    = 0
>>> [pid  2586] mprotect(0x7fffab4a0000, 65536, PROT_READ) = 0
>>> [pid  2586] munmap(0x7fffabf00000, 63776) = 0
>>> [pid  2586] tgkill(2586, 2587, SIGRTMIN) = 0
>>> [pid  2586] close(3)                    = 0
>>> [pid  2586] futex(0x7fffabccf280, FUTEX_WAIT, 2587, NULL <unfinished ...>
>>>
>>> ########### Write returns with broken PIPE and __pthread_disable_asynccancel is called
>>> [pid  2587] <... write resumed>)        = -1 EPIPE (Broken pipe)
>>> [pid  2587] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=2586, si_uid=61684} ---
>>> [pid  2587] --- SIGRTMIN {si_signo=SIGRTMIN, si_code=SI_TKILL, si_pid=2586, si_uid=61684} ---
>>> [pid  2587] futex(0x7fffab4b0224, FUTEX_WAKE_PRIVATE, 2147483647) = 0
>>>
>>> ########### No side-effects reported back to program
>>> [pid  2587] madvise(0x7fffab4c0000, 8257536, MADV_DONTNEED) = 0
>>> [pid  2587] exit(0)                     = ?
>>>
>>> With BZ#12683 fix the cancellation is not acted upon and the testcase then fails
>>> depending whether the write is interrupted or not by the cancellation signal.
>>
>> Hmm.  Which cancellation implementation is this?  At which point in the
>> trace do we start unwinding?  I'm surprised that strace reports the
>> EPIPE before the SIGPIPE, but maybe that's just a kernel race.  My
>> expectation is that the current code unwinds after the system call
>> returns with the EPIPE error, never returning it to the application.  I
>> think this is the right behavior for the write system call.
> 
> This is current implement, more specifically glibc 2.17, CentOS 7.6 on
> powerpc64le.  From the trace :
> 
>>> [pid  2586] tgkill(2586, 2587, SIGRTMIN) = 0
> 
> This where pthread_cancel sends the SIGCANCEL signal to thread.
> 
>>> [pid  2586] close(3)                    = 0
> 
> This is the
> 
>     /* This will cause the write in the child to return.  */
>     close (fd[0]);
> 
> In tst-cancel2.c.
> 
> And finally:
> 
>>> [pid  2587] <... write resumed>)        = -1 EPIPE (Broken pipe)
>>> [pid  2587] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=2586, si_uid=61684} ---
> 
> SIGPIPE is receives, making the write fail with EPIPE and then
> 
>>> [pid  2587] --- SIGRTMIN {si_signo=SIGRTMIN, si_code=SI_TKILL, si_pid=2586, si_uid=61684} ---
> 
> sigcancel_handler is issued.  And the implementation *does* unwind after the
> syscall is done, the problem is it ignores -1/EPIPE (and it is BZ#12683).
> 

Florian, do still disagree with this change?



More information about the Libc-alpha mailing list