This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: consistent printf race conditions

From: Nicholas Murphy <halcyonic at gmail dot com>
To: Josh Stone <jistone at redhat dot com>
Cc: systemtap at sourceware dot org
Date: Wed, 24 Oct 2012 19:54:43 -0400
Subject: Re: consistent printf race conditions
References: <A2204139-1397-4602-B816-3B4DF79AC9E2@gmail.com> <50887957.7030105@redhat.com>

Scripts below.  Unfortunately, part of what I'm using this script for is figuring out what the workload is, so I can't provide you with a reliable way to replicate it.  I'll try to include some of the surrounding output on the offchance that might help.  FWIW, this is SystemTap 1.3.

-----
probe syscall.close.return {
  uid = uid()
  track_uid =   uid == 42 // normally does more checks like this
  if ((execname() != "stap" && execname() != "salt-minion" && execname() != "stap-io" && track_uid) && $return == 0) {
    // checks if pid and fd are in an associative array
    if (!is_fd_blacklisted(pid(), $fd)) {
        printf("{\"execname\":\"%s\",\"fd\":%d,\"op\":\"%s\",\"pid\":%d,\"ppid\":%d,\"return\":%d,\"timestamp\":%d,\"uid
\":%d}\n",clean_string(execname()),$fd,clean_string("CLOSE"),pid(),ppid(),$return,gettimeofday_ms(),uid())
    }
    unblacklist_fd(pid(), $fd)
    delete write_fds[pid(), $fd]
    delete read_fds[pid(), $fd]
  }   
}
-----
probe kernel.function("sys_wait4").call {
  uid = uid()
  track_uid =   uid == 0
  if ((execname() != "stap" && execname() != "salt-minion" && execname() != "stap-io" && track_uid)) {
    printf("{\"execname\":\"%s\",\"op\":\"%s\",\"options\":%d,\"pid\":%d,\"ppid\":%d,\"timestamp\":%d,\"uid\":%d}\n",cle
an_string(execname()),clean_string("WAITPID_CALL"),$options,$pid,ppid(),gettimeofday_ms(),uid())
  }
}
-----

Here's some output surrounding one of those errors (changed some values for security):
-----
{"execname":"bash","fd":3,"op":"CLOSE","pid":13562,"ppid":13561,"return":0,"timestamp":1347749678387,"uid":1}
{"execname":"bash","op":"PIPE","pid":13562,"pipe0":3,"pipe1":4,"ppid":13561,"return":0,"timestamp":1347749678387,       "uid":1}
{"execname":"bash","op":"FORK","pid":13562,"ppid":13561,"return":15121,"timestamp":1347749678387,"uid":1}
{"execname":"bash","fd":4,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678387,"uid":1}
{"execname":"bash","fd":3,"op":"CLOSE","pid":13562,"ppid":13561,"return":0,"timestamp":1347749678388,"uid":1}
{"execname":"bash","fd":4,"op":"CLOSE","pid":13562,"ppid":13561,"return":0,"timestamp":1347749678388,"uid":1}
{"execname":"bash","op":"WAITPID_CALL","options":10,"pid":-1,"ppid":13561,"timestamp":134{"execname":"bash","fd":       3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":1}
{"args":"\"ls\" \"--color=tty\" \"-l\" \"t\"\"Output/\"","execname":"bash","filename":"<unknown>","op":"EXECVE_CALL","pid":15121,"ppid":13562,"pwd":"somepath","timestamp":1347749678388,"uid":1}
{"execname":"ls","fd":3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":1}
{"execname":"ls","op":"EXECVE_RETURN","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":1}

On Oct 24, 2012, at 7:27 PM, Josh Stone <jistone@redhat.com> wrote:

> On 10/24/2012 01:40 PM, Nicholas Murphy wrote:
>> I've been consistently seeing the following race condition: a printf from a close() system call pre-empts printing of a sys_wait4 hook.  Here's the sys_wait4 printf code:
>> 
>> printf("{\"execname\":\"%s\",\"op\":\"%s\",\"options\":%d,\"pid\":%d,\"ppid\":%d,\"timestamp\":%d,\"uid\":%d}\n",clean_string(execname()),clean_string("WAITPID_CALL"),$options,$pid,ppid(),gettimeofday_ms(),uid())
>> 
>> ...and here's an example of what I end up seeing printed:
>> 
>> {"execname":"bash","op":"WAITPID_CALL","options":10,"pid":-1,"ppid":13561,"timestamp":134{"execname":"bash","fd":3,"op":"CLOSE","pid":15121,"ppid":13562,"return":0,"timestamp":1347749678388,"uid":56441}
>> 
>> I don't see anything in the printf that would cause it to be pre-empted by another printf.  Any idea why I'm seeing this and what I could do to avoid it?  It seems to always be waitpid conflicting with close, so I could perhaps add an access to a common associative array to force synchronization between the two?
> 
> Can you provide a more complete script which demonstrates the error,
> with at least the two close and wait probes?  And what sort of workload
> do you run where it consistently fails?
> 
> I don't know why it would do this, but it will help if I can reproduce
> the corruption locally.
> 
> It would also help if you could attach the output of stap-report.
> 
> Thanks,
> Josh

Follow-Ups:
- Re: consistent printf race conditions
  - From: Frank Ch. Eigler

References:
- consistent printf race conditions
  - From: Nicholas Murphy
- Re: consistent printf race conditions
  - From: Josh Stone

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]