Bug 18263

Summary: In tty tapset, driver_name can be null, causing a script to fail when probing tty.write or tty.read
Product: systemtap Reporter: Max Timchenko <maxvt>
Component: tapsetsAssignee: Unassigned <systemtap>
Status: RESOLVED FIXED    
Severity: normal CC: dsmith
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: kernel_string3 patch
kernel_string_quoted patch

Description Max Timchenko 2015-04-14 15:35:24 UTC
Seeing this error from my script:

ERROR: kernel string copy fault at 0x  (null) [man error::fault] near identifier 'kernel_string' at /usr/share/systemtap/tapset/linux/conversions.stp:18:10

Narrowed down to a tty.write tap. Investigating the parameters by patching the tapset like this: 

probe tty.write = kernel.function("n_tty_write") !,
		  kernel.function("write_chan")
{
        // [mt]
        printf("buf=%p tty=%p\n", $buf, $tty)
        printf("ttydr=%p ttydrn=%p\n", $tty->driver, $tty->driver->driver_name)

I saw this:
buf=0xcf36c800 tty=0xcd358200
ttydr=0xcf952b80 ttydrn=0x0

I'm not sure which driver(s) may be causing this, and if this might be Debian specific.

Script-termination-causing line:
        driver_name = kernel_string($tty->driver->driver_name)

Suggested fix:
	if ($tty->driver->driver_name)
		driver_name = kernel_string($tty->driver->driver_name)
	else
		driver_name = "(none)"

System details:
> uname -a
Linux woot 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) i686 GNU/Linux
> stap -V
Systemtap translator/driver (version 2.6/0.159, Debian version 2.6-0.2)
Comment 1 David Smith 2015-04-14 16:40:11 UTC
Hmm, I'm not quite sure this is the right fix. A better fix might be:

        driver_name = kernel_string2($tty->driver->driver_name, "NULL")

kernel_string2() returns either the string or the 2nd argument.

The only problem with the above fix is that let's say we've got a non-0 address in driver_name, but we still can't read that value (for instance the address isn't valid). Then we've lied and said the value was NULL when it wasn't.

Perhaps we need a kernel_string3() function, which would look like this:

====
function kernel_string3:string (addr:long) {
  try { return kernel_string(addr) } catch { return sprintf("%p", addr) }
}
====

This would either return the string or return the failing address.

Then of course the code in tty.stp would look like:

        driver_name = kernel_string3($tty->driver->driver_name)

(The name 'kernel_string3' could be improved.)
Comment 2 David Smith 2015-04-14 16:45:07 UTC
(In reply to David Smith from comment #1)
> Perhaps we need a kernel_string3() function, which would look like this:
> 
> ====
> function kernel_string3:string (addr:long) {
>   try { return kernel_string(addr) } catch { return sprintf("%p", addr) }
> }
> ====
> 
> This would either return the string or return the failing address.

Strangely enough, the above matches the behavior of kernel_string_quoted, which also either returns the quoted string or the failing address.
Comment 3 David Smith 2015-04-14 18:17:00 UTC
Created attachment 8248 [details]
kernel_string3 patch

Here's a patch that implements the kernel_string3() idea. Let me know if it works for you and what you think of it.
Comment 4 David Smith 2015-04-14 19:23:28 UTC
Created attachment 8249 [details]
kernel_string_quoted patch

Here's a better patch that replaces all use of kernel_string() with kernel_string_quoted() in tty.stp. The problem with kernel_string3() is that it if returned "0x0" you can't know if the string was NULL or if the string was actually "0x0".
Comment 5 Max Timchenko 2015-04-17 15:25:34 UTC
Thanks David, I have patched with attachment 8249 [details] and the error no longer occurs.

I can't see output differences (compared to other options in the thread or my original fix) since my probe does a target set check before printing anything and the troublesome tty does not belong to the target set, but as long as my probe set doesn't bail immediately on start I'm happy.
Comment 6 David Smith 2015-04-21 14:33:41 UTC
Fixed in commit d42a576. This version of the patch will revert to the old behavior if the '--compatible=2.7' option is used.

Max, the difference in output will be that the strings are quoted now.

Thanks for the bug report.