Bug 14396

Summary: Missing DW_ATE_UTF support (char16_t, char32_t)
Product: systemtap Reporter: Mark Wielaard <mark>
Component: translatorAssignee: Unassigned <systemtap>
Status: RESOLVED FIXED    
Severity: normal CC: jistone
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: Add kernel_string_utf16/32

Description Mark Wielaard 2012-07-24 22:25:51 UTC
elfutils dwarf.h was missing the new DWARF4 DW_ATE_UTF define.
So systemtap also doesn't support this this data encoding.

Example usage:

#include <string.h>

const char *foo = "cow";
const char16_t *bar = u"bear";

int
main ()
{
  if (foo == "bear" && bar == u"cow")
    return 42;

  return 0;
}

$ g++ -g -std=c++0x -o utf utf.cxx

$ stap -e 'probe process.function("main") { log($foo$$); log($bar$$) }' -c ./utf
"cow"
4195852

Would be nice to see the $bar value also decoded.
Comment 1 Josh Stone 2012-07-24 22:39:36 UTC
(In reply to comment #0)
> $ stap -e 'probe process.function("main") { log($foo$$); log($bar$$) }' -c
> ./utf
> "cow"
> 4195852
> 
> Would be nice to see the $bar value also decoded.

dwarf_pretty_print::print_chars() uses user_string2/kernel_string2 to read strings bytewise.  We could add similar functions for UTF-16 and UTF-32 which convert to UTF-8 to make stap strings.
Comment 2 Josh Stone 2012-08-10 00:56:21 UTC
Created attachment 6570 [details]
Add kernel_string_utf16/32

Here's a prototype of what those conversion functions might look like for kernel memory.  The user variants would be the same, just s/kread/uread/.

(And now I see that uread() doesn't exist, but it should...)
Comment 3 Josh Stone 2012-08-11 01:04:22 UTC
15ceae2 loc2c-runtime.h: Add uread() and uwrite()
8987b30 PR14396: Add UTF-16/32 conversion functions
6561d8d PR14396: Add pretty-printing support for UTF

$ stap -e 'probe process.function("main") { log($foo$); log($bar$) }' -c ./utf
"cow"
"bear"