This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Token-level mapping of coverage information and generated code
- From: Simon Richter <Simon dot Richter at hogyros dot de>
- To: binutils at sourceware dot org
- Date: Tue, 3 Mar 2020 19:39:28 +0100
- Subject: Token-level mapping of coverage information and generated code
Hi,
I'd like to get finer-than-line-level information for code coverage and
optimized-out code.
Consider:
extern void foo(void); // 1
int test() // 2
{ // 3
int a = 0, b = 0, c = 1, d = 0; // 4
if( a == b && a == c && b == c) { d = a; } // 5
foo(); // 6
return d; // 7
} // 8
Compiling with gcc -c -O0 and mapping back to source, I get
int test()
{
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
int a = 0, b = 0, c = 1, d = 0;
8: c7 45 f8 00 00 00 00 movl $0x0,-0x8(%rbp)
f: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
16: c7 45 f0 01 00 00 00 movl $0x1,-0x10(%rbp)
1d: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
if( a == b && a == c && b == c) { d = a; }
24: 8b 45 f8 mov -0x8(%rbp),%eax
27: 3b 45 f4 cmp -0xc(%rbp),%eax
2a: 75 16 jne 42 <test+0x42>
2c: 8b 45 f8 mov -0x8(%rbp),%eax
2f: 3b 45 f0 cmp -0x10(%rbp),%eax
32: 75 0e jne 42 <test+0x42>
34: 8b 45 f4 mov -0xc(%rbp),%eax
37: 3b 45 f0 cmp -0x10(%rbp),%eax
3a: 75 06 jne 42 <test+0x42>
3c: 8b 45 f8 mov -0x8(%rbp),%eax
3f: 89 45 fc mov %eax,-0x4(%rbp)
foo();
42: e8 00 00 00 00 callq 47 <test+0x47>
43: R_X86_64_PLT32 foo-0x4
return d;
47: 8b 45 fc mov -0x4(%rbp),%eax
}
4a: c9 leaveq
4b: c3 retq
The finest resolution I can get here is a single line, addr2line reports
the exact same mapping for instruction-to-source-line.
Instrumenting for code coverage and running, I get
1: 2:int test()
-: 3:{
1: 4: int a = 0, b = 0, c = 1, d = 0;
1*: 5: if( a == b && a == c && b == c) { d = a; }
1: 5-block 0
1: 5-block 1
%%%%%: 5-block 2
%%%%%: 5-block 3
1: 6: foo();
1: 6-block 0
1: 7: return d;
-: 8:}
As expected, the condition is resolved into four basic blocks,
corresponding to the three tests and the conditional body. Can I somehow
map these basic blocks back to the tokens in the source file?
Similarly, if I compile with optimization enabled, mapping back to source
code gives me
int test()
{
0: 48 83 ec 08 sub $0x8,%rsp
int a = 0, b = 0, c = 1, d = 0;
if( a == b && a == c && b == c) { d = a; }
foo();
4: e8 00 00 00 00 callq 9 <test+0x9>
5: R_X86_64_PLT32 foo-0x4
return d;
}
9: 31 c0 xor %eax,%eax
b: 48 83 c4 08 add $0x8,%rsp
f: c3 retq
I can get a bit better mapping information by interrogating addr2line to
see what source code lines actually contributed to the output:
$ python -c 'for x in range(0, 16): print hex(x)' | \
addr2line -e test.o | \
cut -d: -f2 | \
uniq
3
6
8
This does omit the initialization of d, but I guess that can't be helped
since it's propagated into the return statement as a constant, which is
probably not that relevant a problem for the real world.
Again, I'd like to get a finer-grained mapping than lines here, so I can
highlight in the source code which code actually got used in the final
output.
As a nasty hack, I can run the source code through "tr ' ' '\n'" before
compiling, which gives me rather good resolution for the coverage test, but
the mapping to subexpressions is somewhat arbitrary, because counters are
associated with control flow inside the expression
1: 28:if(
-: 29:a
-: 30:==
-: 31:b
1: 32:&&
-: 33:a
-: 34:==
-: 35:c
#####: 36:&&
-: 37:b
-: 38:==
-: 39:c)
-: 40:{
-: 41:d
#####: 42:=
-: 43:a;
-: 44:}
Is there some way I could accurately extract information from a run that
allows me to highlight which subexpressions hve been evaluated?
>From the run above, I can possibly get
if( a == b && a == c && b == c) { d = a; }
~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~~~~ ~~~~~~
1 1 - -
which isn't bad, but it could probably be improved. The end goal is to
build reports
"this condition has not been touched by a testcase"
and
"this code is unused and the compiler can prove it"
Simon