For all the changes I've made recently. And also various other changes
that occurred over the past 20 years that didn't previously make it into
the docs.
Also, this change de-emphasises the cache and branch simulation aspect,
because they're no longer that useful. Instead it emphasises the
precision and reproducibility of instruction count profiling.
Get rid of cache config warnings with `--cache-sim=no`.
By not configuring the caches in that case. This requires moving a few
assertions around, because they currently assume that the caches are
configured.
And deprecate the use of `cg_diff` and `cg_merge`.
Because `cg_annotate` can do a better job, even annotating source files
when doing diffs in some cases.
The user requests merging by passing multiple cgout files to
`cg_annotate`, and diffing by passing two cgout files to `cg_annotate`
along with `--diff`.
- one more comment at the top describing the three usages of vgdb.
- fixed up a few places where tabs were used for indentation (we are
not very consistent in that either, after the release we'll look
into adopting something like clang-format so you don't have to do
all this by hand).
- Add a missing newline in coregrind/m_main.c to make
none/tests/cmdline2 pass.
Mark Wielaard [Thu, 20 Apr 2023 10:59:02 +0000 (12:59 +0200)]
vgdb: Handle EAGAIN in read_buf
The file descriptor is on non-blocking mode and read_buf should only
be called when poll gave us an POLLIN event signaling the file
descriptor is ready for reading from. Still sometimes we do get an
occasional EAGAIN. Just do as told in that case and try to read again.
Also fix an ERROR errno in getpkt. This has never been observed, but
not getting the actual errno if the write fails in that case would be
really confusing.
Mark Wielaard [Wed, 19 Apr 2023 22:42:40 +0000 (00:42 +0200)]
Bug 439685 compiler warning in callgrind/main.c
main.c: In function 'vgCallgrind_post_syscalltime':
main.c:1779:25: warning: '*((void *)&ts_now+8)'
may be used uninitialized in this function [-Wmaybe-uninitialized]
struct vki_timespec ts_now;
main.c:1779:25: warning: 'ts_now'
may be used uninitialized in this function [-Wmaybe-uninitialized]
In function collect_time the conditional expression in the switch
statement has type int (after integral promotions). GCC assumes that
it may have values other than the ones listed in the enumerated type
it was promoted from. In that case the memory pointed to by its 1st
argument remains unintialised. Later on vki_timespec_diff will read
the contents of ts_now undoditionally. Hence the warning.
Using the default case for the tl_assert () removes the warning and
makes the code more robust should another enumerator ever be added to
Collect_Systime.
PowerPC:, Fix test test_isa_3_1_R1_RT.c, test_isa_3_1_R1_XT.c
Fixes an issue with the PAD_ORI used in the the tests by explicitly adding
SAVE_REGS and RESTORE_REGS macros. The macros ensure that the block of
immediate OR instructions don't inadvertently change the contents of the
registers.
John Reiser suggested that the PAD_ORI asm statements in the PAD_ORI
macro be updated to inform the compiler which register the ori instruction
is clobbering. The compiler will then generate the code to save and
restore the register automatically. This is a cleaner solution then
explicitly adding the macros to store and restore the registers. It is
functionally cleaner in that the value fetched by the instruction under
test is not modified by the PAD_ORI instructions.
This patch removes the SAVE_REG and RESTORE_REG macros and updates the
PAD_ORI macro.
Carl Love [Mon, 17 Apr 2023 21:12:25 +0000 (17:12 -0400)]
PowerPC:, Fix test test_isa_3_1_R1_RT.c, test_isa_3_1_R1_XT.c
Test adds a block of xori instructions for use with the PC relative tests.
The registers used by the xori instructions need to be saved and restored,
otherwise the register changes can impact the execution of the for loops
in the test as registers are randomly changed. The issue occcurs when
GCC is optimizing and inlining the test functions.
Executing vgdb --multi makes vgdb talk the gdb extended-remote
protocol. This means that the gdb run command is supported and
vgdb will start up the program under valgrind. Which means you
don't need to run gdb and valgrind from different terminals.
Also vgdb keeps being connected to gdb after valgrind exits. So
you can easily rerun the program with the same breakpoints in
place.
vgdb now implements a minimal gdbserver that just recognizes
a few extended-remote protocol packets. Once it starts up valgrind
it sets up noack and qsupported then it will forward packets
between gdb and valgrind gdbserver. After valgrind shutsdown it
resumes handling gdb packets itself.
Most notable, the "Function summary" section, which printed one CC for each
`file:function` combination, has been replaced by two sections, "File:function
summary" and "Function:file summary".
These new sections both feature "deep CCs", which have an "outer CC" for the
file (or function), and one or more "inner CCs" for the paired functions (or
files).
Here is a file:function example, which helps show which files have a lot of
events, even if those events are spread across a lot of functions.
```
> 12,427,830 (5.4%, 26.3%) /home/njn/moz/gecko-dev/js/src/ds/LifoAlloc.h:
6,107,862 (2.7%) js::frontend::ParseNodeVerifier::visit(js::frontend::ParseNode*)
3,685,203 (1.6%) js::detail::BumpChunk::setBump(unsigned char*)
1,640,591 (0.7%) js::LifoAlloc::alloc(unsigned long)
711,008 (0.3%) js::detail::BumpChunk::assertInvariants()
```
And here is a function:file example, which shows how heavy inlining can result
in a machine code function being derived from source code from multiple files:
```
> 1,343,736 (0.6%, 35.6%) js::gc::TenuredCell::isMarkedGray() const:
651,108 (0.3%) /home/njn/moz/gecko-dev/js/src/d64/dist/include/js/HeapAPI.h
292,672 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Cell.h
254,854 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Heap.h
```
Previously these patterns were very hard to find, and it was easy to overlook a
hot piece of code because its counts were spread across multiple non-adjacent
entries. I have already found these changes very useful for profiling Rust
code.
Also, cumulative percentages on the outer CCs (e.g. the 26.3% and 35.6% in the
example) tell you what fraction of all events are covered by the entries so
far, something I've wanted for a long time.
Some other, related changes:
- Column event headers are now padded with `_`, e.g. `Ir__________`. This makes
the column/event mapping clearer.
- The "Cachegrind profile" section is now called "Metadata", which is
shorter and clearer.
- A few minor test tweaks, beyond those required for the output changes.
- I converted some doc comments to normal comments. Not standard Python, but
nicer to read, and there are no public APIs here.
- Roughly 2x speedups to `cg_annotate` and smaller improvements for `cg_diff`
and `cg_merge`, due to the following.
- Change the `Cc` class to a type alias for `list[int]`, to avoid the class
overhead (sigh).
- Process event count lines in a single split, instead of a regex
match + split.
- Add the `add_cc_to_ccs` function, which does multiple CC additions in a
single function call.
- Better handling of dicts while reading input, minimizing lookups.
- Pre-computing the missing CC string for each CcPrinter, instead of
regenerating it each time.
Paul Floyd [Mon, 10 Apr 2023 08:28:58 +0000 (10:28 +0200)]
regtest: warning cleanup
All for clang and mostly Apple clang
There are still numerous deprecated warnings on macOS 10.13
(sem* functions, syscall, sbrk, i386, PIEi, OSSpinLocki, swapcontext, getcontext)
- Move it to `auxprogs/`, alongside `pybuild.sh`.
- Disable the annoying design lints, instead of just modifying the
values (which often requires modifying them again later).
Mark Wielaard [Sun, 22 Jan 2023 22:18:18 +0000 (23:18 +0100)]
Propagate memory allocation failure to out_of_memory_NORETURN
Provide the user with a hint of what caused an out of memory error.
And explain that some memory policies, like selinux deny_execmem
might cause Permission denied errors.
Add an err argument to out_of_memory_NORETURN. And change
am_shadow_alloc to return a SysRes (all three callers were already
checking for errors and calling out_of_memory_NORETURN).
cg_annotate: use `<unspecified>` for an unspecified filename.
Users shouldn't ever see this, but it's useful to distinguish this
malformed data file case from the missing symbol case (which is still
shown as `???`).
It's currently written in C, but `cg_annotate` and `cg_diff` are written in
Python. It's better to have them all in the same language.
The good news is that the Python code is 4.5x shorter than the C code.
The bad news is that the Python code is roughly 3x slower than the C
code. But `cg_merge` isn't used that often, so I think it's a reasonable
trade-off.
- Every section now has a heading with the long `----` lines above and
below.
- Event names are always shown below that heading, rather than within
it.
- Each Unreadable file now gets its own section, much like files that
lack any data.
Currently their width is mostly hard-wired in a quick and dirty fashion.
This commit does them properly, so:
- all columns are always the right width, even ones with really large
percentages
- things like `( 1.00%)` are now `(1.00%)`
- any percentages that would involve a division by zero now show as
`(n/a)` rather than `( 0.00%)`
Perl was a reasonable choice for `cg_annotate` in 2002, but not in 2023.
Also, the existing structure of the code is not good. These two things
make it hard to modify `cg_annotate` in any significant way.
Benefits of the change:
- Now written in a language that is (a) nice, and (b) not moribund.
- Easier to maintain, due to (a) abovementioned better language, (b)
better code structure, and (c) better language tooling, such as
formatters, type checkers, and linters.
- The new version is a little shorter.
- It runs about 2x faster.
- Argument handling is more standard. E.g. things like `--context 2`,
`--auto`, `--no-auto` are supported. (The old forms that require `=`
are still supported, though the `=yes`/`=no` forms are deprecated.)
The behaviour and output of the new version is identical for typical
uses, but there are some very minor changes for edge cases, which nobody
is likely to notice. For example:
- The file format is slightly changed: I removed support for '.'
counts, which had the same meaning as '0'. This was a feature that
Cachegrind never used, and the old script handled it inconsistently.
- The new version will abort on a malformed data line. The old version
would just print a warning and continue.
The commit also adds a new test `ann3` that tests many parts of
`cg_annotate` that weren't tested previously, and tweaks the existing
`ann2` test.