Mark Wielaard [Sun, 14 May 2023 21:34:05 +0000 (23:34 +0200)]
Add --with-gdbscripts-dir=PATH configure option
Currently the gdb valgrind scripts are installed under VG_LIBDIR
which is normally pkglibexecdir which is likely not in the default
gdb safe-path (a list of directories from which it is safe to
auto-load files). So users will have to add the directory to their
.gdbinit file.
This patch adds a --with-gdbscripts-dir=PATH configure option that
sets VG_GDBSCRIPTS_DIR to the given PATH (${libexecdir}/valgrind if
not given). A user can also configure --without-gdbscripts-dir to
disable adding a .debug_gdb_scripts section to the vgpreload library
and installing the valgrind-monitor python scripts completely.
Use VG_GDBSCRIPTS_DIR as gdbscriptsdir to install the valgrind-monitor
python files and pass it with CPPFLAGS when building vg_preloaded.c
and vgdb.c to use instead of VG_LIBDIR.
Andreas Arnez [Thu, 26 Jan 2023 16:41:18 +0000 (17:41 +0100)]
s390x: XC instruction: clear in 8-byte increments if possible
The XC instruction is frequently executed in many programs, mainly for
clearing memory. It can target from 1 to 256 bytes. If the size is
constant and XC is actually used for clearing memory, Valgrind implements
it as a byte-wise loop and rolls out the loop for <= 8 bytes.
Instead of clearing byte-wise, it is more efficient to clear in 64-bit
increments, so do this for sizes >= 8 bytes. Roll out the loop for up to
32 bytes. Overall, this reduces the number of insns by a few percent and
provides a slight performance improvement for some programs.
Andreas Arnez [Fri, 5 May 2023 15:48:31 +0000 (17:48 +0200)]
s390x: Optimize CLC for 1, 2, 4, and 8 bytes
The CLC instruction compares two memory areas with sizes from 1 up to 256
bytes. Currently Valgrind always implements it with a bytewise loop.
Add special handling for the sizes 1, 2, 4, and 8. Realize CLC with an
8-, 16-, 32-, and 64-bit integer comparison, respectively, in those cases.
Apart from a slight optimization this also improves the diagnostics for
uninitialized values since it avoids the manufactured conditional jump
that breaks out of the loop over the individual bytes.
Andreas Arnez [Wed, 15 Feb 2023 17:02:37 +0000 (18:02 +0100)]
Bug 465782 - s390x: Avoid __builtin_setjmp
Currently Clang doesn't support __builtin_setjmp() on s390x. Since
Valgrind already has an alternate implementation of setjmp/longjmp for
many other platforms, just add one for s390x as well, to get rid of this
dependency.
Andreas Arnez [Thu, 1 Sep 2022 13:03:01 +0000 (15:03 +0200)]
Bug 465782 - s390x: Drop -mzarch -march=z900 from assembler options
The -mzarch flag is unsupported by Clang, and it is redundant on 64-bit
build systems. Remove it.
Also remove '-march=z900', since it is unsupported by Clang as well. It
would only be needed on build systems with a default architecture lower
than z900. Such systems are out of service for some time now.
Paul Floyd [Fri, 5 May 2023 20:05:36 +0000 (22:05 +0200)]
Add Helgrind and DRD tests and suppressions for getaddrinfo on Linux
Bump version to 3.22.0.GIT
The testcase was posted on the freebsd-hackers mailing list.
I had time to get suppressions for FreeBSD into 3.21 but
ran out of time for the test and Linux suppressions.
I did take a look at how thread sanitizer handles this.
Basically it intercepts the call, turns off checking,
calls the resl function then turns checking back on.
I don't see many other similar examples. Might be worth
looking at dlopen and atexit.
Mark Wielaard [Fri, 28 Apr 2023 11:34:48 +0000 (13:34 +0200)]
Support SCV_FLAG also on VGP_ppc64be_linux
Running on a kernel that supports the SCV instruction (sets
PPC_FEATURE2_SCV in auxv AT_HWCAPS2) valgrind will assert: valgrind:
m_syswrap/syswrap-main.c:549 (getSyscallArgsFromGuestState): Assertion
'gst->guest_syscall_flag == SC_FLAG' failed.
Removing that assert makes most things work. But also filter out
PPC_FEATURE2_SCV from AT_HWCAPS2 for the client, so it shouldn't try
using the SCV instruction.
For all the changes I've made recently. And also various other changes
that occurred over the past 20 years that didn't previously make it into
the docs.
Also, this change de-emphasises the cache and branch simulation aspect,
because they're no longer that useful. Instead it emphasises the
precision and reproducibility of instruction count profiling.
Get rid of cache config warnings with `--cache-sim=no`.
By not configuring the caches in that case. This requires moving a few
assertions around, because they currently assume that the caches are
configured.
And deprecate the use of `cg_diff` and `cg_merge`.
Because `cg_annotate` can do a better job, even annotating source files
when doing diffs in some cases.
The user requests merging by passing multiple cgout files to
`cg_annotate`, and diffing by passing two cgout files to `cg_annotate`
along with `--diff`.
- one more comment at the top describing the three usages of vgdb.
- fixed up a few places where tabs were used for indentation (we are
not very consistent in that either, after the release we'll look
into adopting something like clang-format so you don't have to do
all this by hand).
- Add a missing newline in coregrind/m_main.c to make
none/tests/cmdline2 pass.
Mark Wielaard [Thu, 20 Apr 2023 10:59:02 +0000 (12:59 +0200)]
vgdb: Handle EAGAIN in read_buf
The file descriptor is on non-blocking mode and read_buf should only
be called when poll gave us an POLLIN event signaling the file
descriptor is ready for reading from. Still sometimes we do get an
occasional EAGAIN. Just do as told in that case and try to read again.
Also fix an ERROR errno in getpkt. This has never been observed, but
not getting the actual errno if the write fails in that case would be
really confusing.
Mark Wielaard [Wed, 19 Apr 2023 22:42:40 +0000 (00:42 +0200)]
Bug 439685 compiler warning in callgrind/main.c
main.c: In function 'vgCallgrind_post_syscalltime':
main.c:1779:25: warning: '*((void *)&ts_now+8)'
may be used uninitialized in this function [-Wmaybe-uninitialized]
struct vki_timespec ts_now;
main.c:1779:25: warning: 'ts_now'
may be used uninitialized in this function [-Wmaybe-uninitialized]
In function collect_time the conditional expression in the switch
statement has type int (after integral promotions). GCC assumes that
it may have values other than the ones listed in the enumerated type
it was promoted from. In that case the memory pointed to by its 1st
argument remains unintialised. Later on vki_timespec_diff will read
the contents of ts_now undoditionally. Hence the warning.
Using the default case for the tl_assert () removes the warning and
makes the code more robust should another enumerator ever be added to
Collect_Systime.
PowerPC:, Fix test test_isa_3_1_R1_RT.c, test_isa_3_1_R1_XT.c
Fixes an issue with the PAD_ORI used in the the tests by explicitly adding
SAVE_REGS and RESTORE_REGS macros. The macros ensure that the block of
immediate OR instructions don't inadvertently change the contents of the
registers.
John Reiser suggested that the PAD_ORI asm statements in the PAD_ORI
macro be updated to inform the compiler which register the ori instruction
is clobbering. The compiler will then generate the code to save and
restore the register automatically. This is a cleaner solution then
explicitly adding the macros to store and restore the registers. It is
functionally cleaner in that the value fetched by the instruction under
test is not modified by the PAD_ORI instructions.
This patch removes the SAVE_REG and RESTORE_REG macros and updates the
PAD_ORI macro.
Carl Love [Mon, 17 Apr 2023 21:12:25 +0000 (17:12 -0400)]
PowerPC:, Fix test test_isa_3_1_R1_RT.c, test_isa_3_1_R1_XT.c
Test adds a block of xori instructions for use with the PC relative tests.
The registers used by the xori instructions need to be saved and restored,
otherwise the register changes can impact the execution of the for loops
in the test as registers are randomly changed. The issue occcurs when
GCC is optimizing and inlining the test functions.
Executing vgdb --multi makes vgdb talk the gdb extended-remote
protocol. This means that the gdb run command is supported and
vgdb will start up the program under valgrind. Which means you
don't need to run gdb and valgrind from different terminals.
Also vgdb keeps being connected to gdb after valgrind exits. So
you can easily rerun the program with the same breakpoints in
place.
vgdb now implements a minimal gdbserver that just recognizes
a few extended-remote protocol packets. Once it starts up valgrind
it sets up noack and qsupported then it will forward packets
between gdb and valgrind gdbserver. After valgrind shutsdown it
resumes handling gdb packets itself.
Most notable, the "Function summary" section, which printed one CC for each
`file:function` combination, has been replaced by two sections, "File:function
summary" and "Function:file summary".
These new sections both feature "deep CCs", which have an "outer CC" for the
file (or function), and one or more "inner CCs" for the paired functions (or
files).
Here is a file:function example, which helps show which files have a lot of
events, even if those events are spread across a lot of functions.
```
> 12,427,830 (5.4%, 26.3%) /home/njn/moz/gecko-dev/js/src/ds/LifoAlloc.h:
6,107,862 (2.7%) js::frontend::ParseNodeVerifier::visit(js::frontend::ParseNode*)
3,685,203 (1.6%) js::detail::BumpChunk::setBump(unsigned char*)
1,640,591 (0.7%) js::LifoAlloc::alloc(unsigned long)
711,008 (0.3%) js::detail::BumpChunk::assertInvariants()
```
And here is a function:file example, which shows how heavy inlining can result
in a machine code function being derived from source code from multiple files:
```
> 1,343,736 (0.6%, 35.6%) js::gc::TenuredCell::isMarkedGray() const:
651,108 (0.3%) /home/njn/moz/gecko-dev/js/src/d64/dist/include/js/HeapAPI.h
292,672 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Cell.h
254,854 (0.1%) /home/njn/moz/gecko-dev/js/src/gc/Heap.h
```
Previously these patterns were very hard to find, and it was easy to overlook a
hot piece of code because its counts were spread across multiple non-adjacent
entries. I have already found these changes very useful for profiling Rust
code.
Also, cumulative percentages on the outer CCs (e.g. the 26.3% and 35.6% in the
example) tell you what fraction of all events are covered by the entries so
far, something I've wanted for a long time.
Some other, related changes:
- Column event headers are now padded with `_`, e.g. `Ir__________`. This makes
the column/event mapping clearer.
- The "Cachegrind profile" section is now called "Metadata", which is
shorter and clearer.
- A few minor test tweaks, beyond those required for the output changes.
- I converted some doc comments to normal comments. Not standard Python, but
nicer to read, and there are no public APIs here.
- Roughly 2x speedups to `cg_annotate` and smaller improvements for `cg_diff`
and `cg_merge`, due to the following.
- Change the `Cc` class to a type alias for `list[int]`, to avoid the class
overhead (sigh).
- Process event count lines in a single split, instead of a regex
match + split.
- Add the `add_cc_to_ccs` function, which does multiple CC additions in a
single function call.
- Better handling of dicts while reading input, minimizing lookups.
- Pre-computing the missing CC string for each CcPrinter, instead of
regenerating it each time.
Paul Floyd [Mon, 10 Apr 2023 08:28:58 +0000 (10:28 +0200)]
regtest: warning cleanup
All for clang and mostly Apple clang
There are still numerous deprecated warnings on macOS 10.13
(sem* functions, syscall, sbrk, i386, PIEi, OSSpinLocki, swapcontext, getcontext)
- Move it to `auxprogs/`, alongside `pybuild.sh`.
- Disable the annoying design lints, instead of just modifying the
values (which often requires modifying them again later).