This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
[PATCH 00/16] RFC: Embedding as and ld inside gcc driver and into libgccjit
- From: David Malcolm <dmalcolm at redhat dot com>
- To: gcc-patches at gcc dot gnu dot org, binutils at sourceware dot org
- Cc: David Malcolm <dmalcolm at redhat dot com>
- Date: Mon, 1 Jun 2015 17:04:08 -0400
- Subject: [PATCH 00/16] RFC: Embedding as and ld inside gcc driver and into libgccjit
- Authentication-results: sourceware.org; auth=none
[Crossposting to both gcc-patches and binutils lists, since this
patch kit touches both source trees].
Binutils devs: GCC 5 gained a way to build GCC as a shared library,
libgccjit.so.
I'm been experimenting with ways of optimizing libgccjit, and the
following patch kit (touching both gcc and binutils) achieves a 5x
speedup of
gcc/testsuite/jit.dg/test-benchmark.c
on this x86_64 box (Fedora 20).
The benchmark constructs IR for a simple function in memory, compiles
it, and runs it, 100 times in a row, in the hope of simulating the
workload of an interpreter/VM/language runtime, where bytecode
functions gradually become "hot" (e.g. interpretation count exceeds
a threshold) and are compiled to machine code, all within one
process.
gcc's backend code emits .s files, and libgccjit currently use pex to
invoke the gcc driver to turn it from .s to a .so file (which in
turn invokes "as" and "ld").
These invocations dominate the time take by libgccjit, so the patch
series attempts to time them, and to move them in-process; doing
so largely eliminates the cost of them.
Here are the performance gains:
jit.dg/test-benchmark.c, 100 iterations at optlevel 0:
Without embedded driver: wallclock of 5.300s (0.053s per iteration)
With embedded driver: wallclock of 4.630s (0.046s per iteration)
With embedded driver & gas: wallclock of 3.510s (0.035s per iteration)
With embedded driver&as&ld: wallclock of 2.130s (0.021s per iteration)
As above, hacking up ld args: wallclock of 1.030s (0.010s per iteration)
i.e. about 5x speedup.
There are some memory leaks, FIXMEs, etc, and it hasn't been fully
tested yet, but I thought it was time to post this for discussion.
The patch kit also generalizes gcc's timevar mechanism in such a way
that it can be used both by jit client code, and by "as" and "ld". An
example of a combined report on the accumulated timings of 100
iterations of jit.dg/test-benchmark.c at optlevel 0:
Execution times (seconds)
Client items:
test_jit : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
create_code : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
compile : 0.21 (30%) usr 0.13 (45%) sys 0.25 (25%) wall 14939 kB (74%) ggc
verify_code : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
GCC items:
phase setup : 0.15 (22%) usr 0.02 ( 7%) sys 0.15 (15%) wall 10661 kB (53%) ggc
phase parsing : 0.02 ( 3%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall 653 kB ( 3%) ggc
callgraph construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall 242 kB ( 1%) ggc
callgraph optimization : 0.01 ( 1%) usr 0.01 ( 3%) sys 0.01 ( 1%) wall 142 kB ( 1%) ggc
cfg construction : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 17 kB ( 0%) ggc
cfg cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
df live regs : 0.02 ( 3%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 23 kB ( 0%) ggc
register information : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
parser (global) : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 199 kB ( 1%) ggc
tree eh : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 196 kB ( 1%) ggc
tree operand scan : 0.00 ( 0%) usr 0.01 ( 3%) sys 0.00 ( 0%) wall 100 kB ( 0%) ggc
out of ssa : 0.00 ( 0%) usr 0.02 ( 7%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
expand : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 398 kB ( 2%) ggc
loop init : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 67 kB ( 0%) ggc
integrated RA : 0.07 (10%) usr 0.02 ( 7%) sys 0.02 ( 2%) wall 2468 kB (12%) ggc
LRA virtuals elimination: 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 56 kB ( 0%) ggc
machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
shorten branches : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall 0 kB ( 0%) ggc
final : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 216 kB ( 1%) ggc
initialize rtl : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 12 kB ( 0%) ggc
rest of compilation : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall 232 kB ( 1%) ggc
unaccounted todo : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall 0 kB ( 0%) ggc
replay of JIT client activity: 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 309 kB ( 2%) ggc
driver : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
driver: setup : 0.04 ( 6%) usr 0.00 ( 0%) sys 0.06 ( 6%) wall 0 kB ( 0%) ggc
driver: do spec on infiles: 0.01 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 2%) wall 0 kB ( 0%) ggc
driver: run linker : 0.00 ( 0%) usr 0.01 ( 3%) sys 0.02 ( 2%) wall 0 kB ( 0%) ggc
driver: embedded assembler: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
driver: embedded linker : 0.04 ( 6%) usr 0.02 ( 7%) sys 0.04 ( 4%) wall 0 kB ( 0%) ggc
load JIT result : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
Embedded 'as':
gas_main : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
before pass : 0.03 ( 4%) usr 0.02 ( 7%) sys 0.13 (13%) wall 0 kB ( 0%) ggc
perform_an_assembly_pass: 0.06 ( 9%) usr 0.01 ( 3%) sys 0.06 ( 6%) wall 0 kB ( 0%) ggc
after pass : 0.04 ( 6%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall 0 kB ( 0%) ggc
cleanup : 0.02 ( 3%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall 0 kB ( 0%) ggc
Embedded 'ld':
ld_internal_main: init : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
ldmain.c: lang_final : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
ldmain.c: lang_process : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
lang_process: 1st half : 0.00 ( 0%) usr 0.02 ( 7%) sys 0.02 ( 2%) wall 0 kB ( 0%) ggc
open_output : 0.01 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
open_input_bfds : 0.01 ( 1%) usr 0.02 ( 7%) sys 0.01 ( 1%) wall 0 kB ( 0%) ggc
lang_input_statement_enum: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
open_input_bfds:load_symbols: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
load_symbols: ldfile_open_file: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
ldlang_add_file : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
load_symbols: bfd_link_add_symbols: 0.02 ( 3%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
lang_process: 2nd half : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 4%) wall 0 kB ( 0%) ggc
ldmain.c: ldwrite : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 3%) wall 0 kB ( 0%) ggc
ld_main cleanup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
TOTAL : 0.69 0.29 0.99 20298 kB
Thoughts?
--
1.8.5.3