This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 00/16] RFC: Embedding as and ld inside gcc driver and into libgccjit


[Crossposting to both gcc-patches and binutils lists, since this
patch kit touches both source trees].

Binutils devs: GCC 5 gained a way to build GCC as a shared library,
libgccjit.so.

I'm been experimenting with ways of optimizing libgccjit, and the
following patch kit (touching both gcc and binutils) achieves a 5x
speedup of
  gcc/testsuite/jit.dg/test-benchmark.c
on this x86_64 box (Fedora 20).

The benchmark constructs IR for a simple function in memory, compiles
it, and runs it, 100 times in a row, in the hope of simulating the
workload of an interpreter/VM/language runtime, where bytecode
functions gradually become "hot" (e.g. interpretation count exceeds
a threshold) and are compiled to machine code, all within one
process.

gcc's backend code emits .s files, and libgccjit currently use pex to
invoke the gcc driver to turn it from .s to a .so file (which in
turn invokes "as" and "ld").

These invocations dominate the time take by libgccjit, so the patch
series attempts to time them, and to move them in-process; doing
so largely eliminates the cost of them.

Here are the performance gains:

jit.dg/test-benchmark.c, 100 iterations at optlevel 0:
 Without embedded driver:      wallclock of 5.300s (0.053s per iteration)
 With embedded driver:         wallclock of 4.630s (0.046s per iteration)
 With embedded driver & gas:   wallclock of 3.510s (0.035s per iteration)
 With embedded driver&as&ld:   wallclock of 2.130s (0.021s per iteration)
 As above, hacking up ld args: wallclock of 1.030s (0.010s per iteration)

i.e. about 5x speedup.

There are some memory leaks, FIXMEs, etc, and it hasn't been fully
tested yet, but I thought it was time to post this for discussion.

The patch kit also generalizes gcc's timevar mechanism in such a way
that it can be used both by jit client code, and by "as" and "ld".  An
example of a combined report on the accumulated timings of 100
iterations of jit.dg/test-benchmark.c at optlevel 0:

Execution times (seconds)
Client items:
 test_jit                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 create_code             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 compile                 :   0.21 (30%) usr   0.13 (45%) sys   0.25 (25%) wall   14939 kB (74%) ggc
 verify_code             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
GCC items:
 phase setup             :   0.15 (22%) usr   0.02 ( 7%) sys   0.15 (15%) wall   10661 kB (53%) ggc
 phase parsing           :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall     653 kB ( 3%) ggc
 callgraph construction  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall     242 kB ( 1%) ggc
 callgraph optimization  :   0.01 ( 1%) usr   0.01 ( 3%) sys   0.01 ( 1%) wall     142 kB ( 1%) ggc
 cfg construction        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      17 kB ( 0%) ggc
 cfg cleanup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 df live regs            :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      23 kB ( 0%) ggc
 register information    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 parser (global)         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     199 kB ( 1%) ggc
 tree eh                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     196 kB ( 1%) ggc
 tree operand scan       :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.00 ( 0%) wall     100 kB ( 0%) ggc
 out of ssa              :   0.00 ( 0%) usr   0.02 ( 7%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 expand                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     398 kB ( 2%) ggc
 loop init               :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      67 kB ( 0%) ggc
 integrated RA           :   0.07 (10%) usr   0.02 ( 7%) sys   0.02 ( 2%) wall    2468 kB (12%) ggc
 LRA virtuals elimination:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      56 kB ( 0%) ggc
 machine dep reorg       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 shorten branches        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 final                   :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     216 kB ( 1%) ggc
 initialize rtl          :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall      12 kB ( 0%) ggc
 rest of compilation     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall     232 kB ( 1%) ggc
 unaccounted todo        :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 replay of JIT client activity:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall     309 kB ( 2%) ggc
 driver                  :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 driver: setup           :   0.04 ( 6%) usr   0.00 ( 0%) sys   0.06 ( 6%) wall       0 kB ( 0%) ggc
 driver: do spec on infiles:   0.01 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 driver: run linker      :   0.00 ( 0%) usr   0.01 ( 3%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 driver: embedded assembler:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 driver: embedded linker :   0.04 ( 6%) usr   0.02 ( 7%) sys   0.04 ( 4%) wall       0 kB ( 0%) ggc
 load JIT result         :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
Embedded 'as':
 gas_main                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 before pass             :   0.03 ( 4%) usr   0.02 ( 7%) sys   0.13 (13%) wall       0 kB ( 0%) ggc
 perform_an_assembly_pass:   0.06 ( 9%) usr   0.01 ( 3%) sys   0.06 ( 6%) wall       0 kB ( 0%) ggc
 after pass              :   0.04 ( 6%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall       0 kB ( 0%) ggc
 cleanup                 :   0.02 ( 3%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall       0 kB ( 0%) ggc
Embedded 'ld':
 ld_internal_main: init  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ldmain.c: lang_final    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ldmain.c: lang_process  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 lang_process: 1st half  :   0.00 ( 0%) usr   0.02 ( 7%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 open_output             :   0.01 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 open_input_bfds         :   0.01 ( 1%) usr   0.02 ( 7%) sys   0.01 ( 1%) wall       0 kB ( 0%) ggc
 lang_input_statement_enum:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 open_input_bfds:load_symbols:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 load_symbols: ldfile_open_file:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 ldlang_add_file         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 load_symbols: bfd_link_add_symbols:   0.02 ( 3%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 lang_process: 2nd half  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 4%) wall       0 kB ( 0%) ggc
 ldmain.c: ldwrite       :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 3%) wall       0 kB ( 0%) ggc
 ld_main cleanup         :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.69             0.29             0.99              20298 kB

Thoughts?

-- 
1.8.5.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]