A lean way for getting the size of the instruction at a given address
Zied Guermazi
zied.guermazi@trande.de
Mon Apr 5 22:12:06 GMT 2021
Hi Luis,
yes, it guess it was intended for processing disassemble command. Itwas
not intended to be used in performance critical use cases. Once it was
removed, the next bottle neck is the printf in
get_all_disassembler_options ( a string was used as a mean for passing
options). it consumes 20% of the time.
Shall we put the changes needed to increase the performance in the "etm
for branch tracing" patch set, or in a dedicated one (performance
improvement one). please advicse
/Zied
On 06.04.21 00:04, Luis Machado wrote:
> Hi Zied,
>
> On 4/5/21 6:47 PM, Zied Guermazi wrote:
>> hi Luis,
>>
>> thanks for your support. To experiment the impact of removing the
>> printing of the instruction on the overall performance, I commented
>> out setting and using the print function pointer in print_insn
>> (bfd_vma pc, struct disassemble_info *info, bfd_boolean little) in
>> opcodes/arm-dis.c, and the result was very interesting: The time
>> needed to process the traces dropped down from 12 minutes to 34
>> seconds for 64 MB of traces.
>
> That is quite a bottleneck! I think this code path isn't exercised often.
>
>>
>> now that we have a proof that the bottleneck was printing, we can
>> think about a way to provide a clean implementation.
>
> I agree. A faster implementation of this particular function would be
> nice to have. It may even improve some other code paths that use this
> information.
>
>>
>> Kind Regards
>>
>> Zied Guermazi
>>
>>
>> On 05.04.21 18:40, Luis Machado wrote:
>>> On 4/5/21 1:17 PM, Zied Guermazi wrote:
>>>> hi Luis
>>>>
>>>> A new member function in "class gdb_disassembler" to calculate the
>>>> instruction length only will be a good solution. In fact a big
>>>> overhead is added by the printing of instruction disassembly, which
>>>> is not needed at all. On aarch64, the decoder is optimized to issue
>>>> many instruction in one trace element, and here calculating the
>>>> size consumes more than 80% of the time. On arm, the decoder issues
>>>> one instruction after another and here getting the size consumes
>>>> 50% of the time. Considering the amount of traces this can sum up
>>>> to a dozen of minutes in some cases (64MB of traces)
>>>
>>> Indeed, that doesn't sound good.
>>>
>>>>
>>>> Calculating the instruction size per se, on arm is a "rapid"
>>>> operation and consists of checking few bits in the opcode. So the
>>>> time can be drastically decreased by having a function to calculate
>>>> the size only.
>>>>
>>>>
>>>> gdb_print_insn can be then changed as following (pseudo code):
>>>>
>>>> int
>>>> gdb_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
>>>> struct ui_file *stream, int *branch_delay_insns)
>>>> {
>>>>
>>>> gdb_disassembler di (gdbarch, stream);
>>>>
>>>> if ( di.get_insn_size != 0)
>>>>
>>>> return di.get_insn_size(memaddr);
>>>>
>>>> else
>>>>
>>>> return di.print_insn (memaddr, branch_delay_insns);
>>>> }
>>>>
>>>> Is there a function in aarch64-tdep or arm-tdep doing job of
>>>> disassembly ( the lower layer handling the opcode)? are we relaying
>>>> on the bfd library for it? can someone give me a hint of where to
>>>> find those functions?
>>>
>>> The gdbarch hooks in arm-tdep.c (gdb_print_insn_arm) and
>>> aarch64-tdep.c (aarch64_gdb_print_insn) are more like helper
>>> functions and do some initial setup, but the code to disassemble
>>> lies in opcodes/arm-dis.c (print_insn) and opcodes/aarch64-dis.c
>>> (print_insn_aarch64).
>>>
>>> If you go with the route of changing "class gdb_disassembler", then
>>> you'll probably need to touch binutils/opcodes.
>>>
>>> If you decide to have a gdbarch hook (in arm-tdep/aarch64-tdep),
>>> then you only need to change GDB.
>>>>
>>>>
>>>> Kind Regards
>>>>
>>>> Zied Guermazi
>>>>
>>>>
>>>> On 05.04.21 15:01, Luis Machado wrote:
>>>>> Hi Zied,
>>>>>
>>>>> On 4/4/21 4:59 AM, Zied Guermazi wrote:
>>>>>> hi
>>>>>>
>>>>>> I need to get the size of the instruction at a given address. I
>>>>>> am currently using gdb_insn_length (struct gdbarch *gdbarch,
>>>>>> CORE_ADDR addr) which calls gdb_print_insn (struct gdbarch
>>>>>> *gdbarch, CORE_ADDR memaddr, struct ui_file *stream, int
>>>>>> *branch_delay_insns). and this is consuming a huge time,
>>>>>> considering that this is used in branch tracing and this gets
>>>>>> repeated up to few millions times.
>>>>>>
>>>>>>
>>>>>> Is there a lean way for getting the size of the instruction at a
>>>>>> given address, I am using it for aarch64 and arm targets.
>>>>>
>>>>> At the moment I don't think there is an optimal solution for this.
>>>>> The instruction length is calculated as part of the disassemble
>>>>> process, and is tied to the function that prints instructions.
>>>>>
>>>>> One way to speed things up is to have a new member function in
>>>>> "class gdb_disassembler" to calculate the instruction length only.
>>>>>
>>>>> Another way is to have a new gdbarch hook that calculates the size
>>>>> of an instruction based on the current PC, mapping symbols etc.
>>>>>
>>>>>>
>>>>>> Kind Regards
>>>>>>
>>>>>> Zied Guermazi
>>>>>>
>>>>>>
>>>>
>>>>
>>
More information about the Gdb
mailing list