Disassembler/Decompiler using libbfd

Jakub Zawadzki darkjames@darkjames.ath.cx
Wed Mar 28 02:19:00 GMT 2007


Hello,
Last time I though to write simple asm to C decompiler, (or just
simple disassembler which can guess functions, args for functions,
display imports from other shared library, etc..)
I don't like idea about implementing inside my program all opcodes from
intel x86, and other arch... So I though about libbfd. I look at objdump
sources, cleanup interesting stuff and i have 272lines disassembler..
It's great using libbfd for this stuff.. But now I stuck.

I can display decoded opcodes+args on the screen/on file. I can display 
all decoded opcodes from one call to leave/retn/another call (using
sscanf()). Ok, but I don't have any idea how should I parse decoded opcodes + args.. 
I need to make it more parsable. [Not sscanf()!]

Yeah, I know I can do:
	init_disassemble_info(&info, my_own_data, (fprintf_ftype) my_own_function);
And i do, and

When i just print format and pipe it to sort and uniq.
I get only: "%s" and "," in format,

So we would have:
/* lock_opcode global variable */

lock_opcode = 1;		/* lock -> new opcode */
(*disassemble_fn)(section->vma + start_offset, &info); /* call disassemble_fn */
lock_opcode = 0;		/* unlock */

and then in my_own_function() 

we can check if lock_opcode = 1, than we'd have decoded name of opcode.

than we'd have first param [if avail], Than if opcode has more params.
We'd have: ',' next_param_in_%s, ',' next_param_in_%s, etc...
till we'd have lock_opcode = 0.

First I though it'd be hack.. Now I think it's quite good idea, but I
don't know if on every file, on every arch it'd happen.
[For now I'm trying to dissasm win32 PE file, good file, not `broken`. 
`Broken` -> obfuscated, compressed, etc.. are not in my concern at
whole. I only want to dissasm/decompile good files]

So I have some questions:
 - If this method is acceptable to do decode first opcode than args - 
 	If all arch-system-opcode-decoders work this way?
 - If libbfd can/shouldn't be used this way [For writting
   decompiler/disassembler]
 - If there's other way to do what I want. I don't know maybe something
   from: disassemble_info struct, there's some *results of instruction
   decoders.*

I don't really like idea of copying code from bfd, or by implementing my
own instruction translators.. [SPOT rule, inventing wheel once again,
etc.. really, really bad idea :(]

I would be grateful for any reply. Even: `it's senseless/stupid/whatever
to write decompiler`

My english is not best, so if you don't understand some part or big
part or event whole thing. Sorry. I'll try to explain again.



More information about the Binutils mailing list