objdump contribution/development

J Grant 0013499@tay.ac.uk
Tue Apr 10 07:25:00 GMT 2001


Hi

I will try and answer all your questions in order

Corey Brenner wrote:
> 
> Hi,
> 
> I'm a Unix admin, and do a bit of research and whatnot
> on the side at home, and am interested in a related
> topic to what you're proposing.
> 
> I may be wrong, but it seems that you're proposing:
> 
> To automatically reverse-engineer program semantics
> from assembled machine code, so that a workable set
> of sources may be generated in a target language.
> 
> To do this in a multiplatform environment.

yes, initially I would chose one platform, but i would  
write it in a manner so that it was abstract and could have 
other 'hardware' layers added to support other CPU's

> 
> To be able to decode semantics from optimized machine
> instructions.
> 
> While these are laudable goals, there is a better
> way to do it, and I've been investigating that.  Have
> you encountered SDE (Semantic Dictionary Encoding)
> yet?  Michael Franz wrote a doctoral thesis on the
> subject in 1994 that basically says, "Here's this
> thing Wirth is hacking on called Oberon - it's cool."
>
No not seen that one, do you have an address for it?
I have about 9 good papers on this topic, with several by 
Christina Cifuentes.

> I'm considering authoring a BSD-licensed Slim Binary
> environment using ideas gleaned from that thesis, and
> from the work of N. Wirth, but in C instead of in

do you have a paper by N. Wirth?
I could not see it at http://www.oberon.ethz.ch/

> Oberon, so that it is more directly useful for the
> average guy.  I've only just begun contemplating this,
> but I can see some parallels between these ideas and
> what you'd like to do.
> 
> For starters, it would be nearly impossible to
> reverse-
> engineer program semantics in any truly meaningful way
> from already-produced code.  Try to compile optimized
> code on an Intel and an Alpha, then reverse the pro-
> cess, and you'll likely end up with very different
> pseudo-C.
> 
I have been reading verious papers and relish this challenge
I think moving from asm to a 'middle common state' from which
program semantics can be recovered is the best way forward
As a C layer initially.

> Semantic Dictionary Encoding involves establishing a
> growing Huffman dictionary, then encoding the
> program's
> semantics as dictionary indices.  Such a scheme should
> allow at least basic loop structure and variable
> assignments, index increments, function calls, and
> the like to be preserved (i.e., the program's seman-
> tics).  Expanded slightly, the actual real semantics
> of the program could be preserved by preserving var-
> iable and constant names, rather than constant bit-
> patterns.  This would allow for portability (where,
> say, MAP_PRIVATE is presented in program semantics,
> and a program adapter is merged into the binary as
> it is being produced, which gives a value for that
> constant, rather than encoding 0x00100200, and hoping
> that that bit pattern means MAP_PRIVATE on all plat-
> forms).  To return to my point, the produced binaries
> are extremely small, and could be tucked away in a
> segment of the object file.  The semantic dictionary
> could be merged with the symbol table of the object
> files, further reducing resource usage.
> 
Using the Symbol tables and other info if the elf is not 
strip'd etc should provide plenty of infomation
-g etc

> This would, of course, possibly require a different
> binary standard.  And, it would only be available on
> platforms which produced this encoding, which is why
> I'd like to BSD-license it, to promote adoption by
> vendors.
> 

Ok, so do any of the key developers on this list have a link
of what to do to sign up etc? How do I get access to CVS?
Where do I get the form to be signed by my Univeristy to give the
program to the GPL licence etc

Regards

JG

> <end of rant>
> 
> --Corey
> 
> --- J Grant <0013499@tay.ac.uk> wrote:
> > Hello
> > I am an MSc research student and also a keep
> > supporter/developer of
> > linux programs.
> >
> > I have an idea for something I would like to
> > contribute to the objdump
> > program part of binutils.
> >
> > This is part of my planned research area.
> >
> > ---------------------
> > Developing Intelligent Techniques to Facilitate
> > Source Code Recovery
> >
> > There are several situations where this research
> > would be the key to
> > saving substantial time.
> >
> > · Recovery of lost program source code.
> > · Translation of a program from an obsolete language
> > into usable source.
> > · Algorithm recovery.
> > ---------------------
> >
> > I was envisaging something along the lines of
> > --disassemble-pseudo-c on
> > the option list.  I could build upon the
> > --disassemble option and work
> > from there.
> >
> > I had a look on the www.gnu.org site for the form I
> > would need to fill
> > in to contribute to the project but could not find
> > it. I read all the
> > information on code format and documentation etc
> >
> > I have read many technical journal articles on this
> > subject and believe
> > I have a enough ideas / techniques to write a very
> > good
> > extension/program.
> >
> > If people could advise me on what direction to take
> > now it would be
> > appreciated. I could do my research/programming as a
> > separate program
> > but then I would have to repeat stuff like the bfd
> > etc.  I would much
> > rather my work was free for all to use in a GNU
> > package.
> >
> > Regards
> >
> > Jonathan Grant
> 
> __________________________________________________
> Do You Yahoo!?
> Get email at your own domain with Yahoo! Mail.
> http://personal.mail.yahoo.com/



More information about the Binutils mailing list