This is the mail archive of the cgen@sources.redhat.com mailing list for the CGEN project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Types and other issues with cgen

From: Doug Evans <dje at transmeta dot com>
To: Michael Meissner <cgen-mail at the-meissners dot org>
Cc: cgen at sources dot redhat dot com
Date: Wed, 6 Aug 2003 10:27:25 -0700 (PDT)
Subject: Types and other issues with cgen
References: <20030806024506.GA12937@tiktok.the-meissners.org>
Michael Meissner writes:
 > I've been looking at the internal types used within cgen, and I wanted to get
 > some comments before I start making wholesale changes.  Sorry for the length,
 > but I thought it is important to talk about the issues (#1, #4, and #8 are
 > minor issues).
 > 
 > 1) Cgen uses the PARAMS macro to selectively hide prototypes.  Given that both GCC
 >    and BINUTILS now require a C90 compiler with prototypes, would patches that go
 >    through and compeletely prototype things be accepted?

Yep.

 > 2) Cgen has a type mechanism (DI/SI/etc.) but it doesn't seem to be used in the
 >    actual code for at least the assembler and disassembler

Using the modes in the assembler/disassembler isn't the right way to go.
These modes are for semantic operation, not assembly/disassembly.
Imagine some instruction with an immediate operand that is a fixed
set of constants that is encoded with special magic numbers.
Register indices are another example.
There's a disconnect between representation in the instruction
and use during semantic evaluation.

 > (I haven't gotten to sim/sid yet).

Modes are definately used in simulation.

 >    All fields in the cgen_fields structure are signed long, no
 >    matter what the type that I declare in the .cpu file is.  In part this seems
 >    to be because extract_normal and friends take an address of the field to
 >    fill, and return 0/1 for error and success.  Wouldn't a better approach be
 >    to size & type the fields as the user specified, and make the extract
 >    functions return the extracted value and return error/success via a
 >    pointer.  I could see either separate extractor functions for each type, or
 >    signed/unsigned extractor functions of the widest type, or just a single
 >    extract function being used.

Either way one has to have multiple functions per type
(unless of course one used a union or some such),
regardless of whether the pass/fail indicator is the result
or returned via a pointer to it.

Having multiple variants of the internal extract_normal routine
is an increment in complication I haven't needed yet so I've been
defering it.

Note that there are already functions that have multiple variants
dependent on type.  See for example m32r_cgen_[gs]et_{int,vma}_operand
in opcodes/m32r-ibld.c.
These functions aren't currently used by any binutils program.
They're services offered to programs outside of binutils.

 > 3) Signed long is another problem in that the machine I'm targeting is a 64-bit
 >    machine, but I am doing development on an x86 machine.  If we keep to a
 >    single type, it should be at least bfd_signed_vma which will be the
 >    appropriate size to hold addresses in the target machine.  This will mean
 >    having to rewrite the places that just call printf or the print functions,
 >    but that is not too difficult.  Another possibility is to use a cgen
 >    specific type (or two types for signed/unsigned) that is sized to be as
 >    large as the largest type used in the .cpu file.  Ideally for 32-bit ports
 >    on 32-bit hosts, you would not slow things down by using 64 bit types
 >    blindly, but it would allow those of us developing for larger hosts to
 >    use cgen.

For assembly/disassembly purposes the issue is what is the maximum
size of a "word" in the instruction's representation?
And for the sake of [V]LIW machines let's keep separate the notion
of individual instructions inside one collection of instructions
(or in Transmeta parlance: atoms and molecules (whoop dee doo)).

I'm assuming/hoping you can pack each instruction separately
and then combine them at the end, and for now do the final packing
(or initial unpacking for disassembly) outside of cgen.

 >    There are machines out there with 128 bit registers, such as the MIPS chip
 >    that is at the heart of the Sony playstation, the SES2 registers on the
 >    Pentium IV, and the Altivec registers on the newer Powerpcs.  However, C
 >    compilers don't often times give 128 bit types.  We might want to think
 >    about how to handle these machines as well.  In terms of instruction size, I
 >    do have a 86 bit instruction which pushes the problem also.  This may
 >    require using gmp if needed.  Too bad, we aren't coding in C++, where we
 >    could just define a class type to get the extra precision.

cgen based simulators (written in C) can already handle simulating
architectures with 64 bit values on hosts where the compiler doesn't
have long long (with C++ there's less of an issue).
Dunno how often it is used, so no claim is made that there isn't bitrot
or that it's complete, but it was tested way back when.
Grep for HAVE_LONGLONG in sim/common/cgen-types.h.

Semantics modes are to some extent black boxes.
As new modes become needed we can add them.
A simulator on a host with a compiler that can't represent them
can represent them as a struct and provide the necessary
manipulators of that struct. (for c++ s/struct/class/ if you prefer)
No claim is made that the addition will be a walk in the park,
but that's the plan-of-record.

 > 4) As a nit, we use unsigned int for the hash type, and I suspect it might be
 >    cleaner if we had a cgen specific type for holding hash values (ie,
 >    cgen_hash_t).

Sure.  An increment in complication I was defering.
One might want to add to the name the context in which it is used.
Cgen might want to use different kinds of hashes in different contexts.

 > 5) As an experiment, I compiled cgen with -Wconversion, and it showed a lot of
 >    places where implicit signed<->unsigned conversions were going on.  A lot of
 >    the places were using int to hold sizes like buffer lengths, and passing
 >    sizeof(...) to the value, and size_t would be more useful.  Unfortunately it
 >    also shows other places where having a single type for the fields (such as
 >    long currently, or bfd_signed_vma/cgen_int_t possibly in the future).  One
 >    of my thoughts is to have a union of an appropriate unsigned and signed
 >    types of the same size, and use the appropriate element in the expansion.

Removing the warnings would certainly be a good idea, though this
particular warning doesn't always have a high signal/noise ratio.

 > 6) Using bfd_put_bits and bfd_get_bits to convert the bits into proper endian
 >    format only works for bit sizes of 8, 16, 32, and 64.  In all other places,
 >    bfd aborts (my machine has mostly 43 bit instructions, and 1 86 bit
 >    instruction before the encoding mentioned in #7).  It might be better to
 >    open code this, rather than falling back to the bfd functions.
 > 
 >    Another idea is to always encode instructions expressed as a series of bytes
 >    in big endian (or little endian) format, and then expect the final assembler
 >    encoding to do the appropriate copying.  Otherwise, I see a lot of code that
 >    checks the endianess to get the correct byte.

A final assembly pass to do the appropriate copying isn't necessarily
a slam-dunk.

The asm/disasm side of cgen currently has two modes of representing
instructions: as an "int" in host byte order, or as a string of bytes
in target byte order.

 > 7) As I have mentioned in the past, my machine uses 3 43-bit instructions that
 >    are encoded into a 128 bit super instruction.  Any ideas for the syntax for
 >    specifying the encode/decode operations?

I'm not sure I understand.  In what context?

 > 8) The @arch@_cgen_hw_table uses (PTR) in initializing the asm_data field.
 >    This makes debugging harder.  Would it be possible to have 2 fields so that
 >    each member is correctly typed, and you can print out pointers in the
 >    debugger?

2 fields?  How would it look?
(it's certainly a useful thing to do, and I'd say go for it,
but I'm not clear what the result would look like)

Note that things are currently not totally hopeless.
One could print the value and then say "info sym <value>",
and then print the variable gdb gives.

 > So, suggestions on how you would like me to extend cgen to handle the problems
 > my machine exposes?

For assembly/disassembly, I need to think about it for a bit.
I think what we need to do is be able to handle each insn
individually and handle packing/unpacking outside of cgen (for now).
That reduces the problem to handling how "words" are layed out
in each individual insn(atom).  Since we're dealing with 43 bit
entities (or 2*43 bits), I'm wondering if treating them as
64 bit entities for packing/unpacking will work.
(how the 2*43 bits case would be handled would depend on the details
I guess, maybe 2*64 bits or maybe 32+64 bits).

I'm guessing studying how to handle ia64 would suffice.

 > My initial thoughts are to use a cgen specific type for the types.  The first
 > round would use bfd_vma/bfd_signed_vma, but eventually size the type based on
 > the maximum size used in the .cpu file.  I'm thinking of using the union with
 > signed and unsigned fields, to deal with many of the conversion issues.

If after reading the above you still think this is the way to go,
let's discuss it further.
Follow-Ups:
- Re: Types and other issues with cgen
  - From: Michael Meissner
- Types and other issues with cgen
  - From: Doug Evans
References:
- Types and other issues with cgen
  - From: Michael Meissner
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]