[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1. Introduction to CGEN

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1 Overview

CGEN is a project to provide a framework and toolkit for writing cpu tools.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1.1 Goal

The goal of CGEN (pronounced seejen, and short for "Cpu tools GENerator") is to provide a uniform framework and toolkit for writing programs like assemblers, disassemblers, and simulators without explicitly closing any doors on future things one might wish to do. In the end, its scope is the things the software developer cares about when writing software for the cpu (compilation, assembly, linking, simulation, profiling, debugging, ???).

Achieving the goal is centered around having an application independent description of a CPU (plus environment, like ABI) that applications can then make use of. In the end that's a lot to ask for from one language. What applications can or should be able to use CGEN is left to evolve over time. The description language itself is thus also left to evolve over time!

Achieving the goal also involves having a toolkit, libcgen, that contains a compiled form of the cpu description plus a suite of routines for working with the data. (1)

CGEN is not a new idea. Some GNU ports have done something like this – for example, the SH port in its early days. However, the idea never really “caught on”. CGEN was started because I think it should.

Since CGEN is a very ambitious project, there are currently lots of things that aren't written down, let alone implemented. It will take some time to flush all the details out, but in and of itself that doesn't necessarily mean they can't be flushed out, or that they haven't been considered.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1.2 Why do it?

I think it is important that GNU assembler/disassembler/simulator ports be done from a common framework. On some level it's fun doing things from scratch, which was and still is to a large extent current practice, but this is not the place for that.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1.3 Maybe it should not be done?

However, no one has yet succeeded in pushing for such an extensive common framework.(2)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1.4 How ambitious is CGEN?

CGEN is a very ambitious project, as future projects can be:

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] More complicated simulators

Current CGEN-based simulators achieve their speed by using GCC's "computed goto" facility to implement a threaded interpreter. The "main loop" of the cpu engine is contained within one function and the administrivia of running the program is reduced to about three host instructions per target instruction (one to increment a "virtual pc", one to fetch the address of code that implements that next target instruction, and one to branch to it). Target instructions can be simulated with as few as seven(3) instructions for an "add" (load address of src1, load src1, load address of src2, load src2, add, load address of result, store result). So ignoring overhead (which is minimal for frequently executed code) that's ten host instructions per "typical" target instruction. Pretty good.(4)

However, things can still be better. There is still some implementation related overhead that can be removed. The two instructions to branch to the next instruction would be unnecessary if instruction executors were concatenated together. The fetching and storing of target registers can be reduced if target registers were kept in host registers across instruction boundaries (and the longer one can keep them in host registers the better). A consequence of both of these improvements is the number of memory operations is drastically reduced. There isn't a lot of ILP in the simulation of target instructions to hide memory latencies. Another consequence of these improvements is the opportunity to perform inter-target-instruction scheduling of the host instructions and other optimizations.

There are two ways to achieve these improvements. Both involve converting basic blocks (or superblocks) in the target application into the host instruction set and compiling that. The first way involves doing this "offline". The target program is analyzed and each instruction is converted into, for example, C code that implements the instruction. The result is compiled and then the new version of the target program is run.

The second way is to do the translation from target instruction set to host instruction set while the target program is running. This is often referred to as JIT (Just In Time) simulation (FIXME: proper phrasing here?). One way to implement this is to simulate instructions the way existing CGEN simulators do, but keep track of how frequently a basic block is executed. If a block gets executed often enough, then compile a translation of it to the host instruction set and switch to using that. This avoids the overhead of doing the compilation on code that is rarely executed. Note that here is one place where a dual cpu system can be put to good use. One cpu handles the simulation and the other handles compilation (translating target instructions to host instructions). CGEN can(5) handle a large part of building the JIT compiler because both host and target architectures are recorded in a way that is amenable to program manipulation.

A hybrid of these two ways is to translate target basic blocks to C code, compile it, and dynamically load the result into the running simulation. Problems with this are that one must invoke an external program (though one could dynamically load a special form of C compiler I suppose) and there's a lot of overhead parsing and optimizing the C code. On the other hand one gets to take full advantage of the compiler's optimization technology. And if the application takes a long time to simulate, the extra cost may be worthwhile. A dual cpu system is of benefit here too.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Profiling tools

It is useful to know how well an architecture is being utilized. For one, this helps build better architectures. It also helps determine how well a compilation system is using an architecture.

CGEN-based simulators already compute instruction frequency counts. It's straightforward to add register frequency counts. Monitoring other aspects of the ISA is also possible. The description file provides all the necessary data, all that's needed is to write a generator for an application that then performs the desired analysis.

Function unit, pipeline, and other architecture implementation related items requires a lot more effort but it is doable. The guideline for this effort is again coming up with an application-independent specification of these things.

CGEN does not currently support memory or cache profiling. Obviously they're important, and support may be added in the future. One thing that would be straightforward to add is the building of trace data for usage by cache and memory analysis tools. The point though is that these tools won't benefit much from CGEN's existence.

Another kind of profiling tool is one that takes the program to be profiled as input, inserts profiling code into it, and then generates a new version of the program which is then run.(6) Recorded in CGEN's description files should be all the necessary ISA related data to do this. One thing that's missing is code to handle the file format and relocations.See section ABI description.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Program analysis tools

Related to profiling tools are static program analysis tools. By this I mean taking machine code as input and analyzing it in some way. Except for symbolic information (which could come from BFD or elsewhere), CGEN provides enough information to analyze machine code, both the raw instructions *and* their semantics. Libcgen should contain all the basic tools for doing this. (7)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] ABI description

Several tools need knowledge of not only a cpu's ISA but also of the ABI in use. I think(!) it makes sense to apply the same goals that went into CGEN's architecture description language to an ABI description language: specify the ABI in an application independent way and then have a basic toolkit/library that provides ways of using that data. It might be useful to also allow the writing of program generators for applications that want more than what the toolkit/library provides. Perhaps not, but the basic toolkit/library should, again I think, be useful.

Part of what an ABI defines is the file format and relocations. This is something that BFD is built for. I think a BFD rewrite should happen and should be based, at least in part, on a CGEN-style ABI description. This rewrite would be one user of the ABI description, but certainly not the only user. One problem with this approach is that BFD requires a lot of file format specific C code. I doubt all of this code is amenable to being described in an application independent way. Careful separation of such things will be necessary. It may even be useful to ignore old file formats and limit such a BFD rewrite to ELF (not that ELF is free from such warts, of course).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Machine generated architecture reference material

Engineers often need to refer to architecture documentation. One problem is that there's often only so many hardcopy manuals to go around. Since the CPU description contains a lot of the information engineers need to find it makes sense to convert that information back into a readable form. The manual can then be online available to everyone. Furthermore, each architecture will be documented using the same style making it easier to move from architecture to architecture.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Tools like what NJMCT provides

NJMCT is the New Jersey Machine Code Toolkit. It focuses exclusively on the encoding and decoding of instructions. [FIXME: wip, need to say more].

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Input to a compiler backend

One can define a GCC port to include these four things:

The CGEN description provides all of the cpu architecture description that the compiler needs. However, the current design of the CPU description language is geared towards going from machine instructions to semantic content, whereas what a compiler wants is to do is go from semantic content to machine instructions, so in the end this might not be a reasonable thing to pursue. On the other hand, that problem can be solved in part by specifying two sets of semantics for each instruction: one for the compiler side of things, and one for the simulator side of things. Frequently they will be the same thing and thus need only be specified once. Though specifying them twice, for the two different contexts, is reasonable I think. If the two versions of the semantics are used by multiple applications this makes even more sense.

The planned rewrite of model support in CGEN will support whatever the compiler needs for the implementation description.

Compilers also need to know the target's ABI, which isn't relevant for an architecture description. On the other hand, more than just the compiler needs knowledge of the ABI. Thus it makes sense to think about how many tools there are that need this knowledge and whether one can come up with a unifying description of the ABI. Hence one future project is to add the ABI description to CGEN. This would encompass in essence most of what is contained in the System V ABI documentation.

That leaves the "miscellaneous" part. Essentially this is a catchall for whatever else is needed. This would include things like include file directory locations, port-specific language features, ???. There's not much need to include this info in CGEN, it's pretty esoteric and generally useful to only a few applications.

One can even envision a day when GCC emits object files directly. The instruction description contains enough information to build the instructions and the ABI support would provide enough information on relocations and object file formats.

Debugging information should be treated as an orthogonal concept. At present it is outside the scope of CGEN, though clearly the same reasoning behind CGEN applies to debugging support as well.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ] Hardware/software codesign

This section isn't very well thought out – not much time has been put into it. The thought is that some interface with VHDL/Verilog could be created that would assist hw/sw codesign.

Another related application is to have a feedback mechanism from the compilation system that helps improve the architecture description (both CGEN and HDL). CGEN descriptions for experimental instructions could be added, and a new set of compilation tools quickly regenerated. Then experiments could be run analyzing the effectiveness of the new instructions.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.1.5 What's missing that should be there someday?

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2 CPU description language

The goal of CGEN is to provide a uniform and extensible framework for doing assemblers/disassemblers and simulators, as well as allowing further tools to be developed as necessary.

With that in mind I think the place to start is in defining a CPU description language that is sufficiently powerful for all the current and perceived future needs: an application independent description of the CPU. From the CPU description, tables and code can be generated that an application framework can then use (e.g. opcode table for assembly/disassembly, decoder/executor for simulation).

By "application independence" I mean the data is recorded in a way that doesn't intentionally close any doors on uses of the data. One example of this is using RTL to describe instruction semantics rather than, say, C. The assembler can also make use of the instruction semantics. It doesn't make use of the semantics, per se, but what it does use is the input and output operand information that is machine generated from the semantics. Grokking operand usage from C is possible, but harder. (8) So by writing the semantics in RTL multiple applications can make use of it. One can also generate from the RTL code in languages other than C.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2.1 Language requirements

The CPU description file needs to provide at least the following:

In addition to this, elements of the particular ABI in use is also needed. These things will obviously need to be defined separately from the cpu for obvious reasons.

Some architectures require knowledge of the pipeline in order to do accurate simulation (because, for example, some registers don't have interlocks) so that will be required as well, as opposed to being solely for performance measurement. Pipeline knowledge is also needed in order to achieve accurate profiling information. However, I haven't spent much time on this yet. The current design/implementation is a first pass in order to get something reasonable, and will be revisited as necessary.

Support for generating test files is not complete. Currently the GAS test suite generator gets by (barely) without them. The simulator test suite generator just generates templates and leaves the programmer to fill in the details. But I think this information should be present, meaning that for situations where test vectors can't be derived from the existing specs, new specs should be added as part of the description language. This would make writing testcases an integral part of writing the .cpu file. Clearly there is a risk in having machine generated testcases - but there are ways to eliminate or control the risk.

The syntax of a suitable description language needs to have these properties:

It would also help to not start over completely from scratch. GCC's RTL satisfies all these goals, and is used as the basis for the description language used by CGEN.

Extensibility is achieved by specifying everything as name/value pairs. This allows new elements to be added and even CPU specific elements to be added without complicating the language or requiring a new element in a define_insn-like entry to be added to each existing port. Macros can be used to eliminate the verbosity of repetitively specifying the “name” part, so one can have it both ways. Imagine GCC's ‘.md’ file elements specified as name/value pairs with macro's called define_expand, define_insn, etc. that handle the common cases and expand the entry to the full (define_full_expand (name addsi3) (template ...) (condition ...) ...).

Scheme also uses (foo :keyword1 value1 :keyword2 value2 ...), though that isn't implemented yet (or maybe #:keyword depending upon what is enabled in Guile).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2.2 Layout

Here is a graphical layout of the hierarchy of elements of a ‘.cpu’ file.

                           /          \
                      cpu-family1   cpu-family2  ...
                      /         \
                  machine1    machine2  ...
                   /   \
              model1  model2  ...

Each of these elements is explained in more detail in CGEN's Register Transfer Language. The architecture is one of ‘sparc’, ‘m32r’, etc. Within the ‘sparc’ architecture, the cpu-family might be ‘sparc32’ or ‘sparc64’. Within the ‘sparc32’ CPU family, the machine might be ‘sparc-v8’, ‘sparclite’, etc. Within the ‘sparc-v8’ machine classificiation, the model might be ‘hypersparc’ or ‘supersparc’.

Instructions form their own hierarchy as each instruction may be supported by more than one machine. Also, some architectures can handle more than one instruction set on one chip (e.g. ARM).

                    /   \	   
             operand1  operand2  ... 
                |         |
         hw1+ifield1   hw2+ifield2  ...

Each of these elements is explained in more detail in CGEN's Register Transfer Language.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.2.3 Language problems

There are at least two potential problem areas in the language's design.

The first problem is variation in assembly language syntax. Examples of this are Intel vs AT&T i386 syntax, and Motorola vs MIT m68k syntax. I think there isn't a sufficient number of important cases to warrant handling this efficiently. One could either ignore the issue for situations where divergence is sufficient to dissuade one from handling it in the existing design, or one could provide a front end or use/extend the existing macro mechanism.

One can certainly argue that description of assembler syntax should be separated from the hardware description. Doing so would prevent complications in supporting multiple or even difficult assembler syntaxes from complicating the hardware description. On the other hand, there is a lot of duplication, and in the end for the intended uses of CGEN I think the benefits of combining assembler support with hardware description outweigh the disadvantages. Note that the assembler portions of the description aren't used by the simulator (9), so if one wanted to implement the disassembler/assembler via other means one can.

The second problem area is relocations. Clearly part of processing assembly code is dealing with the relocations involved (e.g. GOT table specification). Relocation support necessarily requires BFD and GAS support, both of which need cleanup in this area. Rewriting BFD to provide a better interface so reloc handling in GAS can be cleaned up is believed to be something this project can and should take advantage of, and that any attempt at adding relocation support should be done by first cleaning up GAS/BFD. That can be left for another day though. :-)

One can certainly argue trying to combine an ABI description with a hardware description is problematic as there can be more than one ABI. However, there often isn't and in the cases where there isn't the simplified porting and maintenance is worth it, in the author's opinion. Furthermore, the current language doesn't embed ABI elements with hardware description elements. Careful segregation of such things might ameliorate any problems.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.3 Opcodes support

Opcodes support comes in the form of machine generated opcode tables as well as supporting routines.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.4 Simulator support

Simulator support comes in the form of machine generated the decoder/executer as well as the structure that records CPU state information (i.e., registers).

CGEN comes with support for both the simulator in the GDB tree (see section GDB Simulator), and the SID simulator (see section SID Simulator).

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.5 Testing support

Inherent in the design is the ability to machine generate test cases both for the assembler/disassembler and for the simulator. Furthermore, it is not unreasonable to add to the description file data specifically intended to assist or guide the testing process. What kinds of additions that will be needed is unknown at present.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.5.1 Assembler/disassembler testing

The description of instructions and their fields contains to some extent not only the syntax but the possible values for each field. For example, in the specification of an immediate field, it is known what the allowable range of values is. Thus it is possible to machine generate test cases for such instructions. Obviously one wouldn't want to test for each number that a number field can contain, however one can generate a representative set of any size. Likewise with register fields, mnemonic fields, etc. A good starting point would be the edge cases, the values at either end of the range of allowable values.

When I first raised the possibility of machine generated test cases the first response I got was that this wouldn't be useful because the same data was being used to generate both the program and the test cases. An error might be propagated to both and thus nullify the test. For example if an opcode field was supposed to have the value 1 and the description file had the value 2, then this error wouldn't be caught. However, this assumes test cases are always generated during the testing run! And it ignores the profound amount of typing that is saved by machine generating test cases! (I discount the argument that this kind of exhaustive testing is unnecessary).

One solution to the above problem is to not generate the test cases during the testing run (which was implicit in the proposal, but perhaps should have been explicit). Another solution is to generate the test cases during the test run but first verify them by some external means before actually using them in any test. Another solution is to have some trust in the generated tests. Yes, some bugs may be missed, but given the quantity of testing that can be done, some bugs may still be caught that would otherwise have been missed. Plus it's all machine-driven, minimal human interaction is required.

So how are machine generated test cases verified? By machine, by hand, and by time. The test cases are checked into CVS and are not regenerated without care. Every time the test cases are regenerated, the diffs are examined to ensure the bug triggering the regeneration has been fixed and that no new bugs have been introduced. In all likelihood once a port is more or less done, regeneration of test cases would stop anyway, and all further changes would be done manually.

“By machine” means that for example in the case of ports with a native assembler one can run the test case through the native assembler and use that as a good first pass.

“By hand” means one can go through each test case and verifying them manually. This is what is done in the case of non-machine generated test cases, the only difference is the perceived difference in quantity. And in the case of machine generated test cases comments can be added to each test to help with the manual verification (e.g. a comment can be added that splits the instruction into its fields and shows their names and values).

“By time” means that this process needn't be done instantaneously. This is no different than the non-machine generated case again except in the perceived difference in quantity of test cases.

Note that no claim is made that manually generated test cases aren't useful or needed. The goal here is to enhance existing forms of testing, not replace them.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

1.5.2 Simulator testing

Machine generation of simulator test cases is possible because the semantics of each instruction is written in a way that is understandable to the generator. At the very least, knowledge of what the instructions are is present! Obviously there will be some instructions that can't be adequately expressed in RTL and are thus not amenable to having a test case being machine generated. There may even be some RTL'd semantics that fall into this category. It is believed, however, that there will still be a large percentage of instructions amenable to having test cases machine generated for them. Such test cases can certainly be hand generated, but it is believed that this is a large amount of unnecessary typing that typically won't be done due to the amount.

An example is the simple arithmetic instructions. These take zero, one, or more arguments and produce a result. The description file contains sufficient data to generate such an instruction, the hard part is in providing the environment to set up the required inputs (e.g. loading values into registers) and retrieve the output (e.g. retrieve a value from a register).

Certainly at the very least all the administrivia for each test case can be machine generated (i.e. a template file can be generated for each instruction, leaving the programmer to fill in the details).

The strategies mentioned for assembler/disassembler machine-generated test cases also apply here.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Doug Evans on January, 28 2010 using texi2html 1.78.