Libgcc.a, Machine descriptions and Linker scripts

Geir Frode Raanes
Wed Jun 23 18:48:00 GMT 1999

I am currently investigating what GCC internals looks like.
The intention is to update the FAQ. Also, I am currently
looking through the R. Stallman documentation and replaceing
every instance of '...on some systems...' with the more
spesific information 'every target description that defines
xxxxx in'

Anyway, I have some questions. By elaborateing the questions, I 
hope for a reality check by the know-how at the same time.

1) Apparently, 'libgcc.a' is built up of object files that are
   built compiletime by extracting function by function from
   the single precision arithmetics (IEEE 754) in 'libgcc1.c' - 
   which in turn is overridden by target specific assembly 
   code if present. Also, double precision arithmetics (IEEE ???)
   is extracted from 'libgcc2.c.' Obviousely(?) double precision
   is implemented through multiple single precition arithmetics.

   In addition, there is some C++ exception handling in 'libgcc2.c'
   wich relies on libc functions 'setjmp/longjmp.' There is an option
   to use 'internal' 'setjmp/longjmp' if no libc is present. How does
   it do this when 'setjmp/longjmp' are as target machine dependant as
   they is? There is also some code here to handle C calling convention
   on "certain" CPUs with "sliding windows" of register banks, most 
   noteabely SPARC and i860. True?

   On the same note - the Stallman docs mentions that "certain" systems
   have APIs and libraries that are not conforming to C calling convention
   and that GCC needs isolation code for these systems. I have not been
   able to locate the code for this. Where is it?

   Also - C++ needs to run constructors and destructors for global and/or
   static objects prior to and after, respectively, the 'main()' function.
   There are four ways to guarantee this. Regardless of method, cc1plus
   will assemble two lists '__CTOUR_LIST__' and '__DTOUR_LIST__' of
   function pointers to all global/static constructors/destructors.
   Depending on whether the object format supports freely named sections 
   - or more precisely if the *system* description in e.g. './gcc/config
   respectively - these two lists goes with no ceremony into the sections

   If .ctourc/.dtours sections are not availeable, then the lists are
   delivered to the GNU linker through specially adapted debug entries.
   Even if .ctours/.detourc sections are availeable one must still arrange
   for the lists to be parsed and each constructor/destructor run in turn.
   It then depends on the capabilities of the program loader of the
   operating system in question how to proceed. If the OS loader is new
   enough to know about c++ - like svr4.h - then it also know to look for
   separate .init/.fini sections with code which parse the lists contained
   in .ctours/.dtours. The code in the .init section will go into the
   startup file ctrbegin.o, while the .fini section goes into the crtend.o
   file. Both versions of parsercode is defined in 'crtstuff.c.'
   The two crt-files will be linked in front of and trailing the user
   code, respecively. It is up to the OS loader to run it at appropriate
   time. Now, ELF is in use on embedded targets too - how then to assure
   that the .init/.fini contained code is run at appropriate time?
   Last case - freely named .ctours/.dtours sections, but no smart OS
   loader. The 'svr3.h,' or COFF case. Then GCC arranges for some 
   eqivalent code of the .init code in 'crtstuff.c' to ber run just 
   inside og 'main()' through a call to the functionpointer '__main.'
   '__main' is not defined in 'crtstuff.c' but rather in 'libgcc2.c.'
   Then '__main' takes care of parsing the '__CTOUR_LIST__' in '.ctour.'
   The '__DTOUR_LIST__' in '.dtour' is parsed by using existing libc
   functionality. Every destructor in the list is set up to run on
   'exit()' by registering them by 'atexit().' Hmmm - libc again.
   GCC can be compiled w/o libc - where does 'exit()/atexit()' come
   from then?

   The last (?) functionality contained within 'libgcc2.c' is support for
   the GCC extension of local or automatic *functions.* This is functions
   with scope local to another containing function. The automatic function
   body is obviousely not placed within the stack frame of the containing
   function as automatic variables are, but the function pointer to the
   automatic function is. The automatic function is called through a
   trampoline bounce into and off again the stack frame. This is in part
   done by 'libgcc2.c' code. The rest is in 'function.c.'

2) There is no such thing as a 'libgcc1.a' regardless of what Mr. Stallman
   says. (?)  Or is this where the 'isolation code' from above would go?
   AFAIK all source code 'libgcc1/2.c' ends up in 'libgcc.a'

3) Cygnus configure - does it interface with 'make' only through
   environment/configuretion variables? I mean, in the gcc setup file
   template 'egcs-1.1.2/gcc/' there is lots of sections like
   the following to pick the correct target description files for the
   given machine-vendor-system alias, overriding the usual gcc behaviour:
   (Does this file list every single one known alias? If so, then a
    reference to this file is far superior to list them all in the docs.
    Such a list will invariabely lag behind the actual list herein. This
    goes for all aspects of the docs - help the user help himself.)

                tm_file="m68k/m68k-coff.h dbx.h libgloss.h"

   Here 'extra_headers' is defined - how does this relate to fixproto?
   I mean, gcc comes with four of the most heavily target dependent libc
   headerfiles, namely 'float.h, limits.h, stdarg.h' and 'stddef.h.'
   These are not frontends to any libc functions but rather define the
   target system caracteristics. Hence, they do not come with libc but
   rather with gcc for the target machine. As far as I can determine,
   fixproto will not look for any spesific headerfile(s) but rather take
   a look at _any_ headerfiles it happends to be supplied with, like the
   POSIX headerfile 'unistd.h,' to see if they are all ANSI C conforming.
   In addition, at the bottom of the same file, there is a long list of
   such environment variables of which the following is only some:

   If this is all configure can set then that explains some curious
   limitations.  BTW, it appears that configure in the root directory
   knows about every other GNU program there is. That would explain why
   simply linking the Newlib/Libgloss directories into the gcc root
   directory also builds Newlib/Libgloss. What does the configuration
   option '--with-newlib' actally do? How does it relate to the libc
   dependencies of 'libgcc2.c?' And how does the 'inhibit-libc' option
   in 'libgcc2.c' work - rather, why does it not work according to the
   surrounding comments?

4) Then there is the '--program-prefix=FOO' option. It works as intended
   for Binutils. There it sets the program prefix of every prefixed file.
   For GCC it is correctly interpreted as the program prefix used for the
   Binutils. Unfortunately, GCC "forgets" to change the gcc prefixed files
   accordingly - they remain prefixed with 'm68k-coff-....' regardless.
   I have not tried the GCC only '--program-transform-name=P' option.

5) Speaking of BinUtils - it has som BFD libraries that are neither
   prefixed or placed in target dependant directories. Am I to assume
   that every new target configuretion will add to these libraries?

6) Also - why does gcc behave differentely to what target dependant 
   directories, './lib/gcc-lib/[target[/version]]program,' it will 
   search for libraries/startupfiles, headerfiles and executeables?
   A simple 'gcc --print-search-dirs' reveals this, making it hard to 
   use the 'GCC_EXEC_PREFIX' environment variable. 

7) I figure it is a bit dumb to supply the '--enable-multilib' option
   as long as Configure can not change the default CPU family member
   which, for m68k, is set as the first listed to the target definition
   macro 'TARGET_SWITCHES' in the file './gcc/config/m68k/m68k.h' 68020
   that is. What I mean is what use of this option if I still have to
   patch the target description macros? Then I could just as well patch
   the 'MULTILIB_OPTIONS/DIRNAMES/MATCHES' variables in the target
   makefile stub specified above; './gcc/config/m68k/t-m68kbare.'

8) Machine descriptions.
   The file 'rtl.def' lists all 'expression codes.' I have tried dumping
   the first stage of RTL code by 'gcc -dr' for some files and it appears
   that all RTXes dumped are in fact defined in 'rtl.def.' 

   BUT then again we have the target description patterns. Mr. Stallman
   lists in chapter 15.7 of the GCC doc the names of all machine
   instructions, 'insn,' the target description files,, should
   define. Now here we have the same problem as with the machine-vendor-
   system aliases - how do I know this is still the complete list? 
   The answer is I don't. But I can not find these names listed anywhere
   but in each target description files. And not all patterns are defined
   for all targets. Bad starting point for a port of GCC.

   Anyhow, what I wonder is how does these pattern work? If I understand
   the documentation correctly, then the gcc syntax tree translators do
   know about these patterns and will emit RTL code for them. What RTL
   code? The (define_insn) "insn_name" itself like 'udivsi3' which,
   incidently, is also one of the single presicion arithmetic in 
   'libgcc1.c,' or the 'RTL template' the define_insn contains? Or maybe
   it is just the other way around - gcc emits generic RTL code based on
   the RTXes defined in 'rtl.def' and then this code is attemted matched
   against the RTL templates and, if a match is found, then the generic
   code is replaced with the "insn_name." That would be logical solution. 
   In the last stage, when substituting assembly code, then the assembly
   code contained in the 'define_insn' is cut&pasted. True? Then how does
   GCC deal with remaining RTL code that has no matching machine pattern?
   How to emit assembly code for this? I am seriousely confused as to the
   place of machine descriptions in this.

Now I am going to give away the fact that I am a HW engineer that 
have not got a clue and have hardly touched a C compiler before.

9) How do I pick output sections to go into the ROM image?
   I.e. how do I stop the BSS section from coming along?
   Also, I suppose I do not need to relocate (AT) the Flash
   output section to 0x0000 to have the rom image file start
   at address zero rather than zero padded up to 0x700000?
   What criteria specifie what address the ROM image file
   will start at?

  SRAM  (rwx) : ORIGIN = 0x00000000, LENGTH = 1M
  Flash	 (rx) : ORIGIN = 0x00700000, LENGTH = 1M
  DRAM        : ORIGIN = 0x00800000, LENGTH = 8M

/* Flash is initially located at 0x0000. Moved to 0x700000 by means
   of programmeable address decoders in 'crt0.s' All C code '.text'
   will run at 0x700000. Before the switch, the vector table at the
   beginning of Flash needs to be copied into the start of SRAM. */

SECTIONS /* OutSect@Addr += { ObjFile(InSect); } */
	.text 0x700400 : /* just above the vector table */ 
		{ _text = . ; *(.text); _etext = . ; } > Flash
	.data 0x000400 : /* just above the vector table copy */
		AT ( ADDR(.text) + SIZEOF ( .text ) ) /* SRAM */
		{ _data = . ; 	*(.data); _edata = . ;  } > Flash
	.bss  0x010000 :
		{ _bstart = . ; *(.bss) *(COMMON); _bend = . ;}
		> SRAM

Now, how does the 'ADDR(.text)' respond to the fact that '.text' is
allocated to 0x700000? Will it count the image file location or the
address in the CPU address space? When I link a simple 'main()' with
output format 'coff-m68k' and teh above linker script I get a image 
file of 16KByte no matter what.  Where am I mistaken?  I simply want
a file I can burn into my flash.

10) Where is possible output format (srec, iHEX, m68k-coff, etc) listed?

11) Suppose I used C++ and the 'svr4.h' system defenition. 
    How do I construct an embedded program loader that runs the
    .init/.fini sections?


I am also documenting object formats and I hope to write a 
'Case study' on how to program a barebone m68k system - ie.
a system with _no_ other code on it. There is no examples
on this special case in the Newlib Libgloss section, neither 
crt0.s, linker scripts or vector table manipulation.
I intend to and I am already writing a introductionary text, 
*not* documentation. A beginner like me can not comprehend 
documentation if not presented cronologically and without 
beeing bombed with '#ifdef machine' statements. One single 
example system all the way, then I can port it to my own 
system at will. The example will be my own TMP68303 based 
microcontroller card with DMA, DRAM, DAC/ADC, SCSI and 
EPLD(FPGA) that needs configuration downloading. 

The example code will be a message passing (Event que) 
ROM monitor as this is where Newlib LibGloss takes off.
This, needless to say, will take time.


  Never ever underestimate the power of human stupidity.
  -Robert Anson Heinlein


New CrossGCC FAQ:
To remove yourself from the crossgcc list, send
mail to with the
text 'unsubscribe' (without the quotes) in the
body of the message.

More information about the crossgcc mailing list