[RFC] Allow linker scripts to specify multiple output regions for an output section?

Wed Feb 22 15:28:00 GMT 2017

[Sending on behalf of Tejas Belagod, please reply to both him (in Cc) and me]

Hi,

There has been some interest in the past in having syntactic support for 
specifying mapping of an output section to multiple memory regions in the GNU LD 
scripting language (eg. https://sourceware.org/bugzilla/show_bug.cgi?id=14299). 
I would like to propose a scheme here and welcome any feedback.

The section command in the LD Script language is structured thus:

section [address] [(type)] :
	[AT(lma)]
	[ALIGN(section_align)]
	[SUBALIGN(subsection_align)]
	[constraint]
	{
	  output-section-command
	  output-section-command
	  ...
	} [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp]

As I understand, it simply means - place the output section at â€˜addressâ€™ with 
attributes specified above (type, alignment etc). If LMA is specified, the 
image(startup code etc.) most likely handles the copying from load address to 
output section VMA. Multiple segment spec means the output section can be part 
of more than one segment and â€˜fillexpâ€™ simply fills the output section loaded 
with the fill value.

Now, this does not have a method to specify output section spanning multiple 
memory regions. For example, if there are 2 RAM regions RAML and RAMU and the 
user wants an output section to first fill RAML and then when RAML is full, i.e. 
when the remaining space in RAML cannot accommodate a full input section, start 
filling RAMU, the user has to split the sections into multiple output sections. 
If we extend this syntax to specify multiple output regions, we can make the 
linker map the output section to multiple regions by filling the output region 
with input sections in the order specified in the â€˜output-section-commandâ€™ and 
when its full (meaning when the remaining gap in a region cannot accommodate one 
full input section, it starts from the next output region. Eg.

MEMORY

{
   RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
   RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
   RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
}

SECTIONS
{
   .text 0x1000 : { *(.text) _etext = . ; }
   .mdata  :
   AT ( ADDR (.text) + SIZEOF (.text) )
   { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
}

The statement:

   .mdata :
    AT ( ADDR (.text) + SIZEOF (.text) )
    { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ

Will have roughly the following meaning:

  For_each_output_section {
   curr_mem_region = get_next_mem_region ();
   location_counter = get_vma_mem_region (curr_mem_region);

   While (fill) {
     current_input_section = get_next_input_section ();

     If (location_counter > end_vma_of_mem_region_in_list)
       Break;

     mem_avail_in_curr_region = get_vma_mem_region (curr_mem_region) + sizeof 
(curr_mem_region) - location_counter;

     If ( sizeof (current_input_section) > mem_avail_in_curr_region))
      {
       curr_mem_region = get_next_mem_region ();
       location_counter = get_vma_mem_region (curr_mem_region);
      }

     process_section (current_input_section, location_counter);
     location_counter += sizeof (current_input_section);
   }

  }

Illustration:

Consider an example where we have the following input .data sections:

.data: size 0x0000FFF0
.data.a : size 0x000000F0
.data.b : size 0x00003000
.data.c : size 0x00000200

With the above scheme, this will be mapped in the following way to RAML,RAMU and 
RAMZ:

RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
        (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***

RAMU : (0x20000000 - 0x200000F0): .data.a
        (0x200000F0 - 0x200030F0): .data.b
        (0x200030F0 - 0x200032F0): .data.c

It will not affect the specification in terms of the other attributes, but one 
(LMA):

* Output section VMA: No change - this just specifies where the output section 
will start.

* type: No change - this is for the output section as a whole - output memory 
regions will not change it.

* LMA: The output section can still be loaded from one LMA and mapped to output 
VMA - the only change here is that the loader will need to map the output 
sections to VMA with the same pattern as the multiple output region matching 
code above. Can a loader do that? Can ad-hoc loaders do this? Or do all loaders 
assume that regions are continguous when output section is mapped to VMAs?

* phdr: No change - Multiple values can still be specified here. One can have an 
output section map to multiple segments irrespective of their output memory 
region mapping.

* Fillexp: No change. We might possibly want to introduce a fillexp for the gaps 
left behind when filling multiple output memory regions.

Caveats:

A comma-separated list of regions will not guarantee contiguous placement of 
input sections, the only way to get a contiguous placement of input sections 
will be to assign the output section to one monolithic memory region.

For orthogonality and consistency, we would want to apply the multiple region 
feature to overlays too. The semantics will not be different from the algorithm 
mentioned above. The only caveat is that the overlay manager/loader will need to 
handle the swapping in and out of sections that run from the VMA consistently 
with the mapping algo described above. Do we want this for overlays too?