This is the mail archive of the binutils@sourceware.org mailing list for the binutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?

From: Tejas Belagod <tejas dot belagod at foss dot arm dot com>
To: dvalin at internode dot on dot net
Cc: Thomas Preudhomme <thomas dot preudhomme at foss dot arm dot com>, binutils at sourceware dot org
Date: Tue, 28 Feb 2017 12:11:12 +0000
Subject: Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
Authentication-results: sourceware.org; auth=none
References: <1652b6ce-5b0d-45d4-44bf-55dc36be1e2a@foss.arm.com> <20170228055141.GA4400@ratatosk>

Hi Erik,

Thanks for your comments. My comments inline below.

On 28/02/17 05:51, Erik Christiansen wrote:

On 22.02.17 15:28, Thomas Preudhomme wrote:

There has been some interest in the past in having syntactic support for
specifying mapping of an output section to multiple memory regions in the
GNU LD scripting language (eg.
https://sourceware.org/bugzilla/show_bug.cgi?id=14299). I would like to
propose a scheme here and welcome any feedback.


TL;DR: Detailed response begins after 6 paragraphs.

OK, in the absence of prior discussion, I'll just think aloud as I
correlate the proposal with my experience in three decades developing
embedded systems. Unfortunately, the one time an MMU was involved, that
was done by the time I became involved, but memory holes are all black.

The closest scenario I recall is where there were disparate physical
memories, both on and off chip, I simply added a MEMORY region for each
such block, e.g. Flash, 16bit SRAM, 8bit SRAM, a couple of small ones
for specific memory mapped system chips with bunches of config
registers, and maybe an FPGA in the mix. Add comments for device names
and the waitstate generator values, and the script serves as central
documentation too.

With that one-to-one region mapping, there was never any conflict over
where stuff should be located, and non were interchangeable. It is as
described by "some on-chip memory and some off-chip memory, but at
non-contiguous addresses" in the above link. And where we had both 8 and
16 bit SRAMS, it was most definitely consistent with "a region of
on-chip SRAM which performs better for code, and the remainder performs
better for data", except that using the wrong one was fatal rather than
merely inferior.

One issue I've encountered is detecting region overflow when multiple
output sections contribute to its content, but existing syntax supports
that, e.g.:

MEMORY
{
   flash   (rx)  : ORIGIN = 0, LENGTH = 32K
   ram    (rw!x) : ORIGIN = 0x800060, LENGTH = 2K
   eeprom (rw!x) : ORIGIN = 0x810000, LENGTH = 1K
}

. = ASSERT (_etext + SIZEOF (.data) <= LENGTH(flash) , "Error: .text + .data
collectively overflow the flash memory." ) ;

But the need to flow across memory holes never eventuated in practice,
as a modest chunk of on-chip RAM could always be used for e.g. sdata,
leaving no need for flowing. All other regions were always incompatible,
making flowing impossible.

...

If LMA is specified, the image(startup code etc.) most likely handles
the copying from load address to output section VMA.


Yes, it does. And in the generic init code I've encountered, it has just
been a single copy loop for e.g. bss, performing a contiguous block copy.
(And when I've written it, that was true too.)

Multiple segment spec means the output section can be part of more
than one segment and ‘fillexp’ simply fills the output section loaded
with the fill value.


Trans-hole flowing would also require a runtime copy loop for each
non-contiguous block, or a table-driven multi-block copier, with the
run-time table somehow initialised from the linker script. (I can
imagine using variables defined in the linker script, and the .RPT
assembler directive - maybe.)

Now, this does not have a method to specify output section spanning multiple
memory regions. For example, if there are 2 RAM regions RAML and RAMU and
the user wants an output section to first fill RAML and then when RAML is
full, i.e. when the remaining space in RAML cannot accommodate a full input
section, start filling RAMU, the user has to split the sections into
multiple output sections. If we extend this syntax to specify multiple
output regions, we can make the linker map the output section to multiple
regions by filling the output region with input sections in the order
specified in the ‘output-section-command’ and when its full (meaning when
the remaining gap in a region cannot accommodate one full input section, it
starts from the next output region.


This seems to be the alternate view of the problem of asking ld to flow
code around holes in a region, something it still can't do, IIRC. I
state it that way, because two non-contiguous memory regions over which
code (or data) may be interchangeably flowed, are identical to a single
region with a hole.

The proposal does seem to be a way to think about addressing that issue:

Eg.

MEMORY

{
   RAML (rwx) : ORIGIN = 0x1FFF0000, LENGTH = 0x00010000
   RAMU (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00040000
   RAMZ (rwx) : ORIGIN = 0x20040000, LENGTH = 0x00040000	
}

SECTIONS
{
   .text 0x1000 : { *(.text) _etext = . ; }
   .mdata  :
   AT ( ADDR (.text) + SIZEOF (.text) )
   { _data = . ; *(.data) *(.data.*); _edata = . ; } > RAML, RAMU, RAMZ
}


Without the need for new syntax or complex init code generators,
having gcc flow code across up to 5 pages of flash plus .lowtext and a
floating .hightext was compatible with the linker script and tests shown
here:

http://lists.nongnu.org/archive/html/avr-gcc-list/2012-12/msg00044.html

While details have faded from wet RAM, ISTR that holes were
manufacturable by not populating any of the 5 pages, which gcc sees as
named spaces. The gcc stuff was done in the AVR back end, IIRC, while an
implementation in ld would be generic.

Illustration:

Consider an example where we have the following input .data sections:

.data: size 0x0000FFF0
.data.a : size 0x000000F0
.data.b : size 0x00003000
.data.c : size 0x00000200

With the above scheme, this will be mapped in the following way to RAML,RAMU
and RAMZ:

RAML : (0x1FFF0000 - 0x1FFFFFF0): .data
        (0x1FFFFFF0 - 0x1FFFFFFF): *** GAP ***


Would GAP use ALIGNMENT, or introduce a new parameter?

I wouldn't want to overload ALIGNMENT here - what if its needed simultaneouslywith ALIGNMENT. Can we not leave this space unassigned? More often than not ifone's filling a memory region automatically, would they really care what goesinto the gaps (if security is not a concern)? OTOH, if security is a concern, wecan explore introducing a new syntax with a default behavior of zero-filling thegaps.

How would the target-specific relocations required to break code across
the hole be handled by ld? E.g. break a small AVR code loop (with 6-bit
relative addressing range) and you'll need a LJMP to bridge the hole,
and another with reversed loop conditionality to close the loop.
Multiply that task by all the possible relocs, and again by all the
possible CPU targets, and it's never-ending work for a software team for
life.


As I understand, compilers generate references to objects within a section with a

 .<input_section_name> + offset_within_section

Now when a section that spans 2 or more regions inserts holes/padding to preventan object from straddling 2 regions, the offsets within the section to otherobjects will change. This means all the compiler-generated "section + offset" ofall objects that come after the padding will need to be fixed up. Its reallydifficult to know which ones to fix up - the relocations are only on the sectionlabel, not the object in the section. So, what I'm proposing here will not splitthe input sections - input sections will move as a block.

It seems more

RAMU : (0x20000000 - 0x200000F0): .data.a
        (0x200000F0 - 0x200030F0): .data.b
        (0x200030F0 - 0x200032F0): .data.c


It will not affect the specification in terms of the other attributes, but
one (LMA):

* Output section VMA: No change - this just specifies where the output
section will start.

* type: No change - this is for the output section as a whole - output
memory regions will not change it.

* LMA: The output section can still be loaded from one LMA and mapped to
output VMA - the only change here is that the loader will need to map the
output sections to VMA with the same pattern as the multiple output region
matching code above. Can a loader do that? Can ad-hoc loaders do this? Or do
all loaders assume that regions are continguous when output section is
mapped to VMAs?


Contiguous. Hole-flowing is what you're proposing to implement, both the
linker internal component (target-specific reloctions), and the generic
(e.g. table driven) multi-block copy loop synthesiser for custom init
code generation. How that would integrate with existing init code in
various implementations, I have no idea.

If LMA can also be flowed around a hole, then runtime init code must be
able to handle not only non-contiguous delivery, but gapped pick-up. Has
the complexity of simultaneously handling different gaps in both been
considered?

I haven't thought about that. Can it be worked on the principle that when onespecifies an LMA and there is user-written init code to copy blocks, the initcode programmer knows the LMA gap layout and can handle the gaps accordingly? Itcould be the case currently where code from different non-contiguous ROMs arecopied into a RAM during startup. This IMHO, is always specific to theparticular embedded system being deployed.

...

For orthogonality and consistency, we would want to apply the multiple
region feature to overlays too. The semantics will not be different from the
algorithm mentioned above. The only caveat is that the overlay
manager/loader will need to handle the swapping in and out of sections that
run from the VMA consistently with the mapping algo described above. Do we
want this for overlays too?


Expanding the complexity of a single-problem solution to cover other
situations seems courageous, unless it naturally falls out of the
narrower solution. As overlays are used e.g. when RAM size or CPU
instruction addressing range is constrained, but there's ample flash,
then the likelihood of holes in either is limited, I suspect.


Makes sense.

Thanks,
Tejas.

Specifying discrete output sections with VMAs placed around the physical
holes is another way to dodge them. They can all be allocated to a
global encompassing memory region. Flowing is performed manually by
assigning suitable code chunks to preferred input sections. Automating
that, as intimated above, is non-trivial.

Caveat: Above thoughts have flowed without aid of caffeine, and are
         recollections from old battles.

Erik

References:
- [RFC] Allow linker scripts to specify multiple output regions for an output section?
  - From: Thomas Preudhomme
- Re: [RFC] Allow linker scripts to specify multiple output regions for an output section?
  - From: Erik Christiansen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]