Gas bug: Broken code generated for inter-section arithmetic.

Jeff Prothero jprother@altera.com
Thu Apr 23 17:50:00 GMT 2015


Summary
=======

Assembly code like (Nios2 example):

    .section  version_header,"a"
    anchor:
              .4byte  HelloWorld-ANCHOR
    .section  version_header_strings,"a"
              .asciz  "aaaaaaaa"
    HelloWorld:

currently silently produces bad code.



Details
=======

Section 6.2.4 ("Infix Operators") of the GNU assembler manuual forbids
inter-section arithmetic:

        + Addition. If either argument is absolute, the result has the section of the
          other argument. You may not add together arguments from different sections.
                          ==========================================================

        - Subtraction. If the right argument is absolute, the result has the section
          of the left argument. If both arguments are in the same section, the result
          is absolute. You may not subtract arguments from different sections.
                       ==========================================================


This restriction is enforced in  resolve_symbol_value()  in  symbols.c by the clause

	  /* Equality and non-equality tests are permitted on anything.
	     Subtraction, and other comparison operators are permitted if
	     both operands are in the same section.  Otherwise, both
	     operands must be absolute.  We already handled the case of
	     addition or subtraction of a constant above.  This will
	     probably need to be changed for an object file format which
	     supports arbitrary expressions, such as IEEE-695.  */
	  if (!(seg_left == absolute_section
		&& seg_right == absolute_section)
	      && !(op == O_eq || op == O_ne)
	      && !((op == O_subtract
		    || op == O_lt || op == O_le || op == O_ge || op == O_gt)
		   && seg_left == seg_right
		   && (seg_left != undefined_section
		       || add_symbol == op_symbol)))
	    {
	      /* Don't emit messages unless we're finalizing the symbol value,
		 otherwise we may get the same message multiple times.  */
	      if (finalize_syms)
		report_op_error (symp, add_symbol, op, op_symbol);

This catches all forbidden inter-section arithmetic expressions which
are assigned to symbols.

Unfortunately, not all expressions are assigned to symbols.  Common
constructs (here quoting tc-nios2.c, but arm etc are similar) like

    const pseudo_typeS md_pseudo_table[] = {
        ...
      {"dword", cons, 8},
      {"half", cons, 2},
      {"word", cons, 4},
      {"2byte", s_nios2_ucons, 2},
      {"4byte", s_nios2_ucons, 4},
      {"8byte", s_nios2_ucons, 8},
      {"16byte", s_nios2_ucons, 16},
      ...

    /* Explicitly unaligned cons.  */
    static void
    s_nios2_ucons (int nbytes)
    {
      int hold;
      hold = nios2_auto_align_on;
      nios2_auto_align_on = 0;
      cons (nbytes);
      nios2_auto_align_on = hold;
    }


invoke cons() directly or indirectly and can result in forbidden
inter-section arithmetic expression being processed by the assembler
without ever tripping the resolve_symbol_value() check, resulting in
bad code being silently generated.

Not good and clearly not intended behavior.

The expression eventually gets processed  by fixup_segment() in  write.c
in the main

    for (; fixP; fixP = fixP->fx_next)

loop, whose author(s) thoughtfully included a

    #ifdef TC_VALIDATE_FIX
          TC_VALIDATE_FIX (fixP, this_segment, skip);
    #endif

hook, so it is actually possible to trap these problems and issue
fatal diagnostics for them by (in the nios2 case) adding

    #define TC_VALIDATE_FIX(FIXP,SEGMENT,SKIP)  if (!nios2_validate_fix (FIXP)) { goto SKIP; }
    extern int nios2_validate_fix(fixS *fixP);

to tc-nios2.h and then in tc-nios2.c adding

    /* Implement TC_VALIDATE_FIX.  */
    int
    nios2_validate_fix (fixS *fixP)
    {   symbolS* add_symbol  = fixP->fx_addsy;
        symbolS* sub_symbol  = fixP->fx_subsy;
        //
        if (add_symbol && sub_symbol)
          {
            segT add_section = S_GET_SEGMENT (add_symbol);
            segT sub_section = S_GET_SEGMENT (sub_symbol);
            //
            if (add_section != absolute_section
            &&  sub_section != absolute_section
            &&  add_section != sub_section)
              {
                as_bad_where (fixP->fx_file, fixP->fx_line,
                              _("Inter-segment arithmetic not supported: `%s' {%s section} - `%s' {%s section}"),
                              S_GET_NAME (add_symbol),
                              segment_name (add_section),
                              S_GET_NAME (sub_symbol),
                              segment_name (sub_section));

                return 0;
              }
          }
        return 1;
    }


This is however at best an interim fix, for two reasons:

First, this problem is really cross-platform and should be fixed in
platform-independent code.

Second, this solution doesn't really work all that well, and I
think isn't really in the spirit of the codebase, as witness
the fact that for the stimulus given above it prints out

    Error: Inter-segment arithmetic not supported: `version_header_strings' {version_header_strings section} - `anchor' {version_header section}

rather than

    Error: Inter-segment arithmetic not supported: `helloworld' {version_header_strings section} - `anchor' {version_header section}

as expected, because  adjust_reloc_syms()  in  write.c  has done

    fixp->fx_addsy = section_symbol (S_GET_SEGMENT (sym));

replacing the original 'helloworld' symbol with a related one
containing a different name field.

My impression is that the intent of the codebase is that once
write_object_file() in write.c sets finalize_syms to 1 that
all diagnostics have been issued and the codebase is free to
clobber diagnostics-required values.

Also, as a matter of codebase cleanliness, it would be nice
to check for inter-segment arithmetic just once (presumably
in resolve_symbol_value()) rather than in in multiple places.

Given the above two considerations, I'm wondering whether pseudoops
like .4byte (or the cons() fn they invoke?) should be somehow
constructing anonymous symbols instead of directly entering
expressions into the program state without an associated symbol.

If all expressions were associated with a (possibly anonymous)
symbol, then resolve_symbol_value() would automatically catch
all inter-segment arithmetic expressions and we'd have a nice
clean centralized solution (on this front at least :-).

Thoughts?
-Jeff



More information about the Binutils mailing list