Bug 5900

Summary: ELF files with more than 65536 sections not handled correctly.
Product: binutils Reporter: Ian Lance Taylor <ian>
Component: gasAssignee: Alan Modra <amodra>
Status: RESOLVED FIXED    
Severity: normal CC: amodra, bug-binutils, hjl.tools
Priority: P2    
Version: 2.16   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2008-03-11 21:21:20

Description Ian Lance Taylor 2008-03-08 01:32:21 UTC
Use this script to create an C/C++ file which will have more than 65536 sections:

for i in `seq 1 70000`; do
  echo "int var_$i __attribute__((section(\"section_$i\"))) = $i;"
done

Compile it.  Run readelf -S on the resulting object file.  Note section 0:

  [ 0]                   NULL             0000000000000000  00000000
       000000000001117a  0000000000000000          70262     0     0

When there are more than 65536 sections, the ELF spec says that sh_size should
hold the number of sections.  When the section string table index is more than
0xff00, it should be in the sh_link field.  Here we see that sh_size is 0x1117a
== 70010.  sh_link is 70262.  This does not make sense as 70262 > 70010.

What is happening is that whenever BFD needs to store a section index larger
than 0xff00, it stores the section number plus 256.  Thus in this case the
section string table index is really 70262 - 256 == 70006.

BFD is self-consistent, and readelf is consistent with what BFD generates.  But
the output does not follow the ELF spec.
Comment 1 H.J. Lu 2008-03-08 23:40:28 UTC
From gABI:

sh_size Unspecified If non-zero, the actual number of section header entries
sh_link Unspecified If non-zero, the index of the section header string table
section

sh_link has a section index, which can be > number of sections. Consider


bash-3.2$ cat y.c
int conststaticvariable;
bash-3.2$ gcc -c y.c -m32
bash-3.2$ readelf -Ss y.o
There are 9 section headers, starting at offset 0xa8:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000000 00  AX  0   0  4
  [ 2] .data             PROGBITS        00000000 000034 000000 00  WA  0   0  4
  [ 3] .bss              NOBITS          00000000 000034 000000 00  WA  0   0  4
  [ 4] .comment          PROGBITS        00000000 000034 00002e 00      0   0  1
  [ 5] .note.GNU-stack   PROGBITS        00000000 000062 000000 00      0   0  1
  [ 6] .shstrtab         STRTAB          00000000 000062 000045 00      0   0  1
  [ 7] .symtab           SYMTAB          00000000 000210 000080 10      8   7  4
  [ 8] .strtab           STRTAB          00000000 000290 000019 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Symbol table '.symtab' contains 8 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS y.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    1
     3: 00000000     0 SECTION LOCAL  DEFAULT    2
     4: 00000000     0 SECTION LOCAL  DEFAULT    3
     5: 00000000     0 SECTION LOCAL  DEFAULT    5
     6: 00000000     0 SECTION LOCAL  DEFAULT    4
     7: 00000004     4 OBJECT  GLOBAL DEFAULT  COM conststaticvariable
bash-3.2$

y.o only have 9 sections. However, the section index of conststaticvariable
is 0xfff2, which is > 9. That is because the section indexes from 0xfff2
to 0xfff2 don't have entries in section header table. st_shndx is section
index, which isn't the same as the index of the section header table.
Comment 2 Ian Lance Taylor 2008-03-10 01:24:36 UTC
You seem to be trying to say that section indexes between SHN_LORESERVE and
SHN_HIRESERVE are not to be used.  However, there is no support for that in the
ELF spec.  And it is not required to make everything work.

Also, providing an example with st_shndx proves little, since I was making a
point about the sh_link section in section number zero.  The ELF spec does not
say "add 256 to section indexes."  It just says to use the section index.

What does icc produce with the sample source code?
Comment 3 H.J. Lu 2008-03-10 02:24:08 UTC
(In reply to comment #2)
> You seem to be trying to say that section indexes between SHN_LORESERVE and
> SHN_HIRESERVE are not to be used.  However, there is no support for that in the
> ELF spec.  And it is not required to make everything work.

I said quite opposite. See comment #1. Section index 0xfff2 is between
SHN_LORESERVE and SHN_HIRESERVE. Here "section index" may not be the
index into section header table.

> 
> Also, providing an example with st_shndx proves little, since I was making a
> point about the sh_link section in section number zero.  The ELF spec does not
> say "add 256 to section indexes."  It just says to use the section index.
> 
> What does icc produce with the sample source code?

There are 70005 section headers, starting at offset 0x484bdc:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 011175 00      0   0  0
  [ 1] .strtab           STRTAB          00000000 000034 194bef 00      0   0  1
Comment 4 Ian Lance Taylor 2008-03-10 03:00:17 UTC
I think you need to qualify what you say.  It is clearly true that the st_shndx
field of a symbol is not a pure section index.  Any value above LORESERVE is
indeed reserved.  The ELF ABI defines what to do for a symbol whose section
index is larger than LORESERVE: put SHN_XINDEX in the st_shndx field, and put
the real section index in the corresponding entry in the SHT_SYMTAB_SHNDX
section.  Note that the ABI does not say to store the section index plus 256; it
says to store the section index.

None of this has anything to do with the sh_link field in section header 0 when
the section string table is larger than LORESERVE.  In that case, I think the
ELF ABI says to put the section index in the sh_link field.  It does not say to
put the section index plus 256.  Currently BFD is putting the section index plus
256.  I think that is wrong.

For the original test case, for a symbol defined in a section whose index is
larger than LORESERVE, what does icc put in the SHT_SYMTAB_SHNDX section?  Does
it put the section index, or the section index plus 256?  I believe that the ELF
ABI says that it should store the former.  BFD stores the latter.  What does the
BFD readelf -s report for those symbols in the object compiled by icc?
Comment 5 H.J. Lu 2008-03-10 03:43:06 UTC
(In reply to comment #4)
> I think you need to qualify what you say.  It is clearly true that the st_shndx
> field of a symbol is not a pure section index.  Any value above LORESERVE is
> indeed reserved.  The ELF ABI defines what to do for a symbol whose section
> index is larger than LORESERVE: put SHN_XINDEX in the st_shndx field, and put
> the real section index in the corresponding entry in the SHT_SYMTAB_SHNDX
> section.  Note that the ABI does not say to store the section index plus 256; it
> says to store the section index.
> 
> None of this has anything to do with the sh_link field in section header 0 when
> the section string table is larger than LORESERVE.  In that case, I think the
> ELF ABI says to put the section index in the sh_link field.  It does not say to
> put the section index plus 256.  Currently BFD is putting the section index plus
> 256.  I think that is wrong.

I think it is up for debate. I can see the point for the current BFD
behavior. That is each section index is unique, including special
ones. When I say section index 0xfff2, there is no ambiguity about
which section it refers to. Would you mind raising your concern at

http://groups.google.com/group/generic-abi 
> For the original test case, for a symbol defined in a section whose index is
> larger than LORESERVE, what does icc put in the SHT_SYMTAB_SHNDX section?  Does
> it put the section index, or the section index plus 256?  I believe that the ELF
> ABI says that it should store the former.  BFD stores the latter.  What does the
> BFD readelf -s report for those symbols in the object compiled by icc?

Would you mind downloading icc to check it out? I believe icc is free for
non-commercial use.
Comment 6 Ian Lance Taylor 2008-03-10 17:09:48 UTC
I compiled the original test case with icc 8.1.  I ran readelf -s.  Here are
some excerpts:

 65279: 00000000     0 SECTION LOCAL  DEFAULT 65278 section_65270
 65280: 00000000     0 SECTION LOCAL  DEFAULT 65279 section_65271
 65281: 00000000     0 SECTION LOCAL  DEFAULT PRC[0xff00] section_65272
 65282: 00000000     0 SECTION LOCAL  DEFAULT PRC[0xff01] section_65273
 65283: 00000000     0 SECTION LOCAL  DEFAULT PRC[0xff02] section_65274

 65521: 00000000     0 SECTION LOCAL  DEFAULT RSV[0xfff0] section_65512
 65522: 00000000     0 SECTION LOCAL  DEFAULT  ABS section_65513
 65523: 00000000     0 SECTION LOCAL  DEFAULT  COM section_65514
 65524: 00000000     0 SECTION LOCAL  DEFAULT RSV[0xfff3] section_65515

 65536: 00000000     0 SECTION LOCAL  DEFAULT RSV[0xffff] section_65527
 65537: 00000000     0 SECTION LOCAL  DEFAULT  UND section_65528
 65538: 00000000     0 SECTION LOCAL  DEFAULT    1 section_65529
 65539: 00000000     0 SECTION LOCAL  DEFAULT    2 section_65530

135280: 00000000     4 OBJECT  GLOBAL DEFAULT 65278 var_65270
135281: 00000000     4 OBJECT  GLOBAL DEFAULT 65279 var_65271
135282: 00000000     4 OBJECT  GLOBAL DEFAULT PRC[0xff00] var_65272
135283: 00000000     4 OBJECT  GLOBAL DEFAULT PRC[0xff01] var_65273
135284: 00000000     4 OBJECT  GLOBAL DEFAULT PRC[0xff02] var_65274

135522: 00000000     4 OBJECT  GLOBAL DEFAULT RSV[0xfff0] var_65512
135523: 00000000     4 OBJECT  GLOBAL DEFAULT  ABS var_65513
135524: 00000000     4 OBJECT  GLOBAL DEFAULT  COM var_65514
135525: 00000000     4 OBJECT  GLOBAL DEFAULT RSV[0xfff3] var_65515

135537: 00000000     4 OBJECT  GLOBAL DEFAULT RSV[0xffff] var_65527
135538: 00000000     4 OBJECT  GLOBAL DEFAULT  UND var_65528
135539: 00000000     4 OBJECT  GLOBAL DEFAULT    1 var_65529


Then I compiled the same test case with gcc 4.2.1 and GNU binutils 2.17.50 (a
snapshot).  Here are some excerpts from readelf -s:

 65279: 00000000     0 SECTION LOCAL  DEFAULT 65278 
 65280: 00000000     0 SECTION LOCAL  DEFAULT 65279 
 65281: 00000000     0 SECTION LOCAL  DEFAULT 65536 
 65282: 00000000     0 SECTION LOCAL  DEFAULT 65537 
 65283: 00000000     0 SECTION LOCAL  DEFAULT 65538 

135281: 00000000     4 OBJECT  GLOBAL DEFAULT 65278 var_65275
135282: 00000000     4 OBJECT  GLOBAL DEFAULT 65279 var_65276
135283: 00000000     4 OBJECT  GLOBAL DEFAULT 65536 var_65277
135284: 00000000     4 OBJECT  GLOBAL DEFAULT 65537 var_65278
135285: 00000000     4 OBJECT  GLOBAL DEFAULT 65538 var_65279
135286: 00000000     4 OBJECT  GLOBAL DEFAULT 65539 var_65280

135518: 00000000     4 OBJECT  GLOBAL DEFAULT 65771 var_65512
135519: 00000000     4 OBJECT  GLOBAL DEFAULT 65772 var_65513
135520: 00000000     4 OBJECT  GLOBAL DEFAULT 65773 var_65514
135521: 00000000     4 OBJECT  GLOBAL DEFAULT 65774 var_65515
135522: 00000000     4 OBJECT  GLOBAL DEFAULT 65775 var_65516

Note that readelf is written to expect the output of gas.  Note that it does not
work correctly for the output of icc.  icc follows the ELF ABI.  gas does not.

For another test, I tried the elfutils.  Running eu-readelf -s on the object
file generated by gcc gives me this:

eu-readelf: invalid sh_link value in section 70007

In other words, the elfutils agree with icc and with the ELF ABI.  They do not
agree with the GNU binutils.


> I think it is up for debate. I can see the point for the current BFD
> behavior. That is each section index is unique, including special
> ones. When I say section index 0xfff2, there is no ambiguity about
> which section it refers to.

If we follow the ELF ABI, each section index is still unique.  The section
indexes are simply the consecutive non-negative numbers up to but not including
the number of sections.  The values from SHN_LORESERVE to SHN_HIRESERVER are not
section indexes.  They are special codes which must be interpreted specially.
Comment 7 Ian Lance Taylor 2008-03-10 17:39:12 UTC
I raised the issue on generic-abi here:

http://groups.google.com/group/generic-abi/browse_frm/thread/e8bb63714b072e67
Comment 8 H.J. Lu 2008-03-10 18:09:25 UTC
gABI says:

sh_link Unspecified If non-zero, the index of the section header string table
section

So sh_link isn't "section index", it is the section header index. What
else did binutils get wrong?
Comment 9 H.J. Lu 2008-03-10 18:28:43 UTC
(In reply to comment #8)
> gABI says:
> 
> sh_link Unspecified If non-zero, the index of the section header string table
> section
> 
> So sh_link isn't "section index", it is the section header index. What
> else did binutils get wrong?

Wait a second, the index of the section header string table section is a
"section index".
Comment 10 Ian Lance Taylor 2008-03-10 19:04:54 UTC
Don't confuse  the notion of section index with the special codes between
SHN_LORESERVE and SHN_HIRESERVE.  Those special codes are used in the st_shndx
field of a Sym structure.  They are not meaningful elsewhere--in particular they
are not meaningful in the sh_link field of section header zero.  The GNU
binutils get that link field wrong when it needs to be larger than 0xffff.

When the special code SHN_XINDEX is used, the ABI is explicit that the
SHT_SYMTAB_SHNDX section holds the section header index.  The GNU binutils store
the wrong values in the SHT_SYMTAB_SHNDX section.

I don't know of anything else that the GNU binutils get wrong.
Comment 11 H.J. Lu 2008-03-10 19:14:41 UTC
gABI says:

Some section header table indexes are reserved in contexts where index size is
restricted, for example, the st_shndx member of a symbol table entry and the
e_shnum and e_shstrndx members of the ELF header. In such contexts, the reserved
values do not represent actual sections in the object file. Also in such
contexts, an escape value indicates that the actual section index is to be found
elsewhere, in a larger field.

That means we can't use from SHN_UNDEF and SHN_LORESERVE to SHN_HIRESERVE
anywhere else.
Comment 12 Ian Lance Taylor 2008-03-10 20:03:47 UTC
> That means we can't use from SHN_UNDEF and SHN_LORESERVE to SHN_HIRESERVE
> anywhere else.

No, it doesn't.  It only means that you can't use them in contexts "where index
size is restricted."
Comment 13 H.J. Lu 2008-03-10 20:43:18 UTC
(In reply to comment #12)
> > That means we can't use from SHN_UNDEF and SHN_LORESERVE to SHN_HIRESERVE
> > anywhere else.
> 
> No, it doesn't.  It only means that you can't use them in contexts "where index
> size is restricted."
> 

It isn't clear to me if those special values have special means
where the index size isn't restricted.
Comment 14 Ian Lance Taylor 2008-03-10 21:25:25 UTC
I don't see anything in the ABI which says they have special meanings.  And I
certainly don't see anything in the ABI which says that code should add 256 to
section indexes when stored.  I think it would be important to not omit a
statement like that, as otherwise anybody could invent whatever procedure they
liked to avoid the special values.  And, again, the ABI works without adding 256.
Comment 15 Alan Modra 2008-03-11 00:49:00 UTC
I am to blame, and can't find anything to defend the current binutils behaviour.
 i.e. I agree with Ian that this is a bug.
Comment 16 H.J. Lu 2008-03-11 19:09:07 UTC
(In reply to comment #15)
> I am to blame, and can't find anything to defend the current binutils behaviour.
>  i.e. I agree with Ian that this is a bug.

Alan, we need to update the whole ELF backend for all ELF targets. Are
you work on that?
Comment 17 H.J. Lu 2008-03-11 19:14:56 UTC
I think we can add a st_xshndx field to internal ELF symbol entry:

---
--- ./internal.h.64k    2007-11-28 23:35:52.000000000 -0800
+++ ./internal.h        2008-03-11 12:14:06.000000000 -0700
@@ -100,7 +100,8 @@ struct elf_internal_sym {
   unsigned long        st_name;                /* Symbol name, index in string
tbl */
   unsigned char        st_info;                /* Type and binding attributes */
   unsigned char        st_other;               /* Visibilty, and target specific */
-  unsigned int  st_shndx;              /* Associated section index */
+  unsigned short st_shndx;             /* Associated section index */
+  unsigned int  st_xshndx;             /* Extended section index */
 };
 
 typedef struct elf_internal_sym Elf_Internal_Sym;
---

to tell if st_shndx is a special value or not.
Comment 18 Alan Modra 2008-03-11 21:21:20 UTC
Yes, I'll work on it.  My approach will be to redefine SHN_LORESERVE thru
SHN_HIRESERVE to FFFFFF00 thru FFFFFFFF, sign extending the existing values to
an unsigned int. All internal BFD uses of section indices will use these values,
so the reserved range is mapped out of the way of "real" section numbers.  We
won't need any of the code that adds SHN_HIRESERVE + 1 - SHN_LORESERVE to skip
over the reserved range.  Also, I think most backend use of SHN_* will not need
changing.
Comment 19 H.J. Lu 2008-03-11 21:31:55 UTC
(In reply to comment #18)
> Yes, I'll work on it.  My approach will be to redefine SHN_LORESERVE thru
> SHN_HIRESERVE to FFFFFF00 thru FFFFFFFF, sign extending the existing values to
> an unsigned int. All internal BFD uses of section indices will use these values,
> so the reserved range is mapped out of the way of "real" section numbers.  We
> won't need any of the code that adds SHN_HIRESERVE + 1 - SHN_LORESERVE to skip
> over the reserved range.  Also, I think most backend use of SHN_* will not need
> changing.

That should work. I doubt we will have 0xFFFFFF00 sections. It will nice
to assert it just in case.
Comment 20 H.J. Lu 2008-03-24 13:38:00 UTC
Fixed by

http://sourceware.org/ml/binutils/2008-03/msg00070.html
Comment 21 Jackie Rosen 2014-02-16 17:51:04 UTC Comment hidden (spam)