Bug 32271 - strip leaves unused PT_LOAD segments
Summary: strip leaves unused PT_LOAD segments
Status: UNCONFIRMED
Alias: None
Product: binutils
Classification: Unclassified
Component: binutils (show other bugs)
Version: 2.41
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-14 15:47 UTC by Stas Sergeev
Modified: 2024-10-17 14:35 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments
test-case (696 bytes, application/gzip)
2024-10-14 15:47 UTC, Stas Sergeev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stas Sergeev 2024-10-14 15:47:49 UTC
Created attachment 15745 [details]
test-case
Comment 1 Stas Sergeev 2024-10-14 15:54:53 UTC
The attached test file is needed to
reproduce the problem:

$ readelf -l tmp.elf
 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.property 
   01     .text 
   02     .bss 
   03     .note.gnu.property 
   04     .note.gnu.property 
   05     

$ strip --strip-debug -R '.note.*' tmp.elf
$  Section to Segment mapping:
  Segment Sections...
   00     
   01     .text 
   02     .bss 
   03     
   04     
   05     

(I have omitted some parts of readelf
output for brevity).
Now we see that the segment 0, which
was previously covering .note sections,
is unmapped, but still has non-zero size.
Segments 3, 4 and 5 are also not removed,
but they are not PT_LOAD, and have zero
size, so do not harm. But segment 0 should
be removed IMO.

Also I wonder why the segment that contains
.note sections is PT_LOAD? Is this really
needed? I have my own elf loader, and it
gets confused by this all.
Comment 2 Nick Clifton 2024-10-17 10:30:56 UTC
(In reply to Stas Sergeev from comment #1)

Hi Stas,
  
> Now we see that the segment 0, which
> was previously covering .note sections,
> is unmapped, but still has non-zero size.
> Segments 3, 4 and 5 are also not removed,
> but they are not PT_LOAD, and have zero
> size, so do not harm. But segment 0 should
> be removed IMO.

Agreed, although this is probably an enhancement rather than a bug.
Have you tried using eu-strip (from the elfutils project) or llvm-strip ?
They may produce the results that you expect.

> Also I wonder why the segment that contains
> .note sections is PT_LOAD? Is this really
> needed?

Because they are needed and used.  At least by ld.so on Linux boxes.
The notes contain information about the architecture extensions needed
by the application that is being loaded and the loader makes sure that
these are available before starting to run the code.

> I have my own elf loader, and it
> gets confused by this all.

It would probably be worth your while enhancing your loader so that it
can cope with these segments.  Even if it just ignores them.  It will
make things simpler in the long run.

Cheers
  Nick
Comment 3 Stas Sergeev 2024-10-17 10:52:36 UTC
Thanks for such a detailed reply!
Its really helpful.

(In reply to Nick Clifton from comment #2)
> Agreed, although this is probably an enhancement rather than a bug.

Having stalled PT_LOAD segment
is most likely a bug. It probably
refers to wrong offsets, or even
past EOF? Or am I missing something
that would make it still "somewhat"
valid?

> Have you tried using eu-strip (from the elfutils project)

Tried just now:
$ eu-strip --strip-debug -R '.note.*' tmp.elf 
eu-strip: Cannot remove allocated section '.note.gnu.property'

So... it refuses to remove, leaving
the file at least in a sane state.
I believe binutils `strip` leaves a
corrupted state.

> or llvm-strip ?

Same as binutils: unmapped PT_LOADs.


> > Also I wonder why the segment that contains
> > .note sections is PT_LOAD? Is this really
> > needed?
> 
> Because they are needed and used.  At least by ld.so on Linux boxes.
> The notes contain information about the architecture extensions needed
> by the application that is being loaded and the loader makes sure that
> these are available before starting to run the code.

I guess its used by glibc, not an application?
Or is there really an API to access that
info from application?

> > I have my own elf loader, and it
> > gets confused by this all.
> 
> It would probably be worth your while enhancing your loader so that it
> can cope with these segments.  Even if it just ignores them.  It will
> make things simpler in the long run.

Here's the full problem.
Those notes are added by build systems.
For example debian build system adds
.note.package.
I am using --section-start switches for
all "allocated" sections to move them to
non-standard location, but if they are
added behind my back, then they have the
default location. When some sections are
moved and some not, you end up with unloadable
ELF because total VA space became too large.
All loaders (even the one from glibc) estimate
the total VA space by subtracting minimal
address from maximal address, but in this case
such estimation fails.
So its not like I can deal with such ELFs,
at least until I want my loader to be smarter
than the one in glibc. :)

The possible work-around can be if you tell
me a magic option with which I can just change
the default VA address, which seems to be
0x8048000 right now. Then I can drop those
horrible --section-start tricks.

Thanks!
Comment 4 Nick Clifton 2024-10-17 11:19:34 UTC
(In reply to Stas Sergeev from comment #3)
Hi Stas,

>> Agreed, although this is probably an enhancement rather than a bug.
> 
> Having stalled PT_LOAD segment
> is most likely a bug. It probably
> refers to wrong offsets, or even
> past EOF? Or am I missing something
> that would make it still "somewhat"
> valid?

Sure - if the segment is referencing beyond the of the file then it is a bug.  But if not then it is more of an unexpected behaviour than a real fault.
 

>> Have you tried using eu-strip (from the elfutils project)
> 
> Tried just now:
> $ eu-strip --strip-debug -R '.note.*' tmp.elf 
> eu-strip: Cannot remove allocated section '.note.gnu.property'

Heh!  Well I guess that this is a fair response.  Removing an allocated section is quite likely to cause problems for the executable.


>> or llvm-strip ?
> 
> Same as binutils: unmapped PT_LOADs.

Well at least the two tools are consistent.

>> Because they are needed and used.  At least by ld.so on Linux boxes.
>> The notes contain information about the architecture extensions needed
>> by the application that is being loaded and the loader makes sure that
>> these are available before starting to run the code.
> 
> I guess its used by glibc, not an application?

Correct.

> Or is there really an API to access that
> info from application?

No.  In the normal case applications are never expected to be able to access this information.  There are methods that they could use, but it would be a hack rather than using a supported API.

 
> Here's the full problem.
> Those notes are added by build systems.
> For example debian build system adds
> .note.package.

You could always strip these note sections from the object files *before* you link them together.

> I am using --section-start switches for
> all "allocated" sections to move them to
> non-standard location, but if they are
> added behind my back, then they have the
> default location. When some sections are
> moved and some not, you end up with unloadable
> ELF because total VA space became too large.
> All loaders (even the one from glibc) estimate
> the total VA space by subtracting minimal
> address from maximal address, but in this case
> such estimation fails.
> So its not like I can deal with such ELFs,
> at least until I want my loader to be smarter
> than the one in glibc. :)

Heresy!  The glibc loader is perfect!  (Well no, it is not, but it is quite good :-).

> The possible work-around can be if you tell
> me a magic option with which I can just change
> the default VA address, which seems to be
> 0x8048000 right now. Then I can drop those
> horrible --section-start tricks.

Have you tried linking with -Ttext=0xNNNNNNNN ?  (And/or --text-segment=X --rodata-segment=X --ldata-segment=X).

Another possibility is to use your own linker script.  Not only could this script ensure that sections are loaded into the VA region you want, but you could also have it discard all of those unwanted debug and note sections too.

Cheers
  Nick
Comment 5 Stas Sergeev 2024-10-17 11:42:11 UTC
(In reply to Nick Clifton from comment #4)
> Sure - if the segment is referencing beyond the of the file then it is a
> bug.  But if not then it is more of an unexpected behaviour than a real
> fault.

Even if it covers some "random"
data in a file? IMHO that's still
a but. If it would be zero-sized
then fine. But its not.

> You could always strip these note sections from the object files *before*
> you link them together.

Hmm, that's an interesting trick I guess.
Its slightly more difficult to try out,
but worth a try eventually.

> Heresy!  The glibc loader is perfect!

Yeah, I know what you are talking about. :(
Been there.

> Have you tried linking with -Ttext=0xNNNNNNNN ?  (And/or --text-segment=X
> --rodata-segment=X --ldata-segment=X).

Just trued, and wtf?
$ x86_64-linux-gnu-ld int23.o int0.o asm.o ms.o plt.o -melf_i386 -static /usr/local/i386-pc-dj64/lib/uplt.o --text-segment=0x08148000 -o tmp.elf
x86_64-linux-gnu-ld: unrecognized option '--text-segment=0x08148000'
x86_64-linux-gnu-ld: use the --help option for usage information

Wow, so lets try this then:
$ x86_64-linux-gnu-ld int23.o int0.o asm.o ms.o plt.o -melf_i386 -static /usr/local/i386-pc-dj64/lib/uplt.o -text-segment=0x08148000 -o tmp.elf
x86_64-linux-gnu-ld: Error: unable to disambiguate: -text-segment=0x08148000 (did you mean --text-segment=0x08148000 ?)

Now it hints me to use -text-segment=0x08148000
only to declare it "unrecognized option"?
Very funny. :)

> Another possibility is to use your own linker script.  Not only could this
> script ensure that sections are loaded into the VA region you want, but you
> could also have it discard all of those unwanted debug and note sections too.

Another option that can actually work,
but is yet more difficult to try. When
I only need to change the default load
address. :)
Maybe fixing --text-segment or adding
some opt to set default load address is
a possibility?
Comment 6 Nick Clifton 2024-10-17 11:51:01 UTC
(In reply to Stas Sergeev from comment #5)
 
> Even if it covers some "random"
> data in a file? IMHO that's still
> a but. If it would be zero-sized
> then fine. But its not.

Can you provide a small example that reproduces this please ?  It looks like something that needs to be investigated.


> > Have you tried linking with -Ttext=0xNNNNNNNN ?  (And/or --text-segment=X
> > --rodata-segment=X --ldata-segment=X).
> 
> Just tried, and wtf?
> $ x86_64-linux-gnu-ld int23.o int0.o asm.o ms.o plt.o -melf_i386 -static
> /usr/local/i386-pc-dj64/lib/uplt.o --text-segment=0x08148000 -o tmp.elf
> x86_64-linux-gnu-ld: unrecognized option '--text-segment=0x08148000'
> x86_64-linux-gnu-ld: use the --help option for usage information

My bad.  The option is -Ttext-segment=... rather than --text-segment=...  Sorry.


> Wow, so lets try this then:
> $ x86_64-linux-gnu-ld int23.o int0.o asm.o ms.o plt.o -melf_i386 -static
> /usr/local/i386-pc-dj64/lib/uplt.o -text-segment=0x08148000 -o tmp.elf
> x86_64-linux-gnu-ld: Error: unable to disambiguate: -text-segment=0x08148000
> (did you mean --text-segment=0x08148000 ?)
> 
> Now it hints me to use -text-segment=0x08148000
> only to declare it "unrecognized option"?
> Very funny. :)

Yeah - the algorithm for hinting at spelling mistake corrections is not that smart...
Comment 7 Stas Sergeev 2024-10-17 12:04:06 UTC
(In reply to Nick Clifton from comment #6)
> (In reply to Stas Sergeev from comment #5)
>  
> > Even if it covers some "random"
> > data in a file? IMHO that's still
> > a but. If it would be zero-sized
> > then fine. But its not.
> 
> Can you provide a small example that reproduces this please ?  It looks like
> something that needs to be investigated.

I think you just need to
`strip --srip-debug -R '.note.*' tmp.elf`
on an attached example.
I can't claim anything, because I
didn't check where the stalled segment
points. But I assume it points to the
invalid locations because its sections
were removed? Or am I wrong?

> My bad.  The option is -Ttext-segment=... rather than --text-segment=... 
> Sorry.

Wow!
This actually works.
So is it the same as just specifying
the new load address for all segments?
For example if I use -Trodata-segment
then not all segments are moved, but
-Ttext-segment seems to move them all,
including notes.
Could you please explain why it is so,
or just assure me it will always move all
segments? This seems to be all I need.

> > Now it hints me to use -text-segment=0x08148000
> > only to declare it "unrecognized option"?
> > Very funny. :)
> 
> Yeah - the algorithm for hinting at spelling mistake corrections is not that
> smart...

Well, for sure it could at least check
if such option actually exists, before
suggesting.
ld.lld (aka llvm ld) does the correct hint.
Comment 8 Nick Clifton 2024-10-17 13:34:11 UTC
(In reply to Stas Sergeev from comment #7)

> > My bad.  The option is -Ttext-segment=... rather than --text-segment=... 
> > Sorry.
> 
> Wow!
> This actually works.
> So is it the same as just specifying
> the new load address for all segments?

Yes.  ish...

> For example if I use -Trodata-segment
> then not all segments are moved, but
> -Ttext-segment seems to move them all,
> including notes.
> Could you please explain why it is so,
> or just assure me it will always move all
> segments? This seems to be all I need.

OK, so the -Ttext-segment sets the start address for the text segment
and by default the other segments (rodata & data) are mapped to start
after the end of the text segment.  So just using -Ttext-segment
effectively moves all (loadable) segments, not just the code segment.

Of course if you combine -Ttext-segment and -Trodata-segment then the
read only segment will be set to where you specify and not after the
text segment.  (Assuming that there is read only data and that a
separate read only segment is being created.  It is possible to have
the linker put code and read only data into the same segment. In which
case only the -Ttext-segment option would be effective).

Notes are considered to be read only data so they will normally be
put into the read only data segment, if one is being created, or the
text segment otherwise.
Comment 9 Stas Sergeev 2024-10-17 14:14:19 UTC
(In reply to Nick Clifton from comment #8)
> OK, so the -Ttext-segment sets the start address for the text segment
> and by default the other segments (rodata & data) are mapped to start
> after the end of the text segment.

Yes, that's quite obvious. :)
But:

> Notes are considered to be read only data so they will normally be
> put into the read only data segment, if one is being created, or the
> text segment otherwise.

And this is exactly not the case
here, which is why I asked for the
additional clarification:

   00     .note.gnu.property 
   01     .text 
   02     .bss 
   03     .note.gnu.property 
   04     .note.gnu.property 
   05

rodata is segment 03 here.
What makes me wonder is why -Ttext-segment
relocates segment 00, which is before .text?
Will this always be the case with the
further versions of binutils?

Also do you agree with my assumption
that unmapped segment may refer to an
invalid data?
Comment 10 Stas Sergeev 2024-10-17 14:35:21 UTC
Let me clarify.
So with --Trodata-segment=0x08148000 I get this:

  Тип            Смещ.    Вирт.адр   Физ.адр    Рзм.фйл Рзм.пм  Флг Выравн
  LOAD           0x000000 0x08048000 0x08048000 0x0011c 0x0011c R   0x1000
  LOAD           0x001000 0x08049000 0x08049000 0x000c5 0x000c5 R E 0x1000
  LOAD           0x000000 0x08148000 0x08148000 0x00000 0x00018 RW  0x1000
  NOTE           0x0000f4 0x080480f4 0x080480f4 0x00028 0x00028 R   0x4
  GNU_PROPERTY   0x0000f4 0x080480f4 0x080480f4 0x00028 0x00028 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

As you can see, only segment 03
have moved. So I assume 03 is rodata.

From here we see:
   00     .note.gnu.property 
   01     .text 
   02     .bss 
   03     .note.gnu.property 
   04     .note.gnu.property 
   05

... that .text is segment 01.
This means 00 may not be moved with
-Ttext-segment, but it actually is.