[PATCH v2] Mark more functions as __COLD

Sergey Bugaev bugaevc@gmail.com
Fri May 19 10:35:29 GMT 2023


On Thu, May 18, 2023 at 10:43 PM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
> The rationale seems ok, some comments below.

Thanks. Any thoughts on the .text.{startup,exit} part?

> > -void
> > +void __COLD
> >  __libc_fatal (const char *message)
> >  {
> >    _dl_fatal_printf ("%s", message);
> >  }
> >  rtld_hidden_def (__libc_fatal)
> >
>
> Can't you just add on the function prototype at include/stdio.h? Same
> question for the __assert_fail and __assert_perror_fail below.

But I did just that (added __COLD to the prototypes in include/stdio.h
and include/assert.h), didn't I?

If you're saying that it's not worth repeating __COLD on the
definition, then sure, I could remove that if you prefer.

> > +/* Intentionally not marked __COLD in the header, since this only causes GCC
> > +   to create a bunch of useless __foo_chk.cold symbols containing only a call
> > +   to this function; better just keep calling it directly.  */
> >  extern void __chk_fail (void) __attribute__ ((__noreturn__));
> >  libc_hidden_proto (__chk_fail)
> >  rtld_hidden_proto (__chk_fail)
>
> Why exactly gcc generates the useless __foo_chk.cold for this case? Is this a
> bug or a limitation?

I don't know; your guess is as good as mine (actually yours would be
better than mine). But my guess would be that they just didn't think
to add a check that whatever code size savings they're getting by
moving the cold path into a separate section outweigh the jump
instruction to get there.

Here's what I'm getting specifically, on i686-gnu:

Dump of assembler code for function __ppoll_chk:
Address range 0x198760 to 0x19879e:
   0x00198760 <+0>: 56                 push   %esi
   0x00198761 <+1>: 53                 push   %ebx
   0x00198762 <+2>: 83 ec 04           sub    $0x4,%esp
   0x00198765 <+5>: 8b 44 24 20         mov    0x20(%esp),%eax
   0x00198769 <+9>: 8b 54 24 14         mov    0x14(%esp),%edx
   0x0019876d <+13>: 8b 4c 24 10         mov    0x10(%esp),%ecx
   0x00198771 <+17>: 8b 5c 24 18         mov    0x18(%esp),%ebx
   0x00198775 <+21>: c1 e8 03           shr    $0x3,%eax
   0x00198778 <+24>: 8b 74 24 1c         mov    0x1c(%esp),%esi
   0x0019877c <+28>: 39 d0               cmp    %edx,%eax
   0x0019877e <+30>: 0f 82 9d bb e8 ff   jb     0x24321 <__ppoll_chk.cold>
   0x00198784 <+36>: 89 74 24 1c         mov    %esi,0x1c(%esp)
   0x00198788 <+40>: 89 5c 24 18         mov    %ebx,0x18(%esp)
   0x0019878c <+44>: 89 54 24 14         mov    %edx,0x14(%esp)
   0x00198790 <+48>: 89 4c 24 10         mov    %ecx,0x10(%esp)
   0x00198794 <+52>: 83 c4 04           add    $0x4,%esp
   0x00198797 <+55>: 5b                 pop    %ebx
   0x00198798 <+56>: 5e                 pop    %esi
   0x00198799 <+57>: e9 b2 b9 fb ff     jmp    0x154150 <__GI_ppoll>
Address range 0x24321 to 0x24326:
   0x00024321 <-1524799>: e8 5c ff ff ff     call   0x24282 <__GI___chk_fail>
End of assembler dump.

It's spending 6 bytes for the 'jb __ppoll_chk.cold', only to jump to
'call __GI___chk_fail' which takes 5 bytes. That's negative space
savings, both overall and inside .text.

And actually frankly that's bad codegen altogether, unless I'm missing
something. Why not

mov 20(%esp), %eax
shr $3, %eax
cmp 8(%esp), %eax
jnb __GI_ppoll
push %ebp
mov %esp, %ebp
call __GI___chk_fail

Then maybe it'd make sense to move the "push, mov, call" into
.text.unlikely, adding a jmp.

Sergey


More information about the Libc-alpha mailing list