This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][SH] CFI directives patch
- From: Chung-Lin Tang <chunglin dot tang at gmail dot com>
- To: Thomas Schwinge <thomas at codesourcery dot com>
- Cc: Kaz Kojima <kkojima at rr dot iij4u dot or dot jp>, libc-alpha at sourceware dot org
- Date: Fri, 11 May 2012 17:15:52 +0800
- Subject: Re: [PATCH][SH] CFI directives patch
- References: <4FA37F28.6040004@codesourcery.com> <87obpxaccw.fsf@schwinge.name>
Hi Thomas,
> I don't really feel strongly about this, but would it make sense to move
> the cfi_remember_state instructions to the place just before the control
> flow (potentially) branches? It does not make any functional difference,
> but might make things more clear conceptually. Or, instead push it even
> further forward, right to the place where the last CFI update is done,
> like I had done in
> <http://thread.gmane.org/gmane.comp.lib.glibc.ports/514/focus=518>, for
> example?
IMHO, I just thought it more clear to mark the epilogue-in-the-middle
like a push/pop. In that sense, the cfi_remember_state should be kept
right before where the later thrown away CFI state starts...
>> --- a/nptl/sysdeps/unix/sysv/linux/sh/sysdep-cancel.h
>> +++ b/nptl/sysdeps/unix/sysv/linux/sh/sysdep-cancel.h
>> @@ -71,6 +71,7 @@
>> CDISABLE; \
>> mov.l @r15+,r0; \
>> cfi_adjust_cfa_offset (-4); \
>> + cfi_restore (r0); \
>> lds.l @r15+,pr; \
>> cfi_adjust_cfa_offset (-4); \
>> cfi_restore (pr); \
>> @@ -97,10 +98,10 @@
>> # define SAVE_ARGS_6 SAVE_ARGS_5
>>
>> # define LOAD_ARGS_0 /* Nothing. */
>> -# define LOAD_ARGS_1 LOAD_ARGS_0; mov.l @(0,r15),r4
>> -# define LOAD_ARGS_2 LOAD_ARGS_1; mov.l @(4,r15),r5
>> -# define LOAD_ARGS_3 LOAD_ARGS_2; mov.l @(8,r15),r6
>> -# define LOAD_ARGS_4 LOAD_ARGS_3; mov.l @(12,r15),r7
>> +# define LOAD_ARGS_1 LOAD_ARGS_0; mov.l @(0,r15),r4; cfi_restore (r4)
>> +# define LOAD_ARGS_2 LOAD_ARGS_1; mov.l @(4,r15),r5; cfi_restore (r5)
>> +# define LOAD_ARGS_3 LOAD_ARGS_2; mov.l @(8,r15),r6; cfi_restore (r6)
>> +# define LOAD_ARGS_4 LOAD_ARGS_3; mov.l @(12,r15),r7; cfi_restore (r7)
>> # define LOAD_ARGS_5 LOAD_ARGS_4
>> # define LOAD_ARGS_6 LOAD_ARGS_5
>
> Why do we need to care about these caller-save registers here? You are
> right that the current code is assymetric in that, for example,
> Âcfi_rel_offset (r0, 0)Â is used but not Âcfi_restore (r0)Â -- but what
> is the reason for doing the former in the first place (and likewise for
> the others occurences/registers)? And if it's not needed to preserve
> these registers' content, should rather the Âcfi_rel_offsetÂs be removed?
I think it has something to do with unwinding across cancellable system
calls, similar to what other architectures have (I see MIPS with similar
code).
I cannot seem to google up a really definitive explanation on why this
point of caller-saves has to be restored though.
>> diff --git a/sysdeps/unix/sh/sysdep.S b/sysdeps/unix/sh/sysdep.S
>> index e816575..939eb9a 100644
>> --- a/sysdeps/unix/sh/sysdep.S
>> +++ b/sysdeps/unix/sh/sysdep.S
>> @@ -41,13 +41,15 @@ skip:
>> sts.l pr, @-r15
>> cfi_adjust_cfa_offset (4)
>> cfi_rel_offset (pr, 0)
>> + cfi_adjust_cfa_offset (4)
>> jsr @r1
>> mov.l r0, @-r15
>> - cfi_adjust_cfa_offset (4)
>
> Hmm?
I think this was to fix the case of a frame-insn in a call delay slot
kind of problem. Imagine unwinding from inside the @r1 function (here
it's errno); it will get the stack position one word off.
But reviewing it, I think it's better to move the frame-insn (and CFI)
above the call and just fill nop.
>> diff --git a/sysdeps/unix/sysv/linux/sh/sysdep.h b/sysdeps/unix/sysv/linux/sh/sysdep.h
>> index 5215a84..f5e7524 100644
>> --- a/sysdeps/unix/sysv/linux/sh/sysdep.h
>> +++ b/sysdeps/unix/sysv/linux/sh/sysdep.h
>> @@ -120,14 +120,13 @@
>> # endif
>> # define SYSCALL_ERROR_HANDLER \
>> neg r0,r1; \
>> - mov r12,r2; \
>> - mov.l 0f,r12; \
>> + mov.l 0f,r2; \
>> mova 0f,r0; \
>> - add r0,r12; \
>> + add r0,r2; \
>> mov.l 1f,r0; \
>> stc gbr, r4; \
>> - mov.l @(r0,r12),r0; \
>> - mov r2,r12; \
>> + add r0,r2; \
>> + mov.l @r2,r0; \
>> add r4,r0; \
>> mov.l r1,@r0; \
>> bra .Lpseudo_end; \
>> @@ -139,13 +138,12 @@
>> /* Store (-r0) into errno through the GOT. */
>> # define SYSCALL_ERROR_HANDLER \
>> neg r0,r1; \
>> - mov r12,r2; \
>> - mov.l 0f,r12; \
>> + mov.l 0f,r2; \
>> mova 0f,r0; \
>> - add r0,r12; \
>> + add r0,r2; \
>> mov.l 1f,r0; \
>> - mov.l @(r0,r12),r0; \
>> - mov r2,r12; \
>> + add r0,r2; \
>> + mov.l @r2,r0; \
>> mov.l r1,@r0; \
>> bra .Lpseudo_end; \
>> mov _IMM1,r0; \
>
> Hmm. In my understanding, r12 is the standard register to use for
> accessing the GOT. I think the linker is only able to optimize this
> exact sequences. Also, isn't there another case for RTLD_PRIVATE_ERRNO?
> What about using Âcfi_register (r12, r2)Â, and later Âcfi_restore (r12)Â.
I think I simply thought that the modified sequence could avoid
clobbering r12, while also looks like to be a little bit faster; I
overlooked the point on linker optimizations.
I'll send some revised parts later.
Thanks,
Chung-Lin