This is the mail archive of the
mailing list for the binutils project.
Re: [PATCH] enable fdpic targets/emulations for sh*-*-linux*
- From: Rich Felker <dalias at libc dot org>
- To: Oleg Endo <oleg dot endo at t-online dot de>
- Cc: binutils at sourceware dot org
- Date: Sun, 4 Oct 2015 22:26:55 -0400
- Subject: Re: [PATCH] enable fdpic targets/emulations for sh*-*-linux*
- Authentication-results: sourceware.org; auth=none
- References: <20150930143555 dot GD8645 at brightrain dot aerifal dot cx> <1443627005 dot 2509 dot 189 dot camel at t-online dot de> <20150930183810 dot GE8645 at brightrain dot aerifal dot cx> <1443715139 dot 2031 dot 134 dot camel at t-online dot de> <20151001164630 dot GI8645 at brightrain dot aerifal dot cx> <1443804962 dot 2031 dot 290 dot camel at t-online dot de> <20151002175223 dot GU8645 at brightrain dot aerifal dot cx> <1443863059 dot 2031 dot 433 dot camel at t-online dot de> <20151003185947 dot GC8645 at brightrain dot aerifal dot cx> <1443929574 dot 2031 dot 506 dot camel at t-online dot de>
On Sun, Oct 04, 2015 at 12:32:54PM +0900, Oleg Endo wrote:
> On Sat, 2015-10-03 at 14:59 -0400, Rich Felker wrote:
> > >
> > > Sure, that can be done, too. Actually, you can have the function
> > > pointer table in the TLS, which makes it reachable via GBR:
> > > mov.l @(disp, gbr), r0
> > > jsr @r0
> > > nop
> > Again, that's unfortunately not possible because positive offsets from
> > GBR belong to the application's initial-exec TLS. The TLS ABI really
> > should have defined GBR to point 1024 bytes below the start of TLS
> > rather that at the start of TLS, so that up to 1k of TCB space could
> > be accessed via the short/fast GBR-based addressing. This would not
> > require reserving that much actual space (which would be a horrible
> > idea -- huge waste of memory per thread) but would just allow it it to
> > be assigned from the end downwards as needed. This is what most other
> > risc archs with limited-range immediates did.
> So fix the TLS ABI? Anyway you're building a new system...
> The same @(disp,gbr) loads/stores can be used to get/set errno. Not
> that a lot of apps out there actually use errno, but the standard
> requires it..
The ABI is a contract between multiple components, possibly with
diverse maintainers and users. From my perspective, treating it as
something you can just change at whim is irresponsible -- especially
without any evidence that doing so is even going to have measurable
benefits -- and not the way to establish a platform as something
mature and attractive to developers/users.
Note that changing this would require changes in at least libc and
binutils, and probably also gcc, and you would need matching versions
of all of them. And I have no idea if there are other third-party
tools that would also be affected. LLVM does not yet support SH but we
want it to. NetBSD probably has an old GCC 4.2 fork doing SH, and
Aboriginal Linux likewise. Etc. Of course they can just stick with an
old ABI but then you have a much more fragmented ABI-scape and it
makes it much harder to mix tools.
> > The only hope for the code running without knowledge and
> > conditional use of the newer ISA extensions is that the OS can
> > reliably notice and trap whatever old simulated atomics were used and
> > convert them to something that synchronizes memory. I advised the
> > OpenRISC developers on this issue early in their porting of musl to
> > or1k and quickly got real atomics added to the ISA so that they
> > wouldn't run into a nasty issue like this in the future. OTOH
> > Linux/MIPS handled the issue just by pretending all MIPS ISA levels
> > have the ll/sc instructions and requiring the kernel to trap and
> > emulate them on ancient hardware. That would have worked for J2 as
> > well but would have given really really bad performance.
> I'm severly confused. First you say performance of atomics doesn't
> matter (you're OK to add a 2x..3x overhead for the runtime switched
> version compared to compiler inlined). But now you are concerned about
> atomics performance. So which?
The MIPS trap probably takes 1000+ cycles (actually I'm looking for
someone with the real hardware to test it so we can make reasonable
cost assessments based on this in musl). So we're looking at
completely different scales of "bad performance".
For micro-optimizing calling conventions you're at best going to make
a difference of several percent (e.g. spending 3% of time instead of
6% of time in that code). For making something that should take tens
of cycles take 1000+, you're likely going to go from 5% time spent in
that code to 95% time spent in that code. (All these numbers are very
rough order of magnitude; if anyone cares we can work out some actual
math for them.)
> > Modulo the sigcontext ABI issue and the gratuitously different syscall
> > trap numbers (the latter of which I have a pending kernel patch to
> > fix, but it's not getting any attention because there's no maintainer
> > for SH and without a maintainer nobody can really touch design/policy
> > type issues like this...).
> Maybe for now it's more productive to create an SH linux branch (which
> is updated from mainline periodically of course) and send a pull request
> to some global maintainer after things have settled and have been
> working for a while.
I think that's a big risk for both users and for us. You've commented
yourself on how stuff (other than Linux, but same issue) has a history
of getting developed in a fork that eventually bitrots and gets
abandoned. I don't want to be in a position of asking users to trust
that forks will be properly maintained, and I also don't want to risk
that upstream will keep rejecting the changes we want and leave us
with incompatible kernel APIs/ABIs. This principle of avoiding
unilateral decisions that lead to API/ABI forks and working for
consensus with the various parties that have an interest is also
something I've stressed with musl and which I want to follow here.
> > Of course if you're running on actual sh2 hardware all the libs need
> > to refrain from using instructions from sh3/sh4/sh4a. But the same
> > dynamic binaries (built for sh2 ISA) can run just fine on sh4 (modulo
> > the sigcontext issue) with sh4-nofpu versions of the libraries
> > installed for better performance (and even with hard-float used
> > internally, like on ARM softfp).
> Yes, that's what I was saying. Except that using hard-float
> "internally" and soft-float "externally" is not supported by the
> compiler at the moment.
This would be nice to add. After finishing FDPIC I'll look into it.
ABI and instruction use should be separate options, not conflated.
Presumably this should not involve much work beyond identifying which
conditionals need to be on "has fpu" and which need to be on "using
> AFAIK, there is also no mechanism for the dynamic linker to pick the
> right libraries. E.g. when loading an SH2-nofpu ELF on an SH4-fpu
> system, it should pick the SH2-nofpu compatible libraries.
For musl there already is. Each ABI has its own dynamic linker/libc,
which in turn has its own library path configuration file. This same
approach allows us even to have multiple archs on the same filesystem