Porting glibc to Coldfire

Tue Aug 15 16:27:00 GMT 2006

This patch ports glibc to Coldfire.  It was tested with the kernel
from Freescale's CWF-MCF547X-548X-2-6-KL BSP:

    http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=CWB-MCF547X-548X-2-6-KL&srch=1

with this additional patch from Freescale applied:

    http://www.codesourcery.com/archives/coldfire-gnu-discuss/msg00041.html

Everything was compiled with the csl/coldfire-4_1 branch of gcc:

    svn://gcc.gnu.org/svn/gcc/branches/csl/coldfire-4_1

which we hope to merge into mainline after 4.2 has branched.

Coldfire vs. m680x0
===================

I suppose one fundamental question is: should Coldfire be treated as
a separate port from m68k, or as a subport?  Although there are several
differences between Coldfire and m680x0, I think the architectures are
similar enough to justify treating them as variations of the same base
port.  gcc, linux and uClinux have done the same thing.

The patch therefore adopts the following directory structure:

    sysdeps/.../m68k/{,fpu/}
    sysdeps/.../m68k/m680x0/{,fpu/}
    sysdeps/.../m68k/m680x0/m68020/{,fpu/}
    sysdeps/.../m68k/coldfire/{,fpu/}

so that files can classified as m680x0-only, Coldfire-only, or suitable
for both.  This involves moving a lot of files from m68k/ to m68k/m680x0/,
and because a patch to do that would be almost unreadable, I've attached
a shell script to do it instead.  I've made the main patch relative to
the moved files.

If a file only needs small changes for Coldfire, I've kept it in
sysdeps/.../m68k/ and used __mcoldfire__ or __mcffpu__ to select
the Coldfire parts.  Obviously it's a judgement call as to how much
variation can be treated as "small", but I hope the balance seems OK.

If no processor is specifically selected by the target triplet, the patch
to sysdeps/m68k/preconfigure will use the compiler to choose between m680x0
or Coldfire as appropriate.

Generic m68k fixes
==================

I came across a few problems with the existing m68k port.  Because I've
got no way of testing the port without the Coldfire support, and because
one or two of the fixes are in code that is sensitive to the Coldfire/m680x0
distinction, I'm afraid everything's lumped together.  The fixes are fairly
simple though.  Specifically:

- sysdeps.h didn't guard against multiple inclusion.

- The definitions of feholdexcept and fesetround were missing
  a libm_hidden_def().

- setjmp.c used hidden_def() rather than libc_hidden_def(), which led to:

    error: '__EI___sigsetjmp' aliased to undefined symbol '__GI___sigsetjmp'

  I've changed it to use libc_hidden_def() instead.

- In dl-trampoline.S:

  - The code that rounds the frame size used "lsr" (implicitly "lsr.w")
    rather than lsr.l, causing it to mishandle large frames:

 	| Round framesize up to even
 	addq.l #1, %d1
	lsr #1, %d1
 	sub.l %d1, %a0
 	sub.l %d1, %a0

  - The code that calls _dl_call_pltexit() failed to initialize the
    lrv_a0 field of the outregs parameter, which in turn meant that
    the contents of lrv_fp0 were at the wrong offset.  Also, the inregs
    parameter pointed 4 bytes below the structure it was supposed to
    point at.

    I've fixed these problems and adjusted the stack offsets of other
    data to account for the extra field.

- The port was missing ldsodefs.h and tst-audit.h.  These files are
  needed because upstream sources no longer provide the m68k definitions.

- struct fpregset was out of sync with linux.  linux puts the
  data registers after the control registers, but glibc had them
  the other way round.

- The layout of struct ucontext was also out of sync with linux.
  uc_sigmask should come after uc_filler, and uc_filler should
  have 80 rather than 174 elements.

- m68k glibc was using the standard linux layout of struct siginfo, but
  m68k linux uses a different layout.  It appears that the uid fields
  were once 16-bit fields on m68k linux, and that, to avoid breaking
  backward compatibility, 32-bit versions were later tacked on to the
  end of each substructure.  I've therefore added an m68k linux-specific
  siginfo.h file.

- The generic implementation of wcpcpy.c accesses the source string
  using an offset from the destination string:

    wchar_t *
    __wcpcpy (dest, src)
	 wchar_t *dest;
	 const wchar_t *src;
    {
      wchar_t *wcp = (wchar_t *) dest - 1;
      wint_t c;
      const ptrdiff_t off = src - dest + 1;

      do
	{
	  c = wcp[off];
	  *++wcp = c;
	}
      while (c != L'\0');

      return wcp;
    }

  which means that sizeof (wchar_t) must be __alignof__ (wchar_t).
  On m68k, the values are 4 and 2 respectively, so the routine won't
  work if ((intptr_t) dest % 2) != ((intptr_t) src % 2).

  wcscpy.c (which was written a year earlier) does check the alignment,
  and so works out of the box on m68k.  I don't think there's any chance
  of getting the upstream version of wcpcpy.c changed in the same way,
  so I've added a port-local version.  I've also done the same for
  wcpcpy-chk.c, which has the same problem.

- m68k/sysdep-cancel.h wrongly treated __librt_multiple_threads as
  hidden, and the assembler version of SINGLE_THREAD_P used PC-relative
  addressing to access it.  I've removed the hidden attribute and made
  librt's SINGLE_THREAD_P load the symbol from the GOT instead.  The new
  implementation of SINGLE_THREAD_P needs a temporary address register,
  which is passed as an argument to the macro.

Optimizations
=============

I had to change the implementation of the string and memory functions
for Coldfire, and noticed that some of them could be optimized slightly.
When trying to reach an alignment boundary, the current code moves the
address into a data register and "and"s it with 3 to see if it is
already aligned.  If it isn't aligned, the code would repeat the check
one byte later, and again for the byte after that.  It would be simpler
to use subq and addq on the first "and" result instead.  (We can use
addq.w and subq.w on m680x0.)  From what I remember of 68000, I think
this is better for 680x0 targets too.

Coldfire changes
================

The main differences between the Coldfire and m680x0 code are as follows:

- FPU differences:

  - FP registers are 64 bits rather than 96 bits wide.

  - Coldfire does not have the 68881's fmovem.l; we must save and restore
    individual control registers.

  - Long doubles are the same as doubles.

  - The canonical NaN has all significand bits set.  Some files in
    ieee754/dbl-64 use hard-coded hex constants, so I've overridden
    them (e_pow.c, s_sin.c and u_remainder.c).

  - Unlike the 68881, the Coldfire FPU lets you raise exceptions by
    setting the appropriate EXC bits of the FPSR and then executing
    an arithmetic instruction.  This makes the implementation of
    fraiseexcpt.c easier.

  - The Coldfire FPU has a much smaller set of instructions than the 68881.
    The functions it does support directly are: fabs(), sqrt(), lrint()
    rint(), and their float and long double equivalents.

- ISA differences:

  - 32-bit PC-relative offsets must be loaded into a register and then
    applied using offset(%pc,reg).  I've added a PCREL_OP macro to wrap up
    this difference.

  - Coldfire does not have jmp (%dN) and jsr (%dN).  Those instructions are
    used in dl-trampoline.S in cases where every address register is live,
    so I've simulated them using push and rts instructions.

  - Coldfire strongly prefers a 32-bit aligned stack pointer, so I've
    rounded frame sizes up to longword rather than word alignment.

  - Coldfire does not have dbra, exg or word-sized register operations.

- Kernel differences:

  - FPU-related fields are often laid out differently.

  - FP registers have different ptrace() numbers.

  - sigcontext has fields for all registers, avoiding the need for the
    real_catch_segfault hack in register-dump.h.

- Coldfire has no atomic compare-and-swap instruction and the kernel
  does not yet have any userspace atomicity support.  I've therefore
  used the generic bits/atomic.h implementation, but with the addition
  of the now-required atomic*_t types.  (I don't think upstream would
  allow these types to be added to the generic bits/atomic.h as none of
  the core targets use that file.)

Compatibility
=============

Because Coldfire is a new port, we don't need to be compatible with
versions before 2.4.  So:

- I've set the default version to GLIBC_2.4 in
  sysdeps/m68k/coldfire/shlib-versions.

- I've moved oldgetrlimit and oldsetrlimit from
  sysdeps/unix/sysv/linux/m68k/syscalls.list to the new
  sysdeps/unix/sysv/linux/m68k/m680x0/syscalls.list.

Expected test faliures
======================

As far as the testsuite goes, some tests failed for me because of the
usual environmental limitations.  For example, the board had only 64MB
of RAM, which isn't enough for some tests, and the root fs was
NFS-mounted, which causes tests like tst-utmp and tst-utmpx to fail.

There are some expected non-environment failures too:

math/test-misc.out
misc/tst-efgcvt.out
stdio-common/tst-printf.out

  - These tests require correct subnormal handling.  The kernel does
    not yet emulate subnormal operations.

build rt/tst-aio2.o
build rt/tst-aio3.o

  - These tests should (but don't) include <pthread.h>, as they refer
    to PTHREAD_BARRIER_SERIAL_THREAD.  Changes are unlikely to be
    accepted upstream because NPTL ports presumably work as-is.

rt/tst-aio10.out
rt/tst-aio9.out

  - A LinuxThreads limitation.  We implement lio_listio using
    pthread_cond_wait, which does not stop and return EINTR when
    a signal is raised.  NPTL avoids this using aio_misc.h.

math/test-double.out
math/test-float.out
math/test-idouble.out
math/test-ifloat.out

  - All four tests fail some llrint_upward and llrint_downward checks
    because of a bug in the generic llrint.c code; see bug #2592.
    test-float.out and test-double.out also fail because the Coldfire
    FPU does not distinguish between quiet and signalling NaNs;
    all NaN inputs raise an Invalid Operation exception.

As a sanity check, I've also built a 68020 glibc.  I used
csl/coldfire-4_1 again, but with the attached mainline backports applied,
and with gcc configured using --with-cpu=68020 and --with-float=hard.

The patch is in three pieces; the initial move-only script, the main
ports patch, and a linuxthreads patch.  There is talk of supporting
NPTL in future, but nothing definite yet.

Please install if OK.

Richard

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glibc-move.clog
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glibc-move.sh
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glibc.clog
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glibc.diff
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment-0003.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glibc-linuxthreads.clog
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment-0004.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: glibc-linuxthreads.diff
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment-0005.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: gcc-backports.diff
URL: <http://sourceware.org/pipermail/libc-ports/attachments/20060815/02569106/attachment-0006.ksh>