[RFC] ABI bump for building with gcc4 ?

Charles Wilson cygwin@cwilson.fastmail.fm
Fri Mar 13 16:44:00 GMT 2009


Yaakov (Cygwin/X) wrote:
> Some maintainers 

That would be me.

> have mentioned that they plan to ABI-number-bump their
> libraries when they rebuild them with gcc-4.3.  Frankly, I think this is
> a bad idea, and I'll try my best to explain why.  In no particular order:

Everybody is entitled to their own opinion.

> 1) If we do this distro-wide, we will essentially double the number of
> DLLs in the distro, which will take up several times as much disk- and
> ImageBase space.  (C++ libraries in particular are *much* smaller with
> shared libstdc++6.)

True.  Until all -- or almost all -- of the distro is *slowly* rebuilt
using gcc4 -shared-libgcc.  The difference is, it CAN be slow, and
needn't happen all at once on some "flag day".

> 2) Is there precedence for this?  I certainly see none in Debian, which
> has gone through several major GCC version transitions.

It's not about the gcc version, per se. It's about the other changes to
*cygwin's* gcc ABI that accompany the switch from gcc-3 to gcc-4. To wit:

1) Obviously any C++ libraries must version bump. No choice there,
because the C++ ABI has changed.  Among other things: dwarf-2 exception
handling, the PR24196 issue (gcc3 used a specific patch to fix passing
empty strings between modules; Dave's gcc4 build uses
--enable-fully-dynamic-string. That's an ABI change). And besides, I
don't think FSF claims ABI compatibility between major releases of C++,
but I could be wrong there.

2) sjlj to dwarf-2. This also affects C indirectly, not just C++ --
having to do with unwinding exceptions thrown in Java or C++ code thru C
libraries.  For instance, suppose I'm using gtkmm, a C++ wrapper around
the gtk C libraries.  In my client code, I pass a callback to gtkmm
(which in turn hands it off to the event loop in gtk).  Something
happens, and it throws.  Since (presumably) because of point #1 above,
both gtkmm and my code are compiled with gcc4 and have been "version
bumped" if DLLs.  BUT, the gtk DLLs have not been.  I have no way of
knowing whether the particular cyggtk-x11-2.0-0.dll  (e.g. DLL number
"0") is dwarf2 or sjlj.

If we do NOT bump the DLL numbers with gcc-4, then IF the one I have
installed with my new gtkmm DLLs and my client code HAS been compiled
with gcc4, then all is well. But if not, then my C++ callback throws a
dwarf2 exception, which unwinds thru the sjlj gtk code...and can never
be caught, even when you eventually unwind back to gtkmm's C++ (dwarf2)
code.  "Unhandled Exception" popup.

What do we tell our users? "Well, with (new version) gtkmm, you have to
have gtk2-x11-runtime-2.6.10-17 or newer even though the internal DLLs
have the same name, because, well, we deliberately broke the ABI on C
libraries without updating the DLL number because it was too hard. Just
upgrade."

And then what happens if they have a different C++ app that hasn't yet
been recompiled to use the new (version bumped) C++ gtkmm library.  Once
they in-place upgrade cyggtk2, now THAT app has sjlj exceptions that
unwind thru the dwarf2 cyggtk2, before going back to the sjlj gtkmm and
(ultimately) the sjlj client code.  I guess that'd still work okay
(except...see [*] below).

But it's really ugly, and very surprising to end users.  Sure, WJM and
all that. But basically you're back to a "flag day" recompile everything
all at once issue. :-(

3) -shared-libgcc vs. -static-libgcc.  I was ALSO assuming that
"recompile with gcc4" was semantically equivalent to "and link with the
shared libgcc".  I considered this to be a significant -- possibly ABI
breaking [*] -- change, that in itself would force an ABI version bump.
Do we really want to mix DLLs and apps where some use the shared libgcc
and others directly contain pieces of the gcc3 static one (with possibly
different internals)?

[*] cygwin-gcc3 modified something fairly major in the ABI having to do
with throwing/catching exceptions thrown "across the DLL boundary". gcc4
(except for TDM's mingw-experimental builds) does not do that; instead
you just *can't* throw exceptions across the DLL boundary if you link
with the static libgcc, IIRC.  You *must* use -shared-libgcc if you want
to do that.

Now, this ABI difference (between cygwin-gcc3 and
stock-gcc3/all-but-tdm-gcc4) was way down at the libgcc level, NOT the
libsupc++ level AFAIK.  So I *think* even gcc4 -static-libgcc is an
"ABI" breakage from gcc3 -- EVEN IF our gcc4 were compiled sjlj!

All in all, *cygwin's* gcc4 is a major change from *cygwin's* gcc3, even
if upstream gcc3/gcc4 was not, and linux distros did not need to
soname-bump their libraries when making the transition.

I bet those linux distros were using dwarf2 gcc3, and -enable-shared gcc3.

Because this is a major change *for us* -- fraught with the possibility
of incompat probs -- we have two choices:

A) version bump all shared libs; newly compiled code (with dw2, gcc4,
-shared-libgcc) will use new, compatible (dw2, gcc4, -shared-libgcc)
libraries. Old code (gcc3) will continue to use old libs (gcc3).

B) FLAG day. Hey all you maintainers, you have 2 weeks to rebuild
everything you maintain.  Or bad things will happen to users and it will
be your fault, you sluggard.

Guess which way I lean? Hell, it took me a week just to work out the
issues with ncurses, and that was *without* switching compilers.

> 3) Reversioning a library requires a patch for each package.  These
> patches will never be accepted upstream, and for those libraries which
> are ABI stable, will need to be maintained ad nauseum.

Yep. We break the ABI, we pay the cost. However, the benefit is (1)
shared libgcc (2) modern gcc (3) faster execution thanks to dw2. My
argument is that we ARE breaking the ABI, and we shouldn't lie about it
just because it is easier.

(Side note: Dave, what are your plans for the gcc4 mingw cross compiler?
sjlj or dw2?  Or maybe just take the mingw.org src tarball and build it
using "their" cross scripts
   http://www.mingw.org/wiki/LinuxCrossMinGW
adapted for our packaging system? etc)

> 4) Changing the ABI number of a library doesn't always make sense.  Take
>  libjpeg62 for example, 62 = 6b (the package version).  What are you
> going to call it, libjpeg63?  Then what happens when jpeg-6c is
> (finally?) released?

Well, jpeg is just dumb.  It's not my fault they chose a silly system
for their SONAMEs.  The libtool documentation (and a little thought)
will tell you, do NOT try to make your library SONAME match your package
version.  They did.  Oops.

My plan for libjpeg was to bump the DLL vernum to 100.  And then just
increment by one as needed. Then *cygwin's* dll number would, in fact,
be unrelated to the package version, as God and libtool intended.

> 5) Different ABI versions imply parallel-installability.  But what if
> the library uses dlopen()ed modules, then they would also have to be
> parallel-installable.  Changing their location would require YA patch,
> not only for that package, but also for other packages that install
> modules for that library.

Yes. I realize that -- and it's a big issue for large distributions like
your GNOME and KDE ports.

> Instead, I suggest that the entire distro be rebuilt, bottom-up, with
> gcc-4.3, and simply inform users that they may need to rebuild their own
> home-built packages for compatibility. 

Without version bumping even C++ libs?

> I'm prepared to do that with all
> my packages, and between Cygwin/X and Ports, I think I can safely say
> that I don't say that lightly.  But doing this any other way is, IMHO,
> simply not sustainable.

So you agree that the gcc3-->gcc4 transition presents possible
compatibility issues?  That new stuff WILL be incompatible with old
stuff (e.g. ABI bump vs FLAG day)?  Then why did you bring up linux:
"They didn't need to bump their ABIs" -- of course not.  They had no
compatibility issues to worry about. We do --

and it *looks* like you mostly agree with all of the compat issues I
raised.  You just come down on the other side of the (A) ABI bump (B)
FLAG day argument.  (With the added fillip that "and we tell all of our
users to recompile all of their stuff, too")

Uhm, remind me again why the next release of cygwin is called cygwin-1.7
and contains cygwin1.dll, and not cygwin-2.0 with cygwin2.dll?

I have a hunch that your plan will lead to a lot more people sticking
with cygwin-1.5 for a lot longer, even on NT+.  There will be pushback
about our plan to have a single setup.exe which can only install
cygwin-1.5 if you're running Win9x.

If we DO decide to go the FLAG day route (we've done it before, back
during the 1.3.x to 1.5.x transition), then I'd argue that we should
continue to support 1.5.x on NT+ for some transition period (6 mos? 1
yr?) AFTER cygwin-1.7 is considered stable for production use, to give
our users time to recompile all of their clients for cygwin-1.7.  Which
is supposedly backwards compatible.

--
Chuck



More information about the Cygwin-apps mailing list