An article about the Cygnus tree

Michael Sokolov msokolov@ivan.Harhan.ORG
Mon Sep 4 14:21:00 GMT 2000


Hi there,

In light of the question of merging the src and gcc repos recently raised again
in a thread on the GCC list about adding zlib to GCC, I've written an article
about the Cygnus tree. It appears below. I hope it will start a better
discussion about merging the repos and breaking the code of silence around the
Cygnus tree.

This article also appears on my FTP site in /pub/embedded/cygnus-tree-intro on
ivan.Harhan.ORG. Whenever I release a toolchain for one of the embedded systems
I'm working with based on the Cygnus tree, I have to explain to my users what
it is, given its obscurity. Now I can just refer them to my article.

Enjoy!

--
Michael Sokolov		Harhan Engineering Laboratory
Public Service Agent	International Free Computing Task Force
			International Engineering and Science Task Force
			615 N GOOD LATIMER EXPY STE #4
			DALLAS TX 75204-5852 USA

Phone: +1-214-824-7693 (Harhan Eng Lab office)
E-mail: msokolov@ivan.Harhan.ORG (ARPA TCP/SMTP) (UUCP coming soon)

Here is the article:

			An Introduction to the Cygnus Tree

				By Michael Sokolov
			International Free Computing Task Force

			@(#)cygnus-tree-intro	1.1	00/09/04


			1. What is the Cygnus tree?

"The GNU configure and build system" by Ian Lance Taylor gives the following
answer:

	The Cygnus tree is used for various packages including gdb, the GNU
	binutils, and egcs.  It is also, of course, used for Cygnus releases.
	It is the build system which was developed at Cygnus, using the Cygnus
	configure script.  It permits building many different packages with a
	single configure and make.  The configure scripts in the tree are being
	converted to autoconf, but the general build structure remains intact.

During the 1990s Cygnus Solutions (now part of Red Hat, Inc.) has created a
remarkable system that is usually called the Cygnus tree. They have taken many
GNU programs (which were all designed as completely self-contained stand-alone
packages), most importantly the ones used for software development (gcc, gas,
binutils, and gdb) and unified them into one source tree with a single top-
level configure script and a single top-level Makefile. This allows all these
packages to be configured, built, and installed together in one fell swoop.

Initially, the Cygnus tree existed inside Cygnus only and was distributed only
to their customers. At that point, most of the GNU programs were simply grafted
into the tree with no changes of much interest to people outside of Cygnus.
These programs existed and were completely public in their original GNU form,
and the Cygnus tree was just a fancy packaging option of not much interest to
people outside of Cygnus. Therefore, there was no pressure for a public Cygnus
tree.

However, as time went on, it turned out that Cygnus was the best and most
active contributor to some of the GNU programs involved. As a result, they have
taken over the maintenance of those programs and changed and enhanced them
significantly. These GNU programs were gcc, gas, binutils, and gdb. Cygnus took
over the maintenance of gas, binutils, and gdb back in early-mid 1990s, doing
development behind closed doors and making occasional public releases. They
were finally opened to the public in May 1999. gcc is bigger and has many more
people interested in its development, so Cygnus didn't take over its
development and do it behind closed doors. First they synchronised their work
on it (in their internal Cygnus tree) with the FSF maintainers. Then in late
1997 they created an open development project for it which they named EGCS. It
was run by Cygnus and competed with FSF's gcc project. Finally, in spring 1999
FSF closed their gcc project and EGCS was renamed into GCC.

All these programs have been integrated into the Cygnus tree so completely that
they no longer exist separately from it. Moreover, some of these programs are
not even single modules any more. The Cygnus tree consists of many
subdirectories called modules and some top-level glue. Initially, there was one
module for each GNU program grafted into the tree. Then, however, Cygnus added
some new modules, split some existing ones, and made some existing modules
dependent on some new ones. As a result, some of the modules that initially
(long ago) were grafted into the tree from stand-alone GNU packages can no
longer be pulled back out of the tree and used separately.

As Cygnus took over the maintenance of several GNU programs, they started
making new releases of them. However, these programs had already been converted
to work in the Cygnus tree only. How could they release them separately then?
The answer is that these releases are NOT like typical GNU packages that have
the program in the top-level directory of the distribution tarball. Instead,
these releases are actually pruned checkouts of the Cygnus tree, made to look
indistinguishable to the untrained eye from typical GNU packages. An important
property of the Cygnus tree is that it doesn't have to be complete. The top-
level configure script and Makefile check whether each directory they are about
to descend into is actually present, and if it isn't, it's silently skipped. As
a result, you can prune the Cygnus tree down to contain only the modules you
need, and then just those modules will be configured and built. Cygnus-made GNU
releases are all Cygnus tree checkouts pruned down to contain only the modules
needed by the GNU release in question.

GCC is still a single module in the Cygnus tree, but it now depends on the top
level and on libiberty, the miscellaneous support code module that virtually
every other module depends on. Current GCC releases consist of the top level,
libiberty, the actual gcc module, and several modules with target libraries for
different high-level languages that GCC now supports. Current Binutils releases
consist of a lot of modules. There is still a binutils module that loosely
corresponds to the old GNU binutils package, but GNU ld is now a separate
module. (gas has always been a separate module.) The binutils, gas, and ld
modules depend extensively on bfd and opcodes, two major host library modules
invented by Cygnus. Current GDB releases also consist of a lot of modules. gdb
itself is still a single module in the Cygnus tree, but it now depends
extensively on bfd and opcodes, same as Binutils, as well as some other Cygnus-
added modules that Binutils don't use. And of course on libiberty, which is
included in all current GCC, Binutils, and GDB releases.

In addition to the above GNU programs, the Cygnus tree contains many
interesting non-GNU modules developed by Cygnus, most of which have been opened
to the public. These include Newlib, Cygwin, a number of Tcl tools and GUI
libraries, an extensive testsuite framework, and CGEN.

	2. So where is this thing in terms of public CVS and mailing lists?

This is where we currently have a problem. When Cygnus tree-based EGCS/GCC,
Binutils, and GDB were first opened to the public, they were in the form of
pruned Cygnus tree checkouts. We ended up with three GNU projects each having
its own (stale, corresponding to a snapshot from the Cygnus tree at some point)
copy of the top-level files, libiberty, and most of the headers. With Binutils
and GDB it was even worse, as they had their own copies of bfd and opcodes,
both of which are actively maintained and rapidly changing modules.

This is OK for releases, but it's a problem for development. After all, the
whole point of release branches and release engineering is to produce stability
in a single software component, regardless of the staleness and deviation from
the mainline this almost always causes. Before Cygnus opened the development to
the public, they internally had a very sensible model: one master source tree
with one master copy of each module where all development is done, so all
developers are always on the same page, and specific bits of the tree are sent
off on release branches as releases are made. Each release will inevitably have
some oddities introduced into it by the release engineering process, and some
things may become less generic than they could be (for example, the top-level
configure and Makefile will still remind curious code readers of the other
modules in the tree, but because of the potentially incompatible changes on the
release branch, there may not be perfect interoperability with them). However,
developers always have one single master tree to work with, to fix major bugs
in, and to make major improvements to. It is perfectly synergistic and self-
consistent, at the price of less stability because it isn't release-engineered.

This means that releases have the property that some bits in them may be stale
or duplicated elsewhere, both of which are highly undesirable for developers.
The way releases should really work is by letting end users who are not
expected to do their own development and bugfixing have a stable release that
doesn't need any bugfixing. But as soon as a user does find a bug he/she wants
to fix him/herself, he/she should really put the release aside, get the master
copies of all components involved, fix any bugs there, and submit the fixes to
the respective maintainers. This is the Free Software way, and this is what
makes Free Software thrive.

However, this arrangement was hampered by EGCS/GCC, Binutils, and GDB starting
life as public GNU projects in the same branched form in which they exist in
releases. What they should have done was to create a public Cygnus tree, fully
explain to everyone what the Cygnus tree is, and have all development proceed
in it like it did inside Cygnus before. But instead each of EGCS, Binutils, and
GDB began life in its own CVS repository containing the same thing the FSF
release tarballs have in them. The renaming of EGCS back to GCC didn't change
anything.

Unfortunately, the GCC (former EGCS) maintainers seem to not have grasped yet
that this arrangement is troublesome and just plain wrong. (And they have been
living with it since the start of the EGCS project!) Fortunately, the Binutils
and GDB maintainers were much quicker to realise this. Almost immediately after
the opening of these projects it became clear that a single master copy of each
component is needed, instead of two teams working divergently on two branches
made off of a once unified Cygnus tree. Also at the same time parts of the
Cygnus tree other than GCC, Binutils, and GDB (i.e., the Cygnus-developed non-
GNU modules) were being opened to the public. Those aren't GNU projects and
virtually everyone doing significant work on them is from Cygnus. Those folks
are used to the Cygnus tree and know that doing it any other way is just plain
wrong. Therefore, it became clear that the Cygnus tree needed to be brought
back.

In February 2000, less than a year after the opening of Binutils and GDB, their
separate CVS repositories were liquidated. Instead, a new CVS repository was
created which was to be the public Cygnus tree. It is /cvs/src on
sources.redhat.com (formerly sourceware.cygnus.com), and from the start it was
designed as a real full Cygnus tree repository, rather than a repository for
just one project, which is what they did with all their public CVS repositories
before that. (You can check the CVS log on its modules file to convince
yourself.) The modules that make up Binutils and GDB were moved into it,
eliminating the separate Binutils and GDB repositories that existed before.
Immediately after that Newlib and Cygwin were imported from Cygnus' internal
tree (these are Cygnus-developed non-GNU tree modules), confirming without any
doubt that finally the public Cygnus tree was born.

After being born in February 2000, the public Cygnus tree in /cvs/src on the
sourceware machine matured quickly. It is now almost complete. Unfortunately,
there is still one omission. This omission and the need for users/developers to
compensate for it manually is the reason why I'm boring you here with history
lessons instead of just telling you where to get the public Cygnus tree and
what to do with it.

This omission is GCC. Ever since the start of EGCS in late 1997 it has lived in
its own CVS repository. Currently it is /cvs/gcc on the sourceware machine
(sources.redhat.com). Officially it's on gcc.gnu.org, but the dirty little
secret is that gcc.gnu.org is just a DNS record, it points to the very same
sourceware machine, same as sources.redhat.com. Everything else has now been
integrated into the /cvs/src repository, bringing back the Cygnus tree in all
its glory. However, there is also the /cvs/gcc repository duplicating a lot of
it. In effect, instead of one unified public Cygnus tree we, the non-Cygnus
folks who don't have access to their original internal tree, have two trees to
deal with: /cvs/gcc and /cvs/src. The former contains the gcc module and all
language library modules, the latter contains everything else. The two
duplicate all the top-level files and the libiberty module.

Most people working on this code have by now realised that it is really
designed to be in one Cygnus tree, and that's how they work on it. Our unspoken
convention is now to locally construct this tree from the two repositories,
work on it, and check the changes into the right repo(s). The parts that are in
only one of the repositories are simply taken from it and combined into one
tree. The trickier parts are the ones that are duplicated in the two repos.
These are the top-level files, the headers, and libiberty. /cvs/gcc's libiberty
is considered the master one, so that one is usually taken. However, most
commits to it are also simultaneously made to /cvs/src's copy, and the latter
is also periodically replaced with the former, so it usually works just as
well. The include directory in the Cygnus tree contains the public headers for
all modules. It currently exists in both repos. /cvs/src's copy contains the
headers for all modules and /cvs/gcc's copy contains only the headers for
libiberty. The latter follow the same rules as libiberty itself. Finally, there
are the top-level files. These must know about all the modules in tree. Most
people changing these files now keep checking each change into both repos.

The /cvs/src repo has one top-level directory in it also named src, and that
directory has the Cygnus tree in it (sans GCC). You can check it out in its
entirety with:

cvs -d :pserver:anoncvs@sources.redhat.com co src

This will take a lot of time and disk space. The /cvs/src repo also has a
modules file. Remember, the Cygnus tree has lots of modules in it, and most
people work only on those modules that interest them. The modules file in the
/cvs/src repo allows checking out partial Cygnus trees, and the comments in its
CVS log indicate that it is modeled after the modules file of Cygnus' internal
repo, meaning that this is how the modules file of a real Cygnus tree should
look like. It has CVS checkout modules defined for the most common Cygnus tree
module combinations that are normally checked out together. Since CVS checkout
module names exist in the same namespace as the top-level directories in the
repo, of which there is only one (src), there are no conflicts. (In particular,
there is no conflict between CVS checkout modules and the Cygnus tree modules,
the latter being one level below in the src directory.)

The /cvs/gcc repo has one top-level directory in it named egcs, and it has this
repo's version of the Cygnus tree in it. You can check it out in its entirety
with:

cvs -d :pserver:anoncvs@sources.redhat.com co egcs

The /cvs/gcc repo's modules file doesn't do much. It has an egcs-core CVS
checkout module defined that checks out the tree without the language front
ends and target libraries, but other than that, this repo is normally checked
out in its entirety.

As for mailing lists, currently most of the public projects in the Cygnus tree
have their own project-specific mailing lists, but there are no mailing lists
for the Cygnus tree overall, leaving the top-level files and many less popular
modules homeless.

				3. Our Current Solution

So what do we do about it? As I've just explained, we really want and need the
Cygnus tree, but there currently isn't a single public CVS repository for it.
The current solution is for people to construct full Cygnus trees on their
local machines from the two CVS repos and to keep track of these two repos in
development. Here is the procedure I use for constructing the full Cygnus tree
from the two repos:

1. Check out both repos (either in their entirety or only the parts you want).
You'll have two partial Cygnus trees in different directories.

2. Create a directory for the combined Cygnus tree.

3. Populate it with everything from the src repo except the include and
libiberty subdirectories.

4. Add gcc, libiberty, and the language target libraries from the gcc repo.

5. Create the include subdirectory and populate it by merging the include
subdirectories of the two repos. For files that are present in both, use the
gcc repo's version.

Explanation. This procedure is designed with the following two points in mind:

1. libiberty and its headers are taken from the gcc repo, which is considered
the master copy.

2. The procedure I just outlined uses the top-level files from the src repo.
The ones from the gcc repo could have been used instead. Most of the time both
will work equally well. I personally prefer to take them from the src repo
because it was specifically designed as the real full Cygnus tree repo.

				4. The Real Solution

The above procedure, with a few variations, is generally followed by most
developers working in the Cygnus tree. However, this doesn't make it any less
painful. It is just a nuisance for everyone to keep piecing the tree together
every time, then parsing back where to check in patches, and remembering to
keep the two repos in sync. There is absolutely no benefit to gain from the
current arrangement. It doesn't give GCC any more independence. GCC is now
critically dependent on the Cygnus tree (and has been so ever since the start
of EGCS), and by keeping their own copy of it the GCC maintainers are simply
closing their eyes to this. But the reality is that everyone still builds and
tests it together with the rest of the Cygnus tree (using the procedure from
the previous section), and the top-level files in the gcc repo are still kept
in sync with the ones in the src repo, manually and painfully. There is nothing
to lose by doing away with this and merging the repos, only a lot of
convenience and sanity to gain.

This is only one of the problems with the current arrangement. The other
problem is that there is no "home" for the Cygnus tree. There is no place where
people can learn what the Cygnus tree is and read all about it. There is no
mailing list to discuss it overall (as opposed to some particular module in
it). There is no clearly designed group responsible for the maintenance of the
top-level files.

In fact, it's even more than just not having a mailing list for the Cygnus tree
overall. There appears to be some sort of a code of silence around it. Many
people know what it is and do their development with it in mind, but there is
virtually no public mention of it, as if everyone is pretending that it doesn't
exist. This is actually why I had to write this article: to tell the people
what the Cygnus tree is. I have a number of projects that involve the Cygnus
tree, and when presenting them to the public, I found myself facing a strange
problem. I need to refer people to the Cygnus tree, but there is nothing to
refer them to! Their is no home (WWW page, FTP site, mailing list, or anything
at all) about the Cygnus tree that I can point people to. In fact, there is
nothing to even tell people what it is, aside from a very terse remark in Ian
Lance Taylor's "The GNU configure and build system", which while describing how
its configure scripts and Makefiles work, really fails to answer the question
of *what it is*.

As a developer highly interested in the Cygnus tree, I'm trying to do whatever
I can to help solve these problems. This article explaining what the Cygnus
tree is and what is its current situation is my first step. I will end it with
my proposal for what I believe is the right solution.

For the problem of two repos, the solution is to merge them. Given how the src
repo was intended from the start as the repo for the full Cygnus tree and how
it successfully does this now for everything except GCC, there is no need to do
anything with it. It is already exactly the way it should be. All that needs to
be done is to move the gcc and language library modules from the gcc repo to
the src one, replace libiberty and its headers in the src repo with the ones in
the gcc one, and put the gcc repo to rest.

Given how new modules have recently been added to the src repo with no fuss and
no problems, I don't think that the current inhabitants of the src repo would
object to welcoming a new member, or even that their consent would be required.
After all, they don't have to check out any modules other than the ones they
need, and the top-level configure script and Makefile already list everything
anyway. In fact, many src repo inhabitants will certainly like having the
master copy of libiberty in their repo, rather than a mirror that sometimes
gets stale. Thus the only ones who will have to be persuaded are the GCC
maintainers.

The question of merging the repos has come up more than once on the GCC mailing
lists. Some of the GCC maintainers have said that they liked the idea. However,
there seems to be some politics playing against it, apparently coming from FSF.
Thus it appears that the next battle is going to be between us, developers, and
the politicians. In order to fight and win this battle, we must actively push
our cause. This brings us to the second problem of not having a real home for
the Cygnus tree.

The solution for this problem is obvious: create one. This article is a start,
explaining publicly apparently for the first time what the Cygnus tree is and
trying to break the code of silence around it. Now we just need to make all
this better known to more people so that we can start a mailing list for the
Cygnus tree and decide what else do we need for a "home" for it. This
awareness-raising cause will probably have to be pursued on the GCC, Binutils,
and GDB mailing lists. This is because these are the parts of the Cygnus tree
where some people still live in the sandbox of separate GNU projects. Everyone
else, i.e., people working on the Cygnus-developed non-GNU modules like Newlib,
already come from the Cygnus tree background and would certainly be all for
bringing it back.

So, let's all try to do our best to enlighten the public about the Cygnus tree,
create a real home for it, and persuade the GCC maintainers to move to the src
repository!

				Appendix. References

"The GNU configure and build system" by Ian Lance Taylor is file configure.texi
in the etc subdirectory of the Cygnus tree. For those of you who prefer WWW,
the author has a WWW version on his page:

http://www.airs.com/ian/configure/

Once you know what the Cygnus tree is thanks to this article, Ian's superb
tutorial will tell you everything you need to know about its configure scripts
and Makefiles to master development in this tree.

libgloss/doc/porting.texi in the Cygnus tree gives a very good overview of how
the different pieces of the tree come together to support embedded systems and
how to port them to a new one.


More information about the Gdb mailing list