This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: onwards to git
- From: Jim Meyering <jim at meyering dot net>
- To: Thomas Schwinge <tschwinge at gnu dot org>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>
- Date: Tue, 09 Jun 2009 13:46:41 +0200
- Subject: Re: onwards to git
- References: <874ovzw929.fsf@meyering.net> <m2hbzzf9v6.fsf@igel.home><87ws8utv6c.fsf@meyering.net> <m2eiv2jl7y.fsf@igel.home><87y6t9qht9.fsf@meyering.net> <871vqzpvkh.fsf@meyering.net><871vqvjrfn.fsf@meyering.net><20090512105524.GA23671@fencepost.gnu.org><87ab5ih0ga.fsf@meyering.net><20090522102709.GE23671@fencepost.gnu.org><20090609111722.GH4695@fencepost.gnu.org>
Thomas Schwinge wrote:
> Hello Jim!
>
> On Fri, May 22, 2009 at 12:27:09PM +0200, I wrote:
>> On Tue, May 12, 2009 at 04:15:17PM +0200, Jim Meyering wrote:
>> > Thomas Schwinge wrote:
>> > > On Mon, May 11, 2009 at 10:49:32PM +0200, Jim Meyering wrote:
>> > >> I've converted the trunk and all branches, filtering
>> > >> to aggregate commits, and cleaning up by removing empty commits
>> > >> and applying heuristics to use reasonable commit messages
>> > >> derived from ChangeLog entries.
>> > >
>> > > This indeed looks very nice in the vast majority of cases! I'm sure
>> > > there are a number of people who are interested in seeing the scripts and
>> > > techniques you used. I am, for sure. :-)
>> >
>> > Thanks for the feedback!
>> >
>> > I'll post the scripts, of course ;-)
>> > If I don't do it this week it's because I forgot or didn't
>> > find the time, so a ping would be welcome.
>>
>> Ping. :-)
>
> I don't want to trouble you too much, but I would be thankful already if
> you could simply hand me over the script you used for the post-conversion
> cleanup and commit accumulation.
Hi Thomas,
Thanks for the prod.
Here are three scripts:
[definitely not production quality.
I wanted to clean them up before publishing, but if I wait
to find time for that, it may never happen, so... ]
git-log-munge: a helper script invoked by glibc-reconstruct-commits
glibc-reconstruct-commits: based on a script by Paolo Bonzini. I used this
to aggregate commits on the "master" branch of a
just-cvs-to-git-converted glibc.git repository.
However, doing that unhooked all branches and tags from master...
tag-restore: reconnect those branches and tags
(caveat, note: contains hard-coded paths)
#!/usr/bin/perl -T
# massage a log as pre-filtered by glibc-reconstruct-commits.
use strict;
use warnings;
sub find_bz ($)
{
my ($lines) = @_;
my @bzs;
foreach my $line (grep (/\bBZ \#\d/, @$lines))
{
$line =~ s/BZ #2423, #2749/BZ #2423, BZ #2749/; # sole fix-up
push @bzs, $line =~ /BZ #(\d+)/mg;
}
return @bzs;
}
{
my @line = <>;
while (@line && $line[$#line] eq "\n") { pop @line; };
while (1 < @line && $line[0] eq ".\n") { shift @line; };
# If the first line starts with TABs, remove them.
@line
and $line[0] =~ s/^\t+//;
# If the first line contains any other TABs, split on them,
# on the assumption that it is a ChangeLog entry that has been
# concatenated by git.
if (@line && $line[0] =~ /\t+/)
{
my $l = $line[0];
chomp $l;
my @spl = split ("\t", $l);
splice @line, 0, 1, (map {"$_\n"} @spl);
}
# If there's a BZ number on the first line, use that as the subject.
if (@line && $line[0] =~ /BZ #\d+/)
{
# BZ is already on the first line; do nothing more.
}
else
{
# If there are more than 1000 lines, presume it's due to a
# ChangeLog->ChangeLog.N rotation and keep only the first.
1000 < @line
and @line = ($line[0]);
my @bz = find_bz \@line;
if (@bz)
{
# Filter out duplicates and numerical-sort.
my %unique = map { $_ => 1 } @bz;
@bz = sort { $a <=> $b } keys %unique;
# We're about to prepend subject+blank-line,
# so if the preexisting 2nd line is blank, remove it.
2 <= @line && $line[1] eq "\n"
and splice @line, 1, 1;
@bz = map { "BZ #$_" } @bz;
my $subject = "[" . join (', ', @bz) . "]\n";
unshift @line, $subject, "\n";
}
}
# If there are 3 or more lines, the first looks like date+name+email
# of a ChangeLog entry, the 2nd is blank, and third starts with a TAB,
# then use the third (minus its leading TAB).
if (3 <= @line && $line[1] eq "\n"
&& $line[0] =~ /^2\d\d\d-\d\d-\d\d \S.*? <.*>$/
&& $line[2] =~ /^\t[^\t]/)
{
shift @line;
shift @line;
$line[0] =~ s/^\t//;
}
# if there are two or more lines, ensure the 2nd is blank
2 <= @line && $line[1] ne "\n"
and splice @line, 1, 0, ("\n");
print @line;
}
# FIXME:
my $junk = <<'EOF';
This commit log message is messed up:
Note how the subject was precisely the body of the ChangeLog entry.
* elf/dl-open.c (_dl_open): Bump GL(dl_nns) to 1 if no libraries
are dlopened in statically linked program even for __LM_ID_CALLER.
2009-04-16 Jakub Jelinek <jakub@redhat.com>
* elf/dl-open.c (_dl_open): Bump GL(dl_nns) to 1 if no libraries
are dlopened in statically linked program even for __LM_ID_CALLER.
EOF
# Local Variables:
# indent-tabs-mode: nil
# End:
#!/bin/bash
# Based on the script from Paolo Bonzini:
# http://sourceware.org/ml/libc-alpha/2009-05/msg00005.html
debug=0
g_prev_commit=
warn () { echo "$*" >&2; }
debug () { test "$debug" = 1 && echo "$*" >&2 || :; }
map()
{
# if it was not rewritten, take the original
if test -r "map/$1"
then
cat "map/$1"
else
echo "$1"
fi
}
m2()
{
cat m2/"$1" 2>/dev/null ||
cat map/"$1" 2>/dev/null ||
echo "$1"
}
# override die(): this version puts in an extra line break, so that
# the progress is still visible
die()
{
echo >&2
echo "$*" >&2
exit 1
}
# When piped a commit, output a script to set the ident of either
# "author" or "committer
set_ident () {
lid="$(echo "$1" | tr "[A-Z]" "[a-z]")"
uid="$(echo "$1" | tr "[a-z]" "[A-Z]")"
pick_id_script='
/^'$lid' /{
s/'\''/'\''\\'\'\''/g
h
s/^'$lid' \([^<]*\) <[^>]*> .*$/\1/
s/'\''/'\''\'\'\''/g
s/.*/GIT_'$uid'_NAME='\''&'\''; export GIT_'$uid'_NAME/p
g
s/^'$lid' [^<]* <\([^>]*\)> .*$/\1/
s/'\''/'\''\'\'\''/g
s/.*/GIT_'$uid'_EMAIL='\''&'\''; export GIT_'$uid'_EMAIL/p
g
s/^'$lid' [^<]* <[^>]*> \(.*\)$/\1/
s/'\''/'\''\'\'\''/g
s/.*/GIT_'$uid'_DATE='\''&'\''; export GIT_'$uid'_DATE/p
q
}
'
LANG=C LC_ALL=C sed -ne "$pick_id_script"
# Ensure non-empty id name.
echo "case \"\$GIT_${uid}_NAME\" in \"\") GIT_${uid}_NAME=\"\${GIT_${uid}_EMAIL%%@*}\" && export GIT_${uid}_NAME;; esac"
}
final_rcs_log_msg='Previously uncontrolled files put into CVS.'
: ${skip_up_to=$(git log -1 --pretty=format:%H ":/$final_rcs_log_msg")}
: ${suspended_tree=}
: ${suspended_commit=}
t()
{
# printf %3.3s "$1"
test -n "$1" &&
git log -1 --pretty='format:[%h:%s]' "$1"
# git log -1 --pretty=format:%h[%s] "$1"
}
do_commit ()
{
test -z "$2" \
&& git commit-tree "$1^{tree}" > map/$1 \
|| git commit-tree "$1^{tree}" -p $2 > map/$1
#debug " do_ci > $(t $1):$(t $(cat map/$1)) p:$(t $2)"
}
skip_commit()
{
echo "$2" > map/$1
#debug "skip_ci > $(t $1):$(t $2)"
}
new_msg()
{
local c=$1
local orig_log=$(git log -1 --pretty=$'format:%s\n%b' $c)
# If any of the three variables is empty, use orig commit.
local t=":$old_changelog:$new_changelog:$c:"
case "$t" in
*::*) printf %s "$orig_log"; return;;
esac
# Find the first pair of differing SHA1s
local i=1
for old_sha1 in $(echo $old_changelog); do
new_sha1=$(echo "$new_changelog"|sed -n ${i}p)
test $old_sha1 != $new_sha1 && break
i=$(expr $i + 1)
done
{ printf '%s\n' "$orig_log"
git diff $old_sha1 $new_sha1 \
| sed -n \
-e '1,/^+/ s/^ //p' \
-e '1,/^@@/d' \
-e 's/^+//p'; \
} \
| git-log-munge
}
get_changelog_hashes ()
{
local c=$1
local p=$2
test -z "$p" && return
new_changelog=$(git ls-tree $c^{tree} \
ChangeLog \
posix/glob/ChangeLog \
nptl_db/ChangeLog \
nptl/ChangeLog \
localedata/ChangeLog \
libidn/ChangeLog \
linuxthreads/ChangeLog \
| awk '{print $3}')
old_changelog=$(git ls-tree $p \
ChangeLog \
posix/glob/ChangeLog \
nptl_db/ChangeLog \
nptl/ChangeLog \
localedata/ChangeLog \
libidn/ChangeLog \
linuxthreads/ChangeLog \
| awk '{print $3}')
}
# Ugly, since it uses and updates a global.
# Map deferred commits to the one they're aggregated to in the new tree.
# When aggregating, $g_prev_commit is the most recent commit (in the orig tree)
# that we've added to the new tree. Map the commits after $g_prev_commit and
# before $COMMIT to the image of $COMMIT in the new tree.
# $g_prev_commit is empty initially, and in that case, we map all commits
# before $COMMIT.
map_suspended_commits ()
{
local commit="$1"
local c
for c in $(git rev-list $g_prev_commit..$commit^); do
#debug map-susp: $(t $c) $(t $(map $commit))
skip_commit $c $(map $commit)
done
g_prev_commit=$commit
}
filter_commit ()
{
local commit="$1"
local parent="$2"
if test "$parent" = "$skip_up_to"; then
echo 'initial import' | do_commit $commit
map_suspended_commits $commit
else
if [ "$GIT_AUTHOR_NAME" != "$PREV_AUTHOR_NAME" -a -n "$suspended_commit" ];
then
#debug committing suspended "$(t $suspended_commit)" "w/parent $(t $parent)"
get_changelog_hashes $suspended_commit $suspended_parent
new_msg $suspended_commit \
| \
GIT_COMMITTER_NAME="$PREV_COMMITTER_NAME" \
GIT_COMMITTER_EMAIL="$PREV_COMMITTER_EMAIL" \
GIT_COMMITTER_DATE="$PREV_COMMITTER_DATE" \
GIT_AUTHOR_NAME="$PREV_AUTHOR_NAME" \
GIT_AUTHOR_EMAIL="$PREV_AUTHOR_EMAIL" \
GIT_AUTHOR_DATE="$PREV_AUTHOR_DATE" \
do_commit "$suspended_commit" "$parent"
parent=$(map "$suspended_commit")
map_suspended_commits $suspended_commit
fi
get_changelog_hashes $commit $parent
if test "$old_changelog" = "$new_changelog" -a $commit != $head \
-a "$GIT_AUTHOR_NAME" = "$PREV_AUTHOR_NAME"; then
#debug deferring "$(t $commit)" "($GIT_AUTHOR_NAME)"
suspended_commit="$commit"
suspended_parent="$parent"
skip_commit "$commit" "$parent"
else
#debug commit "$(t $commit)" "($GIT_AUTHOR_NAME)"
suspended_commit=
new_msg $commit \
| do_commit $commit "$parent"
g_prev_commit=$commit
fi
fi
PREV_COMMITTER_NAME="$GIT_COMMITTER_NAME"
PREV_COMMITTER_EMAIL="$GIT_COMMITTER_EMAIL"
PREV_COMMITTER_DATE="$GIT_COMMITTER_DATE"
PREV_AUTHOR_NAME="$GIT_AUTHOR_NAME"
PREV_AUTHOR_EMAIL="$GIT_AUTHOR_EMAIL"
PREV_AUTHOR_DATE="$GIT_AUTHOR_DATE"
}
USAGE="[--original <namespace>] [-d <directory>] [-f | --force] \
[<rev-list options>...]"
OPTIONS_SPEC=
. "$(git --exec-path)/git-sh-setup"
git diff-files --quiet &&
git diff-index --cached --quiet HEAD -- ||
die "Cannot rewrite branch(es) with a dirty working directory."
tempdir=.git-rewrite
orig_namespace=refs/original/
force=
while :
do
case "$1" in
--)
shift
break
;;
--force|-f)
shift
force=t
continue
;;
-*)
;;
*)
break;
esac
# all switches take one argument
ARG="$1"
case "$#" in 1) usage ;; esac
shift
OPTARG="$1"
shift
case "$ARG" in
-d)
tempdir="$OPTARG"
;;
--original)
orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
;;
*)
usage
;;
esac
done
case "$force" in
t)
rm -rf "$tempdir"
;;
'')
test -d "$tempdir" &&
die "$tempdir already exists, please remove it"
esac
mkdir -p "$tempdir/t" || die ""
rmdir "$tempdir/t" || die ""
cd "$tempdir"
tempdir=$(pwd)
# Remove tempdir on exit
trap 'cd ..; rm -rf "$tempdir"' 0
# Make sure refs/original is empty
git for-each-ref > "$tempdir"/backup-refs
while read sha1 type name
do
case "$force,$name" in
,$orig_namespace*)
die "Namespace $orig_namespace not empty"
;;
t,$orig_namespace*)
git update-ref -d "$name" $sha1
;;
esac
done < "$tempdir"/backup-refs
ORIG_GIT_DIR="$GIT_DIR"
ORIG_GIT_WORK_TREE="$GIT_WORK_TREE"
ORIG_GIT_INDEX_FILE="$GIT_INDEX_FILE"
GIT_WORK_TREE=.
export GIT_DIR GIT_WORK_TREE
# The refs should be updated if their heads were rewritten
if test "$#" != 0; then die "usage: $0"; fi
# Update only the master branch, and all tags.
set master $(git tag -l)
git rev-parse --no-flags --revs-only --symbolic-full-name "$@" |
sed -e '/^^/d' >"$tempdir"/heads
test -s "$tempdir"/heads ||
die "Which ref do you want to rewrite?"
ret=0
# map old->new commit ids for rewriting parents
mkdir map || die "Could not create map/ directory"
mkdir m2 || die "Could not create m2/ directory"
git rev-list --reverse --topo-order --parents "$@" ^$skip_up_to > revs ||
die "Could not get the commits"
commits=$(wc -l <revs | tr -d " ")
test $commits -eq 0 && die "Found nothing to rewrite"
# Rewrite the commits
head=$(git log -1 --pretty=format:%H)
i=0
elided=0
non_elided_commit=
elided_commits=
parent_tree=
while read commit parent blah; do
test -n "$blah" && die unexpected merge
i=$(($i+1))
printf "Rewrite ($i/$commits) $(t $commit)\n"
commit_tree=$(git rev-parse "$commit^{tree}")
# Elide each empty commit.
if test "$commit_tree" = "$parent_tree"; then
elided_commits="$elided_commits $commit"
debug "eliding empty $(t $commit) -> p:$(t $parent) $(t $(map $parent))"
elided=1
continue
fi
test $elided = 1 \
&& parent=$non_elided_commit \
|| non_elided_commit=$commit
elided=0
git cat-file commit "$commit" >commit ||
die "Cannot read commit $commit"
eval "$(set_ident AUTHOR <commit)" ||
die "setting author failed for commit $commit"
eval "$(set_ident COMMITTER <commit)" ||
die "setting committer failed for commit $commit"
mapped_parent=$(map "$parent")
#debug FC: "$(t $commit) $(t $parent) [$(t $mapped_parent)]"
filter_commit $commit $mapped_parent
mapped_parent=$(map "$parent")
for e in $elided_commits; do
#debug "eliding empty $(t $e) -> p:$(t $parent) $(t $mapped_parent)"
echo $(map $parent) > m2/$e
done
elided_commits=
parent_tree=$commit_tree
done <revs
# Finally update the refs
_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"
echo
while read ref
do
# avoid rewriting a ref twice
test -f "$orig_namespace$ref" && continue
sha1=$(git rev-parse "$ref"^0)
rewritten=$(m2 $sha1)
test $sha1 = "$rewritten" &&
warn "WARNING: Ref '$ref' is unchanged" &&
continue
case "$rewritten" in
'')
echo "Ref '$ref' was deleted"
git update-ref -m "filter-branch: delete" -d "$ref" $sha1 ||
{ warn "Could not delete $ref"; ret=1; }
;;
$_x40)
echo "Ref '$ref' was rewritten $(t $rewritten)"
git update-ref -m "filter-branch: rewrite" \
"$ref" $rewritten ||
{ warn "Could not rewrite $ref"; ret=1; }
;;
*)
# NEEDSWORK: possibly add -Werror, making this an error
warn "WARNING: '$ref' was rewritten into multiple commits:"
warn "$rewritten"
warn "WARNING: Ref '$ref' points to the first one now."
rewritten=$(echo "$rewritten" | head -n 1)
git update-ref -m "filter-branch: rewrite to first" \
"$ref" $rewritten $sha1 ||
{ warn "Could not rewrite $ref"; ret=1; }
;;
esac
git update-ref -m "filter-branch: backup" "$orig_namespace$ref" $sha1
done < "$tempdir"/heads
# Save copies of important pieces, in case we want to redo graft.
cp -a ../.git ../.git-pre-graft-backup
b=$(basename $tempdir)
cp -a $tempdir ../$(basename $tempdir)-backup
set -e
set -x
git reset --hard master
branch_heads=$(git br|sed s/..//|grep -v '^master$')
# For every non-master branch, $b, do the following:
# But first record a merge base for each branch, since with
# two or more branches, original/refs/heads/master disappears after
# the first due to our use of git filter-branch's -f option.
mkdir mp
for b in $(echo "$branch_heads"); do
git merge-base original/refs/heads/master "$b" > "mp/$b"
debug "merge-base: $b: $(cat "mp/$b")"
done
# For when a branch has no commits or when it is identical to another.
# In that case, we can't use .git/info/grafts (filter-branch would fail).
# Instead, simply update the ref.
rewrite_ref()
{
local branch_name=$1
local commit=$2
b_full=$(git rev-parse --symbolic-full-name "$branch_name")
git update-ref -m "graft-empty-branch: rewrite" "$b_full" $commit ||
{ warn "Could not rewrite $b_full to $commit"; ret=1; }
}
graft_file=../.git/info/grafts
touch $graft_file
for b in $(echo "$branch_heads"); do
debug "grafting branch: $b"
branch_point=$(cat "mp/$b")
# This is the commit on "master" that will be the parent.
mapped_branch_point=$(map $branch_point)
test -z "$mapped_branch_point" && die "no branch point for $b"
# Get first commit on the branch.
first_commit_on_branch=$(git rev-list $branch_point.."$b"|tail -1)
test -z "$first_commit_on_branch" &&
{ debug "skipping $b; it has no commit of its own"
rewrite_ref $b $mapped_branch_point; continue; }
# If this is a duplicate graft <commit,parent> pair, skip it.
grep -B1 "^$first_commit_on_branch $mapped_branch_point$" $graft_file &&
{ debug "skipping $b: it is identical to a preceding one"
rewrite_ref $b $mapped_branch_point; continue; }
# Set graft point.
printf "# %s\n%s %s\n" "$b" \
$first_commit_on_branch $mapped_branch_point >> $graft_file
# Filter the branch to make the graft permanent.
git filter-branch -f $mapped_branch_point..$b
done
cd ..
rm -rf "$tempdir"
trap - 0
unset GIT_DIR GIT_WORK_TREE GIT_INDEX_FILE
test -z "$ORIG_GIT_DIR" || {
GIT_DIR="$ORIG_GIT_DIR" && export GIT_DIR
}
test -z "$ORIG_GIT_WORK_TREE" || {
GIT_WORK_TREE="$ORIG_GIT_WORK_TREE" &&
export GIT_WORK_TREE
}
test -z "$ORIG_GIT_INDEX_FILE" || {
GIT_INDEX_FILE="$ORIG_GIT_INDEX_FILE" &&
export GIT_INDEX_FILE
}
git read-tree -u -m HEAD
exit $ret
#!/bin/bash
test $# = 0 || exit 1
orig=/var/tmp/glibc-pristine/.git
public=$HOME/w/co/glibc/.git
trap 'st=$?; rm -rf $mapdir && exit $st' 0
trap 'exit $?' 1 2 13 15
mapdir=$(mktemp -d) || exit 1
# Build a table mapping each tagged SHA1 to its list of tag names:
# Note the leading "*" to get the referent of each tag object.
printf 'building SHA1-to-tag-name map using orig repo...\n'
git --git-dir=$orig for-each-ref --shell \
--format='r=%(refname) tag=${r#refs/tags/} o=%(*objectname)' refs/tags |\
while read entry; do
eval "$entry"
echo "cvs/$tag" >> $mapdir/"$o"
done
branch_heads=$(git --git-dir=$orig branch|sed s/..//|grep -v master)
for branch in $(echo "$branch_heads"); do
# Propagate tags on $BRANCH in a pristine, just-converted-from-CVS git
# repository to the cset-aggregated and grafted public glibc.git.
# Use an array to map indices 0..N to the corresponding commit-on-orig-branch:
i=0
for c in $(git --git-dir=$orig rev-list master..$branch); do
old_c[$i]=$c
i=$[i+1]
done
# Apply those tags to the $public tree
export GIT_DIR=$public
i=0
n_tagged=0
n=$(git rev-list master..origin/cvs/$branch|wc -l)
printf " propagating tags to the $n-commit branch, $branch...\n"
for c in $(git rev-list master..origin/cvs/$branch);do
f=$mapdir/${old_c[$i]}
tag_list=$(test -r $f && cat $f) &&
for t in $(echo "$tag_list"); do
git tag -f "$t" $c
n_tagged=$[n_tagged+1]
done
i=$[i+1]
printf ' %03d/%03d (#t=%d)\r' $i $n $n_tagged
done
printf "\n$branch: applied/moved $n_tagged tags\n"
done