librelist archives

« back to archive

merge.c addition to hacking.asc

merge.c addition to hacking.asc

From:
Tom Enterline
Date:
2014-12-21 @ 01:25
Based on my understanding, here is a suggested addition to hacking.asc, for
discussion.
Point out anything you think is wrong.

--------------------------------------
merge_to_changesets is conceptually simple - it just finds the CVS
commits that "match", and creates a git changeset for each set (clique)
of matching CVS commits.
First it finds all the unique branch heads in the CVS masters,
creates corresponding git branch heads, and sorts the git branch
heads in tree order, trunk first.
Then for each git branch head, it finds all the CVS masters
that have commits for that git branch, and calls merge_branches
to create the git changesets. Tags are then assigned to the changesets.

--------------------------------------

The job of merge_branches seems simple - find cliques of matching
CVS commits for one branch, and create corresponding git changesets.
Reasons the code is hard to understand:
1. The criteria to "match", as mentioned above, are complex.
2. You would expect the created git changesets to only include the
matching CVS commits. In fact, non-matching CVS commits are included,
and it is up to export_commit to drop the non-matching commits.
3. Related to 2., masters are only dropped out of the git changesets
being created, after all of the CVS changes in that master have been
added to git changesets.
4. Even deciding if all of a master's commits have been added to
git changesets is complex.

The technique used by merge_branches is to put the masters (revisions)
in order by commit date, and step along that list to find the clique,
i.e. find commits that are "close enough" (within the cvs-fast-export
window).

The revisions array does not contain a static list of revisions,
each revisions array element points to a master's latest (newest) commit.
As the CVS commits are used to create git commits, the revisions array
is updated to point to an earlier (older) commit of the same master.

Another way of understanding the process is as a set of "flows".
Each revision array element is a window into the set of updates (flow)
for the corresponding CVS master. Or using more traditional CS
terminology, each revision array element is a pointer to an element
of the CVS revisions linked list.

Re: [cvsfastexport] merge.c addition to hacking.asc

From:
Eric S. Raymond
Date:
2014-12-21 @ 03:56
Tom Enterline <tenterline@gmail.com>:
> Based on my understanding, here is a suggested addition to hacking.asc, for
> discussion.
> Point out anything you think is wrong.

Most of this seems correct.  I am, however, puzzled by these points:

> 2. You would expect the created git changesets to only include the
> matching CVS commits. In fact, non-matching CVS commits are included,
> and it is up to export_commit to drop the non-matching commits.

Huh?  To the best of my knowledge, neither of the claims in the 
second sentence are true.  Can you explain why you think they are?

> 3. Related to 2., masters are only dropped out of the git changesets
> being created, after all of the CVS changes in that master have been
> added to git changesets.

I can't make much sense of this.

> 4. Even deciding if all of a master's commits have been added to
> git changesets is complex.

Nor of this.

The 'graph about "flows" is good, though.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: [cvsfastexport] merge.c addition to hacking.asc

From:
Laurence Hygate
Date:
2014-12-21 @ 14:56
On 21/12/2014 03:56, Eric S. Raymond wrote:
> Tom Enterline <tenterline@gmail.com>:
>> 2. You would expect the created git changesets to only include the
>> matching CVS commits. In fact, non-matching CVS commits are included,
>> and it is up to export_commit to drop the non-matching commits.
> Huh?  To the best of my knowledge, neither of the claims in the
> second sentence are true.  Can you explain why you think they are?
It doesn't help that merge_to_changesets doesn't create change sets, it 
creates repository states.

Re: [cvsfastexport] merge.c addition to hacking.asc

From:
Tom Enterline
Date:
2014-12-23 @ 04:12
For the branchy repo, here are the files and revisions included in the 12
git commits:
1. .cvsignore, v1.2, README, v1.5, superfluous, v1.1
2. .cvsignore, v1.2. README, v1.4. superfluous, v1.1
3. .cvsignore, v1.2, README, v1.4
4. .cvsignore, v1.2, README, v1.3
5. .cvsignore, v1.2, README, v1.3, doomed, v1.2
6. .cvsignore, v1.2, README, v1.2, doomed, v1.2
7. .cvsignore, v1.1, README, v1.2, doomed, v1.2
8. README, v1.2, doomed, v1.2
9. README, v1.2, doomed, v1.1
10. README, v1.2
11. README, v1.1
12. .cvsignore, v1.2, README, v1.4.2.1, superfluous, v1.1

Perhaps we have different interpretations of "matching", I wouldn't
expect to see the same version of the same file in 5 different commits.

I did trace it through export_commit, and each of the 12 git commits
results in one M or D being output.

How are we disagreeing?

I do think it would help to have clearer definitions, Laurence's
comment about repository states vs. changesets is good.

On Sun, Dec 21, 2014 at 9:56 AM, Laurence Hygate <loz@flower.powernet.co.uk>
wrote:

>
> On 21/12/2014 03:56, Eric S. Raymond wrote:
> > Tom Enterline <tenterline@gmail.com>:
> >> 2. You would expect the created git changesets to only include the
> >> matching CVS commits. In fact, non-matching CVS commits are included,
> >> and it is up to export_commit to drop the non-matching commits.
> > Huh?  To the best of my knowledge, neither of the claims in the
> > second sentence are true.  Can you explain why you think they are?
> It doesn't help that merge_to_changesets doesn't create change sets, it
> creates repository states.
>
>
>

Re: [cvsfastexport] merge.c addition to hacking.asc

From:
Eric S. Raymond
Date:
2014-12-24 @ 03:48
Tom Enterline <tenterline@gmail.com>:
> For the branchy repo, here are the files and revisions included in the 12
> git commits:
> 1. .cvsignore, v1.2, README, v1.5, superfluous, v1.1
> 2. .cvsignore, v1.2. README, v1.4. superfluous, v1.1
> 3. .cvsignore, v1.2, README, v1.4
> 4. .cvsignore, v1.2, README, v1.3
> 5. .cvsignore, v1.2, README, v1.3, doomed, v1.2
> 6. .cvsignore, v1.2, README, v1.2, doomed, v1.2
> 7. .cvsignore, v1.1, README, v1.2, doomed, v1.2
> 8. README, v1.2, doomed, v1.2
> 9. README, v1.2, doomed, v1.1
> 10. README, v1.2
> 11. README, v1.1
> 12. .cvsignore, v1.2, README, v1.4.2.1, superfluous, v1.1
> 
> Perhaps we have different interpretations of "matching", I wouldn't
> expect to see the same version of the same file in 5 different commits.
> 
> I did trace it through export_commit, and each of the 12 git commits
> results in one M or D being output.
> 
> How are we disagreeing?

Terminologically.  Because I work so much with fast-import streams, I
tend to think of a commit as a *delta* - that is, a set of M and D
ops, or a patch in CVS terms. I think Laurence has the same tendency.

You, on the other hand, have clearly been thinking of a commit as what
Laurence calls a "repository state", which is the integral of the set
of deltas up to the point of (your) "commit".

Now that we have that cleared up, we can communicate. :-)
 
> I do think it would help to have clearer definitions, Laurence's
> comment about repository states vs. changesets is good.

Yes, it is.

OK. Since I have to maintain the documentation, I'm going to issue a ukase.

I was going to say we should avoid the ambiguous term "commit" in
favor of "changeset", but looking at the code I see that is
impractical.  Keith's code. comments, and data structure are full of
commit == delta assumptions.

So, um, assume that others will hear "delta" when you say "commit".  If
you want to emphasize that it's a delta, use "changeset".  The set of
files at some CVS tag or gitspace revision is a "repository state"; a CVS
master contains deltas which integrate to "file states".

I will do a pass over the docs to enforce consistency and try to reduce
ambiguous uses of "commit".

> > On 21/12/2014 03:56, Eric S. Raymond wrote:
> > > Tom Enterline <tenterline@gmail.com>:
> > >> 2. You would expect the created git changesets to only include the
> > >> matching CVS commits. In fact, non-matching CVS commits are included,
> > >> and it is up to export_commit to drop the non-matching commits.
> > > Huh?  To the best of my knowledge, neither of the claims in the
> > > second sentence are true.  Can you explain why you think they are?
> > It doesn't help that merge_to_changesets doesn't create change sets, it
> > creates repository states.

OK, now that we have the terminology cleared up: export_commit doesn't
drop anything unless you give it a date restriction for incremental
dumping.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>