librelist archives

« back to archive

Significant progress on correctness

Significant progress on correctness

From:
Eric S. Raymond
Date:
2014-12-17 @ 15:03
It was an epic struggle, but I finally managed to pry apart keithp's
vendor-branch code and make it work better.  It's a lot simpler than
it was, too.

The vendor-branch code now handles all our previous regression-test
cases *and* the pathological "trunk older than vendor branch tip" case
that Robert de Bath found in CVS CVS.  A reduced version of that case
is now part of our normal regression tests.

Since I think vendor branches remain a potential future trouble spot,
I transcribed the section of Brian Berliner's paper that describes
their semantics (and the diagrams) into hacking.asc with commentary
on how it relates to the code, so the assumptions are all documented.

Remaining correctness issues:

1. Loz contributed some code, currently conditioned out, that makes 
branches out of commit cliques defined by tags in file subsets.  I haven't
unconditioned this for the following reasons:

a) I don't fully understand the feature.  That means, Loz, that what
it's doing needs to be documented on the manual page. And one sentence
won't be enough; the reader needs to understand *why this
transformation is a good idea*. What its use cases are.  Ideally,
how the repository malformation that makes it necessary arises.  (Is
it a malformation, or can it be an unexpected result of good CVS practice?)

b) We have no test cases for this feature.  I'm reluctant to put it
into production without at least one.

2. We have a mess around CVS file deletions that sometimes produces
manifest and content mismatches.  Part of this seems to be related to
older versions not marking deletions with an actual commit.  I'm going
to tackle this next.

3. One of our reduced test cases, QED.testrepo, triggers a fatal
internal error in the third-phase toposort.  I wouldn't care about
this a lot, but the same error has been seen in the wild on one
of the Mozilla repositories.

4. Tag assignment sometimes still coughs up hairballs (e.g. manifest and
content mismatches).  This can be seen by running cvconvert on groff CVS.
Whatever's causing this does *not* create problems at master or the
other branch tips.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Re: [cvsfastexport] Significant progress on correctness

From:
Loz
Date:
2014-12-18 @ 00:31
IT Developer
FCOS | Helpdesk | Room 23 | Building 38 | Hanslope Park | Milton Keynes 
| MK19 7BH
+44 (0) 1908 745357 | FTN: 8007 1357 | M: 07515 561591
loz@lozsoft.co.uk <mailto:loz@lozsoft.co.uk> |
On 17/12/2014 15:03, Eric S. Raymond wrote:
> It was an epic struggle, but I finally managed to pry apart keithp's
> vendor-branch code and make it work better.  It's a lot simpler than
> it was, too.
>
> The vendor-branch code now handles all our previous regression-test
> cases *and* the pathological "trunk older than vendor branch tip" case
> that Robert de Bath found in CVS CVS.  A reduced version of that case
> is now part of our normal regression tests.
>
> Since I think vendor branches remain a potential future trouble spot,
> I transcribed the section of Brian Berliner's paper that describes
> their semantics (and the diagrams) into hacking.asc with commentary
> on how it relates to the code, so the assumptions are all documented.
>
> Remaining correctness issues:
>
> 1. Loz contributed some code, currently conditioned out, that makes
> branches out of commit cliques defined by tags in file subsets.  I haven't
> unconditioned this for the following reasons:
>
> a) I don't fully understand the feature.  That means, Loz, that what
> it's doing needs to be documented on the manual page. And one sentence
> won't be enough; the reader needs to understand *why this
> transformation is a good idea*. What its use cases are.  Ideally,
> how the repository malformation that makes it necessary arises.  (Is
> it a malformation, or can it be an unexpected result of good CVS practice?)
I'll try and convince you and the other list members that it's a good 
idea. If I succeed I should have some good collateral for the manual page.

The basic problem is that CVS tags are more general than git tags.
A git tag is a pointer to a commit, and hence a complete repository state.
A CVS tag is a list of file revisions (at most one revision per file).
A common use case for CVS tags is to tag a complete repository state. 
This case naturally fits git tags, and we currently deal with.
However CVS tags can be used to tag a subset of a repository state (or 
even a random set of revisions). I'm not sure if it is good practice to 
use this feature, but taking groff as an example, 12 out of 33 tags 
don't map onto complete repository states. At an extreme, the wl tag 
only has 3 files in it.

Subset tags can't easily be modelled as git tags, because there is no 
commit that contains the right set of file states. Instead I've made the 
CVS tag a git branch. The first and only commit in the branch has the 
set of files defined by the tag.

The algorithm for both types of tag is

1. Find newest CVS revision in the tag T
2. Use the gitspace backlink to find the git commit (G) this revision 
first appeared in.
3. Check whether G has the same set of CVS revisions as T
4. If so, then we can create a git tag with the same name as T pointing at G
5 If not, we create a git branch with the same name as T with G as the 
branch parent.
6 Add a commit to the branch with the exact set of revisions from T

With this in place, if you checkout a tag into an empty repository in 
CVS you should get the same files if you checkout the equivalent 
tag/branch in git.
That said, I'm not sure the wl tag was used to be checked out in this way.

>
> b) We have no test cases for this feature.  I'm reluctant to put it
> into production without at least one.
Absolutely. we should decide whether the idea is worth keeping at all, 
whether it should be a command line option, or whether it should be the 
default. After that I can create a test case.

> 2. We have a mess around CVS file deletions that sometimes produces
> manifest and content mismatches.  Part of this seems to be related to
> older versions not marking deletions with an actual commit.  I'm going
> to tackle this next.
>
> 3. One of our reduced test cases, QED.testrepo, triggers a fatal
> internal error in the third-phase toposort.  I wouldn't care about
> this a lot, but the same error has been seen in the wild on one
> of the Mozilla repositories.
>
> 4. Tag assignment sometimes still coughs up hairballs (e.g. manifest and
> content mismatches).  This can be seen by running cvconvert on groff CVS.
> Whatever's causing this does *not* create problems at master or the
> other branch tips.
Rerun this cvsconvert with subset tag, and I think you'll understand the 
motivation.

Re: [cvsfastexport] Significant progress on correctness

From:
Loz
Date:
2014-12-18 @ 09:10
On 18/12/2014 00:31, Loz wrote:
>
> On 17/12/2014 15:03, Eric S. Raymond wrote:
>> Remaining correctness issues:
>>
>> 1. Loz contributed some code, currently conditioned out, that makes
>> branches out of commit cliques defined by tags in file subsets.  I haven't
>> unconditioned this for the following reasons:
>>
>> a) I don't fully understand the feature.  That means, Loz, that what
>> it's doing needs to be documented on the manual page. And one sentence
>> won't be enough; the reader needs to understand *why this
>> transformation is a good idea*. What its use cases are.  Ideally,
>> how the repository malformation that makes it necessary arises.  (Is
>> it a malformation, or can it be an unexpected result of good CVS practice?)
> I'll try and convince you and the other list members that it's a good 
> idea. If I succeed I should have some good collateral for the manual page.
>
> The basic problem is that CVS tags are more general than git tags.
> A git tag is a pointer to a commit, and hence a complete repository state.
> A CVS tag is a list of file revisions (at most one revision per file).
> A common use case for CVS tags is to tag a complete repository state. 
> This case naturally fits git tags, and we currently deal with.
> However CVS tags can be used to tag a subset of a repository state (or 
> even a random set of revisions). I'm not sure if it is good practice 
> to use this feature, but taking groff as an example, 12 out of 33 tags 
> don't map onto complete repository states. At an extreme, the wl tag 
> only has 3 files in it.
>
> Subset tags can't easily be modelled as git tags, because there is no 
> commit that contains the right set of file states. Instead I've made 
> the CVS tag a git branch. The first and only commit in the branch has 
> the set of files defined by the tag.
>
> The algorithm for both types of tag is
>
> 1. Find newest CVS revision in the tag T
> 2. Use the gitspace backlink to find the git commit (G) this revision 
> first appeared in.
> 3. Check whether G has the same set of CVS revisions as T
> 4. If so, then we can create a git tag with the same name as T 
> pointing at G
> 5 If not, we create a git branch with the same name as T with G as the 
> branch parent.
> 6 Add a commit to the branch with the exact set of revisions from T
>
> With this in place, if you checkout a tag into an empty repository in 
> CVS you should get the same files if you checkout the equivalent 
> tag/branch in git.
> That said, I'm not sure the wl tag was used to be checked out in this way.
Call the CVS revision in step 1 C.

I tried to prove the correctness of steps 1 and 2, to find the tag 
commit in gitspace, but hit a problem. The gist of the proof is that if 
a later commit than G contains a new file, then this file would have 
been C. However, this doesn't account for git commits that don't add 
files. So, if there are future commits that only delete files (including 
sets of commits who's net effect is to delete files), then these might 
be the correct place to point the tag.

I don't see an efficient way to fix this. I think we need to follow all 
branch heads and check whether the git revisions match the tag 
revisions. We can skip if we don't see C and stop if we see an earlier 
revision of C. I'm probably not going to have a chance to code this up 
until the weekend.

Re: [cvsfastexport] Significant progress on correctness

From:
Loz
Date:
2014-12-18 @ 15:59
On 17/12/2014 15:03, Eric S. Raymond wrote:
> 3. One of our reduced test cases, QED.testrepo, triggers a fatal
> internal error in the third-phase toposort.  I wouldn't care about
> this a lot, but the same error has been seen in the wild on one
> of the Mozilla repositories.
>
>
Start with an empty CVS repository, and create your own branch cycle 
with the following:

touch FILE1
cvs add FILE1
cvs commit -m c1
cvs tag -b BRANCH1 FILE1
cvs up -r BRANCH1 FILE1
echo 1 > FILE1
cvs commit -m c2
cvs tag -b BRANCH2 FILE1
touch FILE2
cvs add FILE2
cvs commit -m c3 FILE2
cvs tag -b BRANCH2 FILE2
cvs up -r BRANCH2 FILE2
echo 1 > FILE2
cvs commit -m c4 FILE2
cvs tag -b BRANCH1 FILE2

This is pretty much what the QED example boils down to.

I'm not sure why you'd want to do this, but it breaks some pretty deeply 
held assumptions in cvs-fast-export about how you can move from CVS 
branches to git branches.