Patchwork RFC cvsps: use commitids (when present) to detect changesets

login
register
mail settings
Submitter Frank Kingswood
Date Jan. 10, 2013, 9:40 p.m.
Message ID <b7b916c22e980c483d98.1357854013@andromeda.kingswood-consulting.co.uk>
Download mbox | patch
Permalink /patch/557/
State Accepted
Headers show

Comments

Frank Kingswood - Jan. 10, 2013, 9:40 p.m.
# HG changeset patch
# User Frank Kingswood <frank@kingswood-consulting.co.uk>
# Date 1357853777 0
# Node ID b7b916c22e980c483d989ea3470cf9442708201e
# Parent  a32255dee8599db1e1fdcca1a4d09f0ed25b6d60
cvsps: use commitids (when present) to detect changesets
Simplify core logic by no longer attempting to work around missing
class attributes. Instead always generate the attributes and ignore
the cache if the attributes are missing
Bryan O'Sullivan - Jan. 14, 2013, 7:25 p.m.
On Thu, Jan 10, 2013 at 1:40 PM, Frank Kingswood <
frank@kingswood-consulting.co.uk> wrote:

> cvsps: use commitids (when present) to detect changesets
>

This patch looks okay, but it needs to apply on top of the patch I already
accepted. I tried applying it and then rebasing, but I got 12 conflicts.
Frank Kingswood - Jan. 14, 2013, 9:11 p.m.
On 14/01/13 19:25, Bryan O'Sullivan wrote:
> On Thu, Jan 10, 2013 at 1:40 PM, Frank Kingswood
> <frank@kingswood-consulting.co.uk
> <mailto:frank@kingswood-consulting.co.uk>> wrote:
>
>     cvsps: use commitids (when present) to detect changesets
>
>
> This patch looks okay, but it needs to apply on top of the patch I
> already accepted. I tried applying it and then rebasing, but I got 12
> conflicts.

Thanks, I did not realize you committed the earlier patch.
I've reapplied the changes which are now a much smaller patch.

I have also re-run the tests and the results look sane to me.

To make regression tests pass I have duplicated test-convert-cvs.t, one 
running with cvs 1.12 and the other only on older ones. The other tests 
that differed are now run for 1.12 only.

I'll post this shortly.

Regards,

Frank
Greg Ward - Jan. 15, 2013, 7:40 p.m.
On 14 January 2013, Frank A. Kingswood said:
> To make regression tests pass I have duplicated test-convert-cvs.t,
> one running with cvs 1.12 and the other only on older ones. The
> other tests that differed are now run for 1.12 only.

That shouldn't be necessary now that Mercurial's test framework
supports fuzzy output matching. E.g. here's an example from
test-convert-bzr-directories.t that accomodates different output from
different versions of bzr:

  $ bzr add a
  add(ed|ing) a (re)

That trailing " (re)" is the required magic; it means the rest of the
line is a regex, not a literal string to match exactly.

       Greg
Bryan O'Sullivan - Jan. 15, 2013, 7:43 p.m.
On Tue, Jan 15, 2013 at 11:40 AM, Greg Ward <greg@gerg.ca> wrote:
> That shouldn't be necessary now that Mercurial's test framework
> supports fuzzy output matching.

The changes to the CVS tests make the output wildly different over
many lines, not something that can be handled with a regexp.
Frank Kingswood - Jan. 16, 2013, 9:28 a.m.
On 15/01/13 19:43, Bryan O'Sullivan wrote:
> On Tue, Jan 15, 2013 at 11:40 AM, Greg Ward<greg@gerg.ca>  wrote:
>> That shouldn't be necessary now that Mercurial's test framework
>> supports fuzzy output matching.
> The changes to the CVS tests make the output wildly different over
> many lines, not something that can be handled with a regexp.
Agreed, the output differences in one test that Bryan points out has the 
order of parents in two branches reversed, so glog is very different. 
But read on....

On 14/01/13 21:46, Bryan O'Sullivan wrote:
> This looks cleaner, and the test suite passes under CVS 1.12.
> Unfortunately, there are new test failures with 1.11:
>
> Failed test-convert-cvs.t: output changed
>
> --- /data/users/bryano/hg/hg-crew/tests/test-convert-cvs.t
> +++ /data/users/bryano/hg/hg-crew/tests/test-convert-cvs.t.err
> @@ -1,6 +1,8 @@
>
>     $ "$TESTDIR/hghave" cvs || exit 80
>     $ "$TESTDIR/hghave" cvs112&&  exit 80
> +  skipped: missing feature: cvs client/server>= 1.12
> +  [1]
>     $ cvscall()
>     >  {
>     >      cvs -f "$@"
Sorry, could you patch this up please? I have no cvs 1.11 to run against.
> Failed test-convert-cvs-branch.t: output changed
Now that is interesting, and relevant with Greg's suggestion of using 
re's on the output.
I think the commitid is likely to include the date in the hash, so I 
fear that the output of this test may not be stable between runs. If 
this is true, how could we cope with two valid outputs of the glog test?

Frank
Bryan O'Sullivan - Jan. 17, 2013, 6:24 p.m.
On Wed, Jan 16, 2013 at 1:28 AM, Frank A. Kingswood
<frank@kingswood-consulting.co.uk> wrote:
> I think the commitid is likely to include the date in the hash, so I fear
> that the output of this test may not be stable between runs. If this is
> true, how could we cope with two valid outputs of the glog test?

I'm definitely seeing that test-convert-cvs-synthetic.t is not stable.
It doesn't seem to be possible to synthesize dates with "cvs commit",
so I'm not sure what to do about this. Maybe just not run "hg glog"?

Patch

diff --git a/hgext/convert/cvsps.py b/hgext/convert/cvsps.py
--- a/hgext/convert/cvsps.py
+++ b/hgext/convert/cvsps.py
@@ -19,6 +19,7 @@ 
         .branch    - name of branch this revision is on
         .branches  - revision tuple of branches starting at this revision
         .comment   - commit message
+        .commitid  - CVS commitid or None
         .date      - the commit date as a (time, tz) tuple
         .dead      - true if file revision is dead
         .file      - Name of file
@@ -28,19 +29,22 @@ 
         .revision  - revision number as tuple
         .tags      - list of tags on the file
         .synthetic - is this a synthetic "file ... added on ..." revision?
-        .mergepoint- the branch that has been merged from
-                     (if present in rlog output)
-        .branchpoints- the branches that start at the current entry
+        .mergepoint - the branch that has been merged from (if present in
+                      rlog output) or None
+        .branchpoints - the branches that start at the current entry or empty
     '''
-    def __init__(self, **entries):
+    def __init__(self, **values):
+        self.branches=[]
+        self.branchpoints=set()
+        self.commitid=None
+        self.mergepoint=None
+        self.parent=None
         self.synthetic = False
-        self.__dict__.update(entries)
+        self.__dict__.update(values)
 
     def __repr__(self):
-        return "<%s at 0x%x: %s %s>" % (self.__class__.__name__,
-                                        id(self),
-                                        self.file,
-                                        ".".join(map(str, self.revision)))
+        items = ("%s=%r"%(k, self.__dict__[k]) for k in sorted(self.__dict__))
+        return "%s %x(%s)"%(type(self).__name__, id(self), ", ".join(items))
 
 class logerror(Exception):
     pass
@@ -113,6 +117,7 @@ 
     re_50 = re.compile('revision ([\\d.]+)(\s+locked by:\s+.+;)?$')
     re_60 = re.compile(r'date:\s+(.+);\s+author:\s+(.+);\s+state:\s+(.+?);'
                        r'(\s+lines:\s+(\+\d+)?\s+(-\d+)?;)?'
+                       r'(\s+commitid:\s+([^;]+);)?'
                        r'(.*mergepoint:\s+([^;]+);)?')
     re_70 = re.compile('branches: (.+);$')
 
@@ -171,6 +176,14 @@ 
         try:
             ui.note(_('reading cvs log cache %s\n') % cachefile)
             oldlog = pickle.load(open(cachefile))
+            for e in oldlog:
+               if not (util.safehasattr(e, 'branchpoints') and
+                       util.safehasattr(e, 'commitid') and
+                       util.safehasattr(e, 'mergepoint')):
+                  ui.status(_('ignoring old cache\n'))
+                  oldlog = []
+                  break
+
             ui.note(_('cache has %d log entries\n') % len(oldlog))
         except Exception, e:
             ui.note(_('error reading cache: %r\n') % e)
@@ -297,8 +310,8 @@ 
             match = re_50.match(line)
             assert match, _('expected revision number')
             e = logentry(rcs=scache(rcs), file=scache(filename),
-                    revision=tuple([int(x) for x in match.group(1).split('.')]),
-                    branches=[], parent=None)
+                    revision=tuple([int(x) for x in match.group(1).split('.')]))
+
             state = 6
 
         elif state == 6:
@@ -329,8 +342,11 @@ 
             else:
                 e.lines = None
 
-            if match.group(7): # cvsnt mergepoint
-                myrev = match.group(8).split('.')
+            if match.group(7): # cvs 1.12 commitid
+                e.commitid = match.group(8)
+
+            if match.group(9): # cvsnt mergepoint
+                myrev = match.group(10).split('.')
                 if len(myrev) == 2: # head
                     e.mergepoint = 'HEAD'
                 else:
@@ -339,8 +355,7 @@ 
                     assert len(branches) == 1, ('unknown branch: %s'
                                                 % e.mergepoint)
                     e.mergepoint = branches[0]
-            else:
-                e.mergepoint = None
+
             e.comment = []
             state = 7
 
@@ -469,36 +484,39 @@ 
         .author    - author name as CVS knows it
         .branch    - name of branch this changeset is on, or None
         .comment   - commit message
+        .commitid  - CVS commitid or None
         .date      - the commit date as a (time,tz) tuple
         .entries   - list of logentry objects in this changeset
         .parents   - list of one or two parent changesets
         .tags      - list of tags on this changeset
         .synthetic - from synthetic revision "file ... added on branch ..."
-        .mergepoint- the branch that has been merged from
-                     (if present in rlog output)
-        .branchpoints- the branches that start at the current entry
+        .mergepoint- the branch that has been merged from or None
+        .branchpoints- the branches that start at the current entry or empty
     '''
-    def __init__(self, **entries):
+    def __init__(self, **values):
+        self.branchpoints = set()
+        self.entries=[]
         self.synthetic = False
-        self.__dict__.update(entries)
+        self.__dict__.update(values)
 
     def __repr__(self):
-        return "<%s at 0x%x: %s>" % (self.__class__.__name__,
-                                     id(self),
-                                     getattr(self, 'id', "(no id)"))
+        items = ("%s=%r"%(k, self.__dict__[k]) for k in sorted(self.__dict__) if k!="entries")
+        return "%s %x(%s)"%(type(self).__name__, id(self), ", ".join(items))
 
 def createchangeset(ui, log, fuzz=60, mergefrom=None, mergeto=None):
     '''Convert log into changesets.'''
 
     ui.status(_('creating changesets\n'))
 
-    # Merge changesets
+    # Sort before merging changesets.
 
-    log.sort(key=lambda x: (x.comment, x.author, x.branch, x.date))
+    # The sort key must match the fuzzy expression logic below, date last
+    log.sort(key=lambda x: (x.commitid, x.branchpoints, x.comment, x.author, x.branch, x.date))
 
     changesets = []
     files = set()
     c = None
+    cdate = 0
     for i, e in enumerate(log):
 
         # Check if log entry belongs to the current changeset or not.
@@ -517,30 +535,40 @@ 
         # first changeset and bar the next and MYBRANCH and MYBRANCH2
         # should both start off of the bar changeset. No provisions are
         # made to ensure that this is, in fact, what happens.
-        if not (c and
-                  e.comment == c.comment and
-                  e.author == c.author and
-                  e.branch == c.branch and
-                  (not util.safehasattr(e, 'branchpoints') or
-                    not util.safehasattr (c, 'branchpoints') or
-                    e.branchpoints == c.branchpoints) and
-                  ((c.date[0] + c.date[1]) <=
-                   (e.date[0] + e.date[1]) <=
-                   (c.date[0] + c.date[1]) + fuzz) and
-                  e.file not in files):
+
+        if c is None:
+            same = False
+        elif e.commitid is not None and c.commitid is not None:
+            # CVS commitids
+            same = (e.commitid == c.commitid and
+                    e.file not in files)
+        else:
+            # No CVS commitids, use fuzzy commit detection
+            same = (e.branchpoints == c.branchpoints and
+                    e.comment == c.comment and
+                    e.author == c.author and
+                    e.branch == c.branch and
+                    cdate <= (e.date[0] + e.date[1]) <= (cdate + fuzz) and
+                    e.file not in files)
+
+        if not same:
             c = changeset(comment=e.comment, author=e.author,
-                          branch=e.branch, date=e.date, entries=[],
-                          mergepoint=getattr(e, 'mergepoint', None),
-                          branchpoints=getattr(e, 'branchpoints', set()))
+                          branch=e.branch, date=e.date,
+                          mergepoint=e.mergepoint,
+                          commitid=e.commitid)
             changesets.append(c)
             files = set()
+
             if len(changesets) % 100 == 0:
                 t = '%d %s' % (len(changesets), repr(e.comment)[1:-1])
                 ui.status(util.ellipsis(t, 80) + '\n')
 
         c.entries.append(e)
+        c.branchpoints.update(e.branchpoints)
+
         files.add(e.file)
         c.date = e.date       # changeset date is date of latest commit in it
+        cdate = c.date[0] + c.date[1]
 
     # Mark synthetic changesets
 
@@ -808,9 +836,8 @@ 
             ui.write(('Branch: %s\n' % (cs.branch or 'HEAD')))
             ui.write(('Tag%s: %s \n' % (['', 's'][len(cs.tags) > 1],
                                   ','.join(cs.tags) or '(none)')))
-            branchpoints = getattr(cs, 'branchpoints', None)
-            if branchpoints:
-                ui.write(('Branchpoints: %s \n' % ', '.join(branchpoints)))
+            if cs.branchpoints:
+                ui.write('Branchpoints: %s \n' % ', '.join(cs.branchpoints))
             if opts["parents"] and cs.parents:
                 if len(cs.parents) > 1:
                     ui.write(('Parents: %s\n' %