Patchwork D2068: revlog: do not use delta for lfs revisions

login
register
mail settings
Submitter phabricator
Date Feb. 7, 2018, 1:14 a.m.
Message ID <differential-rev-PHID-DREV-hoaab4o2avr2yuahzdqz-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/27391/
State Superseded
Headers show

Comments

phabricator - Feb. 7, 2018, 1:14 a.m.
quark created this revision.
Herald added a reviewer: indygreg.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This is similar to what we have done for changegroups. It is needed to make
  sure the delta application code path can assume deltas are always against
  vanilla (ex. non-LFS) rawtext so the next fix becomes possible.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

AFFECTED FILES
  mercurial/revlog.py
  tests/test-lfs-bundle.t
  tests/test-revlog-raw.py

CHANGE DETAILS




To: quark, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Feb. 7, 2018, 10:35 p.m.
indygreg requested changes to this revision.
indygreg added inline comments.
This revision now requires changes to proceed.

INLINE COMMENTS

> revlog.py:408
> +                # do not use flags != 0 (ex. LFS) revision as delta base
> +                if revlog.flags(candidaterev) != REVIDX_DEFAULT_FLAGS:
> +                    continue

Same comment as previous review: this leaves a foot gun if we ever introduce revision flags that aren't related to content presence. We need something to help prevent this footgun.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 3, 2018, 7:48 p.m.
ryanmce added inline comments.

INLINE COMMENTS

> revlog.py:2110
> +        # no delta for flag processor revision (see "candelta" for why)
> +        if flags & REVIDX_KNOWN_FLAGS:
> +            deltainfo = None

Why is this not `candelta()`?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers, ryanmce
Cc: mercurial-devel
phabricator - March 3, 2018, 7:52 p.m.
quark added inline comments.

INLINE COMMENTS

> ryanmce wrote in revlog.py:2110
> Why is this not `candelta()`?

`candelta` takes two revisions. Here there is only one revision. It's possible to pass a useless revision but that has unnecessary overhead.

`candelta` also fetches `flags`, here we already know the values `flags` so it's faster to avoid fetching it again.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers, ryanmce
Cc: mercurial-devel
phabricator - March 3, 2018, 8:21 p.m.
ryanmce added inline comments.

INLINE COMMENTS

> quark wrote in revlog.py:2110
> `candelta` takes two revisions. Here there is only one revision. It's possible to pass a useless revision but that has unnecessary overhead.
> 
> `candelta` also fetches `flags`, here we already know the values `flags` so it's faster to avoid fetching it again.

Then why is this `REVIDX_KNOWN_FLAGS` and not `REVIDX_RAWTEXT_CHANGING_FLAGS`?

Given the number of questions here from me and others, I think the comment could use extension at least so future readers understand why this is the way it is.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers, ryanmce
Cc: mercurial-devel
phabricator - March 3, 2018, 11:43 p.m.
quark added inline comments.

INLINE COMMENTS

> ryanmce wrote in revlog.py:2110
> Then why is this `REVIDX_KNOWN_FLAGS` and not `REVIDX_RAWTEXT_CHANGING_FLAGS`?
> 
> Given the number of questions here from me and others, I think the comment could use extension at least so future readers understand why this is the way it is.

Should be `REVIDX_RAWTEXT_CHANGING_FLAGS`. I'll make the change.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers, ryanmce
Cc: mercurial-devel
phabricator - March 6, 2018, 6:03 p.m.
indygreg accepted this revision.
indygreg added inline comments.
This revision is now accepted and ready to land.

INLINE COMMENTS

> test-revlog-raw.py:156-161
> +        if r == 0 or rlog.flags(r):
>              text = rlog.revision(r, raw=True)
>              cachedelta = None
>          else:
> -            # deltaparent is more interesting if it has the EXTSTORED flag.
> -            deltaparent = max([0] + [p for p in range(r - 2) if rlog.flags(p)])
> +            # deltaparent cannot have EXTSTORED flag.
> +            deltaparent = max([-1] + [p for p in range(r) if not rlog.flags(p)])

I feel like this should be checking against specific flags. But since this is a test, I'm fine accepting this. We can fix in a follow-up if the logic is wrong.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers, ryanmce
Cc: mercurial-devel
phabricator - March 6, 2018, 6:09 p.m.
quark added inline comments.

INLINE COMMENTS

> indygreg wrote in test-revlog-raw.py:156-161
> I feel like this should be checking against specific flags. But since this is a test, I'm fine accepting this. We can fix in a follow-up if the logic is wrong.

Seems I missed this line somehow. Will change in a minute.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2068

To: quark, indygreg, #hg-reviewers, ryanmce
Cc: mercurial-devel

Patch

diff --git a/tests/test-revlog-raw.py b/tests/test-revlog-raw.py
--- a/tests/test-revlog-raw.py
+++ b/tests/test-revlog-raw.py
@@ -114,6 +114,8 @@ 
             else:
                 # suboptimal deltaparent
                 deltaparent = min(0, parentrev)
+            if not rlog.candelta(deltaparent, r):
+                deltaparent = -1
             return {'node': rlog.node(r), 'p1': pnode, 'p2': node.nullid,
                     'cs': rlog.node(rlog.linkrev(r)), 'flags': rlog.flags(r),
                     'deltabase': rlog.node(deltaparent),
@@ -151,12 +153,12 @@ 
     for r in rlog:
         p1 = rlog.node(r - 1)
         p2 = node.nullid
-        if r == 0:
+        if r == 0 or rlog.flags(r):
             text = rlog.revision(r, raw=True)
             cachedelta = None
         else:
-            # deltaparent is more interesting if it has the EXTSTORED flag.
-            deltaparent = max([0] + [p for p in range(r - 2) if rlog.flags(p)])
+            # deltaparent cannot have EXTSTORED flag.
+            deltaparent = max([-1] + [p for p in range(r) if not rlog.flags(p)])
             text = None
             cachedelta = (deltaparent, rlog.revdiff(deltaparent, r))
         flags = rlog.flags(r)
@@ -262,8 +264,9 @@ 
         result.append((text, rawtext))
 
         # Verify flags like isdelta, isext work as expected
-        if bool(rlog.deltaparent(rev) > -1) != isdelta:
-            abort('rev %d: isdelta is ineffective' % rev)
+        # isdelta can be overridden to False if this or p1 has isext set
+        if bool(rlog.deltaparent(rev) > -1) and not isdelta:
+            abort('rev %d: isdelta is unexpected' % rev)
         if bool(rlog.flags(rev)) != isext:
             abort('rev %d: isext is ineffective' % rev)
     return result
diff --git a/tests/test-lfs-bundle.t b/tests/test-lfs-bundle.t
--- a/tests/test-lfs-bundle.t
+++ b/tests/test-lfs-bundle.t
@@ -91,10 +91,7 @@ 
   ---- Applying src-normal.bundle to dst-normal ----
   OK
   ---- Applying src-normal.bundle to dst-lfs ----
-   X@2: unpacking 0609652b7877: integrity check failed on data/X.i:2
-   Y@2: unpacking e384812cdeb9: integrity check failed on data/Y.i:2
-  2 integrity errors encountered!
-  (first damaged changeset appears to be 2)
+  CRASHED
   ---- Applying src-lfs.bundle to dst-normal ----
   OK
   ---- Applying src-lfs.bundle to dst-lfs ----
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -404,6 +404,9 @@ 
         for candidaterevs in self._getcandidaterevs(p1, p2, cachedelta):
             nominateddeltas = []
             for candidaterev in candidaterevs:
+                # do not use flags != 0 (ex. LFS) revision as delta base
+                if revlog.flags(candidaterev) != REVIDX_DEFAULT_FLAGS:
+                    continue
                 candidatedelta = self._builddeltainfo(revinfo, candidaterev, fh)
                 if revlog._isgooddeltainfo(candidatedelta, revinfo.textlen):
                     nominateddeltas.append(candidatedelta)
@@ -2082,7 +2085,12 @@ 
             deltacomputer = _deltacomputer(self)
 
         revinfo = _revisioninfo(node, p1, p2, btext, textlen, cachedelta, flags)
-        deltainfo = deltacomputer.finddeltainfo(revinfo, fh)
+
+        # do not use delta for flags != 0 (ex. LFS) revisions
+        if flags == REVIDX_DEFAULT_FLAGS:
+            deltainfo = deltacomputer.finddeltainfo(revinfo, fh)
+        else:
+            deltainfo = None
 
         if deltainfo is not None:
             base = deltainfo.base