Patchwork [STABLE] filelog: don't crash on invalid copy metadata (issue5748)

login
register
mail settings
Submitter Gregory Szorc
Date Dec. 1, 2017, 2:22 a.m.
Message ID <5f86d95b1bd8aae2c555.1512094930@gps-mbp.local>
Download mbox | patch
Permalink /patch/25841/
State Deferred, archived
Headers show

Comments

Gregory Szorc - Dec. 1, 2017, 2:22 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1512094786 18000
#      Thu Nov 30 21:19:46 2017 -0500
# Branch stable
# Node ID 5f86d95b1bd8aae2c555d2480aa4ac0d2893d35f
# Parent  27196b7fc1acfc78711bc4b721fcc68754354c2c
filelog: don't crash on invalid copy metadata (issue5748)

"copy" and "copyrev" are both supposed to appear next to each other.
However, a user report demonstrated a crash that indicates that
something in the wild is producing "copy" without "copyrev"
(probably `hg convert`).

While we should definitely fix the source of the bad metadata,
the bad code causing the crash is already in the wild and who knows
how many repositories are impacted. So let's be more defensive
when accessing the file revision metadata.
Anton Shestakov - Dec. 1, 2017, 4:19 a.m.
On Thu, 30 Nov 2017 21:22:10 -0500
Gregory Szorc <gregory.szorc@gmail.com> wrote:

> -        if m and "copy" in m:
> -            return (m["copy"], revlog.bin(m["copyrev"]))
> +        if m:
> +            # copy and copyrev occur in pairs. In rare cases due to bugs,
> +            # one can occur without the other.
> +            try:
> +                copy, copyrev = m['copy'], m['copyrev']
> +                return copy, revlog.bin(copyrev)
> +            except KeyError:
> +                pass
> +

Can't help but ask: why not just

  if m and "copy" in m and "copyrev" in m:

?
Gregory Szorc - Dec. 1, 2017, 5:34 a.m.
On Thu, Nov 30, 2017 at 11:19 PM, Anton Shestakov <av6@dwimlabs.net> wrote:

> On Thu, 30 Nov 2017 21:22:10 -0500
> Gregory Szorc <gregory.szorc@gmail.com> wrote:
>
> > -        if m and "copy" in m:
> > -            return (m["copy"], revlog.bin(m["copyrev"]))
> > +        if m:
> > +            # copy and copyrev occur in pairs. In rare cases due to
> bugs,
> > +            # one can occur without the other.
> > +            try:
> > +                copy, copyrev = m['copy'], m['copyrev']
> > +                return copy, revlog.bin(copyrev)
> > +            except KeyError:
> > +                pass
> > +
>
> Can't help but ask: why not just
>
>   if m and "copy" in m and "copyrev" in m:
>
> ?
>

I have a personal aversion to redundant key lookups. Not that it matters
much here: we're using a builtin dict and the lookup should be very fast.
Yuya Nishihara - Dec. 1, 2017, 12:51 p.m.
On Thu, 30 Nov 2017 21:22:10 -0500, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1512094786 18000
> #      Thu Nov 30 21:19:46 2017 -0500
> # Branch stable
> # Node ID 5f86d95b1bd8aae2c555d2480aa4ac0d2893d35f
> # Parent  27196b7fc1acfc78711bc4b721fcc68754354c2c
> filelog: don't crash on invalid copy metadata (issue5748)
> 
> "copy" and "copyrev" are both supposed to appear next to each other.
> However, a user report demonstrated a crash that indicates that
> something in the wild is producing "copy" without "copyrev"
> (probably `hg convert`).
> 
> While we should definitely fix the source of the bad metadata,
> the bad code causing the crash is already in the wild and who knows
> how many repositories are impacted. So let's be more defensive
> when accessing the file revision metadata.

I'm not pretty sure if ignoring data corruption is better than the crash.
Are there any integrity concerns?

> --- a/mercurial/filelog.py
> +++ b/mercurial/filelog.py
> @@ -60,10 +60,17 @@ class filelog(revlog.revlog):
>          if self.parents(node)[0] != revlog.nullid:
>              return False
>          t = self.revision(node)
>          m = parsemeta(t)[0]
> -        if m and "copy" in m:
> -            return (m["copy"], revlog.bin(m["copyrev"]))
> +        if m:
> +            # copy and copyrev occur in pairs. In rare cases due to bugs,
> +            # one can occur without the other.
> +            try:
> +                copy, copyrev = m['copy'], m['copyrev']
> +                return copy, revlog.bin(copyrev)
> +            except KeyError:

Nit: this except block might be too broad if revlog.bin() weren't a trivial
function.

Patch

diff --git a/mercurial/filelog.py b/mercurial/filelog.py
--- a/mercurial/filelog.py
+++ b/mercurial/filelog.py
@@ -60,10 +60,17 @@  class filelog(revlog.revlog):
         if self.parents(node)[0] != revlog.nullid:
             return False
         t = self.revision(node)
         m = parsemeta(t)[0]
-        if m and "copy" in m:
-            return (m["copy"], revlog.bin(m["copyrev"]))
+        if m:
+            # copy and copyrev occur in pairs. In rare cases due to bugs,
+            # one can occur without the other.
+            try:
+                copy, copyrev = m['copy'], m['copyrev']
+                return copy, revlog.bin(copyrev)
+            except KeyError:
+                pass
+
         return False
 
     def size(self, rev):
         """return the size of a given revision"""