Patchwork [3,of,4] revlog: raise an exception earlier if an entry is too large

login
register
mail settings
Submitter Jordi Gutiérrez Hermoso
Date May 21, 2015, 8:30 p.m.
Message ID <94b79351d9569b65c3c1.1432240221@Iris>
Download mbox | patch
Permalink /patch/9219/
State Accepted
Delegated to: Augie Fackler
Headers show

Comments

Jordi Gutiérrez Hermoso - May 21, 2015, 8:30 p.m.
# HG changeset patch
# User Jordi Gutiérrez Hermoso <jordigh@octave.org>
# Date 1432239712 14400
#      Thu May 21 16:21:52 2015 -0400
# Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9
# Parent  88b99c48761cc7b982b84294aa679b63f5edf967
revlog: raise an exception earlier if an entry is too large

Before we were relying on _pack to error out when trying to pass an
integer that was too large for the "i" format specifier. Now we check
this earlier so we can form a better error message.
Matt Mackall - May 22, 2015, 5:05 p.m.
On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote:
> # HG changeset patch
> # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> # Date 1432239712 14400
> #      Thu May 21 16:21:52 2015 -0400
> # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9
> # Parent  88b99c48761cc7b982b84294aa679b63f5edf967
> revlog: raise an exception earlier if an entry is too large

Issue number?

> Before we were relying on _pack to error out when trying to pass an
> integer that was too large for the "i" format specifier. Now we check
> this earlier so we can form a better error message.

> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x"
>  ngshaoffset = 32
>  versionformat = ">I"
>  
> +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed
> +# integer)
> +maxentrysize = 2147483648
> +

You're off by one. The limit is 0x7fffffff. Let's use hex, add a
comment, and prefix with _.

>  class revlogio(object):
>      def __init__(self):
>          self.size = struct.calcsize(indexformatng)
> @@ -162,6 +166,11 @@ class revlogio(object):
>          return index, getattr(index, 'nodemap', None), cache
>  
>      def packentry(self, entry, node, version, rev):
> +        # uncompressed length
> +        if entry[2] > maxentrysize:
> +            raise RevlogError(_("too large for revlog storage"),

Should mention 2G. And probably the filename.

> +                              hint=_("consider using the largefiles extension"))

Eek. I'm not sure about that. Can we get just the fix and then bikeshed
the hint part?
Jordi Gutiérrez Hermoso - May 22, 2015, 5:40 p.m.
On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote:
> On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote:
> > # HG changeset patch
> > # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> > # Date 1432239712 14400
> > #      Thu May 21 16:21:52 2015 -0400
> > # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9
> > # Parent  88b99c48761cc7b982b84294aa679b63f5edf967
> > revlog: raise an exception earlier if an entry is too large
> 
> Issue number?

I considered the 4th patch in this series to address the issue, but I
suppose this one does too. Is it ok to have two patches with the same
issue number?

> 
> > Before we were relying on _pack to error out when trying to pass an
> > integer that was too large for the "i" format specifier. Now we check
> > this earlier so we can form a better error message.
> 
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x"
> >  ngshaoffset = 32
> >  versionformat = ">I"
> >  
> > +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed
> > +# integer)
> > +maxentrysize = 2147483648
> > +
> 
> You're off by one. The limit is 0x7fffffff. Let's use hex, add a
> comment, and prefix with _.

Okay.

> 
> >  class revlogio(object):
> >      def __init__(self):
> >          self.size = struct.calcsize(indexformatng)
> > @@ -162,6 +166,11 @@ class revlogio(object):
> >          return index, getattr(index, 'nodemap', None), cache
> >  
> >      def packentry(self, entry, node, version, rev):
> > +        # uncompressed length
> > +        if entry[2] > maxentrysize:
> > +            raise RevlogError(_("too large for revlog storage"),
> 
> Should mention 2G. And probably the filename.

The filename isn't available at this point. And there may not even be
a file at all: you could also hit this limit if you try to create a
giant manifest entry, I think. That's why I decided to handle the
filename higher up in the call stack.

> 
> > +                              hint=_("consider using the largefiles extension"))
> 
> Eek. I'm not sure about that. Can we get just the fix and then bikeshed
> the hint part?

Sure, but the hint is not shown for a RevlogError. You have to throw
an Abort for the hint to show, so that's why I also I thought that
this could just be added here and a caller could decide to show the
hint or not.
Adrian Buehlmann - May 22, 2015, 8 p.m.
On 2015-05-22 19:40, Jordi Gutiérrez Hermoso wrote:
> On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote:
>> On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote:
>>> # HG changeset patch
>>> # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
>>> # Date 1432239712 14400
>>> #      Thu May 21 16:21:52 2015 -0400
>>> # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9
>>> # Parent  88b99c48761cc7b982b84294aa679b63f5edf967
>>> revlog: raise an exception earlier if an entry is too large
>>
>> Issue number?
> 
> I considered the 4th patch in this series to address the issue, but I
> suppose this one does too. Is it ok to have two patches with the same
> issue number?

I'd say yes. Splitting of patches should not be dictated by issue
fixing. If it makes sense to have multiple patches by the usual rules,
do it.

Precedent: I've done that (two patches with the same issue number) in
e1002cf9fe54 and e98581d44f0b, fixing

    http://bz.selenic.com/show_bug.cgi?id=2513

This was in 2010 and "HG Bot" was able to handle it.
Matt Mackall - May 23, 2015, 8:55 p.m.
On Fri, 2015-05-22 at 13:40 -0400, Jordi Gutiérrez Hermoso wrote:
> On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote:
> > On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote:
> > > # HG changeset patch
> > > # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> > > # Date 1432239712 14400
> > > #      Thu May 21 16:21:52 2015 -0400
> > > # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9
> > > # Parent  88b99c48761cc7b982b84294aa679b63f5edf967
> > > revlog: raise an exception earlier if an entry is too large
> > 
> > Issue number?
> 
> I considered the 4th patch in this series to address the issue, but I
> suppose this one does too. Is it ok to have two patches with the same
> issue number?
> 
> > 
> > > Before we were relying on _pack to error out when trying to pass an
> > > integer that was too large for the "i" format specifier. Now we check
> > > this earlier so we can form a better error message.
> > 
> > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > > --- a/mercurial/revlog.py
> > > +++ b/mercurial/revlog.py
> > > @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x"
> > >  ngshaoffset = 32
> > >  versionformat = ">I"
> > >  
> > > +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed
> > > +# integer)
> > > +maxentrysize = 2147483648
> > > +
> > 
> > You're off by one. The limit is 0x7fffffff. Let's use hex, add a
> > comment, and prefix with _.
> 
> Okay.
> 
> > 
> > >  class revlogio(object):
> > >      def __init__(self):
> > >          self.size = struct.calcsize(indexformatng)
> > > @@ -162,6 +166,11 @@ class revlogio(object):
> > >          return index, getattr(index, 'nodemap', None), cache
> > >  
> > >      def packentry(self, entry, node, version, rev):
> > > +        # uncompressed length
> > > +        if entry[2] > maxentrysize:
> > > +            raise RevlogError(_("too large for revlog storage"),
> > 
> > Should mention 2G. And probably the filename.
>
> The filename isn't available at this point. And there may not even be
> a file at all: you could also hit this limit if you try to create a
> giant manifest entry, I think. That's why I decided to handle the
> filename higher up in the call stack.

Yep. Let's use self.indexfile for now. It's not perfect, but it's better
than nothing for the generic revlog message. I hadn't seen 4of4.

> > 
> > > +                              hint=_("consider using the largefiles extension"))
> > 
> > Eek. I'm not sure about that. Can we get just the fix and then bikeshed
> > the hint part?
> 
> Sure, but the hint is not shown for a RevlogError. You have to throw
> an Abort for the hint to show, so that's why I also I thought that
> this could just be added here and a caller could decide to show the
> hint or not.

Right, my request is to send a simpler version of the fix without the
hint. That way the kernel of the fix isn't needlessly delayed on tuning
less important details.

So ideally a series like this would be:

- report "too big" error
- refactor of hints
- addition of hint about feature-of-last-resort

..which maximizes the odds of making forward progress on the core fix.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -152,6 +152,10 @@  indexformatng = ">Qiiiiii20s12x"
 ngshaoffset = 32
 versionformat = ">I"
 
+# matches uncompressed length of indexformatng (2 gigs, 4-byte signed
+# integer)
+maxentrysize = 2147483648
+
 class revlogio(object):
     def __init__(self):
         self.size = struct.calcsize(indexformatng)
@@ -162,6 +166,11 @@  class revlogio(object):
         return index, getattr(index, 'nodemap', None), cache
 
     def packentry(self, entry, node, version, rev):
+        # uncompressed length
+        if entry[2] > maxentrysize:
+            raise RevlogError(_("too large for revlog storage"),
+                              hint=_("consider using the largefiles extension"))
+
         p = _pack(indexformatng, *entry)
         if rev == 0:
             p = _pack(versionformat, version) + p[4:]