Patchwork [1,of,3,V2] revlog: raise an exception earlier if an entry is too large

login
register
mail settings
Submitter Jordi Gutiérrez Hermoso
Date May 28, 2015, 8:48 p.m.
Message ID <1f8879ee5b6e98b832ca.1432846138@Iris>
Download mbox | patch
Permalink /patch/9344/
State Accepted
Delegated to: Augie Fackler
Headers show

Comments

Jordi Gutiérrez Hermoso - May 28, 2015, 8:48 p.m.
# HG changeset patch
# User Jordi Gutiérrez Hermoso <jordigh@octave.org>
# Date 1432819730 14400
#      Thu May 28 09:28:50 2015 -0400
# Node ID 1f8879ee5b6e98b832caefc9ceb1e506b0b23ec4
# Parent  bcb17d7dbec25088eaec5e4d34dedbd7057c5d68
revlog: raise an exception earlier if an entry is too large

Before we were relying on _pack to error out when trying to pass an
integer that was too large for the "i" format specifier. Now we check
this earlier so we can form a better error message.

The error message unfortunately must exclude the filename at this
level of the call stack. The problem is that this name is not
available here, and the error can be triggered by a large manifest or
by a large file itself. Although perhaps we could provide the name of
a revlog index file (from the revlog object, instead of the revlogio
object), this seems like too much leakage of internal data structures.
It's not ideal already that an error message even mentions revlogs,
but this does seem unavoidable here.
Yuya Nishihara - May 29, 2015, 2:16 p.m.
On Thu, 28 May 2015 16:48:58 -0400, Jordi Gutiérrez Hermoso wrote:
> # HG changeset patch
> # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> # Date 1432819730 14400
> #      Thu May 28 09:28:50 2015 -0400
> # Node ID 1f8879ee5b6e98b832caefc9ceb1e506b0b23ec4
> # Parent  bcb17d7dbec25088eaec5e4d34dedbd7057c5d68
> revlog: raise an exception earlier if an entry is too large
> 
> Before we were relying on _pack to error out when trying to pass an
> integer that was too large for the "i" format specifier. Now we check
> this earlier so we can form a better error message.
> 
> The error message unfortunately must exclude the filename at this
> level of the call stack. The problem is that this name is not
> available here, and the error can be triggered by a large manifest or
> by a large file itself. Although perhaps we could provide the name of
> a revlog index file (from the revlog object, instead of the revlogio
> object), this seems like too much leakage of internal data structures.
> It's not ideal already that an error message even mentions revlogs,
> but this does seem unavoidable here.
> 
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -153,6 +153,10 @@ indexformatng = ">Qiiiiii20s12x"
>  ngshaoffset = 32
>  versionformat = ">I"
>  
> +# corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
> +# signed integer)
> +_maxentrysize = 0x7fffffff
> +
>  class revlogio(object):
>      def __init__(self):
>          self.size = struct.calcsize(indexformatng)
> @@ -163,6 +167,11 @@ class revlogio(object):
>          return index, getattr(index, 'nodemap', None), cache
>  
>      def packentry(self, entry, node, version, rev):
> +        # uncompressed length
> +        if entry[2] > _maxentrysize:
> +            raise RevlogError(
> +                _("maximum revlog storage (%d) exceeded" % _maxentrysize))

It should be _("... %d ...") % _maxentrysize.

BTW, I found a similar message that says "32 bits" with the actual value.

    if abs(when) > 0x7fffffff:
        raise Abort(_('date exceeds 32 bits: %d') % when)
Augie Fackler - May 29, 2015, 2:30 p.m.
On Fri, May 29, 2015 at 11:16:41PM +0900, Yuya Nishihara wrote:
> On Thu, 28 May 2015 16:48:58 -0400, Jordi Gutiérrez Hermoso wrote:
> > # HG changeset patch
> > # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> > # Date 1432819730 14400
> > #      Thu May 28 09:28:50 2015 -0400
> > # Node ID 1f8879ee5b6e98b832caefc9ceb1e506b0b23ec4
> > # Parent  bcb17d7dbec25088eaec5e4d34dedbd7057c5d68
> > revlog: raise an exception earlier if an entry is too large
> >
> > Before we were relying on _pack to error out when trying to pass an
> > integer that was too large for the "i" format specifier. Now we check
> > this earlier so we can form a better error message.
> >
> > The error message unfortunately must exclude the filename at this
> > level of the call stack. The problem is that this name is not
> > available here, and the error can be triggered by a large manifest or
> > by a large file itself. Although perhaps we could provide the name of
> > a revlog index file (from the revlog object, instead of the revlogio
> > object), this seems like too much leakage of internal data structures.
> > It's not ideal already that an error message even mentions revlogs,
> > but this does seem unavoidable here.
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -153,6 +153,10 @@ indexformatng = ">Qiiiiii20s12x"
> >  ngshaoffset = 32
> >  versionformat = ">I"
> >
> > +# corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
> > +# signed integer)
> > +_maxentrysize = 0x7fffffff
> > +
> >  class revlogio(object):
> >      def __init__(self):
> >          self.size = struct.calcsize(indexformatng)
> > @@ -163,6 +167,11 @@ class revlogio(object):
> >          return index, getattr(index, 'nodemap', None), cache
> >
> >      def packentry(self, entry, node, version, rev):
> > +        # uncompressed length
> > +        if entry[2] > _maxentrysize:
> > +            raise RevlogError(
> > +                _("maximum revlog storage (%d) exceeded" % _maxentrysize))
>
> It should be _("... %d ...") % _maxentrysize.

Oh, yeah, you're absolutely right for i18n reasons. I'm going to go
ahead and drop the patches now and Jordi can do a resend once you two
have consensus.

>
> BTW, I found a similar message that says "32 bits" with the actual value.
>
>     if abs(when) > 0x7fffffff:
>         raise Abort(_('date exceeds 32 bits: %d') % when)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -153,6 +153,10 @@  indexformatng = ">Qiiiiii20s12x"
 ngshaoffset = 32
 versionformat = ">I"
 
+# corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
+# signed integer)
+_maxentrysize = 0x7fffffff
+
 class revlogio(object):
     def __init__(self):
         self.size = struct.calcsize(indexformatng)
@@ -163,6 +167,11 @@  class revlogio(object):
         return index, getattr(index, 'nodemap', None), cache
 
     def packentry(self, entry, node, version, rev):
+        # uncompressed length
+        if entry[2] > _maxentrysize:
+            raise RevlogError(
+                _("maximum revlog storage (%d) exceeded" % _maxentrysize))
+
         p = _pack(indexformatng, *entry)
         if rev == 0:
             p = _pack(versionformat, version) + p[4:]