Patchwork [1,of,3,V3] revlog: raise an exception earlier if an entry is too large (issue4675)

login
register
mail settings
Submitter Jordi Gutiérrez Hermoso
Date June 2, 2015, 7:07 p.m.
Message ID <b7f66f13bd44581a107f.1433272028@Iris>
Download mbox | patch
Permalink /patch/9446/
State Accepted
Commit eee88912db0a73d3f38789caecfe2d23c27a415f
Headers show

Comments

Jordi Gutiérrez Hermoso - June 2, 2015, 7:07 p.m.
# HG changeset patch
# User Jordi Gutiérrez Hermoso <jordigh@octave.org>
# Date 1433271879 14400
#      Tue Jun 02 15:04:39 2015 -0400
# Node ID b7f66f13bd44581a107f2cb8f5655ea05aa06245
# Parent  eb52de500d2a308761b65bc9efaf85272c27eca5
revlog: raise an exception earlier if an entry is too large (issue4675)

Before we were relying on _pack to error out when trying to pass an
integer that was too large for the "i" format specifier. Now we check
this earlier so we can form a better error message.

The error message unfortunately must exclude the filename at this
level of the call stack. The problem is that this name is not
available here, and the error can be triggered by a large manifest or
by a large file itself. Although perhaps we could provide the name of
a revlog index file (from the revlog object, instead of the revlogio
object), this seems like too much leakage of internal data structures.
It's not ideal already that an error message even mentions revlogs,
but this does seem unavoidable here.
Matt Mackall - June 3, 2015, 8:11 p.m.
On Tue, 2015-06-02 at 15:07 -0400, Jordi Gutiérrez Hermoso wrote:
> # HG changeset patch
> # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> # Date 1433271879 14400
> #      Tue Jun 02 15:04:39 2015 -0400
> # Node ID b7f66f13bd44581a107f2cb8f5655ea05aa06245
> # Parent  eb52de500d2a308761b65bc9efaf85272c27eca5
> revlog: raise an exception earlier if an entry is too large (issue4675)

Queued, but tweaked in flight to explicitly reference "2GiB" in the
error message rather than an arcane 9-digit number.
Jordi Gutiérrez Hermoso - June 3, 2015, 9:10 p.m.
On Wed, 2015-06-03 at 15:11 -0500, Matt Mackall wrote:
> On Tue, 2015-06-02 at 15:07 -0400, Jordi Gutiérrez Hermoso wrote:
> > # HG changeset patch
> > # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> > # Date 1433271879 14400
> > #      Tue Jun 02 15:04:39 2015 -0400
> > # Node ID b7f66f13bd44581a107f2cb8f5655ea05aa06245
> > # Parent  eb52de500d2a308761b65bc9efaf85272c27eca5
> > revlog: raise an exception earlier if an entry is too large (issue4675)
> 
> Queued, but tweaked in flight to explicitly reference "2GiB" in the
> error message rather than an arcane 9-digit number.

My concern with that is that unless you are computing "2GiB" from
_maxentrysize, this adds a maintenance burden for the future should
this variable ever change.

I suppose it's not a big deal, though.
Matt Mackall - June 3, 2015, 10:12 p.m.
On Wed, 2015-06-03 at 17:10 -0400, Jordi Gutiérrez Hermoso wrote:
> On Wed, 2015-06-03 at 15:11 -0500, Matt Mackall wrote:
> > On Tue, 2015-06-02 at 15:07 -0400, Jordi Gutiérrez Hermoso wrote:
> > > # HG changeset patch
> > > # User Jordi Gutiérrez Hermoso <jordigh@octave.org>
> > > # Date 1433271879 14400
> > > #      Tue Jun 02 15:04:39 2015 -0400
> > > # Node ID b7f66f13bd44581a107f2cb8f5655ea05aa06245
> > > # Parent  eb52de500d2a308761b65bc9efaf85272c27eca5
> > > revlog: raise an exception earlier if an entry is too large (issue4675)
> > 
> > Queued, but tweaked in flight to explicitly reference "2GiB" in the
> > error message rather than an arcane 9-digit number.
> 
> My concern with that is that unless you are computing "2GiB" from
> _maxentrysize, this adds a maintenance burden for the future should
> this variable ever change.

It's been 2G for 10 years, if we try to change it, this message will be
the least of our problems.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -153,6 +153,10 @@  indexformatng = ">Qiiiiii20s12x"
 ngshaoffset = 32
 versionformat = ">I"
 
+# corresponds to uncompressed length of indexformatng (2 gigs, 4-byte
+# signed integer)
+_maxentrysize = 0x7fffffff
+
 class revlogio(object):
     def __init__(self):
         self.size = struct.calcsize(indexformatng)
@@ -163,6 +167,12 @@  class revlogio(object):
         return index, getattr(index, 'nodemap', None), cache
 
     def packentry(self, entry, node, version, rev):
+        uncompressedlength = entry[2]
+        if uncompressedlength > _maxentrysize:
+            raise RevlogError(
+                _("size of %d bytes exceeds maximum revlog storage of %d")
+                % (uncompressedlength, _maxentrysize))
+
         p = _pack(indexformatng, *entry)
         if rev == 0:
             p = _pack(versionformat, version) + p[4:]