Submitter | Jordi Gutiérrez Hermoso |
---|---|
Date | May 21, 2015, 8:30 p.m. |
Message ID | <94b79351d9569b65c3c1.1432240221@Iris> |
Download | mbox | patch |
Permalink | /patch/9219/ |
State | Accepted |
Delegated to: | Augie Fackler |
Headers | show |
Comments
On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote: > # HG changeset patch > # User Jordi Gutiérrez Hermoso <jordigh@octave.org> > # Date 1432239712 14400 > # Thu May 21 16:21:52 2015 -0400 > # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9 > # Parent 88b99c48761cc7b982b84294aa679b63f5edf967 > revlog: raise an exception earlier if an entry is too large Issue number? > Before we were relying on _pack to error out when trying to pass an > integer that was too large for the "i" format specifier. Now we check > this earlier so we can form a better error message. > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > --- a/mercurial/revlog.py > +++ b/mercurial/revlog.py > @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x" > ngshaoffset = 32 > versionformat = ">I" > > +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed > +# integer) > +maxentrysize = 2147483648 > + You're off by one. The limit is 0x7fffffff. Let's use hex, add a comment, and prefix with _. > class revlogio(object): > def __init__(self): > self.size = struct.calcsize(indexformatng) > @@ -162,6 +166,11 @@ class revlogio(object): > return index, getattr(index, 'nodemap', None), cache > > def packentry(self, entry, node, version, rev): > + # uncompressed length > + if entry[2] > maxentrysize: > + raise RevlogError(_("too large for revlog storage"), Should mention 2G. And probably the filename. > + hint=_("consider using the largefiles extension")) Eek. I'm not sure about that. Can we get just the fix and then bikeshed the hint part?
On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote: > On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote: > > # HG changeset patch > > # User Jordi Gutiérrez Hermoso <jordigh@octave.org> > > # Date 1432239712 14400 > > # Thu May 21 16:21:52 2015 -0400 > > # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9 > > # Parent 88b99c48761cc7b982b84294aa679b63f5edf967 > > revlog: raise an exception earlier if an entry is too large > > Issue number? I considered the 4th patch in this series to address the issue, but I suppose this one does too. Is it ok to have two patches with the same issue number? > > > Before we were relying on _pack to error out when trying to pass an > > integer that was too large for the "i" format specifier. Now we check > > this earlier so we can form a better error message. > > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > > --- a/mercurial/revlog.py > > +++ b/mercurial/revlog.py > > @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x" > > ngshaoffset = 32 > > versionformat = ">I" > > > > +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed > > +# integer) > > +maxentrysize = 2147483648 > > + > > You're off by one. The limit is 0x7fffffff. Let's use hex, add a > comment, and prefix with _. Okay. > > > class revlogio(object): > > def __init__(self): > > self.size = struct.calcsize(indexformatng) > > @@ -162,6 +166,11 @@ class revlogio(object): > > return index, getattr(index, 'nodemap', None), cache > > > > def packentry(self, entry, node, version, rev): > > + # uncompressed length > > + if entry[2] > maxentrysize: > > + raise RevlogError(_("too large for revlog storage"), > > Should mention 2G. And probably the filename. The filename isn't available at this point. And there may not even be a file at all: you could also hit this limit if you try to create a giant manifest entry, I think. That's why I decided to handle the filename higher up in the call stack. > > > + hint=_("consider using the largefiles extension")) > > Eek. I'm not sure about that. Can we get just the fix and then bikeshed > the hint part? Sure, but the hint is not shown for a RevlogError. You have to throw an Abort for the hint to show, so that's why I also I thought that this could just be added here and a caller could decide to show the hint or not.
On 2015-05-22 19:40, Jordi Gutiérrez Hermoso wrote: > On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote: >> On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote: >>> # HG changeset patch >>> # User Jordi Gutiérrez Hermoso <jordigh@octave.org> >>> # Date 1432239712 14400 >>> # Thu May 21 16:21:52 2015 -0400 >>> # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9 >>> # Parent 88b99c48761cc7b982b84294aa679b63f5edf967 >>> revlog: raise an exception earlier if an entry is too large >> >> Issue number? > > I considered the 4th patch in this series to address the issue, but I > suppose this one does too. Is it ok to have two patches with the same > issue number? I'd say yes. Splitting of patches should not be dictated by issue fixing. If it makes sense to have multiple patches by the usual rules, do it. Precedent: I've done that (two patches with the same issue number) in e1002cf9fe54 and e98581d44f0b, fixing http://bz.selenic.com/show_bug.cgi?id=2513 This was in 2010 and "HG Bot" was able to handle it.
On Fri, 2015-05-22 at 13:40 -0400, Jordi Gutiérrez Hermoso wrote: > On Fri, 2015-05-22 at 12:05 -0500, Matt Mackall wrote: > > On Thu, 2015-05-21 at 16:30 -0400, Jordi Gutiérrez Hermoso wrote: > > > # HG changeset patch > > > # User Jordi Gutiérrez Hermoso <jordigh@octave.org> > > > # Date 1432239712 14400 > > > # Thu May 21 16:21:52 2015 -0400 > > > # Node ID 94b79351d9569b65c3c111cbfe88a03112d617a9 > > > # Parent 88b99c48761cc7b982b84294aa679b63f5edf967 > > > revlog: raise an exception earlier if an entry is too large > > > > Issue number? > > I considered the 4th patch in this series to address the issue, but I > suppose this one does too. Is it ok to have two patches with the same > issue number? > > > > > > Before we were relying on _pack to error out when trying to pass an > > > integer that was too large for the "i" format specifier. Now we check > > > this earlier so we can form a better error message. > > > > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py > > > --- a/mercurial/revlog.py > > > +++ b/mercurial/revlog.py > > > @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x" > > > ngshaoffset = 32 > > > versionformat = ">I" > > > > > > +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed > > > +# integer) > > > +maxentrysize = 2147483648 > > > + > > > > You're off by one. The limit is 0x7fffffff. Let's use hex, add a > > comment, and prefix with _. > > Okay. > > > > > > class revlogio(object): > > > def __init__(self): > > > self.size = struct.calcsize(indexformatng) > > > @@ -162,6 +166,11 @@ class revlogio(object): > > > return index, getattr(index, 'nodemap', None), cache > > > > > > def packentry(self, entry, node, version, rev): > > > + # uncompressed length > > > + if entry[2] > maxentrysize: > > > + raise RevlogError(_("too large for revlog storage"), > > > > Should mention 2G. And probably the filename. > > The filename isn't available at this point. And there may not even be > a file at all: you could also hit this limit if you try to create a > giant manifest entry, I think. That's why I decided to handle the > filename higher up in the call stack. Yep. Let's use self.indexfile for now. It's not perfect, but it's better than nothing for the generic revlog message. I hadn't seen 4of4. > > > > > + hint=_("consider using the largefiles extension")) > > > > Eek. I'm not sure about that. Can we get just the fix and then bikeshed > > the hint part? > > Sure, but the hint is not shown for a RevlogError. You have to throw > an Abort for the hint to show, so that's why I also I thought that > this could just be added here and a caller could decide to show the > hint or not. Right, my request is to send a simpler version of the fix without the hint. That way the kernel of the fix isn't needlessly delayed on tuning less important details. So ideally a series like this would be: - report "too big" error - refactor of hints - addition of hint about feature-of-last-resort ..which maximizes the odds of making forward progress on the core fix.
Patch
diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -152,6 +152,10 @@ indexformatng = ">Qiiiiii20s12x" ngshaoffset = 32 versionformat = ">I" +# matches uncompressed length of indexformatng (2 gigs, 4-byte signed +# integer) +maxentrysize = 2147483648 + class revlogio(object): def __init__(self): self.size = struct.calcsize(indexformatng) @@ -162,6 +166,11 @@ class revlogio(object): return index, getattr(index, 'nodemap', None), cache def packentry(self, entry, node, version, rev): + # uncompressed length + if entry[2] > maxentrysize: + raise RevlogError(_("too large for revlog storage"), + hint=_("consider using the largefiles extension")) + p = _pack(indexformatng, *entry) if rev == 0: p = _pack(versionformat, version) + p[4:]