Patchwork revlog: ensure that flags do not overflow 2 bytes

login
register
mail settings
Submitter Cotizo Sima
Date Nov. 25, 2016, 6:09 p.m.
Message ID <9925C822-B82A-48A1-9239-C016C73C1AAC@fb.com>
Download mbox | patch
Permalink /patch/17759/
State Changes Requested
Headers show

Comments

Cotizo Sima - Nov. 25, 2016, 6:09 p.m.
Gregory, I agree, but in this particular case I chose the “deepest” call which insert flags into the revlog such that I cover all the possible call paths. Although to be fair, another good alternative would be revlog._addrevision. I’m happy to move the operation there, if you guys find it more appropriate.

From: Gregory Szorc <gregory.szorc@gmail.com>

Date: Friday, November 25, 2016 at 5:27 PM
To: Cotizo Sima <cotizo@fb.com>
Cc: mercurial-devel <mercurial-devel@mercurial-scm.org>
Subject: Re: [PATCH] revlog: ensure that flags do not overflow 2 bytes

On Fri, Nov 25, 2016 at 5:24 AM, Cotizo Sima <cotizo@fb.com<mailto:cotizo@fb.com>> wrote:
# HG changeset patch
# User Cotizo Sima <cotizo@fb.com<mailto:cotizo@fb.com>>
# Date 1480079985 28800
#      Fri Nov 25 05:19:45 2016 -0800
# Node ID 4b311b730614941db6409f560ab9451bec74be75
# Parent  a3163433647108b7bec8fc45896db1c20b18ab21
revlog: ensure that flags do not overflow 2 bytes

This patch adds a line that ensures we are not setting by mistake a set of flags
overlfowing the 2 bytes they are allocated. Given the way the data is packed in
the revlog header, overflowing 2 bytes will result in setting a wrong offset.


I'm a believer in failing fast. If this new code comes into play, we've already lost to a bug. I think instead we should raise ValueError if type > 65535.
Gregory Szorc - Nov. 25, 2016, 6:59 p.m.
I think having the check in offset_type() to catch all consumers is the
right place. (Another source of the flag is changegroup data via
revlog.addgroup() and cg.deltachunk().)

To clarify, I'm suggesting that instead of truncating "type" like this
patch is doing, we should raise ValueError in offset_type(). We should
never pass in a too large number and IMO this warrants an exception. This
will actively prevent bad data from buggy code being written to a revlog.


On Fri, Nov 25, 2016 at 10:09 AM, Cotizo Sima <cotizo@fb.com> wrote:

> Gregory, I agree, but in this particular case I chose the “deepest” call
> which insert flags into the revlog such that I cover all the possible call
> paths. Although to be fair, another good alternative would be
> revlog._addrevision. I’m happy to move the operation there, if you guys
> find it more appropriate.
>
>
>
> *From: *Gregory Szorc <gregory.szorc@gmail.com>
> *Date: *Friday, November 25, 2016 at 5:27 PM
> *To: *Cotizo Sima <cotizo@fb.com>
> *Cc: *mercurial-devel <mercurial-devel@mercurial-scm.org>
> *Subject: *Re: [PATCH] revlog: ensure that flags do not overflow 2 bytes
>
>
>
> On Fri, Nov 25, 2016 at 5:24 AM, Cotizo Sima <cotizo@fb.com> wrote:
>
> # HG changeset patch
> # User Cotizo Sima <cotizo@fb.com>
> # Date 1480079985 28800
> #      Fri Nov 25 05:19:45 2016 -0800
> # Node ID 4b311b730614941db6409f560ab9451bec74be75
> # Parent  a3163433647108b7bec8fc45896db1c20b18ab21
> revlog: ensure that flags do not overflow 2 bytes
>
> This patch adds a line that ensures we are not setting by mistake a set of
> flags
> overlfowing the 2 bytes they are allocated. Given the way the data is
> packed in
> the revlog header, overflowing 2 bytes will result in setting a wrong
> offset.
>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -72,6 +72,7 @@
>      return int(q & 0xFFFF)
>
>  def offset_type(offset, type):
> +    type &= 0xFFFF
>      return long(long(offset) << 16 | type)
>
>  _nullhash = hashlib.sha1(nullid)
>
>
>
> I'm a believer in failing fast. If this new code comes into play, we've
> already lost to a bug. I think instead we should raise ValueError if type >
> 65535.
>
>
>
Jun Wu - Nov. 25, 2016, 7:16 p.m.
Excerpts from Gregory Szorc's message of 2016-11-25 10:59:42 -0800:
> I think having the check in offset_type() to catch all consumers is the
> right place. (Another source of the flag is changegroup data via
> revlog.addgroup() and cg.deltachunk().)
> 
> To clarify, I'm suggesting that instead of truncating "type" like this
> patch is doing, we should raise ValueError in offset_type(). We should
> never pass in a too large number and IMO this warrants an exception. This
> will actively prevent bad data from buggy code being written to a revlog.

+1. I think raising an Exception is better than dropping the bits silently.
It does not require moving the check elsewhere, but just change "&=" to an
if condition and raise an RuntimeError or so.

> On Fri, Nov 25, 2016 at 10:09 AM, Cotizo Sima <cotizo@fb.com> wrote:
> 
> > Gregory, I agree, but in this particular case I chose the “deepest” call
> > which insert flags into the revlog such that I cover all the possible call
> > paths. Although to be fair, another good alternative would be
> > revlog._addrevision. I’m happy to move the operation there, if you guys
> > find it more appropriate.
> >
> >
> >
> > *From: *Gregory Szorc <gregory.szorc@gmail.com>
> > *Date: *Friday, November 25, 2016 at 5:27 PM
> > *To: *Cotizo Sima <cotizo@fb.com>
> > *Subject: *Re: [PATCH] revlog: ensure that flags do not overflow 2 bytes
> >
> >
> >
> > On Fri, Nov 25, 2016 at 5:24 AM, Cotizo Sima <cotizo@fb.com> wrote:
> >
> > # HG changeset patch
> > # User Cotizo Sima <cotizo@fb.com>
> > # Date 1480079985 28800
> > #      Fri Nov 25 05:19:45 2016 -0800
> > # Node ID 4b311b730614941db6409f560ab9451bec74be75
> > # Parent  a3163433647108b7bec8fc45896db1c20b18ab21
> > revlog: ensure that flags do not overflow 2 bytes
> >
> > This patch adds a line that ensures we are not setting by mistake a set of
> > flags
> > overlfowing the 2 bytes they are allocated. Given the way the data is
> > packed in
> > the revlog header, overflowing 2 bytes will result in setting a wrong
> > offset.
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -72,6 +72,7 @@
> >      return int(q & 0xFFFF)
> >
> >  def offset_type(offset, type):
> > +    type &= 0xFFFF
> >      return long(long(offset) << 16 | type)
> >
> >  _nullhash = hashlib.sha1(nullid)
> >
> >
> >
> > I'm a believer in failing fast. If this new code comes into play, we've
> > already lost to a bug. I think instead we should raise ValueError if type >
> > 65535.
> >
> >
> >
Yuya Nishihara - Nov. 26, 2016, 11:14 a.m.
On Fri, 25 Nov 2016 19:16:23 +0000, Jun Wu wrote:
> Excerpts from Gregory Szorc's message of 2016-11-25 10:59:42 -0800:
> > I think having the check in offset_type() to catch all consumers is the
> > right place. (Another source of the flag is changegroup data via
> > revlog.addgroup() and cg.deltachunk().)
> > 
> > To clarify, I'm suggesting that instead of truncating "type" like this
> > patch is doing, we should raise ValueError in offset_type(). We should
> > never pass in a too large number and IMO this warrants an exception. This
> > will actively prevent bad data from buggy code being written to a revlog.
> 
> +1. I think raising an Exception is better than dropping the bits silently.
> It does not require moving the check elsewhere, but just change "&=" to an
> if condition and raise an RuntimeError or so.

ValueError or assert sounds better. Dropped from patchwork.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py

--- a/mercurial/revlog.py

+++ b/mercurial/revlog.py

@@ -72,6 +72,7 @@ 

     return int(q & 0xFFFF)

 def offset_type(offset, type):
+    type &= 0xFFFF

     return long(long(offset) << 16 | type)

 _nullhash = hashlib.sha1(nullid)