Patchwork [07,of,11] revlog: fix many file handle opens to explicitly use bytes mode

login
register
mail settings
Submitter Augie Fackler
Date March 26, 2017, 10:36 p.m.
Message ID <e7fe3ab60132647a9c3b.1490567801@imladris.local>
Download mbox | patch
Permalink /patch/19706/
State Accepted
Headers show

Comments

Augie Fackler - March 26, 2017, 10:36 p.m.
# HG changeset patch
# User Augie Fackler <raf@durin42.com>
# Date 1490567324 14400
#      Sun Mar 26 18:28:44 2017 -0400
# Node ID e7fe3ab60132647a9c3b86b2960b385e61b9dcf0
# Parent  48144fe2d912b7d9fc300955d0c881aceead6930
revlog: fix many file handle opens to explicitly use bytes mode

This broke on Python 3, but we've probably been getting lucky that
this hasn't caused EOL corruption on Windows or something for
years. :(
Yuya Nishihara - March 27, 2017, 1:54 p.m.
On Sun, 26 Mar 2017 18:36:41 -0400, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <raf@durin42.com>
> # Date 1490567324 14400
> #      Sun Mar 26 18:28:44 2017 -0400
> # Node ID e7fe3ab60132647a9c3b86b2960b385e61b9dcf0
> # Parent  48144fe2d912b7d9fc300955d0c881aceead6930
> revlog: fix many file handle opens to explicitly use bytes mode
> 
> This broke on Python 3, but we've probably been getting lucky that
> this hasn't caused EOL corruption on Windows or something for
> years. :(
> 
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1467,8 +1467,8 @@ class revlog(object):
>  
>          dfh = None
>          if not self._inline:
> -            dfh = self.opener(self.datafile, "a+")
> -        ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
> +            dfh = self.opener(self.datafile, "a+b")
> +        ifh = self.opener(self.indexfile, "a+b", checkambig=self._checkambig)

'b' shouldn't be needed since vfs adds 'b' unless text=True is explicitly
specified.
Augie Fackler - March 27, 2017, 2:09 p.m.
On Mon, Mar 27, 2017 at 10:54:08PM +0900, Yuya Nishihara wrote:
> On Sun, 26 Mar 2017 18:36:41 -0400, Augie Fackler wrote:
> > # HG changeset patch
> > # User Augie Fackler <raf@durin42.com>
> > # Date 1490567324 14400
> > #      Sun Mar 26 18:28:44 2017 -0400
> > # Node ID e7fe3ab60132647a9c3b86b2960b385e61b9dcf0
> > # Parent  48144fe2d912b7d9fc300955d0c881aceead6930
> > revlog: fix many file handle opens to explicitly use bytes mode
> >
> > This broke on Python 3, but we've probably been getting lucky that
> > this hasn't caused EOL corruption on Windows or something for
> > years. :(
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -1467,8 +1467,8 @@ class revlog(object):
> >
> >          dfh = None
> >          if not self._inline:
> > -            dfh = self.opener(self.datafile, "a+")
> > -        ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
> > +            dfh = self.opener(self.datafile, "a+b")
> > +        ifh = self.opener(self.indexfile, "a+b", checkambig=self._checkambig)
>
> 'b' shouldn't be needed since vfs adds 'b' unless text=True is explicitly
> specified.

Huh. I swear this fixed some problems for me. I'll keep it around in
reserve in case the problem crops up again.
Gregory Szorc - March 30, 2017, 2:24 a.m.
On Mon, Mar 27, 2017 at 6:54 AM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 26 Mar 2017 18:36:41 -0400, Augie Fackler wrote:
> > # HG changeset patch
> > # User Augie Fackler <raf@durin42.com>
> > # Date 1490567324 14400
> > #      Sun Mar 26 18:28:44 2017 -0400
> > # Node ID e7fe3ab60132647a9c3b86b2960b385e61b9dcf0
> > # Parent  48144fe2d912b7d9fc300955d0c881aceead6930
> > revlog: fix many file handle opens to explicitly use bytes mode
> >
> > This broke on Python 3, but we've probably been getting lucky that
> > this hasn't caused EOL corruption on Windows or something for
> > years. :(
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -1467,8 +1467,8 @@ class revlog(object):
> >
> >          dfh = None
> >          if not self._inline:
> > -            dfh = self.opener(self.datafile, "a+")
> > -        ifh = self.opener(self.indexfile, "a+",
> checkambig=self._checkambig)
> > +            dfh = self.opener(self.datafile, "a+b")
> > +        ifh = self.opener(self.indexfile, "a+b",
> checkambig=self._checkambig)
>
> 'b' shouldn't be needed since vfs adds 'b' unless text=True is explicitly
> specified.
>

That "text" argument always seemed weird to me because all it does is
reinvent the wheel for the "mode" argument to open(). I'd prefer we
deprecate the argument and use explicit mode values so the I/O code more
closely aligns with what Python does. Principal of least surprise.

That being said, it is a big BC and I'm not sure we can change it easily.

I do like Augie's patch to add "b" because it results in explicit code,
even if it is redundant. Although, I can see how some call sites specifying
"b" and others not with both having the same behavior (without "text=True")
could be confusing. That's a nice corner we painted ourselves into :/
Yuya Nishihara - March 30, 2017, 3:08 p.m.
On Wed, 29 Mar 2017 19:24:11 -0700, Gregory Szorc wrote:
> On Mon, Mar 27, 2017 at 6:54 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sun, 26 Mar 2017 18:36:41 -0400, Augie Fackler wrote:
> > > # HG changeset patch
> > > # User Augie Fackler <raf@durin42.com>
> > > # Date 1490567324 14400
> > > #      Sun Mar 26 18:28:44 2017 -0400
> > > # Node ID e7fe3ab60132647a9c3b86b2960b385e61b9dcf0
> > > # Parent  48144fe2d912b7d9fc300955d0c881aceead6930
> > > revlog: fix many file handle opens to explicitly use bytes mode
> > >
> > > This broke on Python 3, but we've probably been getting lucky that
> > > this hasn't caused EOL corruption on Windows or something for
> > > years. :(
> > >
> > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > > --- a/mercurial/revlog.py
> > > +++ b/mercurial/revlog.py
> > > @@ -1467,8 +1467,8 @@ class revlog(object):
> > >
> > >          dfh = None
> > >          if not self._inline:
> > > -            dfh = self.opener(self.datafile, "a+")
> > > -        ifh = self.opener(self.indexfile, "a+",
> > checkambig=self._checkambig)
> > > +            dfh = self.opener(self.datafile, "a+b")
> > > +        ifh = self.opener(self.indexfile, "a+b",
> > checkambig=self._checkambig)
> >
> > 'b' shouldn't be needed since vfs adds 'b' unless text=True is explicitly
> > specified.
> >
> 
> That "text" argument always seemed weird to me because all it does is
> reinvent the wheel for the "mode" argument to open(). I'd prefer we
> deprecate the argument and use explicit mode values so the I/O code more
> closely aligns with what Python does. Principal of least surprise.

Perhaps it exists because 'b' is too subtle and unix programmers would easily
miss it. I'd rather get rid of the text mode at all for Py3 compatibility.
Optionally we can make vfs raise exception if 'b' isn't explicitly specified.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1467,8 +1467,8 @@  class revlog(object):
 
         dfh = None
         if not self._inline:
-            dfh = self.opener(self.datafile, "a+")
-        ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
+            dfh = self.opener(self.datafile, "a+b")
+        ifh = self.opener(self.indexfile, "a+b", checkambig=self._checkambig)
         try:
             return self._addrevision(node, text, transaction, link, p1, p2,
                                      flags, cachedelta, ifh, dfh)
@@ -1769,7 +1769,7 @@  class revlog(object):
         end = 0
         if r:
             end = self.end(r - 1)
-        ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
+        ifh = self.opener(self.indexfile, "a+b", checkambig=self._checkambig)
         isize = r * self._io.size
         if self._inline:
             transaction.add(self.indexfile, end + isize, r)
@@ -1777,7 +1777,7 @@  class revlog(object):
         else:
             transaction.add(self.indexfile, isize, r)
             transaction.add(self.datafile, end)
-            dfh = self.opener(self.datafile, "a+")
+            dfh = self.opener(self.datafile, "a+b")
         def flush():
             if dfh:
                 dfh.flush()
@@ -1846,8 +1846,8 @@  class revlog(object):
                     # addrevision switched from inline to conventional
                     # reopen the index
                     ifh.close()
-                    dfh = self.opener(self.datafile, "a+")
-                    ifh = self.opener(self.indexfile, "a+",
+                    dfh = self.opener(self.datafile, "a+b")
+                    ifh = self.opener(self.indexfile, "a+b",
                                       checkambig=self._checkambig)
         finally:
             if dfh:
@@ -2082,11 +2082,11 @@  class revlog(object):
                 if not cachedelta:
                     text = self.revision(rev)
 
-                ifh = destrevlog.opener(destrevlog.indexfile, 'a+',
+                ifh = destrevlog.opener(destrevlog.indexfile, 'a+b',
                                         checkambig=False)
                 dfh = None
                 if not destrevlog._inline:
-                    dfh = destrevlog.opener(destrevlog.datafile, 'a+')
+                    dfh = destrevlog.opener(destrevlog.datafile, 'a+b')
                 try:
                     destrevlog._addrevision(node, text, tr, linkrev, p1, p2,
                                             flags, cachedelta, ifh, dfh)