Patchwork [STABLE] revlog: seek to end of file before writing (issue4943)

login
register
mail settings
Submitter Gregory Szorc
Date Dec. 18, 2015, 1:17 a.m.
Message ID <5237adcc38525cc18434.1450401430@7.1.168.192.in-addr.arpa>
Download mbox | patch
Permalink /patch/12135/
State Superseded
Commit e240e914d2261788c3ba700401659c11ed820fe3
Headers show

Comments

Gregory Szorc - Dec. 18, 2015, 1:17 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1450401362 28800
#      Thu Dec 17 17:16:02 2015 -0800
# Branch stable
# Node ID 5237adcc38525cc184345ef7ae6464526e6651fb
# Parent  00aa37c65e0a365230938aed51d1b38b7cace135
revlog: seek to end of file before writing (issue4943)

Revlogs were recently refactored to open file handles in "a+" and use a
persistent file handle for reading and writing. This drastically
reduced the number of file handles being opened.

Unfortunately, it appears that some versions of Solaris lose the file
offset when performing a write after the handle has been seeked.

The simplest workaround is to seek to EOF on files opened in a+ mode
before writing to them, which is what this patch does.

Ideally, this code would exist in the vfs layer. However, this would
require creating a proxy class for file objects in order to provide a
custom implementation of write(). This would add overhead. Since
revlogs are the only files we open in a+ mode, the one-off workaround
in revlog.py should be sufficient.

This patch appears to have little to no impact on performance on my
Linux machine.
Matt Mackall - Dec. 18, 2015, 8:33 p.m.
On Thu, 2015-12-17 at 17:17 -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1450401362 28800
> #      Thu Dec 17 17:16:02 2015 -0800
> # Branch stable
> # Node ID 5237adcc38525cc184345ef7ae6464526e6651fb
> # Parent  00aa37c65e0a365230938aed51d1b38b7cace135
> revlog: seek to end of file before writing (issue4943)

Looks good, queued for stable with Adrian's tweak.
Adrian Buehlmann - Dec. 19, 2015, 8:43 a.m.
On 2015-12-18 21:33, Matt Mackall wrote:
> On Thu, 2015-12-17 at 17:17 -0800, Gregory Szorc wrote:
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1450401362 28800
>> #      Thu Dec 17 17:16:02 2015 -0800
>> # Branch stable
>> # Node ID 5237adcc38525cc184345ef7ae6464526e6651fb
>> # Parent  00aa37c65e0a365230938aed51d1b38b7cace135
>> revlog: seek to end of file before writing (issue4943)
> 
> Looks good, queued for stable with Adrian's tweak.

Looks like a tab character sneaked in on in-flight tweaking, on line
1441 in revlog.py, which has been reported as new issue 5019.
Matt Mackall - Dec. 20, 2015, 10:12 p.m.
On Sat, 2015-12-19 at 09:43 +0100, Adrian Buehlmann wrote:
> On 2015-12-18 21:33, Matt Mackall wrote:
> > On Thu, 2015-12-17 at 17:17 -0800, Gregory Szorc wrote:
> > > # HG changeset patch
> > > # User Gregory Szorc <gregory.szorc@gmail.com>
> > > # Date 1450401362 28800
> > > #      Thu Dec 17 17:16:02 2015 -0800
> > > # Branch stable
> > > # Node ID 5237adcc38525cc184345ef7ae6464526e6651fb
> > > # Parent  00aa37c65e0a365230938aed51d1b38b7cace135
> > > revlog: seek to end of file before writing (issue4943)
> > 
> > Looks good, queued for stable with Adrian's tweak.
> 
> Looks like a tab character sneaked in on in-flight tweaking, on line
> 1441 in revlog.py, which has been reported as new issue 5019.

Yeah, this was introduced by editing the patch in Emacs with electric
tabs. I caught this in the merge, then went and tried to find the tab
in stable with hg export | cat -v. Note to self: you need cat -T.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -12,8 +12,9 @@  and O(changes) merge between branches.
 """
 
 # import stuff from node for others to import from revlog
 import collections
+import os
 from node import bin, hex, nullid, nullrev
 from i18n import _
 import ancestor, mdiff, parsers, error, util, templatefilters
 import struct, zlib, errno
@@ -1425,10 +1426,23 @@  class revlog(object):
         self._basecache = (curr, chainbase)
         return node
 
     def _writeentry(self, transaction, ifh, dfh, entry, data, link, offset):
+        # Files opened in a+ mode have inconsistent behavior on various
+        # platforms. Windows requires that a file positioning call be made
+        # when the file handle transitions between reads and writes. See
+        # 3686fa2b8eee and the mixedfilemodewrapper in windows.py. On other
+        # platforms, Python or the platform itself can be buggy. Some versions
+        # of Solaris have been observed to not append at the end of the file
+        # if the file was seeked to before the end. See issue4943 for more.
+        #
+        # We work around this issue by inserting a seek() before writing.
+        # Note: This is likely not necessary on Python 3.
+        ifh.seek(0, os.SEEK_END)
+
         curr = len(self) - 1
         if not self._inline:
+            dfh.seek(0, os.SEEK_END)
             transaction.add(self.datafile, offset)
             transaction.add(self.indexfile, curr * len(entry))
             if data[0]:
                 dfh.write(data[0])