Submitter | Matt Mackall |
---|---|
Date | Nov. 30, 2015, 10:23 p.m. |
Message ID | <1448922205.6589.56.camel@selenic.com> |
Download | mbox | patch |
Permalink | /patch/11686/ |
State | Not Applicable |
Headers | show |
Comments
On Mon, Nov 30, 2015 at 04:23:25PM -0600, Matt Mackall wrote: > On Thu, 2015-11-26 at 18:10 -0800, Gregory Szorc wrote: > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1447625312 28800 > > # Sun Nov 15 14:08:32 2015 -0800 > > # Branch stable > > # Node ID a7d5be598b347ae2c26566aa2bcdcf5cb1acb30b > > # Parent 6979fe2a6d75105affcacd9e298262a92641cb98 > > posix: insert seek between reads and writes on Solaris (issue4943) > > > > At least some versions of Solaris fail to properly write to file > > descriptors opened for both reading and writing. When the descriptor > > is seeked and read, subsequent writes effectively disappear into a > > black hole. > > A quick check of all the "a+" users suggests this is restricted to > revlog writing. So I think it'd be much simpler to do: > > diff -r 61fbf5dc12b2 mercurial/scmutil.py > --- a/mercurial/scmutil.py Tue Nov 24 21:41:12 2015 +0000 > +++ b/mercurial/scmutil.py Mon Nov 30 16:21:33 2015 -0600 > @@ -516,6 +516,12 @@ > fp = util.posixfile(f, mode) > if nlink == 0: > self._fixfilemode(f) > + > + # The position when opening in append mode is implementation defined, so > + # make it consistent across platforms by positioning at EOF > + if mode in ('a', 'a+'): > + fp.seek(0, os.SEEK_END) > + > return fp > > def symlink(self, src, dst): > diff -r 61fbf5dc12b2 mercurial/windows.py > --- a/mercurial/windows.py Tue Nov 24 21:41:12 2015 +0000 > +++ b/mercurial/windows.py Mon Nov 30 16:21:33 2015 -0600 > @@ -99,12 +99,6 @@ > '''Open a file with even more POSIX-like semantics''' > try: > fp = osutil.posixfile(name, mode, buffering) # may raise WindowsError > - > - # The position when opening in append mode is implementation defined, so > - # make it consistent with other platforms, which position at EOF. > - if 'a' in mode: > - fp.seek(0, os.SEEK_END) > - > if '+' in mode: > return mixedfilemodewrapper(fp) > > Thoughts? Scares me less than the alternatives. Send a patch for stable? > -- > Mathematics is the supreme nostalgia of our time. > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel
On Mon, Nov 30, 2015 at 2:23 PM, Matt Mackall <mpm@selenic.com> wrote: > On Thu, 2015-11-26 at 18:10 -0800, Gregory Szorc wrote: > > # HG changeset patch > > # User Gregory Szorc <gregory.szorc@gmail.com> > > # Date 1447625312 28800 > > # Sun Nov 15 14:08:32 2015 -0800 > > # Branch stable > > # Node ID a7d5be598b347ae2c26566aa2bcdcf5cb1acb30b > > # Parent 6979fe2a6d75105affcacd9e298262a92641cb98 > > posix: insert seek between reads and writes on Solaris (issue4943) > > > > At least some versions of Solaris fail to properly write to file > > descriptors opened for both reading and writing. When the descriptor > > is seeked and read, subsequent writes effectively disappear into a > > black hole. > > A quick check of all the "a+" users suggests this is restricted to > revlog writing. So I think it'd be much simpler to do: > > diff -r 61fbf5dc12b2 mercurial/scmutil.py > --- a/mercurial/scmutil.py Tue Nov 24 21:41:12 2015 +0000 > +++ b/mercurial/scmutil.py Mon Nov 30 16:21:33 2015 -0600 > @@ -516,6 +516,12 @@ > fp = util.posixfile(f, mode) > if nlink == 0: > self._fixfilemode(f) > + > + # The position when opening in append mode is implementation > defined, so > + # make it consistent across platforms by positioning at EOF > + if mode in ('a', 'a+'): > + fp.seek(0, os.SEEK_END) > + > return fp > > def symlink(self, src, dst): > diff -r 61fbf5dc12b2 mercurial/windows.py > --- a/mercurial/windows.py Tue Nov 24 21:41:12 2015 +0000 > +++ b/mercurial/windows.py Mon Nov 30 16:21:33 2015 -0600 > @@ -99,12 +99,6 @@ > '''Open a file with even more POSIX-like semantics''' > try: > fp = osutil.posixfile(name, mode, buffering) # may raise > WindowsError > - > - # The position when opening in append mode is implementation > defined, so > - # make it consistent with other platforms, which position at EOF. > - if 'a' in mode: > - fp.seek(0, os.SEEK_END) > - > if '+' in mode: > return mixedfilemodewrapper(fp) > > Thoughts? My patch is about inserting extra seeks between reads and writes: yours is about seeking when opening in append mode. Apples and oranges. Did you intend to suggest inserting the seek to EOF on every write() to a file opened in append mode? That will blow up the number of system calls during writing. It probably doesn't matter. But it feels wasteful on the platforms where it isn't necessary.
On Mon, 2015-11-30 at 15:28 -0800, Gregory Szorc wrote: > On Mon, Nov 30, 2015 at 2:23 PM, Matt Mackall <mpm@selenic.com> > wrote: > > > On Thu, 2015-11-26 at 18:10 -0800, Gregory Szorc wrote: > > > # HG changeset patch > > > # User Gregory Szorc <gregory.szorc@gmail.com> > > > # Date 1447625312 28800 > > > # Sun Nov 15 14:08:32 2015 -0800 > > > # Branch stable > > > # Node ID a7d5be598b347ae2c26566aa2bcdcf5cb1acb30b > > > # Parent 6979fe2a6d75105affcacd9e298262a92641cb98 > > > posix: insert seek between reads and writes on Solaris > > > (issue4943) > > > > > > At least some versions of Solaris fail to properly write to file > > > descriptors opened for both reading and writing. When the > > > descriptor > > > is seeked and read, subsequent writes effectively disappear into > > > a > > > black hole. > > > > A quick check of all the "a+" users suggests this is restricted to > > revlog writing. So I think it'd be much simpler to do: > > > > diff -r 61fbf5dc12b2 mercurial/scmutil.py > > --- a/mercurial/scmutil.py Tue Nov 24 21:41:12 2015 +0000 > > +++ b/mercurial/scmutil.py Mon Nov 30 16:21:33 2015 -0600 > > @@ -516,6 +516,12 @@ > > fp = util.posixfile(f, mode) > > if nlink == 0: > > self._fixfilemode(f) > > + > > + # The position when opening in append mode is > > implementation > > defined, so > > + # make it consistent across platforms by positioning at > > EOF > > + if mode in ('a', 'a+'): > > + fp.seek(0, os.SEEK_END) > > + > > return fp > > > > def symlink(self, src, dst): > > diff -r 61fbf5dc12b2 mercurial/windows.py > > --- a/mercurial/windows.py Tue Nov 24 21:41:12 2015 +0000 > > +++ b/mercurial/windows.py Mon Nov 30 16:21:33 2015 -0600 > > @@ -99,12 +99,6 @@ > > '''Open a file with even more POSIX-like semantics''' > > try: > > fp = osutil.posixfile(name, mode, buffering) # may raise > > WindowsError > > - > > - # The position when opening in append mode is > > implementation > > defined, so > > - # make it consistent with other platforms, which position > > at EOF. > > - if 'a' in mode: > > - fp.seek(0, os.SEEK_END) > > - > > if '+' in mode: > > return mixedfilemodewrapper(fp) > > > > Thoughts? > > > My patch is about inserting extra seeks between reads and writes: > yours is > about seeking when opening in append mode. Apples and oranges. Indeed! > Did you intend to suggest inserting the seek to EOF on every write() > to a > file opened in append mode? That will blow up the number of system > calls > during writing. It probably doesn't matter. But it feels wasteful on > the > platforms where it isn't necessary. Problem is: we don't know where it isn't necessary. In fact, I'm now pretty unclear on what the problem is. The standard problem here is that in POSIX: fh = open("foo", O_APPEND) ..does not specify where the file pointer is (and it's known to vary from system to system), but Posix DOES specify that all writes first reposition the pointer to the end of file. If Solaris isn't doing that, we've got big problems. But I suspect what's actually happening here is not related to the kernel's handling of the file offset at all, but is in fact connected to Python's own internal I/O buffering. In particular, Python has a single read and write buffer that it tries to manage around seeking: (bufferedio.c) /* Implementation notes: * BufferedReader, BufferedWriter and BufferedRandom try to share most methods (this is helped by the members `readable` and `writable`, which are initialized in the respective constructors) * They also share a single buffer for reading and writing. This enables interleaved reads and writes without flushing. It also makes the logic a bit trickier to get right. * The absolute position of the raw stream is cached, if possible, in the `abs_pos` member. It must be updated every time an operation is done on the raw stream. If not sure, it can be reinitialized by calling _buffered_raw_tell(), which queries the raw stream (_buffered_raw_seek() also does it). To read it, use RAW_TELL(). It looks like there are at least 2 bugs here that affect versions we care about: changeset: 71984:cf2010e9f941 branch: 2.7 parent: 71979:c816479f6aaf user: Antoine Pitrou <solipsis@pitrou.net> date: Sat Aug 20 15:40:58 2011 +0200 summary: Issue #12213: Fix a buffering bug with interleaved reads and writes that changeset: 70060:0d24d4c537a6 branch: 2.7 parent: 70051:534a9e274d88 user: Antoine Pitrou <solipsis@pitrou.net> date: Fri May 13 00:31:52 2011 +0200 summary: Issue #12062: In the `io` module, fix a flushing bug when doing a certain Looks like both of these can trigger if reads and writes occur at different positions inside the same 8k buffer window. Antoine is attached to both of these, so maybe he knows what's up.
On Mon, Nov 30, 2015 at 4:35 PM, Matt Mackall <mpm@selenic.com> wrote: > On Mon, 2015-11-30 at 15:28 -0800, Gregory Szorc wrote: > > On Mon, Nov 30, 2015 at 2:23 PM, Matt Mackall <mpm@selenic.com> > > wrote: > > > > > On Thu, 2015-11-26 at 18:10 -0800, Gregory Szorc wrote: > > > > # HG changeset patch > > > > # User Gregory Szorc <gregory.szorc@gmail.com> > > > > # Date 1447625312 28800 > > > > # Sun Nov 15 14:08:32 2015 -0800 > > > > # Branch stable > > > > # Node ID a7d5be598b347ae2c26566aa2bcdcf5cb1acb30b > > > > # Parent 6979fe2a6d75105affcacd9e298262a92641cb98 > > > > posix: insert seek between reads and writes on Solaris > > > > (issue4943) > > > > > > > > At least some versions of Solaris fail to properly write to file > > > > descriptors opened for both reading and writing. When the > > > > descriptor > > > > is seeked and read, subsequent writes effectively disappear into > > > > a > > > > black hole. > > > > > > A quick check of all the "a+" users suggests this is restricted to > > > revlog writing. So I think it'd be much simpler to do: > > > > > > diff -r 61fbf5dc12b2 mercurial/scmutil.py > > > --- a/mercurial/scmutil.py Tue Nov 24 21:41:12 2015 +0000 > > > +++ b/mercurial/scmutil.py Mon Nov 30 16:21:33 2015 -0600 > > > @@ -516,6 +516,12 @@ > > > fp = util.posixfile(f, mode) > > > if nlink == 0: > > > self._fixfilemode(f) > > > + > > > + # The position when opening in append mode is > > > implementation > > > defined, so > > > + # make it consistent across platforms by positioning at > > > EOF > > > + if mode in ('a', 'a+'): > > > + fp.seek(0, os.SEEK_END) > > > + > > > return fp > > > > > > def symlink(self, src, dst): > > > diff -r 61fbf5dc12b2 mercurial/windows.py > > > --- a/mercurial/windows.py Tue Nov 24 21:41:12 2015 +0000 > > > +++ b/mercurial/windows.py Mon Nov 30 16:21:33 2015 -0600 > > > @@ -99,12 +99,6 @@ > > > '''Open a file with even more POSIX-like semantics''' > > > try: > > > fp = osutil.posixfile(name, mode, buffering) # may raise > > > WindowsError > > > - > > > - # The position when opening in append mode is > > > implementation > > > defined, so > > > - # make it consistent with other platforms, which position > > > at EOF. > > > - if 'a' in mode: > > > - fp.seek(0, os.SEEK_END) > > > - > > > if '+' in mode: > > > return mixedfilemodewrapper(fp) > > > > > > Thoughts? > > > > > > My patch is about inserting extra seeks between reads and writes: > > yours is > > about seeking when opening in append mode. Apples and oranges. > > Indeed! > > > Did you intend to suggest inserting the seek to EOF on every write() > > to a > > file opened in append mode? That will blow up the number of system > > calls > > during writing. It probably doesn't matter. But it feels wasteful on > > the > > platforms where it isn't necessary. > > Problem is: we don't know where it isn't necessary. > > In fact, I'm now pretty unclear on what the problem is. The standard > problem here is that in POSIX: > > fh = open("foo", O_APPEND) > > ..does not specify where the file pointer is (and it's known to vary > from system to system), but Posix DOES specify that all writes first > reposition the pointer to the end of file. > > If Solaris isn't doing that, we've got big problems. But I suspect > what's actually happening here is not related to the kernel's handling > of the file offset at all, but is in fact connected to Python's own > internal I/O buffering. In particular, Python has a single read and > write buffer that it tries to manage around seeking: > > (bufferedio.c) > > /* > Implementation > notes: > > > * BufferedReader, BufferedWriter and BufferedRandom try to share > most > methods (this is helped by the members `readable` and `writable`, > which > are initialized in the respective > constructors) > * They also share a single buffer for reading and writing. This > enables > interleaved reads and writes without flushing. It also makes the > logic > a bit trickier to get > right. > * The absolute position of the raw stream is cached, if possible, in > the > `abs_pos` member. It must be updated every time an operation is > done > on the raw stream. If not sure, it can be reinitialized by > calling > _buffered_raw_tell(), which queries the raw stream > (_buffered_raw_seek() > also does it). To read it, use RAW_TELL(). > > > It looks like there are at least 2 bugs here that affect versions we care > about: > > changeset: 71984:cf2010e9f941 > branch: 2.7 > parent: 71979:c816479f6aaf > user: Antoine Pitrou <solipsis@pitrou.net> > date: Sat Aug 20 15:40:58 2011 +0200 > summary: Issue #12213: Fix a buffering bug with interleaved reads and > writes that > > changeset: 70060:0d24d4c537a6 > branch: 2.7 > parent: 70051:534a9e274d88 > user: Antoine Pitrou <solipsis@pitrou.net> > date: Fri May 13 00:31:52 2011 +0200 > summary: Issue #12062: In the `io` module, fix a flushing bug when > doing a certain > > Looks like both of these can trigger if reads and writes occur at > different positions inside the same 8k buffer window. > > Antoine is attached to both of these, so maybe he knows what's up. > See also (from hg's repo): changeset: 36484:3686fa2b8eee user: Gregory Szorc <gregory.szorc@gmail.com> date: Sun Sep 27 18:46:53 2015 -0700 summary: windows: insert file positioning call between reads and writes Also worth repeating what's in the comments in Mercurial's bug tracker ( https://bz.mercurial-scm.org/show_bug.cgi?id=4943) that this issue seems to go away on Python 3.0. I'm pretty sure CPython is automatically inserting the extra seek() call on that Python version to work around platform behavior differences.
On Mon, 30 Nov 2015 17:18:12 -0800, Gregory Szorc wrote: > On Mon, Nov 30, 2015 at 4:35 PM, Matt Mackall <mpm@selenic.com> wrote: > > On Mon, 2015-11-30 at 15:28 -0800, Gregory Szorc wrote: > > > Did you intend to suggest inserting the seek to EOF on every write() > > > to a > > > file opened in append mode? That will blow up the number of system > > > calls > > > during writing. It probably doesn't matter. But it feels wasteful on > > > the > > > platforms where it isn't necessary. > > > > Problem is: we don't know where it isn't necessary. > > > > > In fact, I'm now pretty unclear on what the problem is. The standard > > problem here is that in POSIX: > > > > fh = open("foo", O_APPEND) > > > > ..does not specify where the file pointer is (and it's known to vary > > from system to system), but Posix DOES specify that all writes first > > reposition the pointer to the end of file. > > > > If Solaris isn't doing that, we've got big problems. But I suspect > > what's actually happening here is not related to the kernel's handling > > of the file offset at all, but is in fact connected to Python's own > > internal I/O buffering. In particular, Python has a single read and > > write buffer that it tries to manage around seeking: > > > > (bufferedio.c) > > > > /* > > Implementation > > notes: > > > > > > * BufferedReader, BufferedWriter and BufferedRandom try to share > > most > > methods (this is helped by the members `readable` and `writable`, > > which > > are initialized in the respective > > constructors) > > * They also share a single buffer for reading and writing. This > > enables > > interleaved reads and writes without flushing. It also makes the > > logic > > a bit trickier to get > > right. > > * The absolute position of the raw stream is cached, if possible, in > > the > > `abs_pos` member. It must be updated every time an operation is > > done > > on the raw stream. If not sure, it can be reinitialized by > > calling > > _buffered_raw_tell(), which queries the raw stream > > (_buffered_raw_seek() > > also does it). To read it, use RAW_TELL(). > > > > > > It looks like there are at least 2 bugs here that affect versions we care > > about: > > > > changeset: 71984:cf2010e9f941 > > branch: 2.7 > > parent: 71979:c816479f6aaf > > user: Antoine Pitrou <solipsis@pitrou.net> > > date: Sat Aug 20 15:40:58 2011 +0200 > > summary: Issue #12213: Fix a buffering bug with interleaved reads and > > writes that > > > > changeset: 70060:0d24d4c537a6 > > branch: 2.7 > > parent: 70051:534a9e274d88 > > user: Antoine Pitrou <solipsis@pitrou.net> > > date: Fri May 13 00:31:52 2011 +0200 > > summary: Issue #12062: In the `io` module, fix a flushing bug when > > doing a certain > > > > Looks like both of these can trigger if reads and writes occur at > > different positions inside the same 8k buffer window. > > > > Antoine is attached to both of these, so maybe he knows what's up. > > > > See also (from hg's repo): > > changeset: 36484:3686fa2b8eee > user: Gregory Szorc <gregory.szorc@gmail.com> > date: Sun Sep 27 18:46:53 2015 -0700 > summary: windows: insert file positioning call between reads and writes > > Also worth repeating what's in the comments in Mercurial's bug tracker ( > https://bz.mercurial-scm.org/show_bug.cgi?id=4943) that this issue seems to > go away on Python 3.0. I'm pretty sure CPython is automatically inserting > the extra seek() call on that Python version to work around platform > behavior differences. IIRC, Python2 uses a plain old stdio, but Python3 has own buffering mechanism called "New I/O". I think the bugs Matt pointed out affect only on the New I/O.
On Tue, 1 Dec 2015 21:39:44 +0900 Yuya Nishihara <yuya@tcha.org> wrote: > > > > See also (from hg's repo): > > > > changeset: 36484:3686fa2b8eee > > user: Gregory Szorc <gregory.szorc@gmail.com> > > date: Sun Sep 27 18:46:53 2015 -0700 > > summary: windows: insert file positioning call between reads and writes > > > > Also worth repeating what's in the comments in Mercurial's bug tracker ( > > https://bz.mercurial-scm.org/show_bug.cgi?id=4943) that this issue seems to > > go away on Python 3.0. I'm pretty sure CPython is automatically inserting > > the extra seek() call on that Python version to work around platform > > behavior differences. > > IIRC, Python2 uses a plain old stdio, but Python3 has own buffering mechanism > called "New I/O". I think the bugs Matt pointed out affect only on the New I/O. Yes, exactly. "bufferedio.c" is Python 3's new I/O stack, available in Python 2 in the "io" module, but standard open() in Python 2 is still the old fopen()-based implementation. Python 3 tries to workaround platform differences by providing consistent semantics -- which is also why it doesn't rely on the libc's buffered I/O layer anymore. (I haven't looked at the Mercurial source code lately, but I suppose it's unlikely to use the new "io" module) Note there's one other well-known issue with writes in appending mode: https://bugs.python.org/issue15723 Regards Antoine.
So what is the state of this patch. Is ther any chance that we eventually take it as is or are we going to use a modified approach anyway? On 12/01/2015 04:54 AM, Antoine Pitrou wrote: > On Tue, 1 Dec 2015 21:39:44 +0900 > Yuya Nishihara <yuya@tcha.org> wrote: >>> >>> See also (from hg's repo): >>> >>> changeset: 36484:3686fa2b8eee >>> user: Gregory Szorc <gregory.szorc@gmail.com> >>> date: Sun Sep 27 18:46:53 2015 -0700 >>> summary: windows: insert file positioning call between reads and writes >>> >>> Also worth repeating what's in the comments in Mercurial's bug tracker ( >>> https://bz.mercurial-scm.org/show_bug.cgi?id=4943) that this issue seems to >>> go away on Python 3.0. I'm pretty sure CPython is automatically inserting >>> the extra seek() call on that Python version to work around platform >>> behavior differences. >> >> IIRC, Python2 uses a plain old stdio, but Python3 has own buffering mechanism >> called "New I/O". I think the bugs Matt pointed out affect only on the New I/O. > > Yes, exactly. "bufferedio.c" is Python 3's new I/O stack, available in > Python 2 in the "io" module, but standard open() in Python 2 is still > the old fopen()-based implementation. Python 3 tries to workaround > platform differences by providing consistent semantics -- which is > also why it doesn't rely on the libc's buffered I/O layer anymore. > > (I haven't looked at the Mercurial source code lately, but I suppose > it's unlikely to use the new "io" module) > > Note there's one other well-known issue with writes in appending mode: > https://bugs.python.org/issue15723 > > Regards > > Antoine. > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel >
On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > So what is the state of this patch. > > Is ther any chance that we eventually take it as is or are we going to use > a modified approach anyway? Some Solaris distributions are definitely busted. I believe mpm was investigating the scope of the bustage (OS-level POSIX semantics/bug versus Python behavior/bug). > > > On 12/01/2015 04:54 AM, Antoine Pitrou wrote: > >> On Tue, 1 Dec 2015 21:39:44 +0900 >> Yuya Nishihara <yuya@tcha.org> wrote: >> >>> >>>> See also (from hg's repo): >>>> >>>> changeset: 36484:3686fa2b8eee >>>> user: Gregory Szorc <gregory.szorc@gmail.com> >>>> date: Sun Sep 27 18:46:53 2015 -0700 >>>> summary: windows: insert file positioning call between reads and >>>> writes >>>> >>>> Also worth repeating what's in the comments in Mercurial's bug tracker ( >>>> https://bz.mercurial-scm.org/show_bug.cgi?id=4943) that this issue >>>> seems to >>>> go away on Python 3.0. I'm pretty sure CPython is automatically >>>> inserting >>>> the extra seek() call on that Python version to work around platform >>>> behavior differences. >>>> >>> >>> IIRC, Python2 uses a plain old stdio, but Python3 has own buffering >>> mechanism >>> called "New I/O". I think the bugs Matt pointed out affect only on the >>> New I/O. >>> >> >> Yes, exactly. "bufferedio.c" is Python 3's new I/O stack, available in >> Python 2 in the "io" module, but standard open() in Python 2 is still >> the old fopen()-based implementation. Python 3 tries to workaround >> platform differences by providing consistent semantics -- which is >> also why it doesn't rely on the libc's buffered I/O layer anymore. >> >> (I haven't looked at the Mercurial source code lately, but I suppose >> it's unlikely to use the new "io" module) >> >> Note there's one other well-known issue with writes in appending mode: >> https://bugs.python.org/issue15723 >> >> Regards >> >> Antoine. >> _______________________________________________ >> Mercurial-devel mailing list >> Mercurial-devel@selenic.com >> https://selenic.com/mailman/listinfo/mercurial-devel >> >> > -- > Pierre-Yves David > > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel >
On Sat, 2015-12-05 at 21:26 -0800, Gregory Szorc wrote: > On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David < > pierre-yves.david@ens-lyon.org> wrote: > > > So what is the state of this patch. > > > > Is ther any chance that we eventually take it as is or are we going > > to use > > a modified approach anyway? > > > Some Solaris distributions are definitely busted. I believe mpm was > investigating the scope of the bustage (OS-level POSIX semantics/bug > versus > Python behavior/bug). As best I can tell from poring over the reports, it's a bug in the stdio buffering layer in the C library and not in Python or the kernel. To confirm that, we'd really need to write a pure C test case to mirror Greg's minimal Python test. The fact that we have the same bug in both Windows and (some) Solaris makes me pretty anxious that it's more widespread and we're opening ourselves up to corruption more widely. Because the buffer is a complicated read/write cache that emulates the file offset and the rules around writes in "a+" mode are a bit subtle, it seems likely to me that bugs in this code could be somewhat widespread. If we look at Py3's rewrite of this code in the io module, it has indeed had a few bugs in this area of its own. So I'm frankly leaning towards just inserting a seek in the revlog code to be safe. -- Mathematics is the supreme nostalgia of our time.
On 12/07/2015 12:24 PM, Matt Mackall wrote: > On Sat, 2015-12-05 at 21:26 -0800, Gregory Szorc wrote: >> On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David < >> pierre-yves.david@ens-lyon.org> wrote: >> >>> So what is the state of this patch. >>> >>> Is ther any chance that we eventually take it as is or are we going >>> to use >>> a modified approach anyway? >> >> >> Some Solaris distributions are definitely busted. I believe mpm was >> investigating the scope of the bustage (OS-level POSIX semantics/bug >> versus >> Python behavior/bug). > > As best I can tell from poring over the reports, it's a bug in the > stdio buffering layer in the C library and not in Python or the kernel. > To confirm that, we'd really need to write a pure C test case to mirror > Greg's minimal Python test. > > The fact that we have the same bug in both Windows and (some) Solaris > makes me pretty anxious that it's more widespread and we're opening > ourselves up to corruption more widely. > > Because the buffer is a complicated read/write cache that emulates the > file offset and the rules around writes in "a+" mode are a bit subtle, > it seems likely to me that bugs in this code could be somewhat > widespread. If we look at Py3's rewrite of this code in the io module, > it has indeed had a few bugs in this area of its own. > > So I'm frankly leaning towards just inserting a seek in the revlog code > to be safe. You mean doing the seek in all cases/OSes, right. This seems reasonable (modulo proper documentation) What would be the downside of it? Is there an expected performance hit?
On Mon, 2015-12-07 at 13:27 -0800, Pierre-Yves David wrote: > > On 12/07/2015 12:24 PM, Matt Mackall wrote: > > On Sat, 2015-12-05 at 21:26 -0800, Gregory Szorc wrote: > > > On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David < > > > pierre-yves.david@ens-lyon.org> wrote: > > > > > > > So what is the state of this patch. > > > > > > > > Is ther any chance that we eventually take it as is or are we > > > > going > > > > to use > > > > a modified approach anyway? > > > > > > > > > Some Solaris distributions are definitely busted. I believe mpm > > > was > > > investigating the scope of the bustage (OS-level POSIX > > > semantics/bug > > > versus > > > Python behavior/bug). > > > > As best I can tell from poring over the reports, it's a bug in the > > stdio buffering layer in the C library and not in Python or the > > kernel. > > To confirm that, we'd really need to write a pure C test case to > > mirror > > Greg's minimal Python test. > > > > The fact that we have the same bug in both Windows and (some) > > Solaris > > makes me pretty anxious that it's more widespread and we're opening > > ourselves up to corruption more widely. > > > > Because the buffer is a complicated read/write cache that emulates > > the > > file offset and the rules around writes in "a+" mode are a bit > > subtle, > > it seems likely to me that bugs in this code could be somewhat > > widespread. If we look at Py3's rewrite of this code in the io > > module, > > it has indeed had a few bugs in this area of its own. > > > > So I'm frankly leaning towards just inserting a seek in the revlog > > code > > to be safe. > > You mean doing the seek in all cases/OSes, right. This seems > reasonable > (modulo proper documentation) > > What would be the downside of it? Is there an expected performance > hit? Unknown, but probably minimal. The seek still probably goes through the stdio buffer code.. and just injects some sanity. Correctly-written buffer code can potentially still elide sending the seek onto the kernel. And even if it doesn't.. a simple syscall like this on Linux is actually pretty similar in speed to a Python function call.
On 12/07/2015 02:01 PM, Matt Mackall wrote: > On Mon, 2015-12-07 at 13:27 -0800, Pierre-Yves David wrote: >> >> On 12/07/2015 12:24 PM, Matt Mackall wrote: >>> On Sat, 2015-12-05 at 21:26 -0800, Gregory Szorc wrote: >>>> On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David < >>>> pierre-yves.david@ens-lyon.org> wrote: >>>> >>>>> So what is the state of this patch. >>>>> >>>>> Is ther any chance that we eventually take it as is or are we >>>>> going >>>>> to use >>>>> a modified approach anyway? >>>> >>>> >>>> Some Solaris distributions are definitely busted. I believe mpm >>>> was >>>> investigating the scope of the bustage (OS-level POSIX >>>> semantics/bug >>>> versus >>>> Python behavior/bug). >>> >>> As best I can tell from poring over the reports, it's a bug in the >>> stdio buffering layer in the C library and not in Python or the >>> kernel. >>> To confirm that, we'd really need to write a pure C test case to >>> mirror >>> Greg's minimal Python test. >>> >>> The fact that we have the same bug in both Windows and (some) >>> Solaris >>> makes me pretty anxious that it's more widespread and we're opening >>> ourselves up to corruption more widely. >>> >>> Because the buffer is a complicated read/write cache that emulates >>> the >>> file offset and the rules around writes in "a+" mode are a bit >>> subtle, >>> it seems likely to me that bugs in this code could be somewhat >>> widespread. If we look at Py3's rewrite of this code in the io >>> module, >>> it has indeed had a few bugs in this area of its own. >>> >>> So I'm frankly leaning towards just inserting a seek in the revlog >>> code >>> to be safe. >> >> You mean doing the seek in all cases/OSes, right. This seems >> reasonable >> (modulo proper documentation) >> >> What would be the downside of it? Is there an expected performance >> hit? > > Unknown, but probably minimal. The seek still probably goes through the > stdio buffer code.. and just injects some sanity. Correctly-written > buffer code can potentially still elide sending the seek onto the > kernel. And even if it doesn't.. a simple syscall like this on Linux is > actually pretty similar in speed to a Python function call. So any object to just doing it?
On Mon, Dec 07, 2015 at 02:06:56PM -0800, Pierre-Yves David wrote: > On 12/07/2015 02:01 PM, Matt Mackall wrote: > >On Mon, 2015-12-07 at 13:27 -0800, Pierre-Yves David wrote: > >> > >>On 12/07/2015 12:24 PM, Matt Mackall wrote: > >>>On Sat, 2015-12-05 at 21:26 -0800, Gregory Szorc wrote: > >>>>On Fri, Dec 4, 2015 at 6:51 PM, Pierre-Yves David < > >>>>pierre-yves.david@ens-lyon.org> wrote: > >>>So I'm frankly leaning towards just inserting a seek in the revlog > >>>code > >>>to be safe. > >> > >>You mean doing the seek in all cases/OSes, right. This seems > >>reasonable > >>(modulo proper documentation) > >> > >>What would be the downside of it? Is there an expected performance > >>hit? > > > >Unknown, but probably minimal. The seek still probably goes through the > >stdio buffer code.. and just injects some sanity. Correctly-written > >buffer code can potentially still elide sending the seek onto the > >kernel. And even if it doesn't.. a simple syscall like this on Linux is > >actually pretty similar in speed to a Python function call. > > So any object to just doing it? I'm in favor of just inserting the seek in the revlog code and doing an out-of-schedule release to fix solaris folks. > > -- > Pierre-Yves David > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@selenic.com > https://selenic.com/mailman/listinfo/mercurial-devel
Patch
diff -r 61fbf5dc12b2 mercurial/scmutil.py --- a/mercurial/scmutil.py Tue Nov 24 21:41:12 2015 +0000 +++ b/mercurial/scmutil.py Mon Nov 30 16:21:33 2015 -0600 @@ -516,6 +516,12 @@ fp = util.posixfile(f, mode) if nlink == 0: self._fixfilemode(f) + + # The position when opening in append mode is implementation defined, so + # make it consistent across platforms by positioning at EOF + if mode in ('a', 'a+'): + fp.seek(0, os.SEEK_END) + return fp def symlink(self, src, dst): diff -r 61fbf5dc12b2 mercurial/windows.py --- a/mercurial/windows.py Tue Nov 24 21:41:12 2015 +0000 +++ b/mercurial/windows.py Mon Nov 30 16:21:33 2015 -0600 @@ -99,12 +99,6 @@ '''Open a file with even more POSIX-like semantics''' try: fp = osutil.posixfile(name, mode, buffering) # may raise WindowsError - - # The position when opening in append mode is implementation defined, so - # make it consistent with other platforms, which position at EOF. - if 'a' in mode: - fp.seek(0, os.SEEK_END) - if '+' in mode: return mixedfilemodewrapper(fp)