Submitter | Jun Wu |
---|---|
Date | Nov. 14, 2016, 11:35 p.m. |
Message ID | <d1637df5ffd91e95da25.1479166555@x1c> |
Download | mbox | patch |
Permalink | /patch/17571/ |
State | Accepted |
Headers | show |
Comments
On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1479165246 0 > # Mon Nov 14 23:14:06 2016 +0000 > # Node ID d1637df5ffd91e95da25213f06f346adb3269ace > # Parent 8097b8c66952003addd5b683a14265c1d3cc201f > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 > patch: migrate to util.iterfile Not all read()s are interruptible by signals. I don't remember the detail, but signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a (local) disk is not a slow device according to this definition; I/O operations on disk devices are not interrupted by signals." Since readline() doesn't have cache in CPython layer, it would be slightly slower than fp.__iter__().
Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900: > On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1479165246 0 > > # Mon Nov 14 23:14:06 2016 +0000 > > # Node ID d1637df5ffd91e95da25213f06f346adb3269ace > > # Parent 8097b8c66952003addd5b683a14265c1d3cc201f > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 > > patch: migrate to util.iterfile > > Not all read()s are interruptible by signals. I don't remember the detail, but > signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a > (local) disk is not a slow device according to this definition; I/O operations > on disk devices are not interrupted by signals." Good to know! I guess modern OS have similar decisions, although POSIX seems to be not helpful here. > Since readline() doesn't have cache in CPython layer, it would be slightly > slower than fp.__iter__(). A quick benchmark shows it's ~4x slower. Maybe make the function smarter to only use the workaround for a pipe? if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode): return iter(fp.readline, '') else: return fp
On 11/15/2016 01:38 PM, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900: >> On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: >>> # HG changeset patch >>> # User Jun Wu <quark@fb.com> >>> # Date 1479165246 0 >>> # Mon Nov 14 23:14:06 2016 +0000 >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace >>> # Parent 8097b8c66952003addd5b683a14265c1d3cc201f >>> # Available At https://bitbucket.org/quark-zju/hg-draft >>> # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 >>> patch: migrate to util.iterfile >> >> Not all read()s are interruptible by signals. I don't remember the detail, but >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a >> (local) disk is not a slow device according to this definition; I/O operations >> on disk devices are not interrupted by signals." > > Good to know! I guess modern OS have similar decisions, although POSIX seems > to be not helpful here. > >> Since readline() doesn't have cache in CPython layer, it would be slightly >> slower than fp.__iter__(). > > A quick benchmark shows it's ~4x slower. > > Maybe make the function smarter to only use the workaround for a pipe? > > if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode): > return iter(fp.readline, '') > else: > return fp Should we drop this series from hg-committed or a followup is enough? Cheers,
On 15 November 2016 at 12:14, Yuya Nishihara <yuya@tcha.org> wrote: > Not all read()s are interruptible by signals. I don't remember the detail, > but > signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note > that a > (local) disk is not a slow device according to this definition; I/O > operations > on disk devices are not interrupted by signals." > > Since readline() doesn't have cache in CPython layer, it would be slightly > slower than fp.__iter__(). > It looks like this isn't the case for Windows; I found http://bugs.python.org/issue1633941, where you can provoke the bug by using ^Z while reading from stdin. Also, before Python 2.7.4 (including 2.6) there is no EINTR handling *at all*, all reading operations that can be interrupted are affected. See http://bugs.python.org/issue12268, which aimed at fixing that issue but managed to miss readahead.
Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +0000: > > On 11/15/2016 01:38 PM, Jun Wu wrote: > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900: > >> On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: > >>> # HG changeset patch > >>> # User Jun Wu <quark@fb.com> > >>> # Date 1479165246 0 > >>> # Mon Nov 14 23:14:06 2016 +0000 > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace > >>> # Parent 8097b8c66952003addd5b683a14265c1d3cc201f > >>> # Available At https://bitbucket.org/quark-zju/hg-draft > >>> # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 > >>> patch: migrate to util.iterfile > >> > >> Not all read()s are interruptible by signals. I don't remember the detail, but > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a > >> (local) disk is not a slow device according to this definition; I/O operations > >> on disk devices are not interrupted by signals." > > > > Good to know! I guess modern OS have similar decisions, although POSIX seems > > to be not helpful here. > > > >> Since readline() doesn't have cache in CPython layer, it would be slightly > >> slower than fp.__iter__(). > > > > A quick benchmark shows it's ~4x slower. > > > > Maybe make the function smarter to only use the workaround for a pipe? > > > > if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode): > > return iter(fp.readline, '') > > else: > > return fp > > Should we drop this series from hg-committed or a followup is enough? +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we support Python 2.6, we are at risk that any fileobj.read() is broken. > > Cheers, >
Excerpts from Jun Wu's message of 2016-11-15 15:11:52 +0000: > Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +0000: > > > > On 11/15/2016 01:38 PM, Jun Wu wrote: > > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900: > > >> On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: > > >>> # HG changeset patch > > >>> # User Jun Wu <quark@fb.com> > > >>> # Date 1479165246 0 > > >>> # Mon Nov 14 23:14:06 2016 +0000 > > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace > > >>> # Parent 8097b8c66952003addd5b683a14265c1d3cc201f > > >>> # Available At https://bitbucket.org/quark-zju/hg-draft > > >>> # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 > > >>> patch: migrate to util.iterfile > > >> > > >> Not all read()s are interruptible by signals. I don't remember the detail, but > > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a > > >> (local) disk is not a slow device according to this definition; I/O operations > > >> on disk devices are not interrupted by signals." > > > > > > Good to know! I guess modern OS have similar decisions, although POSIX seems > > > to be not helpful here. > > > > > >> Since readline() doesn't have cache in CPython layer, it would be slightly > > >> slower than fp.__iter__(). > > > > > > A quick benchmark shows it's ~4x slower. > > > > > > Maybe make the function smarter to only use the workaround for a pipe? > > > > > > if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode): > > > return iter(fp.readline, '') > > > else: > > > return fp > > > > Should we drop this series from hg-committed or a followup is enough? > > +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we > support Python 2.6, we are at risk that any fileobj.read() is broken. Actually, I think the worker issue is still worthwhile to fix (the current situation is better than none anyway). So I will send follow-ups: - Fix util.iterfile perf - Only use SIGCHILD handler in worker, if Python >= 2.7.4 > > > > > Cheers, > >
On Tue, 15 Nov 2016 15:14:09 +0000, Jun Wu wrote: > Excerpts from Jun Wu's message of 2016-11-15 15:11:52 +0000: > > Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +0000: > > > > > > On 11/15/2016 01:38 PM, Jun Wu wrote: > > > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900: > > > >> On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: > > > >>> # HG changeset patch > > > >>> # User Jun Wu <quark@fb.com> > > > >>> # Date 1479165246 0 > > > >>> # Mon Nov 14 23:14:06 2016 +0000 > > > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace > > > >>> # Parent 8097b8c66952003addd5b683a14265c1d3cc201f > > > >>> # Available At https://bitbucket.org/quark-zju/hg-draft > > > >>> # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 > > > >>> patch: migrate to util.iterfile > > > >> > > > >> Not all read()s are interruptible by signals. I don't remember the detail, but > > > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a > > > >> (local) disk is not a slow device according to this definition; I/O operations > > > >> on disk devices are not interrupted by signals." > > > > > > > > Good to know! I guess modern OS have similar decisions, although POSIX seems > > > > to be not helpful here. > > > > > > > >> Since readline() doesn't have cache in CPython layer, it would be slightly > > > >> slower than fp.__iter__(). > > > > > > > > A quick benchmark shows it's ~4x slower. > > > > > > > > Maybe make the function smarter to only use the workaround for a pipe? > > > > > > > > if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode): > > > > return iter(fp.readline, '') No idea if it can cover all cases. (e.g. i don't know if tty is the "slow" device.) > > > Should we drop this series from hg-committed or a followup is enough? > > > > +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we > > support Python 2.6, we are at risk that any fileobj.read() is broken. > > Actually, I think the worker issue is still worthwhile to fix (the current > situation is better than none anyway). So I will send follow-ups: > > - Fix util.iterfile perf > - Only use SIGCHILD handler in worker, if Python >= 2.7.4 It sounds good to patch worker.py which is known to have the problem, and we wouldn't care if the read of progress messages gets 4x slower.
Excerpts from Yuya Nishihara's message of 2016-11-16 00:39:46 +0900: > On Tue, 15 Nov 2016 15:14:09 +0000, Jun Wu wrote: > > Excerpts from Jun Wu's message of 2016-11-15 15:11:52 +0000: > > > Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +0000: > > > > > > > > On 11/15/2016 01:38 PM, Jun Wu wrote: > > > > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900: > > > > >> On Mon, 14 Nov 2016 23:35:55 +0000, Jun Wu wrote: > > > > >>> # HG changeset patch > > > > >>> # User Jun Wu <quark@fb.com> > > > > >>> # Date 1479165246 0 > > > > >>> # Mon Nov 14 23:14:06 2016 +0000 > > > > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace > > > > >>> # Parent 8097b8c66952003addd5b683a14265c1d3cc201f > > > > >>> # Available At https://bitbucket.org/quark-zju/hg-draft > > > > >>> # hg pull https://bitbucket.org/quark-zju/hg-draft -r d1637df5ffd9 > > > > >>> patch: migrate to util.iterfile > > > > >> > > > > >> Not all read()s are interruptible by signals. I don't remember the detail, but > > > > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a > > > > >> (local) disk is not a slow device according to this definition; I/O operations > > > > >> on disk devices are not interrupted by signals." > > > > > > > > > > Good to know! I guess modern OS have similar decisions, although POSIX seems > > > > > to be not helpful here. > > > > > > > > > >> Since readline() doesn't have cache in CPython layer, it would be slightly > > > > >> slower than fp.__iter__(). > > > > > > > > > > A quick benchmark shows it's ~4x slower. > > > > > > > > > > Maybe make the function smarter to only use the workaround for a pipe? > > > > > > > > > > if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode): > > > > > return iter(fp.readline, '') > > No idea if it can cover all cases. (e.g. i don't know if tty is the "slow" > device.) > > > > > Should we drop this series from hg-committed or a followup is enough? > > > > > > +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we > > > support Python 2.6, we are at risk that any fileobj.read() is broken. > > > > Actually, I think the worker issue is still worthwhile to fix (the current > > situation is better than none anyway). So I will send follow-ups: > > > > - Fix util.iterfile perf > > - Only use SIGCHILD handler in worker, if Python >= 2.7.4 > > It sounds good to patch worker.py which is known to have the problem, and we > wouldn't care if the read of progress messages gets 4x slower. Given the fact that you can do "hg import - < file". I think it's worthwhile to patch all places in the code base. The patch I'm currently writing will special case S_ISREG files so it should have little perf impact.
Patch
diff --git a/mercurial/patch.py b/mercurial/patch.py --- a/mercurial/patch.py +++ b/mercurial/patch.py @@ -1070,5 +1070,5 @@ the hunk is left unchanged. patchfp = open(patchfn) ncpatchfp = stringio() - for line in patchfp: + for line in util.iterfile(patchfp): if not line.startswith('#'): ncpatchfp.write(line) @@ -2013,5 +2013,5 @@ def _externalpatch(ui, repo, patcher, pa util.shellquote(patchname))) try: - for line in fp: + for line in util.iterfile(fp): line = line.rstrip() ui.note(line + '\n')