Patchwork D3845: worker: support more return types in posix worker

login
register
mail settings
Submitter phabricator
Date June 27, 2018, 1:07 a.m.
Message ID <differential-rev-PHID-DREV-2mfrxruf2pgfpijg2kzx-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/32453/
State Superseded
Headers show

Comments

phabricator - June 27, 2018, 1:07 a.m.
hooper created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This allows us to return things that aren't tuple(int, str) from worker
  functions. I wanted to use marshal instead of pickle, but it seems to read from
  the pipe in non-blocking mode, which means it stops before it sees the results.
  
  The windows worker already supports arbitrary return values without
  serialization, because it uses threads instead of subprocesses.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

AFFECTED FILES
  mercurial/worker.py

CHANGE DETAILS




To: hooper, #hg-reviewers
Cc: mercurial-devel
phabricator - June 30, 2018, 1:09 a.m.
durin42 added a comment.


  I'm not in love with pickle. Could we use json or cbor instead?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: durin42, mercurial-devel
Yuya Nishihara - June 30, 2018, 2:15 a.m.
>   I'm not in love with pickle. Could we use json or cbor instead?

Perhaps cbor is better since it can be streamed and the overhead is pretty
low. We have to keep the message size small since rfd/wfd is a multi-writer
pipe.
phabricator - June 30, 2018, 2:15 a.m.
yuja added a comment.


  >   I'm not in love with pickle. Could we use json or cbor instead?
  
  Perhaps cbor is better since it can be streamed and the overhead is pretty
  low. We have to keep the message size small since rfd/wfd is a multi-writer
  pipe.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
phabricator - June 30, 2018, 2:38 a.m.
durin42 added a comment.


  >   >   I'm not in love with pickle. Could we use json or cbor instead?
  >   
  >   Perhaps cbor is better since it can be streamed and the overhead is pretty
  >   low. We have to keep the message size small since rfd/wfd is a multi-writer
  >   pipe.
  
  It's been recommended to me that we avoid the streaming flavor of
  cbor, so we'd probably just do one-shot messages.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
Yuya Nishihara - June 30, 2018, 2:50 a.m.
>   >   >   I'm not in love with pickle. Could we use json or cbor instead?
>   >   
>   >   Perhaps cbor is better since it can be streamed and the overhead is pretty
>   >   low. We have to keep the message size small since rfd/wfd is a multi-writer
>   >   pipe.
>   
>   It's been recommended to me that we avoid the streaming flavor of
>   cbor, so we'd probably just do one-shot messages.

I meant multiple one-shot messages can be serialized over the pipe. JSON parser
doesn't work in that way. Each message must be written atomically.
phabricator - June 30, 2018, 2:53 a.m.
yuja added a comment.


  >   >   >   I'm not in love with pickle. Could we use json or cbor instead?
  >   >   
  >   >   Perhaps cbor is better since it can be streamed and the overhead is pretty
  >   >   low. We have to keep the message size small since rfd/wfd is a multi-writer
  >   >   pipe.
  >   
  >   It's been recommended to me that we avoid the streaming flavor of
  >   cbor, so we'd probably just do one-shot messages.
  
  I meant multiple one-shot messages can be serialized over the pipe. JSON parser
  doesn't work in that way. Each message must be written atomically.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
phabricator - July 3, 2018, 2:07 p.m.
durin42 added a comment.


  >   This makes me feel that pickle is "okay" tool. @durin42, any idea?
  
  I can live with pickle in that case, I guess. Sigh.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
phabricator - July 3, 2018, 7:46 p.m.
hooper added a comment.


  Yuya, it had passed tests for me with cbor, so is that a portability issue?
  
  One of the benefits of pickle/marshal is that we don't lose information, like when tuples become lists. That would be an insidious problem for callers.
  
  Also, in case it wasn't obvious, we'll need another patch to add some handling of len(dumps(result)) > PIPE_BUF (which was an existing issue).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
Yuya Nishihara - July 4, 2018, 12:30 p.m.
>   Yuya, it had passed tests for me with cbor, so is that a portability issue?

I don't think it's a portability issue. `cbor.load(StringIO(''))` doesn't
raise EOFError.

>   One of the benefits of pickle/marshal is that we don't lose information,
> like when tuples become lists. That would be an insidious problem for callers.

Good point.

>   Also, in case it wasn't obvious, we'll need another patch to add some
> handling of len(dumps(result)) > PIPE_BUF (which was an existing issue).

Yup. A payload was much smaller than PIPE_BUF, but that's no longer true.

The most straightforward fix will be to create pipe per worker and select()
them. I don't know if there's a simpler mechanism. DGRAM socket can ensure
message boundaries, but it's still has a size limit.
phabricator - July 4, 2018, 12:30 p.m.
yuja added a comment.


  >   Yuya, it had passed tests for me with cbor, so is that a portability issue?
  
  I don't think it's a portability issue. `cbor.load(StringIO(''))` doesn't
  raise EOFError.
  
  >   One of the benefits of pickle/marshal is that we don't lose information,
  > 
  > like when tuples become lists. That would be an insidious problem for callers.
  
  Good point.
  
  >   Also, in case it wasn't obvious, we'll need another patch to add some
  > 
  > handling of len(dumps(result)) > PIPE_BUF (which was an existing issue).
  
  Yup. A payload was much smaller than PIPE_BUF, but that's no longer true.
  
  The most straightforward fix will be to create pipe per worker and select()
  them. I don't know if there's a simpler mechanism. DGRAM socket can ensure
  message boundaries, but it's still has a size limit.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
phabricator - July 4, 2018, 6:16 p.m.
indygreg added a comment.


  In https://phab.mercurial-scm.org/D3845#60316, @durin42 wrote:
  
  > It's been recommended to me that we avoid the streaming flavor of
  >  cbor, so we'd probably just do one-shot messages.
  
  
  Out of curiosity, could you elaborate?
  
  One of the critiques against CBOR is that naive consumption of streaming data types can lead to resource exhaustion. e.g. by streaming a very large byte string. Of course, resource exhaustion can occur without streaming as well if the sender sends a very large document. Parsers need to deal with resource exhaustion regardless.
  
  Anyway, I don't believe ``cbor2`` prevents the use of the streaming types. Nor does it have support for limiting bytes read. For the latter, we have ``util.cappedreader`` which can expose a minimal wrap of a file object. But it needs work to be used in the context of limiting resource consumption (e.g. it should throw a reasonable error if an overrun is encountered).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: indygreg, yuja, durin42, mercurial-devel

Patch

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -155,8 +155,8 @@ 
 
                 def workerfunc():
                     os.close(rfd)
-                    for i, item in func(*(staticargs + (pargs,))):
-                        os.write(wfd, '%d %s\n' % (i, item))
+                    for result in func(*(staticargs + (pargs,))):
+                        os.write(wfd, util.pickle.dumps(result))
                     return 0
 
                 ret = scmutil.callcatch(ui, workerfunc)
@@ -187,9 +187,15 @@ 
                 os.kill(os.getpid(), -status)
             sys.exit(status)
     try:
-        for line in util.iterfile(fp):
-            l = line.split(' ', 1)
-            yield int(l[0]), l[1][:-1]
+        while True:
+            try:
+                yield util.pickle.load(fp)
+            except EOFError:
+                break
+            except IOError as e:
+                if e.errno == errno.EINTR:
+                    continue
+                raise
     except: # re-raises
         killworkers()
         cleanup()