Patchwork chg: handle EOF reading data block

login
register
mail settings
Submitter Jun Wu
Date July 18, 2016, 6:34 p.m.
Message ID <c063823d664a978415bd.1468866850@x1c>
Download mbox | patch
Permalink /patch/15949/
State Accepted
Headers show

Comments

Jun Wu - July 18, 2016, 6:34 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1468864506 -3600
#      Mon Jul 18 18:55:06 2016 +0100
# Node ID c063823d664a978415bde11e0e13a6c3983d1dd9
# Parent  953839de96ab574caa40557c542c262286c6287c
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r c063823d664a
chg: handle EOF reading data block

We recently discovered a case in production that chg uses 100% CPU and is
trying to read data forever:

  recvfrom(4, "", 1814012019, 0, NULL, NULL) = 0

Using gdb, apparently readchannel() got wrong data. It was reading in an
infinite loop because rsize == 0 does not exit the loop, while the server
process had ended.

  (gdb) bt
  #0 ... in recv () at /lib64/libc.so.6
  #1 ... in readchannel (...) at /usr/include/bits/socket2.h:45
  #2 ... in readchannel (hgc=...) at hgclient.c:129
  #3 ... in handleresponse (hgc=...) at hgclient.c:255
  #4 ... in hgc_runcommand (hgc=..., args=<optimized>, argsize=<optimized>)
  #5 ... in main (argc=...486922636, argv=..., envp=...) at chg.c:661
  (gdb) frame 2
  (gdb) p *hgc
  $1 = {sockfd = 4, pid = 381152, ctx = {ch = 108 'l',
        data = 0x7fb05164f010 "st):\nTraceback (most recent call last):\n"
        "Traceback (most recent call last):\ne", maxdatasize = 1814065152,"
        " datasize = 1814064225}, capflags = 16131}

This patch addresses the infinite loop issue by detecting continuously empty
responses and abort in that case.

Note that datasize can be translated to ['l', ' ', 'l', 'a']. Concatenate
datasize and data, it forms part of "Traceback (most recent call last):".

This may indicate a server-side channeledoutput issue. If it is a race
condition, we may want to use flock to protect the channels.
Matt Mackall - July 19, 2016, 4:17 a.m.
On Mon, 2016-07-18 at 19:34 +0100, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1468864506 -3600
> #      Mon Jul 18 18:55:06 2016 +0100
> # Node ID c063823d664a978415bde11e0e13a6c3983d1dd9
> # Parent  953839de96ab574caa40557c542c262286c6287c
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> c063823d664a
> chg: handle EOF reading data block

Queued for default, thanks.
-- 
Mathematics is the supreme nostalgia of our time.
Yuya Nishihara - July 19, 2016, 12:42 p.m.
On Mon, 18 Jul 2016 19:34:10 +0100, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1468864506 -3600
> #      Mon Jul 18 18:55:06 2016 +0100
> # Node ID c063823d664a978415bde11e0e13a6c3983d1dd9
> # Parent  953839de96ab574caa40557c542c262286c6287c
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r c063823d664a
> chg: handle EOF reading data block

> --- a/contrib/chg/hgclient.c
> +++ b/contrib/chg/hgclient.c
> @@ -126,10 +126,15 @@ static void readchannel(hgclient_t *hgc)
>  		return;  /* assumes input request */
>  
>  	size_t cursize = 0;
> +	int emptycount = 0;
>  	while (cursize < hgc->ctx.datasize) {
>  		rsize = recv(hgc->sockfd, hgc->ctx.data + cursize,
>  			     hgc->ctx.datasize - cursize, 0);
> -		if (rsize < 0)
> +		/* rsize == 0 normally indicates EOF, while it's also a valid
> +		 * packet size for unix socket. treat it as EOF and abort if
> +		 * we get many empty responses in a row. */
> +		emptycount = (rsize == 0 ? emptycount + 1 : 0);
> +		if (rsize < 0 || emptycount > 20)

I'm curious when blocking recv() may return 0 other than EOF and zero-size
request. I suspect that the original condition "rsize < 0" was totally wrong.
Jun Wu - July 19, 2016, 1:07 p.m.
Excerpts from Yuya Nishihara's message of 2016-07-19 21:42:12 +0900:
> I'm curious when blocking recv() may return 0 other than EOF and zero-size
> request. I suspect that the original condition "rsize < 0" was totally wrong.

Manpage says it could return 0 for zero-length datagrams:

  RETURN VALUE
   ...
   Datagram sockets in various domains (e.g., the UNIX and Internet domains)
   permit zero-length datagrams.  When such a datagram is received, the
   return value is 0.
   ...

This is ugly but I don't think we have too many choices.

I'm more concerned about the root cause. It seems like nested channedoutput
writing. We probably want to make use of stdio locking by merging the two
write calls:

  diff --git a/mercurial/commandserver.py b/mercurial/commandserver.py
  --- a/mercurial/commandserver.py
  +++ b/mercurial/commandserver.py
  @@ -54,8 +54,8 @@ class channeledoutput(object):
       def write(self, data):
           if not data:
               return
  -        self.out.write(struct.pack('>cI', self.channel, len(data)))
  -        self.out.write(data)
  +        buf = struct.pack('>cI', self.channel, len(data)) + data
  +        self.out.write(buf)
           self.out.flush()
Yuya Nishihara - July 19, 2016, 1:29 p.m.
On Tue, 19 Jul 2016 14:07:23 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-07-19 21:42:12 +0900:
> > I'm curious when blocking recv() may return 0 other than EOF and zero-size
> > request. I suspect that the original condition "rsize < 0" was totally wrong.
> 
> Manpage says it could return 0 for zero-length datagrams:
> 
>   RETURN VALUE
>    ...
>    Datagram sockets in various domains (e.g., the UNIX and Internet domains)
>    permit zero-length datagrams.  When such a datagram is received, the
>    return value is 0.
>    ...
> 
> This is ugly but I don't think we have too many choices.

It's datagram socket (e.g. UDP).

> I'm more concerned about the root cause. It seems like nested channedoutput
> writing. We probably want to make use of stdio locking by merging the two
> write calls:
> 
>   diff --git a/mercurial/commandserver.py b/mercurial/commandserver.py
>   --- a/mercurial/commandserver.py
>   +++ b/mercurial/commandserver.py
>   @@ -54,8 +54,8 @@ class channeledoutput(object):
>        def write(self, data):
>            if not data:
>                return
>   -        self.out.write(struct.pack('>cI', self.channel, len(data)))
>   -        self.out.write(data)
>   +        buf = struct.pack('>cI', self.channel, len(data)) + data
>   +        self.out.write(buf)
>            self.out.flush()

Yeah, I have that patch to make commandserver channels thread safe. Do you
have any extensions that can share ui across threads?
Jun Wu - July 19, 2016, 2:56 p.m.
Excerpts from Yuya Nishihara's message of 2016-07-19 22:29:55 +0900:
> It's datagram socket (e.g. UDP).

You are right. I should have checked commandserver is using SOCK_STREAM and
unaffected.  Now we can treat an empty response as an error.

> Yeah, I have that patch to make commandserver channels thread safe. Do you
> have any extensions that can share ui across threads?

We have some "daemonize" code that use processes to build or rearrange cache
in the background. I think we may want some standard "daemonize" code in
util to make sure stdios are closed.
Yuya Nishihara - July 20, 2016, 3:18 p.m.
On Tue, 19 Jul 2016 15:56:41 +0100, Jun Wu wrote:
> > Yeah, I have that patch to make commandserver channels thread safe. Do you
> > have any extensions that can share ui across threads?  
> 
> We have some "daemonize" code that use processes to build or rearrange cache
> in the background.

When parent and child processes write to ui.fout/ferr, they have different
stdio buffers and duplicated fds. So two writes won't be serialized even if
we do write(header + data) instead of write(header) + write(data).

> I think we may want some standard "daemonize" code in
> util to make sure stdios are closed.

Sounds good.
Jun Wu - July 20, 2016, 3:47 p.m.
Excerpts from Yuya Nishihara's message of 2016-07-21 00:18:35 +0900:
> When parent and child processes write to ui.fout/ferr, they have different
> stdio buffers and duplicated fds. So two writes won't be serialized even if
> we do write(header + data) instead of write(header) + write(data).

I'm aware of that but I'm not sure if it is the cause. Mercurial uses
threads to do batch updating and that may cause issues as well. Even if
we only have a single thread, it could also be problematic if exceptions
are raised during writing.
Yuya Nishihara - July 21, 2016, 2:29 p.m.
On Wed, 20 Jul 2016 16:47:20 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-07-21 00:18:35 +0900:
> > When parent and child processes write to ui.fout/ferr, they have different
> > stdio buffers and duplicated fds. So two writes won't be serialized even if
> > we do write(header + data) instead of write(header) + write(data).  
> 
> I'm aware of that but I'm not sure if it is the cause. Mercurial uses
> threads to do batch updating and that may cause issues as well.

batch updating is driven by worker processes, which can cause problems on
plain commandserver, but that's okay for chg because the worker processes
can directly write to client's stdio.

> Even if
> we only have a single thread, it could also be problematic if exceptions
> are raised during writing.

Good point. write(header + data) will be more reliable against errors such
as KeyboardInterrupt.

Patch

diff --git a/contrib/chg/hgclient.c b/contrib/chg/hgclient.c
--- a/contrib/chg/hgclient.c
+++ b/contrib/chg/hgclient.c
@@ -126,10 +126,15 @@  static void readchannel(hgclient_t *hgc)
 		return;  /* assumes input request */
 
 	size_t cursize = 0;
+	int emptycount = 0;
 	while (cursize < hgc->ctx.datasize) {
 		rsize = recv(hgc->sockfd, hgc->ctx.data + cursize,
 			     hgc->ctx.datasize - cursize, 0);
-		if (rsize < 0)
+		/* rsize == 0 normally indicates EOF, while it's also a valid
+		 * packet size for unix socket. treat it as EOF and abort if
+		 * we get many empty responses in a row. */
+		emptycount = (rsize == 0 ? emptycount + 1 : 0);
+		if (rsize < 0 || emptycount > 20)
 			abortmsg("failed to read data block");
 		cursize += rsize;
 	}