Patchwork [6,of,6] bundle2: don't attempt I/O after an I/O error occurs (issue4784)

login
register
mail settings
Submitter Gregory Szorc
Date April 14, 2017, 7:44 a.m.
Message ID <d1b31c378dc8e29d9827.1492155850@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/20199/
State Accepted
Headers show

Comments

Gregory Szorc - April 14, 2017, 7:44 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1492155656 25200
#      Fri Apr 14 00:40:56 2017 -0700
# Node ID d1b31c378dc8e29d9827f9ded9fd023d5a00b0c9
# Parent  8525abda8397f2a5a94edc5db2279549ef53b8e8
bundle2: don't attempt I/O after an I/O error occurs (issue4784)

Many have seen a "stream ended unexpectedly" error. This message is
raised from changegroup.readexactly() when a read(n) operation fails
to return exactly N bytes.

I believe most occurrences of this error in the wild stem from
the code changed in this patch. Before, if bundle2's part applicator
raised an Exception when processing/applying parts, the exception
handler would attempt to iterate the remaining parts. If I/O
during this iteration failed, it would likely raise the
"stream ended unexpectedly" exception.

The problem with this approach is that if we already encountered
an I/O error iterating the bundle2 data during application, then
any further I/O would almost certainly fail. If the original stream
were closed, changegroup.readexactly() would obtain an empty
string, triggering "stream ended unexpectedly" with "got 0." This is
the error message that users would see. What's worse is that the
original I/O related exception would be lost since a new exception
would be raised. This made debugging the actual I/O failure
effectively impossible.

This patch changes the exception handler for bundle2 application to
not perform additional stream I/O if the original exception was
I/O related. As tests demonstrate, the "stream ended unexpectedly"
error goes away and it is replaced with something actually useful.

A subtle bug with this change is that *all* I/O errors will influence
behavior, not just I/O errors on the bundle2 stream. e.g. an I/O error
from the local filesystem could result in not seeking the bundle. This
is the source of the test-acl.t test change, for example. I question
whether we care. Considering pretty much all bundle2 stream I/O errors
resulted in this code not running before (due to a fast I/O error
re-raise), we seem to be getting along just fine without the code. And
since this error recovery code is only tested in test-acl.t, I'm
guessing it isn't important. Or, at least it isn't as important as
accurately reporting the original I/O error. Considering these "stream
ended unexpectedly" errors have been masking several problems, I
think the new code is acceptable. We can always improve the stream
error detection code later if the new model isn't good enough.
Yuya Nishihara - April 16, 2017, 3:41 a.m.
On Fri, 14 Apr 2017 00:44:10 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1492155656 25200
> #      Fri Apr 14 00:40:56 2017 -0700
> # Node ID d1b31c378dc8e29d9827f9ded9fd023d5a00b0c9
> # Parent  8525abda8397f2a5a94edc5db2279549ef53b8e8
> bundle2: don't attempt I/O after an I/O error occurs (issue4784)

> A subtle bug with this change is that *all* I/O errors will influence
> behavior, not just I/O errors on the bundle2 stream. e.g. an I/O error
> from the local filesystem could result in not seeking the bundle. This
> is the source of the test-acl.t test change, for example. I question
> whether we care. Considering pretty much all bundle2 stream I/O errors
> resulted in this code not running before (due to a fast I/O error
> re-raise), we seem to be getting along just fine without the code. And
> since this error recovery code is only tested in test-acl.t, I'm
> guessing it isn't important. Or, at least it isn't as important as
> accurately reporting the original I/O error. Considering these "stream
> ended unexpectedly" errors have been masking several problems, I
> think the new code is acceptable. We can always improve the stream
> error detection code later if the new model isn't good enough.

> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> --- a/mercurial/bundle2.py
> +++ b/mercurial/bundle2.py
> @@ -354,9 +354,13 @@ def processbundle(repo, unbundler, trans
>          for nbpart, part in iterparts:
>              _processpart(op, part)
>      except Exception as exc:
> -        for nbpart, part in iterparts:
> -            # consume the bundle content
> -            part.seek(0, 2)
> +        # Don't attempt further I/O after we've encountered an I/O failure.
> +        # This will almost certainly just raise another I/O failure and will
> +        # only mask the underlying failure.
> +        if not isinstance(exc, (IOError, error.RichIOError)):
> +            for nbpart, part in iterparts:
> +                # consume the bundle content
> +                part.seek(0, 2)

I think this is okay, but if we have to care about a local IOError, maybe we
could try consuming bundle and suppress errors?

  except Exception as exc:
      try:
          # consume the bundle content
      except (IOError, error.RichIOError):
          pass
Gregory Szorc - April 16, 2017, 7 p.m.
On Sat, Apr 15, 2017 at 8:41 PM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Fri, 14 Apr 2017 00:44:10 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1492155656 25200
> > #      Fri Apr 14 00:40:56 2017 -0700
> > # Node ID d1b31c378dc8e29d9827f9ded9fd023d5a00b0c9
> > # Parent  8525abda8397f2a5a94edc5db2279549ef53b8e8
> > bundle2: don't attempt I/O after an I/O error occurs (issue4784)
>
> > A subtle bug with this change is that *all* I/O errors will influence
> > behavior, not just I/O errors on the bundle2 stream. e.g. an I/O error
> > from the local filesystem could result in not seeking the bundle. This
> > is the source of the test-acl.t test change, for example. I question
> > whether we care. Considering pretty much all bundle2 stream I/O errors
> > resulted in this code not running before (due to a fast I/O error
> > re-raise), we seem to be getting along just fine without the code. And
> > since this error recovery code is only tested in test-acl.t, I'm
> > guessing it isn't important. Or, at least it isn't as important as
> > accurately reporting the original I/O error. Considering these "stream
> > ended unexpectedly" errors have been masking several problems, I
> > think the new code is acceptable. We can always improve the stream
> > error detection code later if the new model isn't good enough.
>
> > diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> > --- a/mercurial/bundle2.py
> > +++ b/mercurial/bundle2.py
> > @@ -354,9 +354,13 @@ def processbundle(repo, unbundler, trans
> >          for nbpart, part in iterparts:
> >              _processpart(op, part)
> >      except Exception as exc:
> > -        for nbpart, part in iterparts:
> > -            # consume the bundle content
> > -            part.seek(0, 2)
> > +        # Don't attempt further I/O after we've encountered an I/O
> failure.
> > +        # This will almost certainly just raise another I/O failure and
> will
> > +        # only mask the underlying failure.
> > +        if not isinstance(exc, (IOError, error.RichIOError)):
> > +            for nbpart, part in iterparts:
> > +                # consume the bundle content
> > +                part.seek(0, 2)
>
> I think this is okay, but if we have to care about a local IOError, maybe
> we
> could try consuming bundle and suppress errors?
>
>   except Exception as exc:
>       try:
>           # consume the bundle content
>       except (IOError, error.RichIOError):
>           pass
>

I went with something similar to this in the series I just sent. But it is
different enough that we may want to discuss it further.

Patch

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -354,9 +354,13 @@  def processbundle(repo, unbundler, trans
         for nbpart, part in iterparts:
             _processpart(op, part)
     except Exception as exc:
-        for nbpart, part in iterparts:
-            # consume the bundle content
-            part.seek(0, 2)
+        # Don't attempt further I/O after we've encountered an I/O failure.
+        # This will almost certainly just raise another I/O failure and will
+        # only mask the underlying failure.
+        if not isinstance(exc, (IOError, error.RichIOError)):
+            for nbpart, part in iterparts:
+                # consume the bundle content
+                part.seek(0, 2)
         # Small hack to let caller code distinguish exceptions from bundle2
         # processing from processing the old format. This is mostly
         # needed to handle different return codes to unbundle according to the
diff --git a/tests/test-acl.t b/tests/test-acl.t
--- a/tests/test-acl.t
+++ b/tests/test-acl.t
@@ -896,7 +896,7 @@  file specified by acl.config does not ex
   acl: checking access for user "barney"
   error: pretxnchangegroup.acl hook raised an exception: [Errno 2] No such file or directory: '../acl.config'
   bundle2-input-part: total payload size 1553
-  bundle2-input-bundle: 3 parts total
+  bundle2-input-bundle: 2 parts total
   transaction abort!
   rollback completed
   abort: No such file or directory: ../acl.config
diff --git a/tests/test-http-bad-server.t b/tests/test-http-bad-server.t
--- a/tests/test-http-bad-server.t
+++ b/tests/test-http-bad-server.t
@@ -696,7 +696,8 @@  Server stops sending after bundle2 part 
   adding changesets
   transaction abort!
   rollback completed
-  abort: stream ended unexpectedly (got 0 bytes, expected 4)
+  abort: HTTP request error (incomplete response)
+  (this may be an intermittent network failure; if the error persists, consider contacting the network or server operator)
   [255]
 
   $ killdaemons.py $DAEMON_PIDS
@@ -725,7 +726,8 @@  Server stops after bundle2 part payload 
   adding changesets
   transaction abort!
   rollback completed
-  abort: stream ended unexpectedly (got 0 bytes, expected 4)
+  abort: HTTP request error (incomplete response)
+  (this may be an intermittent network failure; if the error persists, consider contacting the network or server operator)
   [255]
 
   $ killdaemons.py $DAEMON_PIDS
@@ -755,7 +757,8 @@  Server stops sending in middle of bundle
   adding changesets
   transaction abort!
   rollback completed
-  abort: stream ended unexpectedly (got 0 bytes, expected 4)
+  abort: HTTP request error (incomplete response)
+  (this may be an intermittent network failure; if the error persists, consider contacting the network or server operator)
   [255]
 
   $ killdaemons.py $DAEMON_PIDS