Patchwork run-tests: "fix" race condition in race condition fix

login
register
mail settings
Submitter Bryan O'Sullivan
Date Jan. 22, 2016, 7 p.m.
Message ID <6bb64e83ac188ee99528.1453489217@bryano-mbp.local>
Download mbox | patch
Permalink /patch/12868/
State Superseded
Commit a6833e464b070c2f51edff8d6e55be79f5d9db22
Delegated to: Laurent Charignon
Headers show

Comments

Bryan O'Sullivan - Jan. 22, 2016, 7 p.m.
# HG changeset patch
# User Bryan O'Sullivan <bryano@fb.com>
# Date 1453489213 28800
#      Fri Jan 22 11:00:13 2016 -0800
# Branch stable
# Node ID 6bb64e83ac188ee99528ad0237028f00047ab505
# Parent  0b752757a0c4f30ac755389014e2af9d2e0b4a3f
run-tests: "fix" race condition in race condition fix

Laurent's commit 3203dfe341f9 still suffers from a race: by the
time the "job" function tries to assign to channels[channel], that
list has been truncated to empty.  The result is that every job
thread raises an IndexError.

Earlier, I tried an approach of correctly locking channels, but
that caused run-tests to hang on KeyboardInterrupt sometimes.

This approach is strictly hackier, but seems to actually work
reliably.
timeless - Jan. 22, 2016, 7:30 p.m.
On Fri, Jan 22, 2016 at 2:00 PM, Bryan O'Sullivan <bos@serpentine.com> wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano@fb.com>
> # Date 1453489213 28800
> #      Fri Jan 22 11:00:13 2016 -0800
> # Branch stable
> # Node ID 6bb64e83ac188ee99528ad0237028f00047ab505
> # Parent  0b752757a0c4f30ac755389014e2af9d2e0b4a3f
> run-tests: "fix" race condition in race condition fix
>
> Laurent's commit 3203dfe341f9 still suffers from a race: by the
> time the "job" function tries to assign to channels[channel], that
> list has been truncated to empty.  The result is that every job
> thread raises an IndexError.
>
> Earlier, I tried an approach of correctly locking channels, but
> that caused run-tests to hang on KeyboardInterrupt sometimes.
>
> This approach is strictly hackier, but seems to actually work
> reliably.

I really wish this worked :(

(py)[timeless@gcc2-power8 crew]$ hg import ../crew0/0
applying ../crew0/0
(py)[timeless@gcc2-power8 crew]$ cd tests/
(py)[timeless@gcc2-power8 tests]$ ./run-tests.py -l -j120
..ss^C
Skipped test-convert-svn-source.t: missing feature: subversion python bindings
Skipped test-convert-svn-move.t: missing feature: subversion python bindings
# Ran 122 tests, 2 skipped, 0 warned, 0 failed.
INTERRUPTED: test-convert-cvs.t (after 2 seconds)
INTERRUPTED: test-contrib-check-code.t (after 2 seconds)
INTERRUPTED: test-pull-pull-corruption.t (after 2 seconds)
INTERRUPTED: test-mv-cp-st-diff.t (after 2 seconds)
INTERRUPTED: test-check-code.t (after 2 seconds)
INTERRUPTED: test-check-py3-compat.t (after 2 seconds)
INTERRUPTED: test-convert-svn-encoding.t (after 2 seconds)
INTERRUPTED: test-check-commit.t (after 2 seconds)
EINTERRUPTED: test-histedit-fold.t (after 1 seconds)
EINTERRUPTED: test-lfconvert.t (after 1 seconds)
EINTERRUPTED: test-hgweb-symrev.t (after 2 seconds)
INTERRUPTED: test-hgweb-empty.t (after 1 seconds)
INTERRUPTED: test-bundle2-exchange.t (after 2 seconds)
INTERRUPTED: test-encoding-textwrap.t (after 2 seconds)
INTERRUPTED: test-rebase-check-restore.t (after 2 seconds)
INTERRUPTED: test-gendoc.t (after 3 seconds)
INTERRUPTED: test-obsolete-checkheads.t (after 3 seconds)
INTERRUPTED: test-notify.t (after 2 seconds)
INTERRUPTED: test-check-execute.t (after 3 seconds)
INTERRUPTED: test-contrib-perf.t (after 4 seconds)
INTERRUPTED: test-bundle2-format.t (after 4 seconds)
INTERRUPTED: test-acl.t (after 5 seconds)
INTERRUPTED: test-hgweb-commands.t (after 5 seconds)
INTERRUPTED: test-patchbomb.t (after 7 seconds)
INTERRUPTED: test-merge-changedelete.t (after 7 seconds)
INTERRUPTED: test-merge-force.t (after 10 seconds)
INTERRUPTED: test-rename-merge2.t (after 10 seconds)


^C^C^C^C^C^C^C^Z
[4]+  Stopped                 ./run-tests.py -l -j120
(py)[timeless@gcc2-power8 tests]$ hg log -r .
changeset:   30698:6bb64e83ac18
branch:      stable
tag:         tip
parent:      30636:0b752757a0c4
user:        Bryan O'Sullivan <bryano@fb.com>
date:        Fri Jan 22 11:00:13 2016 -0800
summary:     run-tests: "fix" race condition in race condition fix

(py)[timeless@gcc2-power8 tests]$ hg log -r . -v
changeset:   30698:6bb64e83ac18
branch:      stable
tag:         tip
parent:      30636:0b752757a0c4
user:        Bryan O'Sullivan <bryano@fb.com>
date:        Fri Jan 22 11:00:13 2016 -0800
files:       tests/run-tests.py
description:
run-tests: "fix" race condition in race condition fix

Laurent's commit 3203dfe341f9 still suffers from a race: by the
time the "job" function tries to assign to channels[channel], that
list has been truncated to empty.  The result is that every job
thread raises an IndexError.

Earlier, I tried an approach of correctly locking channels, but
that caused run-tests to hang on KeyboardInterrupt sometimes.

This approach is strictly hackier, but seems to actually work
reliably.


(py)[timeless@gcc2-power8 tests]$
Pierre-Yves David - Jan. 22, 2016, 7:44 p.m.
On 01/22/2016 11:30 AM, timeless wrote:
> On Fri, Jan 22, 2016 at 2:00 PM, Bryan O'Sullivan <bos@serpentine.com> wrote:
>> # HG changeset patch
>> # User Bryan O'Sullivan <bryano@fb.com>
>> # Date 1453489213 28800
>> #      Fri Jan 22 11:00:13 2016 -0800
>> # Branch stable
>> # Node ID 6bb64e83ac188ee99528ad0237028f00047ab505
>> # Parent  0b752757a0c4f30ac755389014e2af9d2e0b4a3f
>> run-tests: "fix" race condition in race condition fix
>>
>> Laurent's commit 3203dfe341f9 still suffers from a race: by the
>> time the "job" function tries to assign to channels[channel], that
>> list has been truncated to empty.  The result is that every job
>> thread raises an IndexError.
>>
>> Earlier, I tried an approach of correctly locking channels, but
>> that caused run-tests to hang on KeyboardInterrupt sometimes.
>>
>> This approach is strictly hackier, but seems to actually work
>> reliably.
>
> I really wish this worked :(

How much of your issue is related to "bryan patch failure to fix the 
race introduced by Laurent" and how much is it of "run-tests.py 
interruption have been borked since forever"? I've personnally never 
have been confortable with ctrl^C run-tests.py in years.
Bryan O'Sullivan - Jan. 22, 2016, 9:09 p.m.
On Fri, Jan 22, 2016 at 11:30 AM, timeless <timeless@gmail.com> wrote:

> I really wish this worked :(
>

What kind of system are you using? This patch works reliably on my Mac and
Linux boxes.
timeless - Jan. 22, 2016, 9:54 p.m.
This is:
$ uname -a
Linux gcc2-power8.osuosl.org 3.17.4-301.fc21.ppc64le #1 SMP Mon Dec 1
07:51:01 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux

$ lsb_release -a
LSB Version:    :core-4.1-noarch:core-4.1-ppc64le
Distributor ID:    Fedora
Description:    Fedora release 21 (Twenty One)
Release:    21
Codename:    TwentyOne

$ virtualenv --version
1.11.6

$ python --version
Python 2.7.8

I'm ssh'ing from a mac, but I doubt that's relevant.

$ cat /proc/cpuinfo |perl -pne 's/(processor\s*:).*/$1 X/'|sort|uniq
-c|sort -n      1 firmware    : OPAL v3
      1 machine        : PowerNV 8247-22L
      1 model        : 8247-22L
      1 platform    : PowerNV
      1 timebase    : 512000000
    160
    160 clock        : 3425.000000MHz
    160 cpu        : POWER8E (raw), altivec supported
    160 processor    : X
    160 revision    : 2.1 (pvr 004b 0201)

(X is just a processor number, so there are 160 lines w/ numbers
0..159, not very interesting)

On Fri, Jan 22, 2016 at 4:09 PM, Bryan O'Sullivan <bos@serpentine.com> wrote:
>
> On Fri, Jan 22, 2016 at 11:30 AM, timeless <timeless@gmail.com> wrote:
>>
>> I really wish this worked :(
>
>
> What kind of system are you using? This patch works reliably on my Mac and
> Linux boxes.
Pierre-Yves David - Jan. 23, 2016, 12:03 a.m.
On 01/22/2016 11:00 AM, Bryan O'Sullivan wrote:
> # HG changeset patch
> # User Bryan O'Sullivan <bryano@fb.com>
> # Date 1453489213 28800
> #      Fri Jan 22 11:00:13 2016 -0800
> # Branch stable
> # Node ID 6bb64e83ac188ee99528ad0237028f00047ab505
> # Parent  0b752757a0c4f30ac755389014e2af9d2e0b4a3f
> run-tests: "fix" race condition in race condition fix

After discussion with Laurent, I'm taking this as this seems strictly 
better than the current situation.

Pushed to the clowncopter.
Bryan O'Sullivan - Jan. 26, 2016, 5:08 p.m.
On Fri, Jan 22, 2016 at 4:03 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> After discussion with Laurent, I'm taking this as this seems strictly
> better than the current situation.
>

Thanks, I agree with your reasoning :-)

Patch

diff --git a/tests/run-tests.py b/tests/run-tests.py
--- a/tests/run-tests.py
+++ b/tests/run-tests.py
@@ -1526,13 +1526,16 @@  class TestSuite(unittest.TestSuite):
             channels[channel] = "=" + test.name[5:].split(".")[0]
             try:
                 test(result)
-                channels[channel] = ''
                 done.put(None)
             except KeyboardInterrupt:
-                channels[channel] = ''
+                pass
             except: # re-raises
                 done.put(('!', test, 'run-test raised an error, see traceback'))
                 raise
+            try:
+                channels[channel] = ''
+            except IndexError:
+                pass
 
         def stat():
             count = 0