Submitter | Katsunori FUJIWARA |
---|---|
Date | May 20, 2016, 5:53 p.m. |
Message ID | <8c5e880c7e25e94354d3.1463766831@juju> |
Download | mbox | patch |
Permalink | /patch/15178/ |
State | Accepted |
Headers | show |
Comments
21.05.2016, 02:08, "FUJIWARA Katsunori" <foozy@lares.dti.ne.jp>: > # HG changeset patch > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > # Date 1463766531 -32400 > # Sat May 21 02:48:51 2016 +0900 > # Branch stable > # Node ID 8c5e880c7e25e94354d312d582d2ba19ca419423 > # Parent 854556c5f3bf6493a99481a355c5112b2ea0ed37 > tests: escape bytes setting MSB in input of grep for portability > > GNU grep (2.21-2 or later) assumes that input is encoded in LC_CTYPE, > and input is binary if it contains byte sequence not valid for that > encoding. > > For example, if locale is configured as C, a byte setting most > significant bit (MSB) makes such GNU grep show "Binary file <FILENAME> > matches" message instead of matched lines unintentionally. > > This behavior is recognized as a bug, and fixed in GNU grep 2.25-1 or > later. But some distributions are shipped with such buggy version > (e.g. Ubuntu xenial, which is used by launchpad buildbot). > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19230 > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800670 > http://packages.ubuntu.com/xenial/grep > > This causes failure of test-commit-interactive.t, which applies grep > on CP932 byte sequence since 1111e84de635. > > But, explicit setting LC_CTYPE for CP932 might cause another problem, > because it can't be assumed that all environment running Mercurial > tests allows arbitrary locale setting. > > To resolve this issue, this patch escapes bytes setting MSB in input > of grep. > > For this purpose: > > - str.encode('string-escape') isn't useful, because it escapes also > control code (less than 0x20), and makes EOL handling complicated > > - "f --hexdump" isn't useful, because it isn't line-oriented > > - "sed -n" seems reasonable, but "sed" itself sometimes causes > portability issue, too (e.g. 900767dfa80d or afb86ee925bf) > > This patch is posted with "stable" flag, because 1111e84de635 is on > stable branch. > > diff --git a/tests/test-commit-interactive.t b/tests/test-commit-interactive.t > --- a/tests/test-commit-interactive.t > +++ b/tests/test-commit-interactive.t > @@ -895,11 +895,24 @@ This tests that translated help message > $ LANGUAGE=ja > $ export LANGUAGE > > - $ hg commit -i --encoding cp932 2>&1 <<EOF | grep '^y - ' > + $ cat > $TESTTMP/escape.py <<EOF > + > from __future__ import absolute_import > + > import sys > + > def escape(c): > + > o = ord(c) > + > if o < 0x80: > + > return c > + > else: > + > return r'\x%02x' % o # escape char setting MSB > + > for l in sys.stdin: > + > sys.stdout.write(''.join(escape(c) for c in l)) > + > EOF > + > + $ hg commit -i --encoding cp932 2>&1 <<EOF | python $TESTTMP/escape.py | grep '^y - ' > > ? > > q > > EOF > - y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) (esc) > + y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) > > $ LANGUAGE= > #endif Passes on a Xenial box in Vagrant.
On Sat, May 21, 2016 at 02:53:51AM +0900, FUJIWARA Katsunori wrote: > # HG changeset patch > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > # Date 1463766531 -32400 > # Sat May 21 02:48:51 2016 +0900 > # Branch stable > # Node ID 8c5e880c7e25e94354d312d582d2ba19ca419423 > # Parent 854556c5f3bf6493a99481a355c5112b2ea0ed37 > tests: escape bytes setting MSB in input of grep for portability queued for stable, thanks > > GNU grep (2.21-2 or later) assumes that input is encoded in LC_CTYPE, > and input is binary if it contains byte sequence not valid for that > encoding. > > For example, if locale is configured as C, a byte setting most > significant bit (MSB) makes such GNU grep show "Binary file <FILENAME> > matches" message instead of matched lines unintentionally. > > This behavior is recognized as a bug, and fixed in GNU grep 2.25-1 or > later. But some distributions are shipped with such buggy version > (e.g. Ubuntu xenial, which is used by launchpad buildbot). > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19230 > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800670 > http://packages.ubuntu.com/xenial/grep > > This causes failure of test-commit-interactive.t, which applies grep > on CP932 byte sequence since 1111e84de635. > > But, explicit setting LC_CTYPE for CP932 might cause another problem, > because it can't be assumed that all environment running Mercurial > tests allows arbitrary locale setting. > > To resolve this issue, this patch escapes bytes setting MSB in input > of grep. > > For this purpose: > > - str.encode('string-escape') isn't useful, because it escapes also > control code (less than 0x20), and makes EOL handling complicated > > - "f --hexdump" isn't useful, because it isn't line-oriented > > - "sed -n" seems reasonable, but "sed" itself sometimes causes > portability issue, too (e.g. 900767dfa80d or afb86ee925bf) > > This patch is posted with "stable" flag, because 1111e84de635 is on > stable branch. > > diff --git a/tests/test-commit-interactive.t b/tests/test-commit-interactive.t > --- a/tests/test-commit-interactive.t > +++ b/tests/test-commit-interactive.t > @@ -895,11 +895,24 @@ This tests that translated help message > $ LANGUAGE=ja > $ export LANGUAGE > > - $ hg commit -i --encoding cp932 2>&1 <<EOF | grep '^y - ' > + $ cat > $TESTTMP/escape.py <<EOF > + > from __future__ import absolute_import > + > import sys > + > def escape(c): > + > o = ord(c) > + > if o < 0x80: > + > return c > + > else: > + > return r'\x%02x' % o # escape char setting MSB > + > for l in sys.stdin: > + > sys.stdout.write(''.join(escape(c) for c in l)) > + > EOF > + > + $ hg commit -i --encoding cp932 2>&1 <<EOF | python $TESTTMP/escape.py | grep '^y - ' > > ? > > q > > EOF > - y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) (esc) > + y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) > > $ LANGUAGE= > #endif > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Patch
diff --git a/tests/test-commit-interactive.t b/tests/test-commit-interactive.t --- a/tests/test-commit-interactive.t +++ b/tests/test-commit-interactive.t @@ -895,11 +895,24 @@ This tests that translated help message $ LANGUAGE=ja $ export LANGUAGE - $ hg commit -i --encoding cp932 2>&1 <<EOF | grep '^y - ' + $ cat > $TESTTMP/escape.py <<EOF + > from __future__ import absolute_import + > import sys + > def escape(c): + > o = ord(c) + > if o < 0x80: + > return c + > else: + > return r'\x%02x' % o # escape char setting MSB + > for l in sys.stdin: + > sys.stdout.write(''.join(escape(c) for c in l)) + > EOF + + $ hg commit -i --encoding cp932 2>&1 <<EOF | python $TESTTMP/escape.py | grep '^y - ' > ? > q > EOF - y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) (esc) + y - \x82\xb1\x82\xcc\x95\xcf\x8dX\x82\xf0\x8bL\x98^(yes) $ LANGUAGE= #endif