Patchwork [STABLE] extdiff: quote only options specified at runtime (issue4463)

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Dec. 1, 2014, 5:44 p.m.
Message ID <d887d27304009d1cac05.1417455843@juju>
Download mbox | patch
Permalink /patch/6932/
State Superseded
Delegated to: Matt Mackall
Headers show

Comments

Katsunori FUJIWARA - Dec. 1, 2014, 5:44 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1417454278 -32400
#      Tue Dec 02 02:17:58 2014 +0900
# Branch stable
# Node ID d887d27304009d1cac05a1db4399f69d1778c5f0
# Parent  edf29f9c15f0f171847f4c7a8184cca4e95c8b31
extdiff: quote only options specified at runtime (issue4463)

For arguments and options including white spaces, changeset
72a89cf86fcd introduced fully quoting on all command line arguments
and options for external diff command.

But this causes unexpected behavior of extdiff with WinMerge, because
WinMerge can't work correctly, when command line options in Windows
standard style are quoted: for example, 'WinMerge /r ....' is OK, but
'WinMerge "/r" ....' is NG).

"contrib/mergetools.hgrc" file also specifies some options in Windows
standard style for WinMerge.

See also https://bitbucket.org/tortoisehg/thg/issue/3978/ for detail
about this problem.

On the other hand, users can't specify command line arguments and
options including white spaces, without such quoting in extdiff.

To resolve both issues, this patch introduces mixed quoting policy
"quote only options specified at runtime".

Command line arguments and options specified at runtime by "-o" of
extdiff should be quoted in extdiff, because quoting in the command
line is stripped by the command shell. This stripping breaks arguments
and options including white spaces, if they aren't quoted in extdiff.

On the other hand, command line arguments and options specified in
configuration files can be quoted arbitrarily by users. Such quoting
is still kept at "util.system()" invocation.

BTW, this patch may save the external diff configurations using
command separation (e.g. ';'/'&&'/'||' on POSIX, '&'/'&&'/'||' on
Windows), because:

  - "shlex.split" doesn't suppose them

    "echo foo;echo bar" is splitted into ["echo", "foo;echo", "bar"],
    for example.

  - quoting by "util.shellquote" prevents such separators from working
    correctly
Matt Mackall - Dec. 2, 2014, 12:13 a.m.
On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1417454278 -32400
> #      Tue Dec 02 02:17:58 2014 +0900
> # Branch stable
> # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0
> # Parent  edf29f9c15f0f171847f4c7a8184cca4e95c8b31
> extdiff: quote only options specified at runtime (issue4463)
> 
> For arguments and options including white spaces, changeset
> 72a89cf86fcd introduced fully quoting on all command line arguments
> and options for external diff command.
> 
> But this causes unexpected behavior of extdiff with WinMerge, because
> WinMerge can't work correctly, when command line options in Windows
> standard style are quoted: for example, 'WinMerge /r ....' is OK, but
> 'WinMerge "/r" ....' is NG).

Presumably this is no good because WinMerge doesn't use the standard
argument parser in Microsoft's C runtime (the thing that gives argc and
argv to main()) and instead uses the raw single-string lpCmdLine passed
to WinMain... and then does its own broken parsing that doesn't handle
quoting on things that aren't filenames. Sigh.

I'm more than a little worried that we're putting this sort of obscure
knowledge in a special function in a weird extension and not somewhere
in util.

> +def _splitcmdline(cmdline):
> +    '''Split diff command line configuration into command path and options
> +
> +    This returns ``(commandpath, options)`` tuple. ``options`` is a
> +    list, but each command line options specified in ``cmdline`` are
> +    not splitted. So, ``len(options)`` is 0 or 1.

Seems silly for it to be a list then?

> +    stream = cStringIO.StringIO(cmdline)
> +    # according to "shlex.split" implementation, ``posix`` is True
> +    # even on Windows
> +    lex = shlex.shlex(stream, posix=True)

This function accepts strings, no need to mess with StringIO.

Since it's getting late, looks like this has to wait for 3.2.3.
Yuya Nishihara - Dec. 2, 2014, 3:05 p.m.
On Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote:
> On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1417454278 -32400
> > #      Tue Dec 02 02:17:58 2014 +0900
> > # Branch stable
> > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0
> > # Parent  edf29f9c15f0f171847f4c7a8184cca4e95c8b31
> > extdiff: quote only options specified at runtime (issue4463)
> > 
> > For arguments and options including white spaces, changeset
> > 72a89cf86fcd introduced fully quoting on all command line arguments
> > and options for external diff command.
> > 
> > But this causes unexpected behavior of extdiff with WinMerge, because
> > WinMerge can't work correctly, when command line options in Windows
> > standard style are quoted: for example, 'WinMerge /r ....' is OK, but
> > 'WinMerge "/r" ....' is NG).
> 
> Presumably this is no good because WinMerge doesn't use the standard
> argument parser in Microsoft's C runtime (the thing that gives argc and
> argv to main()) and instead uses the raw single-string lpCmdLine passed
> to WinMain... and then does its own broken parsing that doesn't handle
> quoting on things that aren't filenames. Sigh.

How about making windows.shellquote() not quote known-safe argument?
As many GUI applications start from WinMain() (or MFC's equivalent), I think,
they can easily invent broken command-line parsers.

subprocess.list2cmdline() might have some hints, which says "using the same
rules as the MS C runtime."

Regards,
Katsunori FUJIWARA - Dec. 4, 2014, 11:40 a.m.
At Mon, 01 Dec 2014 18:13:15 -0600,
Matt Mackall wrote:
> 
> On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1417454278 -32400
> > #      Tue Dec 02 02:17:58 2014 +0900
> > # Branch stable
> > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0
> > # Parent  edf29f9c15f0f171847f4c7a8184cca4e95c8b31
> > extdiff: quote only options specified at runtime (issue4463)
> > 
> > For arguments and options including white spaces, changeset
> > 72a89cf86fcd introduced fully quoting on all command line arguments
> > and options for external diff command.
> > 
> > But this causes unexpected behavior of extdiff with WinMerge, because
> > WinMerge can't work correctly, when command line options in Windows
> > standard style are quoted: for example, 'WinMerge /r ....' is OK, but
> > 'WinMerge "/r" ....' is NG).
> 
> Presumably this is no good because WinMerge doesn't use the standard
> argument parser in Microsoft's C runtime (the thing that gives argc and
> argv to main()) and instead uses the raw single-string lpCmdLine passed
> to WinMain... and then does its own broken parsing that doesn't handle
> quoting on things that aren't filenames. Sigh.
> 
> I'm more than a little worried that we're putting this sort of obscure
> knowledge in a special function in a weird extension and not somewhere
> in util.

At this posting, I focused on localizing changes in extdiff.

I'll post more generalized ones.


> > +def _splitcmdline(cmdline):
> > +    '''Split diff command line configuration into command path and options
> > +
> > +    This returns ``(commandpath, options)`` tuple. ``options`` is a
> > +    list, but each command line options specified in ``cmdline`` are
> > +    not splitted. So, ``len(options)`` is 0 or 1.
> 
> Seems silly for it to be a list then?

To keep specified configuration as it is, "options" in the tuple
returned by "_splitcmdline()" never be splitted into list. It is
always treated as a single string.

For example:

    $ hg showconfig extdiff
    extdiff.winmerge = winmerge /r "FOO" BAR
    $ hg winmerge -o "foo bar" -o baz

Extdiff in the case above causes:

    1. "_splitcmdline()" returns "('winmerge', '/r \"FOO\" BAR')", and
      "diffopts" is initialized as "['/r \"FOO\" BAR']"

    2. "['foo bar', 'baz']" are passed as "opts['options']", and
       "runtimeopts" is initialized as "['\"foo bar\"', 'baz']"
       (= "map(util.shellquote, opts['option'])")

    3. then, "dodiff()" is invoked with "['/r \"FOO\" BAR', '\"foo bar\"', 'baz']"
       (= "diffopts + runtimeopts") as "diffopts", and

    4. command line for "util.system" is build by "' '.join(diffopts)"

    => '/r \"FOO\" BAR \"foo bar\" baz'

This allows users to quote arguments/options arbitrarily in the
configuration files.

 
> > +    stream = cStringIO.StringIO(cmdline)
> > +    # according to "shlex.split" implementation, ``posix`` is True
> > +    # even on Windows
> > +    lex = shlex.shlex(stream, posix=True)
> 
> This function accepts strings, no need to mess with StringIO.

Yes, "shlex.shlex" can accept string. In fact, I
implemented "_splitcmdline" with code below at first.

    cmdpath = shlex.split(cmdline)[0]
    diffopts = cmdline[(cmdpath.index(cmdpath) + len(cmdpath)):]
    if diffopts and diffopts[0] in '"\'':
        diffopts = diffopts[1:] # skip closing quotation
    return (cmdpath, diffopts)

But this doesn't work correclty for complicated cmdpath, for
example '/"usr/local"/bin/foobar' (may I worry too much ? :-))

On the other hand, passing cStringIO allows to know exactly what
characters are not yet read in by shlex, via "stream.tell()".

 
> Since it's getting late, looks like this has to wait for 3.2.3.
> 
> -- 
> Mathematics is the supreme nostalgia of our time.
> 
> 
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Dec. 4, 2014, 11:48 a.m.
At Wed, 3 Dec 2014 00:05:04 +0900,
Yuya Nishihara wrote:
> 
> On Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote:
> > On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > > # Date 1417454278 -32400
> > > #      Tue Dec 02 02:17:58 2014 +0900
> > > # Branch stable
> > > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0
> > > # Parent  edf29f9c15f0f171847f4c7a8184cca4e95c8b31
> > > extdiff: quote only options specified at runtime (issue4463)
> > > 
> > > For arguments and options including white spaces, changeset
> > > 72a89cf86fcd introduced fully quoting on all command line arguments
> > > and options for external diff command.
> > > 
> > > But this causes unexpected behavior of extdiff with WinMerge, because
> > > WinMerge can't work correctly, when command line options in Windows
> > > standard style are quoted: for example, 'WinMerge /r ....' is OK, but
> > > 'WinMerge "/r" ....' is NG).
> > 
> > Presumably this is no good because WinMerge doesn't use the standard
> > argument parser in Microsoft's C runtime (the thing that gives argc and
> > argv to main()) and instead uses the raw single-string lpCmdLine passed
> > to WinMain... and then does its own broken parsing that doesn't handle
> > quoting on things that aren't filenames. Sigh.
> 
> How about making windows.shellquote() not quote known-safe argument?
> As many GUI applications start from WinMain() (or MFC's equivalent), I think,
> they can easily invent broken command-line parsers.
> 
> subprocess.list2cmdline() might have some hints, which says "using the same
> rules as the MS C runtime."

It may work incorrectly, if environment variables are used in
configuration.

Also with cmd.exe on Windows, environment variable expansions are
executed before splitting command line into each arguments. For
example, when "CONCAT" has "foo bar baz" value:

  winmerge %CONCAT%   => winmerge foo bar baz   (taking 3 arguments)
  winmerge "%CONCAT%" => winmerge "foo bar baz" (taking 1 argument)

If "windows.shellquote()" doesn't quote arguments including %CONCAT%,
"shlex.split + windows.shellquote" combination doesn't treat %CONCAT%
as a single argument, even if users quote %CONCAT% explicitly in the
configuration.

On the other hand, users may not want to treat %CONCAT% as a single
argument (shouldn't users do such thing ? maybe, but we shouldn't
expect so). In this case, quoting %CONCAT% in "windows.shellquote"
works incorrectly for users.


IMHO, root cause of this problem is that "shlex.split" loses whether
users really want to quote each of arguments or not.

Keeping the configuration string specified by users as it is (=
avoiding "shlex.split") isn't the best solution, but seems to be
better than others in almost all cases.


Quoting options specified in the command line seems to be reasonable,
because environment variables should be already expanded by
"cmd.exe". Even though users may delay expansion by "%%CONCAT%%", such
users should be able to avoid problem by their own skill :-)


> Regards,
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - Dec. 4, 2014, 4:13 p.m.
On Thu, 04 Dec 2014 20:48:24 +0900, FUJIWARA Katsunori wrote:
> At Wed, 3 Dec 2014 00:05:04 +0900,
> Yuya Nishihara wrote:
> > On Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote:
> > > On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote:
> > > > # HG changeset patch
> > > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > > > # Date 1417454278 -32400
> > > > #      Tue Dec 02 02:17:58 2014 +0900
> > > > # Branch stable
> > > > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0
> > > > # Parent  edf29f9c15f0f171847f4c7a8184cca4e95c8b31
> > > > extdiff: quote only options specified at runtime (issue4463)
> > > > 
> > > > For arguments and options including white spaces, changeset
> > > > 72a89cf86fcd introduced fully quoting on all command line arguments
> > > > and options for external diff command.
> > > > 
> > > > But this causes unexpected behavior of extdiff with WinMerge, because
> > > > WinMerge can't work correctly, when command line options in Windows
> > > > standard style are quoted: for example, 'WinMerge /r ....' is OK, but
> > > > 'WinMerge "/r" ....' is NG).
> > > 
> > > Presumably this is no good because WinMerge doesn't use the standard
> > > argument parser in Microsoft's C runtime (the thing that gives argc and
> > > argv to main()) and instead uses the raw single-string lpCmdLine passed
> > > to WinMain... and then does its own broken parsing that doesn't handle
> > > quoting on things that aren't filenames. Sigh.
> > 
> > How about making windows.shellquote() not quote known-safe argument?
> > As many GUI applications start from WinMain() (or MFC's equivalent), I think,
> > they can easily invent broken command-line parsers.
> > 
> > subprocess.list2cmdline() might have some hints, which says "using the same
> > rules as the MS C runtime."
> 
> It may work incorrectly, if environment variables are used in
> configuration.
> 
> Also with cmd.exe on Windows, environment variable expansions are
> executed before splitting command line into each arguments. For
> example, when "CONCAT" has "foo bar baz" value:
> 
>   winmerge %CONCAT%   => winmerge foo bar baz   (taking 3 arguments)
>   winmerge "%CONCAT%" => winmerge "foo bar baz" (taking 1 argument)
> 
> If "windows.shellquote()" doesn't quote arguments including %CONCAT%,
> "shlex.split + windows.shellquote" combination doesn't treat %CONCAT%
> as a single argument, even if users quote %CONCAT% explicitly in the
> configuration.
> 
> On the other hand, users may not want to treat %CONCAT% as a single
> argument (shouldn't users do such thing ? maybe, but we shouldn't
> expect so). In this case, quoting %CONCAT% in "windows.shellquote"
> works incorrectly for users.

It's the same for Unix. $CONCAT shouldn't be quoted in that case.

> IMHO, root cause of this problem is that "shlex.split" loses whether
> users really want to quote each of arguments or not.
>
> Keeping the configuration string specified by users as it is (=
> avoiding "shlex.split") isn't the best solution, but seems to be
> better than others in almost all cases.

I misread 72a89cf86fcd.  I thought it really wanted to quote options, but it
seems it just wanted to disable <, >, |, &, or ; that terminates a command.
So I agree that shlex.split()+shellquote() is not appropriate.

Regards,

Patch

diff --git a/hgext/extdiff.py b/hgext/extdiff.py
--- a/hgext/extdiff.py
+++ b/hgext/extdiff.py
@@ -64,7 +64,7 @@  pretty fast (at least faster than having
 from mercurial.i18n import _
 from mercurial.node import short, nullid
 from mercurial import cmdutil, scmutil, util, commands, encoding
-import os, shlex, shutil, tempfile, re
+import os, shlex, shutil, tempfile, re, cStringIO
 
 cmdtable = {}
 command = cmdutil.command(cmdtable)
@@ -121,7 +121,7 @@  def dodiff(ui, repo, diffcmd, diffopts, 
 
     revs = opts.get('rev')
     change = opts.get('change')
-    args = ' '.join(map(util.shellquote, diffopts))
+    args = ' '.join(diffopts)
     do3way = '$parent2' in args
 
     if revs and change:
@@ -268,26 +268,73 @@  def extdiff(ui, repo, *pats, **opts):
     revisions are specified, the working directory files are compared
     to its parent.'''
     program = opts.get('program')
-    option = opts.get('option')
+    # they should be quoted explicitly, because quotations
+    # for them are already stripped by shell
+    option = map(util.shellquote, opts['option'])
     if not program:
         program = 'diff'
         option = option or ['-Npru']
     return dodiff(ui, repo, program, option, pats, opts)
 
+def _splitcmdline(cmdline):
+    '''Split diff command line configuration into command path and options
+
+    This returns ``(commandpath, options)`` tuple. ``options`` is a
+    list, but each command line options specified in ``cmdline`` are
+    not splitted. So, ``len(options)`` is 0 or 1.
+
+    This requires explicit quoting for command path and options
+    including white spaces in configuration files. But on the other
+    hand, users can control quoting for command path and options by
+    themselves.
+
+    >>> _splitcmdline('foo')
+    ('foo', [])
+    >>> _splitcmdline('foo bar baz')
+    ('foo', ['bar baz'])
+    >>> _splitcmdline('"foo foo"')
+    ('foo foo', [])
+    >>> _splitcmdline('"foo foo" bar baz')
+    ('foo foo', ['bar baz'])
+    >>> _splitcmdline('"foo foo" "bar" baz')
+    ('foo foo', ['"bar" baz'])
+    >>> _splitcmdline('foo "bar" baz')
+    ('foo', ['"bar" baz'])
+    >>> _splitcmdline('"foo"/"foo" "bar" baz')
+    ('foo/foo', ['"bar" baz'])
+    '''
+    stream = cStringIO.StringIO(cmdline)
+    # according to "shlex.split" implementation, ``posix`` is True
+    # even on Windows
+    lex = shlex.shlex(stream, posix=True)
+    lex.whitespace_split = True
+    lex.commenters = ''
+
+    cmdpath = lex.next()
+    # current position of stream should be next of the first white
+    # space after diff command path, because characters in stream are
+    # read one by one via "StringIO.read(1)"
+    diffopts = cmdline[stream.tell():]
+
+    if diffopts:
+        return (cmdpath, [diffopts])
+    else:
+        return (cmdpath, [])
+
 def uisetup(ui):
     for cmd, path in ui.configitems('extdiff'):
         if cmd.startswith('cmd.'):
             cmd = cmd[4:]
             if not path:
                 path = cmd
-            diffopts = shlex.split(ui.config('extdiff', 'opts.' + cmd, ''))
+            diffopts = ui.config('extdiff', 'opts.' + cmd, '')
+            diffopts = diffopts and [diffopts] or []
         elif cmd.startswith('opts.'):
             continue
         else:
             # command = path opts
             if path:
-                diffopts = shlex.split(path)
-                path = diffopts.pop(0)
+                path, diffopts = _splitcmdline(path)
             else:
                 path, diffopts = cmd, []
         # look for diff arguments in [diff-tools] then [merge-tools]
@@ -295,11 +342,14 @@  def uisetup(ui):
             args = ui.config('diff-tools', cmd+'.diffargs') or \
                    ui.config('merge-tools', cmd+'.diffargs')
             if args:
-                diffopts = shlex.split(args)
+                diffopts = [args]
         def save(cmd, path, diffopts):
             '''use closure to save diff command to use'''
             def mydiff(ui, repo, *pats, **opts):
-                return dodiff(ui, repo, path, diffopts + opts['option'],
+                # they should be quoted explicitly, because quotations
+                # for them are already stripped by shell
+                runtimeopts = map(util.shellquote, opts['option'])
+                return dodiff(ui, repo, path, diffopts + runtimeopts,
                               pats, opts)
             doc = _('''\
 use %(path)s to diff repository (or selected files)
diff --git a/tests/test-doctest.py b/tests/test-doctest.py
--- a/tests/test-doctest.py
+++ b/tests/test-doctest.py
@@ -31,4 +31,5 @@  testmod('mercurial.util', testtarget='pl
 testmod('hgext.convert.cvsps')
 testmod('hgext.convert.filemap')
 testmod('hgext.convert.subversion')
+testmod('hgext.extdiff')
 testmod('hgext.mq')