Submitter | Katsunori FUJIWARA |
---|---|
Date | Dec. 1, 2014, 5:44 p.m. |
Message ID | <d887d27304009d1cac05.1417455843@juju> |
Download | mbox | patch |
Permalink | /patch/6932/ |
State | Superseded |
Delegated to: | Matt Mackall |
Headers | show |
Comments
On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote: > # HG changeset patch > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > # Date 1417454278 -32400 > # Tue Dec 02 02:17:58 2014 +0900 > # Branch stable > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0 > # Parent edf29f9c15f0f171847f4c7a8184cca4e95c8b31 > extdiff: quote only options specified at runtime (issue4463) > > For arguments and options including white spaces, changeset > 72a89cf86fcd introduced fully quoting on all command line arguments > and options for external diff command. > > But this causes unexpected behavior of extdiff with WinMerge, because > WinMerge can't work correctly, when command line options in Windows > standard style are quoted: for example, 'WinMerge /r ....' is OK, but > 'WinMerge "/r" ....' is NG). Presumably this is no good because WinMerge doesn't use the standard argument parser in Microsoft's C runtime (the thing that gives argc and argv to main()) and instead uses the raw single-string lpCmdLine passed to WinMain... and then does its own broken parsing that doesn't handle quoting on things that aren't filenames. Sigh. I'm more than a little worried that we're putting this sort of obscure knowledge in a special function in a weird extension and not somewhere in util. > +def _splitcmdline(cmdline): > + '''Split diff command line configuration into command path and options > + > + This returns ``(commandpath, options)`` tuple. ``options`` is a > + list, but each command line options specified in ``cmdline`` are > + not splitted. So, ``len(options)`` is 0 or 1. Seems silly for it to be a list then? > + stream = cStringIO.StringIO(cmdline) > + # according to "shlex.split" implementation, ``posix`` is True > + # even on Windows > + lex = shlex.shlex(stream, posix=True) This function accepts strings, no need to mess with StringIO. Since it's getting late, looks like this has to wait for 3.2.3.
On Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote: > On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote: > > # HG changeset patch > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > > # Date 1417454278 -32400 > > # Tue Dec 02 02:17:58 2014 +0900 > > # Branch stable > > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0 > > # Parent edf29f9c15f0f171847f4c7a8184cca4e95c8b31 > > extdiff: quote only options specified at runtime (issue4463) > > > > For arguments and options including white spaces, changeset > > 72a89cf86fcd introduced fully quoting on all command line arguments > > and options for external diff command. > > > > But this causes unexpected behavior of extdiff with WinMerge, because > > WinMerge can't work correctly, when command line options in Windows > > standard style are quoted: for example, 'WinMerge /r ....' is OK, but > > 'WinMerge "/r" ....' is NG). > > Presumably this is no good because WinMerge doesn't use the standard > argument parser in Microsoft's C runtime (the thing that gives argc and > argv to main()) and instead uses the raw single-string lpCmdLine passed > to WinMain... and then does its own broken parsing that doesn't handle > quoting on things that aren't filenames. Sigh. How about making windows.shellquote() not quote known-safe argument? As many GUI applications start from WinMain() (or MFC's equivalent), I think, they can easily invent broken command-line parsers. subprocess.list2cmdline() might have some hints, which says "using the same rules as the MS C runtime." Regards,
At Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote: > > On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote: > > # HG changeset patch > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > > # Date 1417454278 -32400 > > # Tue Dec 02 02:17:58 2014 +0900 > > # Branch stable > > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0 > > # Parent edf29f9c15f0f171847f4c7a8184cca4e95c8b31 > > extdiff: quote only options specified at runtime (issue4463) > > > > For arguments and options including white spaces, changeset > > 72a89cf86fcd introduced fully quoting on all command line arguments > > and options for external diff command. > > > > But this causes unexpected behavior of extdiff with WinMerge, because > > WinMerge can't work correctly, when command line options in Windows > > standard style are quoted: for example, 'WinMerge /r ....' is OK, but > > 'WinMerge "/r" ....' is NG). > > Presumably this is no good because WinMerge doesn't use the standard > argument parser in Microsoft's C runtime (the thing that gives argc and > argv to main()) and instead uses the raw single-string lpCmdLine passed > to WinMain... and then does its own broken parsing that doesn't handle > quoting on things that aren't filenames. Sigh. > > I'm more than a little worried that we're putting this sort of obscure > knowledge in a special function in a weird extension and not somewhere > in util. At this posting, I focused on localizing changes in extdiff. I'll post more generalized ones. > > +def _splitcmdline(cmdline): > > + '''Split diff command line configuration into command path and options > > + > > + This returns ``(commandpath, options)`` tuple. ``options`` is a > > + list, but each command line options specified in ``cmdline`` are > > + not splitted. So, ``len(options)`` is 0 or 1. > > Seems silly for it to be a list then? To keep specified configuration as it is, "options" in the tuple returned by "_splitcmdline()" never be splitted into list. It is always treated as a single string. For example: $ hg showconfig extdiff extdiff.winmerge = winmerge /r "FOO" BAR $ hg winmerge -o "foo bar" -o baz Extdiff in the case above causes: 1. "_splitcmdline()" returns "('winmerge', '/r \"FOO\" BAR')", and "diffopts" is initialized as "['/r \"FOO\" BAR']" 2. "['foo bar', 'baz']" are passed as "opts['options']", and "runtimeopts" is initialized as "['\"foo bar\"', 'baz']" (= "map(util.shellquote, opts['option'])") 3. then, "dodiff()" is invoked with "['/r \"FOO\" BAR', '\"foo bar\"', 'baz']" (= "diffopts + runtimeopts") as "diffopts", and 4. command line for "util.system" is build by "' '.join(diffopts)" => '/r \"FOO\" BAR \"foo bar\" baz' This allows users to quote arguments/options arbitrarily in the configuration files. > > + stream = cStringIO.StringIO(cmdline) > > + # according to "shlex.split" implementation, ``posix`` is True > > + # even on Windows > > + lex = shlex.shlex(stream, posix=True) > > This function accepts strings, no need to mess with StringIO. Yes, "shlex.shlex" can accept string. In fact, I implemented "_splitcmdline" with code below at first. cmdpath = shlex.split(cmdline)[0] diffopts = cmdline[(cmdpath.index(cmdpath) + len(cmdpath)):] if diffopts and diffopts[0] in '"\'': diffopts = diffopts[1:] # skip closing quotation return (cmdpath, diffopts) But this doesn't work correclty for complicated cmdpath, for example '/"usr/local"/bin/foobar' (may I worry too much ? :-)) On the other hand, passing cStringIO allows to know exactly what characters are not yet read in by shlex, via "stream.tell()". > Since it's getting late, looks like this has to wait for 3.2.3. > > -- > Mathematics is the supreme nostalgia of our time. > > > ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
At Wed, 3 Dec 2014 00:05:04 +0900, Yuya Nishihara wrote: > > On Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote: > > On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote: > > > # HG changeset patch > > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > > > # Date 1417454278 -32400 > > > # Tue Dec 02 02:17:58 2014 +0900 > > > # Branch stable > > > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0 > > > # Parent edf29f9c15f0f171847f4c7a8184cca4e95c8b31 > > > extdiff: quote only options specified at runtime (issue4463) > > > > > > For arguments and options including white spaces, changeset > > > 72a89cf86fcd introduced fully quoting on all command line arguments > > > and options for external diff command. > > > > > > But this causes unexpected behavior of extdiff with WinMerge, because > > > WinMerge can't work correctly, when command line options in Windows > > > standard style are quoted: for example, 'WinMerge /r ....' is OK, but > > > 'WinMerge "/r" ....' is NG). > > > > Presumably this is no good because WinMerge doesn't use the standard > > argument parser in Microsoft's C runtime (the thing that gives argc and > > argv to main()) and instead uses the raw single-string lpCmdLine passed > > to WinMain... and then does its own broken parsing that doesn't handle > > quoting on things that aren't filenames. Sigh. > > How about making windows.shellquote() not quote known-safe argument? > As many GUI applications start from WinMain() (or MFC's equivalent), I think, > they can easily invent broken command-line parsers. > > subprocess.list2cmdline() might have some hints, which says "using the same > rules as the MS C runtime." It may work incorrectly, if environment variables are used in configuration. Also with cmd.exe on Windows, environment variable expansions are executed before splitting command line into each arguments. For example, when "CONCAT" has "foo bar baz" value: winmerge %CONCAT% => winmerge foo bar baz (taking 3 arguments) winmerge "%CONCAT%" => winmerge "foo bar baz" (taking 1 argument) If "windows.shellquote()" doesn't quote arguments including %CONCAT%, "shlex.split + windows.shellquote" combination doesn't treat %CONCAT% as a single argument, even if users quote %CONCAT% explicitly in the configuration. On the other hand, users may not want to treat %CONCAT% as a single argument (shouldn't users do such thing ? maybe, but we shouldn't expect so). In this case, quoting %CONCAT% in "windows.shellquote" works incorrectly for users. IMHO, root cause of this problem is that "shlex.split" loses whether users really want to quote each of arguments or not. Keeping the configuration string specified by users as it is (= avoiding "shlex.split") isn't the best solution, but seems to be better than others in almost all cases. Quoting options specified in the command line seems to be reasonable, because environment variables should be already expanded by "cmd.exe". Even though users may delay expansion by "%%CONCAT%%", such users should be able to avoid problem by their own skill :-) > Regards, > ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
On Thu, 04 Dec 2014 20:48:24 +0900, FUJIWARA Katsunori wrote: > At Wed, 3 Dec 2014 00:05:04 +0900, > Yuya Nishihara wrote: > > On Mon, 01 Dec 2014 18:13:15 -0600, Matt Mackall wrote: > > > On Tue, 2014-12-02 at 02:44 +0900, FUJIWARA Katsunori wrote: > > > > # HG changeset patch > > > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > > > > # Date 1417454278 -32400 > > > > # Tue Dec 02 02:17:58 2014 +0900 > > > > # Branch stable > > > > # Node ID d887d27304009d1cac05a1db4399f69d1778c5f0 > > > > # Parent edf29f9c15f0f171847f4c7a8184cca4e95c8b31 > > > > extdiff: quote only options specified at runtime (issue4463) > > > > > > > > For arguments and options including white spaces, changeset > > > > 72a89cf86fcd introduced fully quoting on all command line arguments > > > > and options for external diff command. > > > > > > > > But this causes unexpected behavior of extdiff with WinMerge, because > > > > WinMerge can't work correctly, when command line options in Windows > > > > standard style are quoted: for example, 'WinMerge /r ....' is OK, but > > > > 'WinMerge "/r" ....' is NG). > > > > > > Presumably this is no good because WinMerge doesn't use the standard > > > argument parser in Microsoft's C runtime (the thing that gives argc and > > > argv to main()) and instead uses the raw single-string lpCmdLine passed > > > to WinMain... and then does its own broken parsing that doesn't handle > > > quoting on things that aren't filenames. Sigh. > > > > How about making windows.shellquote() not quote known-safe argument? > > As many GUI applications start from WinMain() (or MFC's equivalent), I think, > > they can easily invent broken command-line parsers. > > > > subprocess.list2cmdline() might have some hints, which says "using the same > > rules as the MS C runtime." > > It may work incorrectly, if environment variables are used in > configuration. > > Also with cmd.exe on Windows, environment variable expansions are > executed before splitting command line into each arguments. For > example, when "CONCAT" has "foo bar baz" value: > > winmerge %CONCAT% => winmerge foo bar baz (taking 3 arguments) > winmerge "%CONCAT%" => winmerge "foo bar baz" (taking 1 argument) > > If "windows.shellquote()" doesn't quote arguments including %CONCAT%, > "shlex.split + windows.shellquote" combination doesn't treat %CONCAT% > as a single argument, even if users quote %CONCAT% explicitly in the > configuration. > > On the other hand, users may not want to treat %CONCAT% as a single > argument (shouldn't users do such thing ? maybe, but we shouldn't > expect so). In this case, quoting %CONCAT% in "windows.shellquote" > works incorrectly for users. It's the same for Unix. $CONCAT shouldn't be quoted in that case. > IMHO, root cause of this problem is that "shlex.split" loses whether > users really want to quote each of arguments or not. > > Keeping the configuration string specified by users as it is (= > avoiding "shlex.split") isn't the best solution, but seems to be > better than others in almost all cases. I misread 72a89cf86fcd. I thought it really wanted to quote options, but it seems it just wanted to disable <, >, |, &, or ; that terminates a command. So I agree that shlex.split()+shellquote() is not appropriate. Regards,
Patch
diff --git a/hgext/extdiff.py b/hgext/extdiff.py --- a/hgext/extdiff.py +++ b/hgext/extdiff.py @@ -64,7 +64,7 @@ pretty fast (at least faster than having from mercurial.i18n import _ from mercurial.node import short, nullid from mercurial import cmdutil, scmutil, util, commands, encoding -import os, shlex, shutil, tempfile, re +import os, shlex, shutil, tempfile, re, cStringIO cmdtable = {} command = cmdutil.command(cmdtable) @@ -121,7 +121,7 @@ def dodiff(ui, repo, diffcmd, diffopts, revs = opts.get('rev') change = opts.get('change') - args = ' '.join(map(util.shellquote, diffopts)) + args = ' '.join(diffopts) do3way = '$parent2' in args if revs and change: @@ -268,26 +268,73 @@ def extdiff(ui, repo, *pats, **opts): revisions are specified, the working directory files are compared to its parent.''' program = opts.get('program') - option = opts.get('option') + # they should be quoted explicitly, because quotations + # for them are already stripped by shell + option = map(util.shellquote, opts['option']) if not program: program = 'diff' option = option or ['-Npru'] return dodiff(ui, repo, program, option, pats, opts) +def _splitcmdline(cmdline): + '''Split diff command line configuration into command path and options + + This returns ``(commandpath, options)`` tuple. ``options`` is a + list, but each command line options specified in ``cmdline`` are + not splitted. So, ``len(options)`` is 0 or 1. + + This requires explicit quoting for command path and options + including white spaces in configuration files. But on the other + hand, users can control quoting for command path and options by + themselves. + + >>> _splitcmdline('foo') + ('foo', []) + >>> _splitcmdline('foo bar baz') + ('foo', ['bar baz']) + >>> _splitcmdline('"foo foo"') + ('foo foo', []) + >>> _splitcmdline('"foo foo" bar baz') + ('foo foo', ['bar baz']) + >>> _splitcmdline('"foo foo" "bar" baz') + ('foo foo', ['"bar" baz']) + >>> _splitcmdline('foo "bar" baz') + ('foo', ['"bar" baz']) + >>> _splitcmdline('"foo"/"foo" "bar" baz') + ('foo/foo', ['"bar" baz']) + ''' + stream = cStringIO.StringIO(cmdline) + # according to "shlex.split" implementation, ``posix`` is True + # even on Windows + lex = shlex.shlex(stream, posix=True) + lex.whitespace_split = True + lex.commenters = '' + + cmdpath = lex.next() + # current position of stream should be next of the first white + # space after diff command path, because characters in stream are + # read one by one via "StringIO.read(1)" + diffopts = cmdline[stream.tell():] + + if diffopts: + return (cmdpath, [diffopts]) + else: + return (cmdpath, []) + def uisetup(ui): for cmd, path in ui.configitems('extdiff'): if cmd.startswith('cmd.'): cmd = cmd[4:] if not path: path = cmd - diffopts = shlex.split(ui.config('extdiff', 'opts.' + cmd, '')) + diffopts = ui.config('extdiff', 'opts.' + cmd, '') + diffopts = diffopts and [diffopts] or [] elif cmd.startswith('opts.'): continue else: # command = path opts if path: - diffopts = shlex.split(path) - path = diffopts.pop(0) + path, diffopts = _splitcmdline(path) else: path, diffopts = cmd, [] # look for diff arguments in [diff-tools] then [merge-tools] @@ -295,11 +342,14 @@ def uisetup(ui): args = ui.config('diff-tools', cmd+'.diffargs') or \ ui.config('merge-tools', cmd+'.diffargs') if args: - diffopts = shlex.split(args) + diffopts = [args] def save(cmd, path, diffopts): '''use closure to save diff command to use''' def mydiff(ui, repo, *pats, **opts): - return dodiff(ui, repo, path, diffopts + opts['option'], + # they should be quoted explicitly, because quotations + # for them are already stripped by shell + runtimeopts = map(util.shellquote, opts['option']) + return dodiff(ui, repo, path, diffopts + runtimeopts, pats, opts) doc = _('''\ use %(path)s to diff repository (or selected files) diff --git a/tests/test-doctest.py b/tests/test-doctest.py --- a/tests/test-doctest.py +++ b/tests/test-doctest.py @@ -31,4 +31,5 @@ testmod('mercurial.util', testtarget='pl testmod('hgext.convert.cvsps') testmod('hgext.convert.filemap') testmod('hgext.convert.subversion') +testmod('hgext.extdiff') testmod('hgext.mq')