Patchwork [1,of,2,STABLE,V2] util: add "shellsplit()" to centralize the logic to split command line

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Dec. 4, 2014, 5:22 p.m.
Message ID <427cb697d124bea1a7e6.1417713752@juju>
Download mbox | patch
Permalink /patch/7006/
State Superseded
Headers show

Comments

Katsunori FUJIWARA - Dec. 4, 2014, 5:22 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1417712035 -32400
#      Fri Dec 05 01:53:55 2014 +0900
# Branch stable
# Node ID 427cb697d124bea1a7e67ebcdeca2fea81a925c0
# Parent  57d35d3c1cf170513cc150c5021f5e3e0d7cdafb
util: add "shellsplit()" to centralize the logic to split command line

This is the preparation for fixing issue4463.

This patch uses StringIO object to know exactly where the first shell
delimiter character is placed, because "shlex.next()" returns the
de-quoted string and this makes eliminating it from original string
difficult in some complicated cases.

For example, "shlex.next()" for '"foo"/"foo" bar baz' returns 'foo/foo'.

On the other hand, "StringIO.tell()" allows us to know exactly what
characters are not yet read in by shlex.

Another patch series for "default" branch will replace existing
"shlex.split()" invocations by "util.shellsplit()" (and add new
"check-code.py" rule to prevent from using "shlex.split()").
Katsunori FUJIWARA - Dec. 5, 2014, 11:22 a.m.
At Fri, 05 Dec 2014 02:22:32 +0900,
FUJIWARA Katsunori wrote:
> 
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1417712035 -32400
> #      Fri Dec 05 01:53:55 2014 +0900
> # Branch stable
> # Node ID 427cb697d124bea1a7e67ebcdeca2fea81a925c0
> # Parent  57d35d3c1cf170513cc150c5021f5e3e0d7cdafb
> util: add "shellsplit()" to centralize the logic to split command line
> 
> This is the preparation for fixing issue4463.
> 
> This patch uses StringIO object to know exactly where the first shell
> delimiter character is placed, because "shlex.next()" returns the
> de-quoted string and this makes eliminating it from original string
> difficult in some complicated cases.
> 
> For example, "shlex.next()" for '"foo"/"foo" bar baz' returns 'foo/foo'.
> 
> On the other hand, "StringIO.tell()" allows us to know exactly what
> characters are not yet read in by shlex.
> 
> Another patch series for "default" branch will replace existing
> "shlex.split()" invocations by "util.shellsplit()" (and add new
> "check-code.py" rule to prevent from using "shlex.split()").

Please ignore this series, if this is not yet reviewed/queued.

I found that patch #1 of this series can become more suitable to be
checked about "use util.shellsplit instead of shlex.split" by
"check-code.py".

I'll post revised series soon.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -19,7 +19,7 @@  import error, osutil, encoding
 import errno, shutil, sys, tempfile, traceback
 import re as remod
 import os, time, datetime, calendar, textwrap, signal, collections
-import imp, socket, urllib
+import imp, socket, urllib, shlex, cStringIO
 
 if os.name == 'nt':
     import windows as platform
@@ -609,6 +609,49 @@  def _sethgexecutable(path):
     global _hgexecutable
     _hgexecutable = path
 
+def shellsplit(cmdline, all=True):
+    '''Split a command line into components
+
+    If ``all`` is true (default), this splits ``cmdline`` into
+    components, and returns list of them.
+
+    Otherwise, this splits ``cmdline`` only at the first shell
+    delimiter character, and returns ``(command, rest)`` tuple.
+
+    Code paths splitting the string from configuration files into each
+    components should use this instead of ``shlex.split``, because the
+    latter loses whether users really want to quote each components or
+    not (see issue4463 for detail).
+
+    >>> shellsplit('foo bar baz')
+    ['foo', 'bar', 'baz']
+    >>> shellsplit('foo', all=False)
+    ('foo', '')
+    >>> shellsplit('foo   bar baz', all=False)
+    ('foo', 'bar baz')
+    >>> shellsplit('"foo foo"', all=False)
+    ('foo foo', '')
+    >>> shellsplit('"foo foo" bar baz', all=False)
+    ('foo foo', 'bar baz')
+    >>> shellsplit('"foo foo"   "bar" baz', all=False)
+    ('foo foo', '"bar" baz')
+    >>> shellsplit('foo "bar" baz', all=False)
+    ('foo', '"bar" baz')
+    >>> shellsplit('"foo"/"foo" "bar" baz', all=False)
+    ('foo/foo', '"bar" baz')
+    '''
+    if all:
+        return shlex.split(cmdline)
+
+    stream = cStringIO.StringIO(cmdline)
+    # According to "shlex.split" implementation, ``posix`` is True
+    # even on Windows
+    lex = shlex.shlex(stream, posix=True)
+    lex.whitespace_split = True
+    lex.commenters = ''
+
+    return (lex.next(), cmdline[stream.tell():].lstrip())
+
 def system(cmd, environ={}, cwd=None, onerr=None, errprefix=None, out=None):
     '''enhanced shell command execution.
     run with environment maybe modified, maybe in different dir.