Patchwork [V2] convert: parse quoted keys and values in mapfiles (BC)

login
register
mail settings
Submitter Ingmar Blonk
Date Sept. 24, 2017, 4:44 p.m.
Message ID <3a3276c2-d685-9234-d69f-e206d71e391d@gmail.com>
Download mbox | patch
Permalink /patch/24137/
State Changes Requested
Headers show

Comments

Ingmar Blonk - Sept. 24, 2017, 4:44 p.m.
# HG changeset patch
# User Ingmar Blonk <ingmar.blonk@gmail.com>
# Date 1506267061 -7200
#      Sun Sep 24 17:31:01 2017 +0200
# Node ID 9d95751ccea50ed91a457315762c18f3a9cf69f4
# Parent  575097b4dce054a5b8d992fe15797d9d62ceaf71
convert: parse quoted keys and values in mapfiles (BC)

Parsing quoted texts in mapfiles allows whitespace usage for e.g. branch 
names
in a banchmap.

        Mercurial Source
        ################
@@ -589,7 +589,7 @@
    convert_source=mysource

    $ cat > branchmap.txt << EOF
-  > old branch new_branch
+  > "old branch" new_branch
    > EOF

    $ hg -R a branch -q 'old branch'
Yuya Nishihara - Sept. 25, 2017, 1:23 p.m.
On Sun, 24 Sep 2017 18:44:57 +0200, Ingmar Blonk wrote:
> # HG changeset patch
> # User Ingmar Blonk <ingmar.blonk@gmail.com>
> # Date 1506267061 -7200
> #      Sun Sep 24 17:31:01 2017 +0200
> # Node ID 9d95751ccea50ed91a457315762c18f3a9cf69f4
> # Parent  575097b4dce054a5b8d992fe15797d9d62ceaf71
> convert: parse quoted keys and values in mapfiles (BC)
> 
> Parsing quoted texts in mapfiles allows whitespace usage for e.g. branch 
> names
> in a banchmap.

This patch appears to be whitespace damaged. Can you use the patchbomb?

https://www.mercurial-scm.org/wiki/ContributingChanges#Emailing_patches

> diff -r 575097b4dce0 -r 9d95751ccea5 hgext/convert/common.py
> --- a/hgext/convert/common.py    Thu Sep 14 13:14:32 2017 -0700
> +++ b/hgext/convert/common.py    Sun Sep 24 17:31:01 2017 +0200
> @@ -10,7 +10,9 @@
>   import datetime
>   import errno
>   import os
> +import pipes
>   import re
> +import shlex
>   import subprocess
> 
>   from mercurial.i18n import _
> @@ -461,7 +463,7 @@
>                   # Ignore blank lines
>                   continue
>               try:
> -                key, value = line.rsplit(' ', 1)
> +                key, value = shlex.split(line, posix=True)

Even if we can't catch all possibilities of BC breakages, I think it's better
to save common scenario.

> @@ -479,7 +481,7 @@
>                   raise error.Abort(
>                       _('could not open map file %r: %s') %
>                       (self.path, encoding.strtolocal(err.strerror)))
> -        self.fp.write('%s %s\n' % (key, value))
> +        self.fp.write('%s %s\n' % (pipes.quote(key), pipes.quote(value)))
>           self.fp.flush()
>           super(mapfile, self).__setitem__(key, value)

I meant the parsing of branchmap should be a separate function. The mapfile
class has write functionality because it's also used as an internal storage.
rsplit() was introduced at af1117f37fa7, and digging it further, it was
necessary to fix the issue 1224. So changing the syntax of mapfile will
probably break existing repositories.

https://bz.mercurial-scm.org/show_bug.cgi?id=1224
Ingmar Blonk - Sept. 26, 2017, 8:20 p.m.
On 25-9-2017 15:23, Yuya Nishihara wrote:
> On Sun, 24 Sep 2017 18:44:57 +0200, Ingmar Blonk wrote:
>> # HG changeset patch
>> # User Ingmar Blonk<ingmar.blonk@gmail.com>
>> # Date 1506267061 -7200
>> #      Sun Sep 24 17:31:01 2017 +0200
>> # Node ID 9d95751ccea50ed91a457315762c18f3a9cf69f4
>> # Parent  575097b4dce054a5b8d992fe15797d9d62ceaf71
>> convert: parse quoted keys and values in mapfiles (BC)
>>
>> Parsing quoted texts in mapfiles allows whitespace usage for e.g. branch
>> names
>> in a banchmap.
> This patch appears to be whitespace damaged. Can you use the patchbomb?
>
> https://www.mercurial-scm.org/wiki/ContributingChanges#Emailing_patches

Just looked into it, it seems to be my Thunderbird configuration. Will 
try to get it right next time.

>> diff -r 575097b4dce0 -r 9d95751ccea5 hgext/convert/common.py
>> --- a/hgext/convert/common.py    Thu Sep 14 13:14:32 2017 -0700
>> +++ b/hgext/convert/common.py    Sun Sep 24 17:31:01 2017 +0200
>> @@ -10,7 +10,9 @@
>>    import datetime
>>    import errno
>>    import os
>> +import pipes
>>    import re
>> +import shlex
>>    import subprocess
>>
>>    from mercurial.i18n import _
>> @@ -461,7 +463,7 @@
>>                    # Ignore blank lines
>>                    continue
>>                try:
>> -                key, value = line.rsplit(' ', 1)
>> +                key, value = shlex.split(line, posix=True)
> Even if we can't catch all possibilities of BC breakages, I think it's better
> to save common scenario.

Ok, detecting single and double quote usage is difficult taking escapes 
and correct closing quotes into account.
The best I can come up with is the following snippet:

     try:
         key, value = shlex.split(line, posix=True)
     except ValueError:
         try:
             # Fall back to old behavior
             key, value = line.rsplit(' ', 1)
         except ValueError:
             raise error.Abort(
                 _('syntax error in %s(%d): key/value pair expected')
                 % (self.path, i + 1))

The only situations when this could go wrong are when one forgets a 
closing quote or when one expects the old behavior, quotes are present 
(including closing quote) and the unpack doesn't raise.
You agree?

>> @@ -479,7 +481,7 @@
>>                    raise error.Abort(
>>                        _('could not open map file %r: %s') %
>>                        (self.path, encoding.strtolocal(err.strerror)))
>> -        self.fp.write('%s %s\n' % (key, value))
>> +        self.fp.write('%s %s\n' % (pipes.quote(key), pipes.quote(value)))
>>            self.fp.flush()
>>            super(mapfile, self).__setitem__(key, value)
> I meant the parsing of branchmap should be a separate function. The mapfile
> class has write functionality because it's also used as an internal storage.
> rsplit() was introduced at af1117f37fa7, and digging it further, it was
> necessary to fix the issue 1224. So changing the syntax of mapfile will
> probably break existing repositories.
>
> https://bz.mercurial-scm.org/show_bug.cgi?id=1224

I understood, however I looked a bit further into it.
The mapfile class is basically a dict backed by a file. The dict 
supports values with spaces, the file format only for the key. Adding 
the quote() in the write functionality will make the file support all 
dict keys and values.
Looking into issue 1224, the generated shamap will contain the spaces of 
the subversion directory paths as keys. The rsplit() allows the spaces 
in the keys. To me this seems to fix only half of the problem where 
you'd like full support for whitespace.
The proposed solution using pipes.quote() only adds surrounding quotes 
when spaces or quotes are in the string itself. This in combination with 
the above proposed parsing fall back mechanism should cover all cases 
that I can think of. At least test test-convert-svn-source runs fine.

I understand that for something like Mercurial stability and backwards 
compatibility is important, but on the other hand I'm trying to avoid 
writing extra code when not needed (and possibly leaving out an 
improvement). This is my first patch for Mercurial, so I'm not sure how 
these things are handled.
Yuya Nishihara - Sept. 27, 2017, 7:05 a.m.
On Tue, 26 Sep 2017 22:20:14 +0200, Ingmar Blonk wrote:
> On 25-9-2017 15:23, Yuya Nishihara wrote:
> Ok, detecting single and double quote usage is difficult taking escapes 
> and correct closing quotes into account.
> The best I can come up with is the following snippet:
> 
>      try:
>          key, value = shlex.split(line, posix=True)
>      except ValueError:
>          try:
>              # Fall back to old behavior
>              key, value = line.rsplit(' ', 1)
>          except ValueError:
>              raise error.Abort(
>                  _('syntax error in %s(%d): key/value pair expected')
>                  % (self.path, i + 1))
> 
> The only situations when this could go wrong are when one forgets a 
> closing quote or when one expects the old behavior, quotes are present 
> (including closing quote) and the unpack doesn't raise.
> You agree?

Yeah, that's what I have in mind.

> >> @@ -479,7 +481,7 @@
> >>                    raise error.Abort(
> >>                        _('could not open map file %r: %s') %
> >>                        (self.path, encoding.strtolocal(err.strerror)))
> >> -        self.fp.write('%s %s\n' % (key, value))
> >> +        self.fp.write('%s %s\n' % (pipes.quote(key), pipes.quote(value)))
> >>            self.fp.flush()
> >>            super(mapfile, self).__setitem__(key, value)
> > I meant the parsing of branchmap should be a separate function. The mapfile
> > class has write functionality because it's also used as an internal storage.
> > rsplit() was introduced at af1117f37fa7, and digging it further, it was
> > necessary to fix the issue 1224. So changing the syntax of mapfile will
> > probably break existing repositories.
> >
> > https://bz.mercurial-scm.org/show_bug.cgi?id=1224
> 
> I understood, however I looked a bit further into it.
> The mapfile class is basically a dict backed by a file. The dict 
> supports values with spaces, the file format only for the key. Adding 
> the quote() in the write functionality will make the file support all 
> dict keys and values.
> Looking into issue 1224, the generated shamap will contain the spaces of 
> the subversion directory paths as keys. The rsplit() allows the spaces 
> in the keys. To me this seems to fix only half of the problem where 
> you'd like full support for whitespace.
> The proposed solution using pipes.quote() only adds surrounding quotes 
> when spaces or quotes are in the string itself. This in combination with 
> the above proposed parsing fall back mechanism should cover all cases 
> that I can think of. At least test test-convert-svn-source runs fine.

shamap values should never contain spaces. That's why rsplit() was just
fine for data which the mapfile class was originally designed for. IMHO,
there's no need to break compatibility of the storage across hg versions.

> I understand that for something like Mercurial stability and backwards 
> compatibility is important, but on the other hand I'm trying to avoid 
> writing extra code when not needed (and possibly leaving out an 
> improvement). This is my first patch for Mercurial, so I'm not sure how 
> these things are handled.

If I understand correctly, branchmap can be a plain dict (with no write
function), so the new code would be just ~10 lines.

Patch

diff -r 575097b4dce0 -r 9d95751ccea5 hgext/convert/__init__.py
--- a/hgext/convert/__init__.py    Thu Sep 14 13:14:32 2017 -0700
+++ b/hgext/convert/__init__.py    Sun Sep 24 17:31:01 2017 +0200
@@ -263,8 +263,8 @@ 

      where "original_branch_name" is the name of the branch in the
      source repository, and "new_branch_name" is the name of the branch
-    is the destination repository. No whitespace is allowed in the new
-    branch name. This can be used to (for instance) move code in one
+    is the destination repository. Branch names containing whitespace
+    should be quoted. This can be used to (for instance) move code in one
      repository from "default" to a named branch.

      Mercurial Source
diff -r 575097b4dce0 -r 9d95751ccea5 hgext/convert/common.py
--- a/hgext/convert/common.py    Thu Sep 14 13:14:32 2017 -0700
+++ b/hgext/convert/common.py    Sun Sep 24 17:31:01 2017 +0200
@@ -10,7 +10,9 @@ 
  import datetime
  import errno
  import os
+import pipes
  import re
+import shlex
  import subprocess

  from mercurial.i18n import _
@@ -461,7 +463,7 @@ 
                  # Ignore blank lines
                  continue
              try:
-                key, value = line.rsplit(' ', 1)
+                key, value = shlex.split(line, posix=True)
              except ValueError:
                  raise error.Abort(
                      _('syntax error in %s(%d): key/value pair expected')
@@ -479,7 +481,7 @@ 
                  raise error.Abort(
                      _('could not open map file %r: %s') %
                      (self.path, encoding.strtolocal(err.strerror)))
-        self.fp.write('%s %s\n' % (key, value))
+        self.fp.write('%s %s\n' % (pipes.quote(key), pipes.quote(value)))
          self.fp.flush()
          super(mapfile, self).__setitem__(key, value)

diff -r 575097b4dce0 -r 9d95751ccea5 tests/test-convert.t
--- a/tests/test-convert.t    Thu Sep 14 13:14:32 2017 -0700
+++ b/tests/test-convert.t    Sun Sep 24 17:31:01 2017 +0200
@@ -125,9 +125,9 @@ 

        where "original_branch_name" is the name of the branch in the source
        repository, and "new_branch_name" is the name of the branch is the
-      destination repository. No whitespace is allowed in the new 
branch name.
-      This can be used to (for instance) move code in one repository from
-      "default" to a named branch.
+      destination repository. Branch names containing whitespace should be
+      quoted. This can be used to (for instance) move code in one 
repository
+      from "default" to a named branch.