Patchwork convert: parse quoted keys and values in mapfiles

login
register
mail settings
Submitter Ingmar Blonk
Date Sept. 18, 2017, 8:28 p.m.
Message ID <17657fd6-7de1-5a6a-cc42-e3d2aba66d94@gmail.com>
Download mbox | patch
Permalink /patch/23992/
State Changes Requested
Headers show

Comments

Ingmar Blonk - Sept. 18, 2017, 8:28 p.m.
# HG changeset patch
# User Ingmar Blonk <ingmar.blonk@gmail.com>
# Date 1505662991 -7200
#      Sun Sep 17 17:43:11 2017 +0200
# Node ID af658bed1729dd0e48a60c67a82f4bc44f578c97
# Parent  448725a2ef7356524bcc9638a5c5eaaf59f263af
convert: parse quoted keys and values in mapfiles

Parsing quoted texts in mapfiles allows whitespace usage for e.g. branch 
names
in a banchmap.

        Mercurial Source
        ################
@@ -589,7 +589,7 @@
    convert_source=mysource

    $ cat > branchmap.txt << EOF
-  > old branch new_branch
+  > "old branch" new_branch
    > EOF

    $ hg -R a branch -q 'old branch'
Matt Harbison - Sept. 19, 2017, 2:16 a.m.
On Mon, 18 Sep 2017 16:28:45 -0400, Ingmar Blonk <ingmar.blonk@gmail.com>  
wrote:

> # HG changeset patch
> # User Ingmar Blonk <ingmar.blonk@gmail.com>
> # Date 1505662991 -7200
> #      Sun Sep 17 17:43:11 2017 +0200
> # Node ID af658bed1729dd0e48a60c67a82f4bc44f578c97
> # Parent  448725a2ef7356524bcc9638a5c5eaaf59f263af
> convert: parse quoted keys and values in mapfiles

This should probably be flagged as (BC), since it is changing documented  
behavior.  It hasn't been documented for long though- I just updated it in  
June (3a57bfd369d4).  Alternately, check the line to see if it contains  
'"', to decide to split the old way or new way?  (Keep the old test too if  
you go this route.)

Also, '"' is apparently a valid character in a branch name.  Maybe it  
should be banned (separately)?  (Though branchmap also works when  
converting foreign repos, so that may not buy much here.)

> Parsing quoted texts in mapfiles allows whitespace usage for e.g. branch  
> names
> in a banchmap.
>
> diff -r 448725a2ef73 -r af658bed1729 hgext/convert/__init__.py
> --- a/hgext/convert/__init__.py    Fri Sep 15 00:01:57 2017 -0700
> +++ b/hgext/convert/__init__.py    Sun Sep 17 17:43:11 2017 +0200
> @@ -263,8 +263,8 @@
>
>       where "original_branch_name" is the name of the branch in the
>       source repository, and "new_branch_name" is the name of the branch
> -    is the destination repository. No whitespace is allowed in the new
> -    branch name. This can be used to (for instance) move code in one
> +    is the destination repository. Branch names containing whitespace
> +    should be quoted. This can be used to (for instance) move code in  
> one
>       repository from "default" to a named branch.
>
>       Mercurial Source
> diff -r 448725a2ef73 -r af658bed1729 hgext/convert/common.py
> --- a/hgext/convert/common.py    Fri Sep 15 00:01:57 2017 -0700
> +++ b/hgext/convert/common.py    Sun Sep 17 17:43:11 2017 +0200
> @@ -11,6 +11,7 @@
>   import errno
>   import os
>   import re
> +import shlex
>   import subprocess
>
>   from mercurial.i18n import _
> @@ -461,7 +462,7 @@
>                   # Ignore blank lines
>                   continue
>               try:
> -                key, value = line.rsplit(' ', 1)
> +                key, value = shlex.split(line, posix=True)
>               except ValueError:
>                   raise error.Abort(
>                       _('syntax error in %s(%d): key/value pair  
> expected')
> diff -r 448725a2ef73 -r af658bed1729 tests/test-convert.t
> --- a/tests/test-convert.t    Fri Sep 15 00:01:57 2017 -0700
> +++ b/tests/test-convert.t    Sun Sep 17 17:43:11 2017 +0200
> @@ -125,9 +125,9 @@
>
>         where "original_branch_name" is the name of the branch in the  
> source
>         repository, and "new_branch_name" is the name of the branch is  
> the
> -      destination repository. No whitespace is allowed in the new  
> branch name.
> -      This can be used to (for instance) move code in one repository  
> from
> -      "default" to a named branch.
> +      destination repository. Branch names containing whitespace should  
> be
> +      quoted. This can be used to (for instance) move code in one  
> repository
> +      from "default" to a named branch.
>
>         Mercurial Source
>         ################
> @@ -589,7 +589,7 @@
>     convert_source=mysource
>
>     $ cat > branchmap.txt << EOF
> -  > old branch new_branch
> +  > "old branch" new_branch
>     > EOF
>
>     $ hg -R a branch -q 'old branch'
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Sept. 20, 2017, 1:57 p.m.
On Mon, 18 Sep 2017 22:16:49 -0400, Matt Harbison wrote:
> On Mon, 18 Sep 2017 16:28:45 -0400, Ingmar Blonk <ingmar.blonk@gmail.com>  
> wrote:
> 
> > # HG changeset patch
> > # User Ingmar Blonk <ingmar.blonk@gmail.com>
> > # Date 1505662991 -7200
> > #      Sun Sep 17 17:43:11 2017 +0200
> > # Node ID af658bed1729dd0e48a60c67a82f4bc44f578c97
> > # Parent  448725a2ef7356524bcc9638a5c5eaaf59f263af
> > convert: parse quoted keys and values in mapfiles
> 
> This should probably be flagged as (BC), since it is changing documented  
> behavior.

Yes.

> It hasn't been documented for long though- I just updated it in  
> June (3a57bfd369d4).  Alternately, check the line to see if it contains  
> '"', to decide to split the old way or new way?  (Keep the old test too if  
> you go this route.)

I don't think we can fix this problem reliably without a BC, but maybe we
can try to preserve the old behavior if parsed line has more than two
tokens. And more conservative "wordchars" will minimize the behavior change.
See filemap.filemapper for example.

> > diff -r 448725a2ef73 -r af658bed1729 hgext/convert/common.py
> > --- a/hgext/convert/common.py    Fri Sep 15 00:01:57 2017 -0700
> > +++ b/hgext/convert/common.py    Sun Sep 17 17:43:11 2017 +0200
> > @@ -11,6 +11,7 @@
> >   import errno
> >   import os
> >   import re
> > +import shlex
> >   import subprocess
> >
> >   from mercurial.i18n import _
> > @@ -461,7 +462,7 @@
> >                   # Ignore blank lines
> >                   continue
> >               try:
> > -                key, value = line.rsplit(' ', 1)
> > +                key, value = shlex.split(line, posix=True)
> >               except ValueError:
> >                   raise error.Abort(
> >                       _('syntax error in %s(%d): key/value pair  
> > expected')

The mapfile class supports read and write, but this only changes the read
function, which doesn't make sense. Perhaps we should stop using the mapfile
class for branchmap and write a dedicated parse function.
Yuya Nishihara - Sept. 21, 2017, 1:47 p.m.
On Wed, 20 Sep 2017 22:57:10 +0900, Yuya Nishihara wrote:
> And more conservative "wordchars" will minimize the behavior change.
> See filemap.filemapper for example.

Never mind. wordchars appears not important if whitespace_split is set.
Ingmar Blonk - Sept. 24, 2017, 4:45 p.m.
On 20-9-2017 15:57, Yuya Nishihara wrote:
> On Mon, 18 Sep 2017 22:16:49 -0400, Matt Harbison wrote:
>> On Mon, 18 Sep 2017 16:28:45 -0400, Ingmar Blonk <ingmar.blonk@gmail.com>
>> wrote:
>>
>>> # HG changeset patch
>>> # User Ingmar Blonk <ingmar.blonk@gmail.com>
>>> # Date 1505662991 -7200
>>> #      Sun Sep 17 17:43:11 2017 +0200
>>> # Node ID af658bed1729dd0e48a60c67a82f4bc44f578c97
>>> # Parent  448725a2ef7356524bcc9638a5c5eaaf59f263af
>>> convert: parse quoted keys and values in mapfiles
>> This should probably be flagged as (BC), since it is changing documented
>> behavior.
> Yes.
>
>> It hasn't been documented for long though- I just updated it in
>> June (3a57bfd369d4).  Alternately, check the line to see if it contains
>> '"', to decide to split the old way or new way?  (Keep the old test too if
>> you go this route.)
> I don't think we can fix this problem reliably without a BC, but maybe we
> can try to preserve the old behavior if parsed line has more than two
> tokens. And more conservative "wordchars" will minimize the behavior change.
> See filemap.filemapper for example.
Thought about it, but there always seems be a possible scenario which 
isn't handled well or could give unexpected results.
Didn't change it for patch v2, but did add the (BC) flag.
>>> diff -r 448725a2ef73 -r af658bed1729 hgext/convert/common.py
>>> --- a/hgext/convert/common.py    Fri Sep 15 00:01:57 2017 -0700
>>> +++ b/hgext/convert/common.py    Sun Sep 17 17:43:11 2017 +0200
>>> @@ -11,6 +11,7 @@
>>>    import errno
>>>    import os
>>>    import re
>>> +import shlex
>>>    import subprocess
>>>
>>>    from mercurial.i18n import _
>>> @@ -461,7 +462,7 @@
>>>                    # Ignore blank lines
>>>                    continue
>>>                try:
>>> -                key, value = line.rsplit(' ', 1)
>>> +                key, value = shlex.split(line, posix=True)
>>>                except ValueError:
>>>                    raise error.Abort(
>>>                        _('syntax error in %s(%d): key/value pair
>>> expected')
> The mapfile class supports read and write, but this only changes the read
> function, which doesn't make sense. Perhaps we should stop using the mapfile
> class for branchmap and write a dedicated parse function.
Write support is fixed in patch v2.

Patch

diff -r 448725a2ef73 -r af658bed1729 hgext/convert/__init__.py
--- a/hgext/convert/__init__.py    Fri Sep 15 00:01:57 2017 -0700
+++ b/hgext/convert/__init__.py    Sun Sep 17 17:43:11 2017 +0200
@@ -263,8 +263,8 @@ 

      where "original_branch_name" is the name of the branch in the
      source repository, and "new_branch_name" is the name of the branch
-    is the destination repository. No whitespace is allowed in the new
-    branch name. This can be used to (for instance) move code in one
+    is the destination repository. Branch names containing whitespace
+    should be quoted. This can be used to (for instance) move code in one
      repository from "default" to a named branch.

      Mercurial Source
diff -r 448725a2ef73 -r af658bed1729 hgext/convert/common.py
--- a/hgext/convert/common.py    Fri Sep 15 00:01:57 2017 -0700
+++ b/hgext/convert/common.py    Sun Sep 17 17:43:11 2017 +0200
@@ -11,6 +11,7 @@ 
  import errno
  import os
  import re
+import shlex
  import subprocess

  from mercurial.i18n import _
@@ -461,7 +462,7 @@ 
                  # Ignore blank lines
                  continue
              try:
-                key, value = line.rsplit(' ', 1)
+                key, value = shlex.split(line, posix=True)
              except ValueError:
                  raise error.Abort(
                      _('syntax error in %s(%d): key/value pair expected')
diff -r 448725a2ef73 -r af658bed1729 tests/test-convert.t
--- a/tests/test-convert.t    Fri Sep 15 00:01:57 2017 -0700
+++ b/tests/test-convert.t    Sun Sep 17 17:43:11 2017 +0200
@@ -125,9 +125,9 @@ 

        where "original_branch_name" is the name of the branch in the source
        repository, and "new_branch_name" is the name of the branch is the
-      destination repository. No whitespace is allowed in the new 
branch name.
-      This can be used to (for instance) move code in one repository from
-      "default" to a named branch.
+      destination repository. Branch names containing whitespace should be
+      quoted. This can be used to (for instance) move code in one 
repository
+      from "default" to a named branch.