Patchwork patch: improve handling of filenames containing spaces on import (issue3723)

login
register
mail settings
Submitter Matt Mackall
Date Dec. 6, 2012, 9:44 p.m.
Message ID <1354830287.6733.303.camel@calx>
Download mbox | patch
Permalink /patch/27/
State Not Applicable
Headers show

Comments

Matt Mackall - Dec. 6, 2012, 9:44 p.m.
On Thu, 2012-12-06 at 21:00 +0000, Alastair Houghton wrote:
> On 6 Dec 2012, at 18:55, Matt Mackall <mpm at selenic.com> wrote:
> 
> > Between your bug report and this description, there's lots of confusion
> > about whether you're talking about git diffs, standard diffs, or diffs
> > in general. I've been assuming the latter up to now.
> 
> Well, it is a general problem, but I certainly hadn't intended to try
> to fix it for the general case because there isn't a good general
> solution.

Have you actually tested it with standard diffs or does someone actually
need to do that? Ok, I just did:

 hg init a
 cd a
 mkdir 'foo b'
 echo b > 'foo b/foo'
 hg ci -Am0
 hg up null
 hg export --git tip > p
 cat p
 hg import p --no-commit
 hg st

Works as expected. Also, the hg-generated patch is accepted correctly by
patch(1). As previously demonstrated, diff(1) generates patches that
work with patch(1) and hg import as well. All is good in standard diff
land.

So the entirety of this discussion can be restricted to git diffs. In
the above test, if we add --git to the export, it breaks as described.

For a baseline, if I try to import the git-diff generated by hg into an
empty git repo with git-apply:


..it almost works. By almost, I mean it creates 'foo\ b/foo' (literal
backslash out of thin air) rather than 'foo b/foo'. LOLWUT. This also
breaks in the same way with the known-to-agree-with-diff-and-patch
standard diff that Mercurial produces.

Conclusions:

a) Git is wacky
b) we should probably handle the above patch without complaint, but
without emulating Git's wackiness
c) we should accept escaping on the input, at least on git patches
d) we should eventually do some escaping for --git diffs

We should probably not do (d) until well after (c) is released to
minimize cross-version compatibility issues between hg clients.
Pierre-Yves David - Dec. 7, 2012, 10:28 a.m.
On Thu, Dec 06, 2012 at 03:44:47PM -0600, Matt Mackall wrote:
> On Thu, 2012-12-06 at 21:00 +0000, Alastair Houghton wrote:
 For a baseline, if I try to import the git-diff generated by hg into an
> empty git repo with git-apply:
> 
> diff --git a/foo b/foo b/foo b/foo
> new file mode 100644
> --- /dev/null
> +++ b/foo b/foo	
> @@ -0,0 +1,1 @@
> +b
> 
> ..it almost works. By almost, I mean it creates 'foo\ b/foo' (literal
> backslash out of thin air) rather than 'foo b/foo'. LOLWUT. This also
> breaks in the same way with the known-to-agree-with-diff-and-patch
> standard diff that Mercurial produces.

I was unable to reproduce that here using:

git version 1.7.2.5
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu) qs /bin/sh

What version are you using?
Alastair Houghton - Dec. 7, 2012, 11:37 a.m.
On 7 Dec 2012, at 10:28, Pierre-Yves David <pierre-yves.david at logilab.fr> wrote:

> On Thu, Dec 06, 2012 at 03:44:47PM -0600, Matt Mackall wrote:
>> On Thu, 2012-12-06 at 21:00 +0000, Alastair Houghton wrote:
> For a baseline, if I try to import the git-diff generated by hg into an
>> empty git repo with git-apply:
>> 
>> diff --git a/foo b/foo b/foo b/foo
>> new file mode 100644
>> --- /dev/null
>> +++ b/foo b/foo	
>> @@ -0,0 +1,1 @@
>> +b
>> 
>> ..it almost works. By almost, I mean it creates 'foo\ b/foo' (literal
>> backslash out of thin air) rather than 'foo b/foo'. LOLWUT. This also
>> breaks in the same way with the known-to-agree-with-diff-and-patch
>> standard diff that Mercurial produces.
> 
> I was unable to reproduce that here using:
> 
> git version 1.7.2.5
> GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu) qs /bin/sh
> 
> What version are you using?

I think when I originally tested I had quite an old version of git on
my machine (I don't actually use git very often---we use Mercurial
here).  I just tried with 1.8.0.1, and it seemed to work.

Further testing seems to indicate that git doesn't escape for spaces,
but it will do for non-ASCII characters and control characters, among
others.  e.g. I just renamed a file ?foo? to ?f?? and got

  diff --git a/foo b/foo b/foo b/foo
  deleted file mode 100644
  index 6178079..0000000
  --- a/foo b/foo 
  +++ /dev/null
  @@ -1 +0,0 @@
  -b
  diff --git "a/foo b/f\303\266" "b/foo b/f\303\266"
  new file mode 100644
  index 0000000..6178079
  --- /dev/null
  +++ "b/foo b/f\303\266" 
  @@ -0,0 +1 @@
  +b

from git diff HEAD

So, I think:

(a) Git has already fixed this bug.  The current version (1.8.0.1)
    works with a binary file too; I tried

      hg init a
      cd a
      mkdir 'foo b'
      dd if=/dev/random of="foo b/foo" bs=1024 count=4
      hg ci -Am0
      hg up null
      hg export --git tip > p
      cd ..
      git init b
      git apply ../a/p

    Looking at git's current source code,

      https://github.com/git/git/blob/master/builtin/apply.c

    it only uses the "diff --git" line to obtain the filenames in
    the case where they aren't anywhere else, and requires the two
    names it finds to match exactly.
(b) hg should certainly handle the patch without complaint (my patch
    fixes that)
(c) hg should consider handling escaping on the input for git patches
    (my patch does *not* fix that)
(d) Agree with Matt; hg can't generate escaped filenames until some
    time after (c) if fixed.

FWIW, (c) and (d) seem to raise all kinds of filename encoding issues;
git's escaping system seems poorly designed (in particular, it really
should support \u and \U so that it's possible to unambiguously
specify filenames in a patch, no matter what the system it's being
applied on does; git doesn't appear to support those escapes, but I
guess it just punts on filename encoding anyway).  IIRC I read some
pages on the Mercurial wiki about encoding issues, so you guys have
thought about that already, right?

Kind regards,

Alastair.

--
http://alastairs-place.net
Matt Mackall - Dec. 7, 2012, 4:44 p.m.
On Fri, 2012-12-07 at 11:28 +0100, Pierre-Yves David wrote:
> On Thu, Dec 06, 2012 at 03:44:47PM -0600, Matt Mackall wrote:
> > On Thu, 2012-12-06 at 21:00 +0000, Alastair Houghton wrote:
>  For a baseline, if I try to import the git-diff generated by hg into an
> > empty git repo with git-apply:
> > 
> > diff --git a/foo b/foo b/foo b/foo
> > new file mode 100644
> > --- /dev/null
> > +++ b/foo b/foo	
> > @@ -0,0 +1,1 @@
> > +b
> > 
> > ..it almost works. By almost, I mean it creates 'foo\ b/foo' (literal
> > backslash out of thin air) rather than 'foo b/foo'. LOLWUT. This also
> > breaks in the same way with the known-to-agree-with-diff-and-patch
> > standard diff that Mercurial produces.
> 
> I was unable to reproduce that here using:
> 
> git version 1.7.2.5
> GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu) qs /bin/sh

git version 1.7.10
GNU bash, version 4.2.20(1)-release (x86_64-pc-linux-gnu)
Matt Mackall - Dec. 7, 2012, 5:03 p.m.
On Fri, 2012-12-07 at 11:37 +0000, Alastair Houghton wrote:
> On 7 Dec 2012, at 10:28, Pierre-Yves David <pierre-yves.david at logilab.fr> wrote:
> 
> > On Thu, Dec 06, 2012 at 03:44:47PM -0600, Matt Mackall wrote:
> >> On Thu, 2012-12-06 at 21:00 +0000, Alastair Houghton wrote:
> > For a baseline, if I try to import the git-diff generated by hg into an
> >> empty git repo with git-apply:
> >> 
> >> diff --git a/foo b/foo b/foo b/foo
> >> new file mode 100644
> >> --- /dev/null
> >> +++ b/foo b/foo	
> >> @@ -0,0 +1,1 @@
> >> +b
> >> 
> >> ..it almost works. By almost, I mean it creates 'foo\ b/foo' (literal
> >> backslash out of thin air) rather than 'foo b/foo'. LOLWUT. This also
> >> breaks in the same way with the known-to-agree-with-diff-and-patch
> >> standard diff that Mercurial produces.
> > 
> > I was unable to reproduce that here using:
> > 
> > git version 1.7.2.5
> > GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu) qs /bin/sh
> > 
> > What version are you using?
> 
> I think when I originally tested I had quite an old version of git on
> my machine (I don't actually use git very often---we use Mercurial
> here).  I just tried with 1.8.0.1, and it seemed to work.
> 
> Further testing seems to indicate that git doesn't escape for spaces,
> but it will do for non-ASCII characters and control characters, among
> others.  e.g. I just renamed a file ?foo? to ?f?? and got
> 
>   diff --git a/foo b/foo b/foo b/foo
>   deleted file mode 100644
>   index 6178079..0000000
>   --- a/foo b/foo 
>   +++ /dev/null
>   @@ -1 +0,0 @@
>   -b
>   diff --git "a/foo b/f\303\266" "b/foo b/f\303\266"
>   new file mode 100644
>   index 0000000..6178079
>   --- /dev/null
>   +++ "b/foo b/f\303\266" 
>   @@ -0,0 +1 @@
>   +b
> 
> from git diff HEAD
> 
> So, I think:
> 
> (a) Git has already fixed this bug.  The current version (1.8.0.1)
>     works with a binary file too; I tried
> 
>       hg init a
>       cd a
>       mkdir 'foo b'
>       dd if=/dev/random of="foo b/foo" bs=1024 count=4
>       hg ci -Am0
>       hg up null
>       hg export --git tip > p
>       cd ..
>       git init b
>       git apply ../a/p
> 
>     Looking at git's current source code,
> 
>       https://github.com/git/git/blob/master/builtin/apply.c
> 
>     it only uses the "diff --git" line to obtain the filenames in
>     the case where they aren't anywhere else, and requires the two
>     names it finds to match exactly.
> (b) hg should certainly handle the patch without complaint (my patch
>     fixes that)
> (c) hg should consider handling escaping on the input for git patches
>     (my patch does *not* fix that)
> (d) Agree with Matt; hg can't generate escaped filenames until some
>     time after (c) if fixed.

Ok, sounds good. Can we get your patch again with a more precise
description?

Patch

diff --git a/foo b/foo b/foo b/foo
new file mode 100644
--- /dev/null
+++ b/foo b/foo	
@@ -0,0 +1,1 @@ 
+b