Patchwork [STABLE] match: fix a caseonly rename + explicit path commit on icasefs (issue4768)

login
register
mail settings
Submitter Matt Harbison
Date Aug. 7, 2015, 1:56 a.m.
Message ID <c54ca3f987fd1eaaba3a.1438912566@Envy>
Download mbox | patch
Permalink /patch/10112/
State Accepted
Commit 9ac4e81b96592fb4611860a446bbe6ff91b7ee73
Headers show

Comments

Matt Harbison - Aug. 7, 2015, 1:56 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1438909216 14400
#      Thu Aug 06 21:00:16 2015 -0400
# Branch stable
# Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
# Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
match: fix a caseonly rename + explicit path commit on icasefs (issue4768)

The problem was that the former name and the new name are both normalized to the
case in dirstate, so matcher._files would be ['ABC.txt', 'ABC.txt'].
localrepo.commit() calls localrepo.status(), passing along the matcher.  Inside
dirstate.status(), _walkexplicit() simply grabs matcher.files() and processes
those items.  Since the old name isn't present, it is silently dropped.  There's
a fundamental tension here, because the status command should also accept files
that don't match the filesystem, so we can't drop the normalization in status.
The problem originated in baa11dde8c0e.

Unfortunately with this change, the case of the old file must still be specified
exactly, or the old file is again silently excluded.  I went back to
baa11dde8c0e^, and that had the same behavior, so we are no worse off.  I'm open
to ideas from a matcher or dirstate expert on how to fix that half.
Pierre-Yves David - Aug. 7, 2015, 2:03 a.m.
On 08/06/2015 06:56 PM, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1438909216 14400
> #      Thu Aug 06 21:00:16 2015 -0400
> # Branch stable
> # Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> match: fix a caseonly rename + explicit path commit on icasefs (issue4768)
>
> The problem was that the former name and the new name are both normalized to the
> case in dirstate, so matcher._files would be ['ABC.txt', 'ABC.txt'].
> localrepo.commit() calls localrepo.status(), passing along the matcher.  Inside
> dirstate.status(), _walkexplicit() simply grabs matcher.files() and processes
> those items.  Since the old name isn't present, it is silently dropped.  There's
> a fundamental tension here, because the status command should also accept files
> that don't match the filesystem, so we can't drop the normalization in status.
> The problem originated in baa11dde8c0e.
>
> Unfortunately with this change, the case of the old file must still be specified
> exactly, or the old file is again silently excluded.  I went back to
> baa11dde8c0e^, and that had the same behavior, so we are no worse off.  I'm open
> to ideas from a matcher or dirstate expert on how to fix that half.

The code looks good to me with a small question. I will wait for a 
distate wizard to double review this before pushing it.

>
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -386,7 +386,8 @@
>       def __init__(self, root, cwd, patterns, include, exclude, default, auditor,
>                    ctx, listsubrepos=False, badfn=None):
>           init = super(icasefsmatcher, self).__init__
> -        self._dsnormalize = ctx.repo().dirstate.normalize
> +        self._dirstate = ctx.repo().dirstate

Can we keep a dirstate reference in the matcher without trouble?
Matt Harbison - Aug. 7, 2015, 2:15 a.m.
On Thu, 06 Aug 2015 22:03:09 -0400, Pierre-Yves David  
<pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 08/06/2015 06:56 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1438909216 14400
>> #      Thu Aug 06 21:00:16 2015 -0400
>> # Branch stable
>> # Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
>> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
>> match: fix a caseonly rename + explicit path commit on icasefs  
>> (issue4768)
>>
[snip]
>>
>> diff --git a/mercurial/match.py b/mercurial/match.py
>> --- a/mercurial/match.py
>> +++ b/mercurial/match.py
>> @@ -386,7 +386,8 @@
>>       def __init__(self, root, cwd, patterns, include, exclude,  
>> default, auditor,
>>                    ctx, listsubrepos=False, badfn=None):
>>           init = super(icasefsmatcher, self).__init__
>> -        self._dsnormalize = ctx.repo().dirstate.normalize
>> +        self._dirstate = ctx.repo().dirstate
>
> Can we keep a dirstate reference in the matcher without trouble?

It's safe for the icasefsmatcher class because that is only created inside  
workingctx, so the optional ctx arg is always given.  But there are some  
creators of matcher that don't pass ctx, so we should probably avoid a  
maybe-its-valid-or-maybe-not member there.  (Not to mention that the base  
matcher may not be for a wdir()).
Pierre-Yves David - Aug. 10, 2015, 7:10 a.m.
On 08/06/2015 07:15 PM, Matt Harbison wrote:
> On Thu, 06 Aug 2015 22:03:09 -0400, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>
>>
>>
>> On 08/06/2015 06:56 PM, Matt Harbison wrote:
>>> # HG changeset patch
>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>> # Date 1438909216 14400
>>> #      Thu Aug 06 21:00:16 2015 -0400
>>> # Branch stable
>>> # Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
>>> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
>>> match: fix a caseonly rename + explicit path commit on icasefs
>>> (issue4768)
>>>
> [snip]
>>>
>>> diff --git a/mercurial/match.py b/mercurial/match.py
>>> --- a/mercurial/match.py
>>> +++ b/mercurial/match.py
>>> @@ -386,7 +386,8 @@
>>>       def __init__(self, root, cwd, patterns, include, exclude,
>>> default, auditor,
>>>                    ctx, listsubrepos=False, badfn=None):
>>>           init = super(icasefsmatcher, self).__init__
>>> -        self._dsnormalize = ctx.repo().dirstate.normalize
>>> +        self._dirstate = ctx.repo().dirstate
>>
>> Can we keep a dirstate reference in the matcher without trouble?
>
> It's safe for the icasefsmatcher class because that is only created
> inside workingctx, so the optional ctx arg is always given.  But there
> are some creators of matcher that don't pass ctx, so we should probably
> avoid a maybe-its-valid-or-maybe-not member there.  (Not to mention that
> the base matcher may not be for a wdir()).

So, I'm now more confused, I initially asked the question with two 
possible issue in mind:

- reference cycle,
- layer violation,

But reading your reply, it seems like the code in your patch "Is working 
fine now" but unprotected againt other (valid) way to call this code 
that could crash.

Did I got this right? If so, this looks like a bad idea, isn't it?
Yuya Nishihara - Aug. 10, 2015, 10:16 a.m.
On Mon, 10 Aug 2015 00:10:03 -0700, Pierre-Yves David wrote:
> On 08/06/2015 07:15 PM, Matt Harbison wrote:
> > On Thu, 06 Aug 2015 22:03:09 -0400, Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org> wrote:
> >> On 08/06/2015 06:56 PM, Matt Harbison wrote:
> >>> # HG changeset patch
> >>> # User Matt Harbison <matt_harbison@yahoo.com>
> >>> # Date 1438909216 14400
> >>> #      Thu Aug 06 21:00:16 2015 -0400
> >>> # Branch stable
> >>> # Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
> >>> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> >>> match: fix a caseonly rename + explicit path commit on icasefs
> >>> (issue4768)
> >>>
> > [snip]
> >>>
> >>> diff --git a/mercurial/match.py b/mercurial/match.py
> >>> --- a/mercurial/match.py
> >>> +++ b/mercurial/match.py
> >>> @@ -386,7 +386,8 @@
> >>>       def __init__(self, root, cwd, patterns, include, exclude,
> >>> default, auditor,
> >>>                    ctx, listsubrepos=False, badfn=None):
> >>>           init = super(icasefsmatcher, self).__init__
> >>> -        self._dsnormalize = ctx.repo().dirstate.normalize
> >>> +        self._dirstate = ctx.repo().dirstate
> >>
> >> Can we keep a dirstate reference in the matcher without trouble?
> >
> > It's safe for the icasefsmatcher class because that is only created
> > inside workingctx, so the optional ctx arg is always given.  But there
> > are some creators of matcher that don't pass ctx, so we should probably
> > avoid a maybe-its-valid-or-maybe-not member there.  (Not to mention that
> > the base matcher may not be for a wdir()).
> 
> So, I'm now more confused, I initially asked the question with two 
> possible issue in mind:
> 
> - reference cycle,
> - layer violation,
> 
> But reading your reply, it seems like the code in your patch "Is working 
> fine now" but unprotected againt other (valid) way to call this code 
> that could crash.
> 
> Did I got this right? If so, this looks like a bad idea, isn't it?

There's already a reference to dirstate.normalize, which is a bound method
of dirstate, so I think this patch won't make things worse.
Matt Harbison - Aug. 11, 2015, 1:21 a.m.
On Mon, 10 Aug 2015 03:10:03 -0400, Pierre-Yves David  
<pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 08/06/2015 07:15 PM, Matt Harbison wrote:
>> On Thu, 06 Aug 2015 22:03:09 -0400, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org> wrote:
>>
>>>
>>>
>>> On 08/06/2015 06:56 PM, Matt Harbison wrote:
>>>> # HG changeset patch
>>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>>> # Date 1438909216 14400
>>>> #      Thu Aug 06 21:00:16 2015 -0400
>>>> # Branch stable
>>>> # Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
>>>> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
>>>> match: fix a caseonly rename + explicit path commit on icasefs
>>>> (issue4768)
>>>>
>> [snip]
>>>>
>>>> diff --git a/mercurial/match.py b/mercurial/match.py
>>>> --- a/mercurial/match.py
>>>> +++ b/mercurial/match.py
>>>> @@ -386,7 +386,8 @@
>>>>       def __init__(self, root, cwd, patterns, include, exclude,
>>>> default, auditor,
>>>>                    ctx, listsubrepos=False, badfn=None):
>>>>           init = super(icasefsmatcher, self).__init__
>>>> -        self._dsnormalize = ctx.repo().dirstate.normalize
>>>> +        self._dirstate = ctx.repo().dirstate
>>>
>>> Can we keep a dirstate reference in the matcher without trouble?
>>
>> It's safe for the icasefsmatcher class because that is only created
>> inside workingctx, so the optional ctx arg is always given.  But there
>> are some creators of matcher that don't pass ctx, so we should probably
>> avoid a maybe-its-valid-or-maybe-not member there.  (Not to mention that
>> the base matcher may not be for a wdir()).
>
> So, I'm now more confused, I initially asked the question with two  
> possible issue in mind:
>
> - reference cycle,
> - layer violation,
>
> But reading your reply, it seems like the code in your patch "Is working  
> fine now" but unprotected againt other (valid) way to call this code  
> that could crash.
>
> Did I got this right? If so, this looks like a bad idea, isn't it?

I misunderstood then- I thought you were nudging me to move dirstate into  
the base matcher class for some reason.

But basically, what Yuya said.  I originally did all of this normalizing  
in dirstate [1], but sid0 had performance concerns.  I don't think there's  
any real danger of this special matcher class being used anywhere else  
though.

[1] https://selenic.com/pipermail/mercurial-devel/2015-April/068183.html
Matt Mackall - Aug. 11, 2015, 10:13 p.m.
On Thu, 2015-08-06 at 21:56 -0400, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1438909216 14400
> #      Thu Aug 06 21:00:16 2015 -0400
> # Branch stable
> # Node ID c54ca3f987fd1eaaba3a26d62806270fff535346
> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> match: fix a caseonly rename + explicit path commit on icasefs (issue4768)

Looks fine to me. Queued for stable, thanks.

Patch

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -386,7 +386,8 @@ 
     def __init__(self, root, cwd, patterns, include, exclude, default, auditor,
                  ctx, listsubrepos=False, badfn=None):
         init = super(icasefsmatcher, self).__init__
-        self._dsnormalize = ctx.repo().dirstate.normalize
+        self._dirstate = ctx.repo().dirstate
+        self._dsnormalize = self._dirstate.normalize
 
         init(root, cwd, patterns, include, exclude, default, auditor=auditor,
              ctx=ctx, listsubrepos=listsubrepos, badfn=badfn)
@@ -402,7 +403,13 @@ 
         kindpats = []
         for kind, pats, source in self._kp:
             if kind not in ('re', 'relre'):  # regex can't be normalized
+                p = pats
                 pats = self._dsnormalize(pats)
+
+                # Preserve the original to handle a case only rename.
+                if p != pats and p in self._dirstate:
+                    kindpats.append((kind, p, source))
+
             kindpats.append((kind, pats, source))
         return kindpats
 
diff --git a/tests/test-add.t b/tests/test-add.t
--- a/tests/test-add.t
+++ b/tests/test-add.t
@@ -232,9 +232,17 @@ 
   -xyz
   +def
 
+  $ hg mv CapsDir1/CapsDir/abc.txt CapsDir1/CapsDir/ABC.txt
+  moving CapsDir1/CapsDir/AbC.txt to CapsDir1/CapsDir/ABC.txt (glob)
+  $ hg ci -m "case changing rename" CapsDir1/CapsDir/AbC.txt CapsDir1/CapsDir/ABC.txt
+
+  $ hg status -A capsdir1/capsdir
+  M CapsDir1/CapsDir/SubDir/Def.txt
+  C CapsDir1/CapsDir/ABC.txt
+
   $ hg remove -f 'glob:**.txt' -X capsdir1/capsdir
   $ hg remove -f 'glob:**.txt' -I capsdir1/capsdir
-  removing CapsDir1/CapsDir/AbC.txt (glob)
+  removing CapsDir1/CapsDir/ABC.txt (glob)
   removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
 #endif