Patchwork [6,of,6,v2] paths: allow util.finddirs touse os.path.sep

login
register
mail settings
Submitter Kostia Balytskyi
Date Sept. 19, 2017, 4:57 p.m.
Message ID <0a561e2299b6880dd0e6.1505840270@devvm1416.lla2.facebook.com>
Download mbox | patch
Permalink /patch/24029/
State Deferred, archived
Headers show

Comments

Kostia Balytskyi - Sept. 19, 2017, 4:57 p.m.
# HG changeset patch
# User Kostia Balytskyi <ikostia@fb.com>
# Date 1505837708 25200
#      Tue Sep 19 09:15:08 2017 -0700
# Node ID 0a561e2299b6880dd0e6e91294c92bbc0aef726b
# Parent  29007a29700f107b7fd5bf139fa192f9528d6349
paths: allow util.finddirs touse os.path.sep
Augie Fackler - Sept. 20, 2017, 4:46 p.m.
On Tue, Sep 19, 2017 at 09:57:50AM -0700, Kostia Balytskyi wrote:
> # HG changeset patch
> # User Kostia Balytskyi <ikostia@fb.com>
> # Date 1505837708 25200
> #      Tue Sep 19 09:15:08 2017 -0700
> # Node ID 0a561e2299b6880dd0e6e91294c92bbc0aef726b
> # Parent  29007a29700f107b7fd5bf139fa192f9528d6349
> paths: allow util.finddirs touse os.path.sep

This all sounds necessary, but I lack the required Windows
understanding to know if it's a good idea to do it this specific way.

I've added our resident Windows experts to see what they think. The
patches themselves look reasonably straightforward, and you can
probably give a thumbs-up or thumbs-down on the basis of the approach
outlined in patch 1.

>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -3130,11 +3130,12 @@ class dirs(object):
>  if safehasattr(parsers, 'dirs'):
>      dirs = parsers.dirs
>
> -def finddirs(path):
> -    pos = path.rfind('/')
> +def finddirs(path, useossep=False):
> +    sep = os.path.sep if useossep else '/'
> +    pos = path.rfind(sep)
>      while pos != -1:
>          yield path[:pos]
> -        pos = path.rfind('/', 0, pos)
> +        pos = path.rfind(sep, 0, pos)
>
>  # compression code
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Matt Harbison - Sept. 21, 2017, 4:37 a.m.
> On Sep 20, 2017, at 12:46 PM, Augie Fackler <raf@durin42.com> wrote:
> 
>> On Tue, Sep 19, 2017 at 09:57:50AM -0700, Kostia Balytskyi wrote:
>> # HG changeset patch
>> # User Kostia Balytskyi <ikostia@fb.com>
>> # Date 1505837708 25200
>> #      Tue Sep 19 09:15:08 2017 -0700
>> # Node ID 0a561e2299b6880dd0e6e91294c92bbc0aef726b
>> # Parent  29007a29700f107b7fd5bf139fa192f9528d6349
>> paths: allow util.finddirs touse os.path.sep
> 
> This all sounds necessary, but I lack the required Windows
> understanding to know if it's a good idea to do it this specific way.
> 
> I've added our resident Windows experts to see what they think. The
> patches themselves look reasonably straightforward, and you can
> probably give a thumbs-up or thumbs-down on the basis of the approach
> outlined in patch 1.

I didn't get a chance to really look at this yet.  I've never used long paths myself, so IDK if I'll have any insight.  I do remember mpm not liking this for interoperability reasons [1].  I'm not sure if that situation has improved since then.  (I assume that we don't care about XP and Vista anymore.)  If it hasn't, do we need to do something like use long paths, but abort before creating a file that exceeds MAX_PATH unless some config item is enabled?  That seems nasty, but I'm not sure what else to do.

Alternately, I just found an article that they may have lifted the restriction on Windows 10, if you opt in.  I'm too tired to digest this now, but here it is:

https://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx#maxpath

If we can enable this by editing the manifest, it's probably cleaner than adding more Windows hacks.

[1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-April/039382.html

>> 
>> diff --git a/mercurial/util.py b/mercurial/util.py
>> --- a/mercurial/util.py
>> +++ b/mercurial/util.py
>> @@ -3130,11 +3130,12 @@ class dirs(object):
>> if safehasattr(parsers, 'dirs'):
>>     dirs = parsers.dirs
>> 
>> -def finddirs(path):
>> -    pos = path.rfind('/')
>> +def finddirs(path, useossep=False):
>> +    sep = os.path.sep if useossep else '/'
>> +    pos = path.rfind(sep)
>>     while pos != -1:
>>         yield path[:pos]
>> -        pos = path.rfind('/', 0, pos)
>> +        pos = path.rfind(sep, 0, pos)
>> 
>> # compression code
>> 
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Adrian Buehlmann - Sept. 21, 2017, 6:33 a.m.
On 2017-09-21 06:37, Matt Harbison wrote:
> 
> On Sep 20, 2017, at 12:46 PM, Augie Fackler <raf@durin42.com
> <mailto:raf@durin42.com>> wrote:
> 
>> On Tue, Sep 19, 2017 at 09:57:50AM -0700, Kostia Balytskyi wrote:
>>> # HG changeset patch
>>> # User Kostia Balytskyi <ikostia@fb.com <mailto:ikostia@fb.com>>
>>> # Date 1505837708 25200
>>> #      Tue Sep 19 09:15:08 2017 -0700
>>> # Node ID 0a561e2299b6880dd0e6e91294c92bbc0aef726b
>>> # Parent  29007a29700f107b7fd5bf139fa192f9528d6349
>>> paths: allow util.finddirs touse os.path.sep
>>
>> This all sounds necessary, but I lack the required Windows
>> understanding to know if it's a good idea to do it this specific way.
>>
>> I've added our resident Windows experts to see what they think. The
>> patches themselves look reasonably straightforward, and you can
>> probably give a thumbs-up or thumbs-down on the basis of the approach
>> outlined in patch 1.
> 
> I didn't get a chance to really look at this yet.  I've never used long
> paths myself, so IDK if I'll have any insight.  I do remember mpm not
> liking this for interoperability reasons [1].  I'm not sure if that
> situation has improved since then.  (I assume that we don't care about
> XP and Vista anymore.)  If it hasn't, do we need to do something like
> use long paths, but abort before creating a file that exceeds MAX_PATH
> unless some config item is enabled?  That seems nasty, but I'm not sure
> what else to do.
> 
> Alternately, I just found an article that they may have lifted the
> restriction on Windows 10, if you opt in.  I'm too tired to digest this
> now, but here it is:
> 
> https://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx#maxpath
> 
> 
> If we can enable this by editing the manifest, it's probably cleaner
> than adding more Windows hacks.
> 
> 
> [1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-April/039382.html

The thing is that lots of tools on Windows traditionally could not
handle files created using the "long path" API.

With this API (as I remember), you can create files with filenames that
Windows explorer couldn't even delete anymore. Those names don't even
need to be long. Some special file names are sufficient in order to keep
explorer from being able to delete them (I would have to dig out
examples again).

I don't know if explorer of e.g. Windows 10 still has this problem
(right now I'm not that interested to check again).

Other tools might have problems too. Think of backup tools.

On the other hand, this "only" affects files in the working directory
(the store file names are carefully encoded) and Mercurial itself should
then at least be able to delete such files, using - for example -
commands like "hg update null". You would then of course need a hg
version which is recent enough (i.e. includes the new "long path" API
calls).

Another mitigating factor (IMHO) is, that Mercurial has become quite a
niche tool on Windows. I think git has largely won the race on Windows.
So, folks who still use Mercurial on Windows are likely rather expert
users which are likely more supposed to know what they do.

Perhaps, it would be sufficient to have some warnings when users try to
add problematic file names (I think we already have some warnings in place).

So, maybe this could be implemented by default-disabling it with some
new config knob, which expert users - who are willing to deal with the
consequences - then can enable.

I once hacked a small tool for getting rid of trees containing such
files: https://bitbucket.org/abuehl/winrm/src . I used that for getting
rid of trees containing files with nasty names used for testing.
Matt Harbison - Sept. 23, 2017, 4:20 a.m.
On Thu, 21 Sep 2017 02:33:47 -0400, Adrian Buehlmann <adrian@cadifra.com>  
wrote:

> On 2017-09-21 06:37, Matt Harbison wrote:
>>
>> On Sep 20, 2017, at 12:46 PM, Augie Fackler <raf@durin42.com
>> <mailto:raf@durin42.com>> wrote:
>>
>>> On Tue, Sep 19, 2017 at 09:57:50AM -0700, Kostia Balytskyi wrote:
>>>> # HG changeset patch
>>>> # User Kostia Balytskyi <ikostia@fb.com <mailto:ikostia@fb.com>>
>>>> # Date 1505837708 25200
>>>> #      Tue Sep 19 09:15:08 2017 -0700
>>>> # Node ID 0a561e2299b6880dd0e6e91294c92bbc0aef726b
>>>> # Parent  29007a29700f107b7fd5bf139fa192f9528d6349
>>>> paths: allow util.finddirs touse os.path.sep
>>>
>>> This all sounds necessary, but I lack the required Windows
>>> understanding to know if it's a good idea to do it this specific way.
>>>
>>> I've added our resident Windows experts to see what they think. The
>>> patches themselves look reasonably straightforward, and you can
>>> probably give a thumbs-up or thumbs-down on the basis of the approach
>>> outlined in patch 1.
>>
>> I didn't get a chance to really look at this yet.  I've never used long
>> paths myself, so IDK if I'll have any insight.  I do remember mpm not
>> liking this for interoperability reasons [1].  I'm not sure if that
>> situation has improved since then.  (I assume that we don't care about
>> XP and Vista anymore.)  If it hasn't, do we need to do something like
>> use long paths, but abort before creating a file that exceeds MAX_PATH
>> unless some config item is enabled?  That seems nasty, but I'm not sure
>> what else to do.
>>
>> Alternately, I just found an article that they may have lifted the
>> restriction on Windows 10, if you opt in.  I'm too tired to digest this
>> now, but here it is:
>>
>> https://msdn.microsoft.com/en-us/library/aa365247(v=vs.85).aspx#maxpath
>>
>>
>> If we can enable this by editing the manifest, it's probably cleaner
>> than adding more Windows hacks.
>>
>>
>> [1]  
>> https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-April/039382.html
>
> The thing is that lots of tools on Windows traditionally could not
> handle files created using the "long path" API.
>
> With this API (as I remember), you can create files with filenames that
> Windows explorer couldn't even delete anymore. Those names don't even
> need to be long. Some special file names are sufficient in order to keep
> explorer from being able to delete them (I would have to dig out
> examples again).
>
> I don't know if explorer of e.g. Windows 10 still has this problem
> (right now I'm not that interested to check again).
>
> Other tools might have problems too. Think of backup tools.

Or merge and diff tools.  I hadn't thought of that before.

> On the other hand, this "only" affects files in the working directory
> (the store file names are carefully encoded) and Mercurial itself should
> then at least be able to delete such files, using - for example -
> commands like "hg update null". You would then of course need a hg
> version which is recent enough (i.e. includes the new "long path" API
> calls).
>
> Another mitigating factor (IMHO) is, that Mercurial has become quite a
> niche tool on Windows. I think git has largely won the race on Windows.
> So, folks who still use Mercurial on Windows are likely rather expert
> users which are likely more supposed to know what they do.

I'm not sure that the expert bit is true in a corporate environment.  But  
the config switch seems like a good idea.

> Perhaps, it would be sufficient to have some warnings when users try to
> add problematic file names (I think we already have some warnings in  
> place).

I thought about that, but it doesn't seem reliable.  Special names aside,  
`cd c:\really_long_path && hg clone http://...` could create 'a.txt' that  
needs special handling.

> So, maybe this could be implemented by default-disabling it with some
> new config knob, which expert users - who are willing to deal with the
> consequences - then can enable.
>
> I once hacked a small tool for getting rid of trees containing such
> files: https://bitbucket.org/abuehl/winrm/src . I used that for getting
> rid of trees containing files with nasty names used for testing.

Kostia- I'll try to look deeper at this series this weekend.  But does the  
suggestion in the 2012 thread above seem reasonable, about just tweaking  
the open function instead of (I assume) trying to pass full \\?\ paths  
around all over?  That seems like it would be easier than trying to get  
the paths converted everywhere, and I think would be transparent to the  
test runner and the end user.

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -3130,11 +3130,12 @@  class dirs(object):
 if safehasattr(parsers, 'dirs'):
     dirs = parsers.dirs
 
-def finddirs(path):
-    pos = path.rfind('/')
+def finddirs(path, useossep=False):
+    sep = os.path.sep if useossep else '/'
+    pos = path.rfind(sep)
     while pos != -1:
         yield path[:pos]
-        pos = path.rfind('/', 0, pos)
+        pos = path.rfind(sep, 0, pos)
 
 # compression code