Patchwork [STABLE] subrepo: handle 'C:' style paths on the command line (issue5770)

login
register
mail settings
Submitter Matt Harbison
Date Jan. 21, 2018, 8:50 p.m.
Message ID <a79c2c7aeffeee2f4681.1516567833@Envy>
Download mbox | patch
Permalink /patch/27024/
State Accepted
Headers show

Comments

Matt Harbison - Jan. 21, 2018, 8:50 p.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1516560845 18000
#      Sun Jan 21 13:54:05 2018 -0500
# Node ID a79c2c7aeffeee2f4681f5410864ef688e688b36
# Parent  1dd5724664ab53e0cf2a69809bb3f4158d4e30d7
subrepo: handle 'C:' style paths on the command line (issue5770)

If you think 'C:' and 'C:\' are equivalent paths, see the inline comment before
proceeding.

The problem here was that several commands that take a URL argument (incoming,
outgoing, pull, and push) will use that value to set 'repo._subtoppath' on the
repository object after command specific manipulation of it, but before
converting it to an absolute path.  When an operation is performed on a relative
subrepo, subrepo._abssource() will posixpath.join() this value with the relative
subrepo path.  That adds a '/' after the drive letter, changing how it is
evaluated by abspath()/realpath() in vfsmod.vfs(..., realpath=True) as the
subrepo is instantiated.

I initially tried sanitizing the path in url.localpath(), because url.isabs()
only checks that it starts with a drive letter.  By the sample behavior, this is
clearly not an absolute path.  (Though the comment in isabs() is weasely- this
style path can't be joined either.)  But not everything funnels through there,
and it required explicitly calling localpath() in hg.parseurl() and assigning to
url.path to fix.  But then tests failed with urls like 'a#0'.

Next up was sanitizing the path in the url constructor.  That caused doctest
failures, because there are drive letter tests, so those got expanded in system
specific ways.  Yuya correctly pointed out that util.url is a parser, and
shouldn't be substituting the path too.

Rather than fixing every command call site, just convert it in the common
subrepo location.  I don't see any sanitizing on the path config options, so I
fixed those too.  Note that while the behavior is fixed here, there are still
places where 'comparing with C:' gets printed out, and that's not great for
debugging purposes.  (Specifically I saw it in `hg incoming -B C:`, without
subrepos.)  While clone will write out an absolute default path, I wonder what
would happen if a user edited that path to be 'C:'.  (I don't think supporting
relative paths in .hgrc is a sane thing to do, but while we're poking holes in
things...)

Since this is such an oddball case, it still leaks through in places, and there
seems to be a lot of duplicate url parsing, maybe the url parsing should be
moved to dispatch, and provide the command with a url object?  Then we could
convert this to an absolute path once, and not have to worry about it in the
rest of the code.

I also checked '--cwd C:' on the command line, and it was previously working
because os.chdir() will DTRT.

Finally, one other note from the url.localpath() experimenting.  I don't see any
cases where 'self._hostport' can hold a drive letter.  So I'm wondering if that
is wrong/old code.
Yuya Nishihara - Jan. 22, 2018, 12:38 p.m.
On Sun, 21 Jan 2018 15:50:33 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1516560845 18000
> #      Sun Jan 21 13:54:05 2018 -0500
> # Node ID a79c2c7aeffeee2f4681f5410864ef688e688b36
> # Parent  1dd5724664ab53e0cf2a69809bb3f4158d4e30d7
> subrepo: handle 'C:' style paths on the command line (issue5770)

Queued, many thanks.

> Rather than fixing every command call site, just convert it in the common
> subrepo location.  I don't see any sanitizing on the path config options, so I
> fixed those too.  Note that while the behavior is fixed here, there are still
> places where 'comparing with C:' gets printed out, and that's not great for
> debugging purposes.  (Specifically I saw it in `hg incoming -B C:`, without
> subrepos.)  While clone will write out an absolute default path, I wonder what
> would happen if a user edited that path to be 'C:'.  (I don't think supporting
> relative paths in .hgrc is a sane thing to do, but while we're poking holes in
> things...)
> 
> Since this is such an oddball case, it still leaks through in places, and there
> seems to be a lot of duplicate url parsing, maybe the url parsing should be
> moved to dispatch, and provide the command with a url object?  Then we could
> convert this to an absolute path once, and not have to worry about it in the
> rest of the code.

It wouldn't be easy to determine which argument is a URL at dispatcher level.
Perhaps we can instead add a utility function. FYI, there's another weird bug
you might be interested. :)

  $ mkdir '$foo'
  $ cd '$foo'
  $ hg init
  $ foo=bar hg root
  abort: repository $TESTTMP/$foo not found!

https://bz.mercurial-scm.org/show_bug.cgi?id=5739

> +        path = None
>          if util.safehasattr(repo, '_subtoppath'):
> -            return repo._subtoppath
> -        if push and repo.ui.config('paths', 'default-push'):
> -            return repo.ui.config('paths', 'default-push')
> -        if repo.ui.config('paths', 'default'):
> -            return repo.ui.config('paths', 'default')

I don't fully understand the _subtopath magic, but maybe we could resolve
the peer subrepo URLs recursively from the peerrepo.url(), not from the
localrepo attribute.
Matt Harbison - Jan. 23, 2018, 1:33 a.m.
On Mon, 22 Jan 2018 07:38:39 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 21 Jan 2018 15:50:33 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1516560845 18000
>> #      Sun Jan 21 13:54:05 2018 -0500
>> # Node ID a79c2c7aeffeee2f4681f5410864ef688e688b36
>> # Parent  1dd5724664ab53e0cf2a69809bb3f4158d4e30d7
>> subrepo: handle 'C:' style paths on the command line (issue5770)
>
> Queued, many thanks.
>
>> Rather than fixing every command call site, just convert it in the  
>> common
>> subrepo location.  I don't see any sanitizing on the path config  
>> options, so I
>> fixed those too.  Note that while the behavior is fixed here, there are  
>> still
>> places where 'comparing with C:' gets printed out, and that's not great  
>> for
>> debugging purposes.  (Specifically I saw it in `hg incoming -B C:`,  
>> without
>> subrepos.)  While clone will write out an absolute default path, I  
>> wonder what
>> would happen if a user edited that path to be 'C:'.  (I don't think  
>> supporting
>> relative paths in .hgrc is a sane thing to do, but while we're poking  
>> holes in
>> things...)
>>
>> Since this is such an oddball case, it still leaks through in places,  
>> and there
>> seems to be a lot of duplicate url parsing, maybe the url parsing  
>> should be
>> moved to dispatch, and provide the command with a url object?  Then we  
>> could
>> convert this to an absolute path once, and not have to worry about it  
>> in the
>> rest of the code.
>
> It wouldn't be easy to determine which argument is a URL at dispatcher  
> level. Perhaps we can instead add a utility function.

I was afraid of that.  But each of the related commands.py functions have  
an explicit source/dest parameter, so I figured there had to be some magic  
in dispatch to line this up.

The good news is explicit file paths work without help:

   D:\test>hg cat -R C: C:.hgeol
   [...]
   D:\test>hg cat --cwd C: C:.hgeol
   [...]

> FYI, there's another weird bug
> you might be interested. :)
>
>   $ mkdir '$foo'
>   $ cd '$foo'
>   $ hg init
>   $ foo=bar hg root
>   abort: repository $TESTTMP/$foo not found!

Yikes.  I'm assuming the expandpath() is in there to handle `hg init  
~\myrepo` on Windows.  Moving it out would avoid breaking existing repos,  
but how do we avoid the BC when given on the command line?

> https://bz.mercurial-scm.org/show_bug.cgi?id=5739
>
>> +        path = None
>>          if util.safehasattr(repo, '_subtoppath'):
>> -            return repo._subtoppath
>> -        if push and repo.ui.config('paths', 'default-push'):
>> -            return repo.ui.config('paths', 'default-push')
>> -        if repo.ui.config('paths', 'default'):
>> -            return repo.ui.config('paths', 'default')
>
> I don't fully understand the _subtopath magic, but maybe we could resolve
> the peer subrepo URLs recursively from the peerrepo.url(), not from the
> localrepo attribute.

I don't think there's much magic here other than to recursively climb to  
the top level repo to get the base path, and then as the stack unwinds,  
tack on the current repo's relative path.  These path checks were probably  
added in the order somebody thought of them.  Certainly the shared path  
was a late add.

It's an interesting idea, I'll keep it in mind.
Yuya Nishihara - Jan. 23, 2018, 11:57 a.m.
On Mon, 22 Jan 2018 20:33:21 -0500, Matt Harbison wrote:
> On Mon, 22 Jan 2018 07:38:39 -0500, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sun, 21 Jan 2018 15:50:33 -0500, Matt Harbison wrote:
> >> Rather than fixing every command call site, just convert it in the  
> >> common
> >> subrepo location.  I don't see any sanitizing on the path config  
> >> options, so I
> >> fixed those too.  Note that while the behavior is fixed here, there are  
> >> still
> >> places where 'comparing with C:' gets printed out, and that's not great  
> >> for
> >> debugging purposes.  (Specifically I saw it in `hg incoming -B C:`,  
> >> without
> >> subrepos.)  While clone will write out an absolute default path, I  
> >> wonder what
> >> would happen if a user edited that path to be 'C:'.  (I don't think  
> >> supporting
> >> relative paths in .hgrc is a sane thing to do, but while we're poking  
> >> holes in
> >> things...)
> >>
> >> Since this is such an oddball case, it still leaks through in places,  
> >> and there
> >> seems to be a lot of duplicate url parsing, maybe the url parsing  
> >> should be
> >> moved to dispatch, and provide the command with a url object?  Then we  
> >> could
> >> convert this to an absolute path once, and not have to worry about it  
> >> in the
> >> rest of the code.
> >
> > It wouldn't be easy to determine which argument is a URL at dispatcher  
> > level. Perhaps we can instead add a utility function.
> 
> I was afraid of that.  But each of the related commands.py functions have  
> an explicit source/dest parameter, so I figured there had to be some magic  
> in dispatch to line this up.

It's just an alternative form to *args.

> > FYI, there's another weird bug
> > you might be interested. :)
> >
> >   $ mkdir '$foo'
> >   $ cd '$foo'
> >   $ hg init
> >   $ foo=bar hg root
> >   abort: repository $TESTTMP/$foo not found!
> 
> Yikes.  I'm assuming the expandpath() is in there to handle `hg init  
> ~\myrepo` on Windows.  Moving it out would avoid breaking existing repos,  
> but how do we avoid the BC when given on the command line?

Here we basically do expandpath(getcwd()). That's the problem. Perhaps we
should instead apply expandpath() only to command arguments.
Yuya Nishihara - Jan. 23, 2018, 12:29 p.m.
On Tue, 23 Jan 2018 20:57:54 +0900, Yuya Nishihara wrote:
> > >   $ mkdir '$foo'
> > >   $ cd '$foo'
> > >   $ hg init
> > >   $ foo=bar hg root
> > >   abort: repository $TESTTMP/$foo not found!
> > 
> > Yikes.  I'm assuming the expandpath() is in there to handle `hg init  
> > ~\myrepo` on Windows.  Moving it out would avoid breaking existing repos,  
> > but how do we avoid the BC when given on the command line?
> 
> Here we basically do expandpath(getcwd()). That's the problem. Perhaps we
> should instead apply expandpath() only to command arguments.

More precisely, command arguments (maybe only on Windows?) and paths read
from hgrc.

Patch

diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
--- a/mercurial/subrepo.py
+++ b/mercurial/subrepo.py
@@ -398,15 +398,35 @@ 
             parent.path = posixpath.normpath(parent.path)
             return bytes(parent)
     else: # recursion reached top repo
+        path = None
         if util.safehasattr(repo, '_subtoppath'):
-            return repo._subtoppath
-        if push and repo.ui.config('paths', 'default-push'):
-            return repo.ui.config('paths', 'default-push')
-        if repo.ui.config('paths', 'default'):
-            return repo.ui.config('paths', 'default')
-        if repo.shared():
-            # chop off the .hg component to get the default path form
+            path = repo._subtoppath
+        elif push and repo.ui.config('paths', 'default-push'):
+            path = repo.ui.config('paths', 'default-push')
+        elif repo.ui.config('paths', 'default'):
+            path = repo.ui.config('paths', 'default')
+        elif repo.shared():
+            # chop off the .hg component to get the default path form.  This has
+            # already run through vfsmod.vfs(..., realpath=True), so it doesn't
+            # have problems with 'C:'
             return os.path.dirname(repo.sharedpath)
+        if path:
+            # issue5770: 'C:\' and 'C:' are not equivalent paths.  The former is
+            # as expected: an absolute path to the root of the C: drive.  The
+            # latter is a relative path, and works like so:
+            #
+            #   C:\>cd C:\some\path
+            #   C:\>D:
+            #   D:\>python -c "import os; print os.path.abspath('C:')"
+            #   C:\some\path
+            #
+            #   D:\>python -c "import os; print os.path.abspath('C:relative')"
+            #   C:\some\path\relative
+            if util.hasdriveletter(path):
+                if len(path) == 2 or path[2:3] not in r'\/':
+                    path = os.path.abspath(path)
+            return path
+
     if abort:
         raise error.Abort(_("default path for subrepository not found"))