Patchwork [2,of,4,V2] clone: use the HTTP 301 location to record the default path

login
register
mail settings
Submitter Matt Harbison
Date Feb. 16, 2017, 9:41 p.m.
Message ID <27a4bc77e8b8e50abc76.1487281269@Envy>
Download mbox | patch
Permalink /patch/18572/
State Changes Requested
Delegated to: Yuya Nishihara
Headers show

Comments

Matt Harbison - Feb. 16, 2017, 9:41 p.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1487087965 18000
#      Tue Feb 14 10:59:25 2017 -0500
# Node ID 27a4bc77e8b8e50abc76c387f117082e5853c47e
# Parent  4f2862487d789edc1f36b5687d828a2914e1dc32
clone: use the HTTP 301 location to record the default path

This is needed to simplify making `hg serve -S` paths the same as `hg serve`.
Since 301 is permanent and cachable, this is arguably the proper behavior
anyway.  For example, once Firefox sees a 301, it automatically forwards,
without querying the original URL again.

Code hosting software may make use of redirects.  I only have access to SCM
Manager.  It uses 302 to access subrepos, so it isn't affected by this.  My
understanding of the python libraries being used is minimal, and this was
inspired by http://www.diveintopython.net/http_web_services/redirects.html.

Since `hg serve` doesn't issue redirects, the added tests aren't showing the new
redirect functionality yet.  But they set a baseline that won't change when
redirects are issued.  Check-commit complains about the methods with '_', but
those names are defined by python.
Yuya Nishihara - Feb. 20, 2017, 2:33 p.m.
On Thu, 16 Feb 2017 16:41:09 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1487087965 18000
> #      Tue Feb 14 10:59:25 2017 -0500
> # Node ID 27a4bc77e8b8e50abc76c387f117082e5853c47e
> # Parent  4f2862487d789edc1f36b5687d828a2914e1dc32
> clone: use the HTTP 301 location to record the default path

> @@ -230,6 +230,16 @@
>          if self._url.rstrip('/') != resp_url.rstrip('/'):
>              if not self.ui.quiet:
>                  self.ui.warn(_('real URL is %s\n') % resp_url)
> +            if resp.status == httplib.MOVED_PERMANENTLY:
> +                u = util.url(resp_url.rstrip('/'))
> +
> +                # The path has auth info on creation.  Restore that here.
> +                creds = self._authinfo
> +                if creds:
> +                    u.user = creds[2]
> +                    u.passwd = creds[3] or None
> +                self.path = str(u)

I didn't carefully review this patch, but is it safe to reuse the credential
for redirected server?
Matt Harbison - Feb. 21, 2017, 1:29 a.m.
On Mon, 20 Feb 2017 09:33:05 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Thu, 16 Feb 2017 16:41:09 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1487087965 18000
>> #      Tue Feb 14 10:59:25 2017 -0500
>> # Node ID 27a4bc77e8b8e50abc76c387f117082e5853c47e
>> # Parent  4f2862487d789edc1f36b5687d828a2914e1dc32
>> clone: use the HTTP 301 location to record the default path
>
>> @@ -230,6 +230,16 @@
>>          if self._url.rstrip('/') != resp_url.rstrip('/'):
>>              if not self.ui.quiet:
>>                  self.ui.warn(_('real URL is %s\n') % resp_url)
>> +            if resp.status == httplib.MOVED_PERMANENTLY:
>> +                u = util.url(resp_url.rstrip('/'))
>> +
>> +                # The path has auth info on creation.  Restore that  
>> here.
>> +                creds = self._authinfo
>> +                if creds:
>> +                    u.user = creds[2]
>> +                    u.passwd = creds[3] or None
>> +                self.path = str(u)
>
> I didn't carefully review this patch, but is it safe to reuse the  
> credential
> for redirected server?

Hmmm.. Maybe not.  At least for the clone path, it isn't even necessary,  
because the password is removed before writing out the hgrc file.  I  
didn't see that until after I added this, and then forgot to remove it.   
The username is stored in the hgrc file though.

It looks like url.passwordmgr stores the username/password, but I don't  
have a handy setup that will password prompt and redirect to see if that  
info can easily be obtained here.  I have no problem with dropping the 'if  
creds' (and related changes not shown here) bits.

Patch

diff --git a/mercurial/hg.py b/mercurial/hg.py
--- a/mercurial/hg.py
+++ b/mercurial/hg.py
@@ -621,6 +621,15 @@ 
 
         destrepo = destpeer.local()
         if destrepo:
+            # Update for 301 redirects
+            endsrc = srcpeer.url()
+            if not islocal(endsrc):
+                abspath = endsrc
+
+                # statichttprepository never sees the 'static-' prefix, so that
+                # need to be accounted for here.
+                if origsource.startswith('static-http'):
+                    abspath = 'static-' + abspath
             template = uimod.samplehgrcs['cloned']
             fp = destrepo.vfs("hgrc", "w", text=True)
             u = util.url(abspath)
diff --git a/mercurial/httppeer.py b/mercurial/httppeer.py
--- a/mercurial/httppeer.py
+++ b/mercurial/httppeer.py
@@ -88,12 +88,12 @@ 
                              (u.query or u.fragment))
 
         # urllib cannot handle URLs with embedded user or passwd
-        self._url, authinfo = u.authinfo()
+        self._url, self._authinfo = u.authinfo()
 
         self.ui = ui
         self.ui.debug('using %s\n' % self._url)
 
-        self.urlopener = url.opener(ui, authinfo)
+        self.urlopener = url.opener(ui, self._authinfo)
         self.requestbuilder = urlreq.request
 
     def __del__(self):
@@ -230,6 +230,16 @@ 
         if self._url.rstrip('/') != resp_url.rstrip('/'):
             if not self.ui.quiet:
                 self.ui.warn(_('real URL is %s\n') % resp_url)
+            if resp.status == httplib.MOVED_PERMANENTLY:
+                u = util.url(resp_url.rstrip('/'))
+
+                # The path has auth info on creation.  Restore that here.
+                creds = self._authinfo
+                if creds:
+                    u.user = creds[2]
+                    u.passwd = creds[3] or None
+                self.path = str(u)
+
         self._url = resp_url
         try:
             proto = resp.getheader('content-type')
diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -227,6 +227,7 @@ 
         "HTTPDigestAuthHandler",
         "HTTPHandler",
         "HTTPPasswordMgrWithDefaultRealm",
+        "HTTPRedirectHandler",
         "HTTPSHandler",
         "install_opener",
         "ProxyHandler",
diff --git a/mercurial/url.py b/mercurial/url.py
--- a/mercurial/url.py
+++ b/mercurial/url.py
@@ -272,6 +272,30 @@ 
 
     return False
 
+class httpredirecthandler(urlreq.httpredirecthandler):
+    def __init__(self):
+        self._permmoved = True
+
+    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
+        '''Called by all http_error_3xx() to monitor redirect types seen'''
+        # Avoid treating 302 -> 301 -> 200 or 301 -> 302 -> 200 as permanent
+        # redirects.
+        self._permmoved = self._permmoved and code == httplib.MOVED_PERMANENTLY
+
+        impl = urlreq.httpredirecthandler.redirect_request
+        return impl(self, req, fp, code, msg, hdrs, newurl)
+
+    def http_error_301(self, req, fp, code, msg, headers):
+        '''Capture the permanent redirect status for later access'''
+        impl = urlreq.httpredirecthandler.http_error_301
+        result = impl(self, req, fp, code, msg, headers)
+
+        # For an unbroken chain of 301, indicate 301 in the status.  Otherwise,
+        # keep the 200 status.
+        if self._permmoved:
+            result.status = code
+        return result
+
 class httphandler(keepalive.HTTPHandler):
     def http_open(self, req):
         return self.do_open(httpconnection, req)
@@ -437,6 +461,7 @@ 
             handlers.append(httpshandler(ui))
 
     handlers.append(proxyhandler(ui))
+    handlers.append(httpredirecthandler())
 
     passmgr = passwordmgr(ui, ui.httppasswordmgrdb)
     if authinfo is not None:
diff --git a/tests/test-subrepo-deep-nested-change.t b/tests/test-subrepo-deep-nested-change.t
--- a/tests/test-subrepo-deep-nested-change.t
+++ b/tests/test-subrepo-deep-nested-change.t
@@ -84,20 +84,20 @@ 
   adding sub2/ = $TESTTMP/main/sub1/sub2 (glob)
   $ cat hg1.pid >> $DAEMON_PIDS
 
-  $ hg clone http://localhost:$HGPORT/main httpclone --config progress.disable=True
+  $ hg clone http://user@localhost:$HGPORT/main httpclone --config progress.disable=True
   requesting all changes
   adding changesets
   adding manifests
   adding file changes
   added 1 changesets with 3 changes to 3 files
   updating to branch default
-  cloning subrepo sub1 from http://localhost:$HGPORT/sub1
+  cloning subrepo sub1 from http://user@localhost:$HGPORT/sub1
   requesting all changes
   adding changesets
   adding manifests
   adding file changes
   added 1 changesets with 3 changes to 3 files
-  cloning subrepo sub1/sub2 from http://localhost:$HGPORT/sub2 (glob)
+  cloning subrepo sub1/sub2 from http://user@localhost:$HGPORT/sub2 (glob)
   requesting all changes
   adding changesets
   adding manifests
@@ -105,6 +105,11 @@ 
   added 1 changesets with 1 changes to 1 files
   3 files updated, 0 files merged, 0 files removed, 0 files unresolved
 
+  $ hg -R httpclone showconfig paths.default
+  http://user@localhost:$HGPORT/main
+  $ hg -R httpclone/sub1 showconfig paths.default
+  http://user@localhost:$HGPORT/sub1
+
   $ cat access.log
   * "GET /main?cmd=capabilities HTTP/1.1" 200 - (glob)
   * "GET /main?cmd=batch HTTP/1.1" 200 - * (glob)
diff --git a/tests/test-subrepo-recursion.t b/tests/test-subrepo-recursion.t
--- a/tests/test-subrepo-recursion.t
+++ b/tests/test-subrepo-recursion.t
@@ -263,20 +263,20 @@ 
   adding repo/foo/bar/ = $TESTTMP/repo/foo/bar (glob)
   $ cat hg1.pid >> $DAEMON_PIDS
 
-  $ hg clone http://localhost:$HGPORT/repo clone  --config progress.disable=True
+  $ hg clone http://user:pass@localhost:$HGPORT/repo clone  --config progress.disable=True
   requesting all changes
   adding changesets
   adding manifests
   adding file changes
   added 3 changesets with 5 changes to 3 files
   updating to branch default
-  cloning subrepo foo from http://localhost:$HGPORT/repo/foo
+  cloning subrepo foo from http://user@localhost:$HGPORT/repo/foo
   requesting all changes
   adding changesets
   adding manifests
   adding file changes
   added 4 changesets with 7 changes to 3 files
-  cloning subrepo foo/bar from http://localhost:$HGPORT/repo/foo/bar (glob)
+  cloning subrepo foo/bar from http://user@localhost:$HGPORT/repo/foo/bar (glob)
   requesting all changes
   adding changesets
   adding manifests
@@ -284,6 +284,11 @@ 
   added 3 changesets with 3 changes to 1 files
   3 files updated, 0 files merged, 0 files removed, 0 files unresolved
 
+  $ hg -R clone showconfig paths.default
+  http://user@localhost:$HGPORT/repo
+  $ hg -R clone/foo showconfig paths.default
+  http://user@localhost:$HGPORT/repo/foo
+
   $ cat clone/foo/bar/z.txt
   z1
   z2