Patchwork D3433: httppeer: detect redirect to URL without query string (issue5860)

login
register
mail settings
Submitter phabricator
Date April 30, 2018, 11:31 p.m.
Message ID <differential-rev-PHID-DREV-se62v32tq5aggaanjppb-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/31246/
State Superseded
Headers show

Comments

phabricator - April 30, 2018, 11:31 p.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  https://phab.mercurial-scm.org/rHG197d10e157ce848129ff5e7a53cf81d4ca63a932 subtly changed the HTTP peer's handling of HTTP redirects.
  
  Before that changeset, we instantiated an HTTP peer instance and
  performed the capabilities lookup with that instance. The old code had
  the following relevant properties:
  
  1. The HTTP request layer would automatically follow HTTP redirects.
  2. An encountered HTTP redirect would update a peer instance variable pointing to the repo URL.
  3. The peer would automagically perform a "capabilities" command request if a caller requested capabilities but capabilities were not yet defined.
  
  The first HTTP request issued by a peer is for ?cmd=capabilities. If
  the server responds with an HTTP redirect to a ?cmd=capabilities URL,
  the HTTP request layer automatically followed it, retrieved a valid
  capabilities response, and the peer's base URL was updated
  automatically so subsequent requests used the proper URL. In other
  words, things "just worked."
  
  In the case where the server redirected to a URL without the
  ?cmd=capabilities query string, the HTTP request layer would follow
  the redirect and likely encounter HTML. The peer's base URL would be
  updated and the unexpected Content-Type would raise a RepoError. We
  would catch RepoError and immediately call between() (testing the case
  for pre 0.9.1 servers not supporting the "capabilities" command). e.g.
  
    try:
        inst._fetchcaps()
    except error.RepoError:
        inst.between([(nullid, nullid)])
  
  between() would eventually call into _callstream(). And _callstream()
  made a call to self.capable('httpheader'). capable() would call
  self.capabilities(), which would see that no capabilities were set
  (because HTML was returned for that request) and call the "capabilities"
  command to fetch capabilities. Because the base URL had been updated
  from the redirect, this 2nd "capabilities" command would succeed and
  the client would immediately call "between," which would also succeed.
  The legacy handshake succeeded. Only because "capabilities" was
  successfully executed as a side effect did the peer recognize that it
  was talking to a modern server. In other words, this all appeared to
  work accidentally.
  
  After https://phab.mercurial-scm.org/rHG197d10e157ce848129ff5e7a53cf81d4ca63a932, we stopped calling the "capabilities" command on
  the peer instance. Instead, we made the request via a low-level opener,
  detected the redirect as part of response handling code, and passed the
  redirected URL into the constructed peer instance.
  
  For cases where the redirected URL included the query string, this
  "just worked." But for cases where the redirected URL stripped the query
  string, we threw RepoError and because we removed the "between" handshake
  fallback, we fell through to the "is a static HTTP repo" check and
  performed an HTTP request for .hg/requires.
  
  While https://phab.mercurial-scm.org/rHG197d10e157ce848129ff5e7a53cf81d4ca63a932 was marked as backwards incompatible, the only
  intended backwards incompatible behavior was not performing the
  "between" fallback. It was not realized that the "between" command
  had the side-effect of recovering from an errant redirect that
  dropped the query string.
  
  This commit restores the previous behavior and allows clients to
  handle a redirect that drops the query string. In the case where
  the request is redirected and the query string is dropped, we raise
  a special case of RepoError. We then catch this special exception in
  the handshake code and perform another "capabilities" request against
  the redirected URL. If that works, all is well. Otherwise, we fall back
  to the "is a static HTTP repo" check.
  
  The new could is arguably better than before https://phab.mercurial-scm.org/rHG197d10e157ce848129ff5e7a53cf81d4ca63a932, as it is
  explicit about the expected behavior and we avoid performing a
  "between" request, saving a server round trip.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

AFFECTED FILES
  mercurial/httppeer.py
  tests/test-http-protocol.t

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - April 30, 2018, 11:38 p.m.
indygreg added a comment.


  So it is documented somewhere, https://phab.mercurial-scm.org/rHG197d10e157ce848129ff5e7a53cf81d4ca63a932 only regressed behavior for the "redirect drops query string case:" the new code handled HTTP redirects with the query string properly. Although it doesn't appear as if there were any tests for it. This commit adds tests for both the "good" and "bad" HTTP redirect cases.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - May 1, 2018, 2:07 p.m.
mharbison72 added a comment.


  This fixes the issue, thanks again.  The only thing I noticed in practice is that the "real URL is xxx" notice is printed twice.  The trace looks sane, (and the test captures this too), so I think it's purely cosmetic.  The *.t test fails on Windows though:
  
    --- e:/Projects/hg/tests/test-http-protocol.t
    +++ e:/Projects/hg/tests/test-http-protocol.t.err
    @@ -395,19 +395,42 @@
       s>     Content-Length: 10\r\n
       s>     \r\n
       s>     redirected
    -  s>     GET /redirected?cmd=capabilities HTTP/1.1\r\n
    -  s>     Accept-Encoding: identity\r\n
    -  s>     user-agent: test\r\n
    -  s>     host: $LOCALIP:$HGPORT\r\n (glob)
    -  s>     \r\n
    -  s> makefile('rb', None)
    -  s>     HTTP/1.1 200 Script output follows\r\n
    -  s>     Server: testing stub value\r\n
    -  s>     Date: $HTTP_DATE$\r\n
    -  s>     Content-Type: application/mercurial-0.1\r\n
    -  s>     Content-Length: 453\r\n
    -  s>     \r\n
    -  s>     batch branchmap $USUAL_BUNDLE2_CAPS_SERVER$ changegroupsubset compression=$BUNDLE2_COMPRESSIONS$ get
    bundle httpheader=1024 httpmediatype=0.1rx,0.1tx,0.2tx known lookup pushkey streamreqs=generaldelta,revlogv1 u
    nbundle=HG10GZ,HG10BZ,HG10UN unbundlehash
    +  ** Unknown exception encountered with possibly-broken third-party extension drawdag
    +  ** which supports versions unknown of Mercurial.
    +  ** Please disable drawdag and try your action again.
    +  ** If that fixes the bug please report it to the extension author.
    +  ** Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 20:42:59) [MSC v.1500 32 bit (Intel)]
    +  ** Mercurial Distributed SCM (version 4.6rc1+5-80695628adcb)
    +  ** Extensions loaded: drawdag
    +  Traceback (most recent call last):
    +    File "e:/Projects/hg/hg", line 41, in <module>
    +      dispatch.run()
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 90, in run
    +      status = (dispatch(req) or 0)
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 210, in dispatch
    +      ret = _runcatch(req)
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 351, in _runcatch
    +      return _callcatch(ui, _runcatchfunc)
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 359, in _callcatch
    +      return scmutil.callcatch(ui, func)
    +    File "e:\Projects\hg\mercurial\scmutil.py", line 160, in callcatch
    +      return func()
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 341, in _runcatchfunc
    +      return _dispatch(req)
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 971, in _dispatch
    +      cmdpats, cmdoptions)
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 727, in runcommand
    +      ret = _runcommand(ui, options, cmd, d)
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 979, in _runcommand
    +      return cmdfunc()
    +    File "e:\Projects\hg\mercurial\dispatch.py", line 968, in <lambda>
    +      d = lambda: util.checksignature(func)(ui, *args, **strcmdopt)
    +    File "e:\Projects\hg\mercurial\util.py", line 1553, in check
    +      return func(*args, **kwargs)
    +    File "e:\Projects\hg\mercurial\debugcommands.py", line 3091, in debugwireproto
    +      e.read()
    +  AttributeError: 'URLError' object has no attribute 'read'
    +  [1]

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 2, 2018, 4:53 p.m.
indygreg added a comment.


  I have no clue how this test would be failing on Windows but not on other platforms :/

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 2, 2018, 5:28 p.m.
mharbison72 added a comment.


  In https://phab.mercurial-scm.org/D3433#54703, @indygreg wrote:
  
  > I have no clue how this test would be failing on Windows but not on other platforms :/
  
  
  Me either.  But I didn't see it in the docs for this class, or when I ran `dir(URLError)`.  (I also tried on FreeBSD 11, where the test works.)  So maybe this is being added in at runtime?  While I'd like to see the tests green on stable, I'm more worried about if this is laying on other error paths for non-debug commands on Windows, that generally aren't exercised.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 2:44 a.m.
mharbison72 added a comment.


  It looks like the redirect is subtly changing the host.  I printed `req.get_full_url()` in keepalive.do_open(), and got:
  
    keepalive: http://$LOCALIP:$HGPORT/redirector?cmd=capabilities
    s>     GET /redirector?cmd=capabilities HTTP/1.1\r\n
    s>     Accept-Encoding: identity\r\n
    s>     user-agent: test\r\n
    s>     host: $LOCALIP:$HGPORT\r\n (glob)
    s>     \r\n
    s> makefile('rb', None)
    s>     HTTP/1.1 301 Redirect\r\n
    s>     Server: testing stub value\r\n
    s>     Date: $HTTP_DATE$\r\n
    s>     Location: http://$LOCALIP:$HGPORT/redirected?cmd=capabilities\r\n (glob)
    s>     Content-Type: text/plain\r\n
    s>     Content-Length: 10\r\n
    s>     \r\n
    s>     redirected
    keepalive: http://Envy:$HGPORT/redirected?cmd=capabilities
  
  The subsequent httppeer test is failing with connection refused.  This looks like as promising a lead as any.  I wonder if this e.read() (and any others) needs to check that the attribute exists first, in case of failures without data available.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 3:10 a.m.
mharbison72 added a comment.


  With these fixes, it works on Windows. I'm not sure if the advertisedbaseurl a couple lines above needs to be adjusted too.
  
  I've got no idea why there seems to be a subtle difference between baseurl and advertisedbaseurl on Windows.  Nor why what I think is the raw redirect header in the output seemingly correct with advertisedbaseurl.  I did use baseurl in the LFS server though, because I think I saw this subtle difference.

INLINE COMMENTS

> test-http-protocol.t:361
> +  >     res.status = b'301 Redirect'
> +  >     newurl = b'%s/redirected%s' % (req.advertisedbaseurl, relpath)
> +  >     if not repo.ui.configbool('testing', 'redirectqs', True) and b'?' in newurl:

If this is changed to req.baseurl, it works.

> test-http-protocol.t:375
> +  >     serve --web-conf paths.conf --pid-file hg.pid -p $HGPORT -d
> +  $ cat hg.pid > $DAEMON_PIDS
> +

>> $DAEMON_PIDS

> test-http-protocol.t:475
> +  >     serve --web-conf paths.conf --pid-file hg.pid -p $HGPORT -d
> +  $ cat hg.pid > $DAEMON_PIDS
> +

>> $DAEMON_PIDS

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 7:17 p.m.
indygreg planned changes to this revision.
indygreg added a comment.


  Thanks for the info @mharbison72! It's really helpful. I'll try to get updated patches out in the next few hours.
  
  FWIW it looks like this is the sole blocker to releasing 4.6. So if you could keep an eye out for updates and can verify patches when they are submitted, it would be appreciated.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 7:34 p.m.
indygreg added inline comments.

INLINE COMMENTS

> mharbison72 wrote in test-http-protocol.t:361
> If this is changed to req.baseurl, it works.

The distinction between these URLs is that one uses the host:port as it actually is and the other uses the host:port as identified by the client. Since we are sending a URL back to the client, we should be using the advertised host:port.

The fact that the advertised host:port is being mangled seemingly points to a bug with the formulation of the `advertised*` variables. This is likely the result of code somewhere using `gethostname()` to populate a hostname field. I'm not sure if this is happening on the client or server. And I'm not sure why it would only be happening on Windows.

It is something to keep our eye on. But I think it is an issue in the test harness and not the hgweb code. We have unit test coverage for URL formulation in `test-wsgirequest.py` and I'm pretty confident it adheres to PEP 3333.

> mharbison72 wrote in test-http-protocol.t:375
> >> $DAEMON_PIDS

We run `killdaemons.py` above, so overwriting `$DAEMON_PIDS` should be fine.

FWIW, I have patches queued up to change `killdaemons.py` behavior so it removes the file by default. Will send those once the 3.7 window opens.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 7:51 p.m.
mharbison72 added a comment.


  I'll probably be at work for another ~2 hours if you need me to test with SCM Manager, but I can stay later if needed.

INLINE COMMENTS

> indygreg wrote in test-http-protocol.t:361
> The distinction between these URLs is that one uses the host:port as it actually is and the other uses the host:port as identified by the client. Since we are sending a URL back to the client, we should be using the advertised host:port.
> 
> The fact that the advertised host:port is being mangled seemingly points to a bug with the formulation of the `advertised*` variables. This is likely the result of code somewhere using `gethostname()` to populate a hostname field. I'm not sure if this is happening on the client or server. And I'm not sure why it would only be happening on Windows.
> 
> It is something to keep our eye on. But I think it is an issue in the test harness and not the hgweb code. We have unit test coverage for URL formulation in `test-wsgirequest.py` and I'm pretty confident it adheres to PEP 3333.

That makes sense.  And test-wsgirequest.py does run on Windows.

What baffles me is that the header that's printed (e.g. line 393) is correct on Windows with either URL.  So I have no idea where it could have divergent behavior.

Should I change the LFS behavior now, or punt because it's experimental?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 8:40 p.m.
indygreg added inline comments.

INLINE COMMENTS

> mharbison72 wrote in test-http-protocol.t:361
> That makes sense.  And test-wsgirequest.py does run on Windows.
> 
> What baffles me is that the header that's printed (e.g. line 393) is correct on Windows with either URL.  So I have no idea where it could have divergent behavior.
> 
> Should I change the LFS behavior now, or punt because it's experimental?

You should consider changing it. But since it is experimental, it isn't critical. We can always do it in 4.6.1 if it is a problem in the wild.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 9:43 p.m.
mharbison72 added a comment.


  Just to confirm that `phabimport` brought this in correctly, the only thing you changed is advertisedurl to baseurl?  If that's the case, the LFS change will have to be punted.
  
  I can confirm that 'test-http*' now runs on Windows, and both Windows and FreeBSD can now clone a repo+subrepo from SCM Manager with this change.
  
  Thanks again for this.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 3, 2018, 9:59 p.m.
indygreg added a comment.


  Yes, I just changed the advertised URL bit in the test extension.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - May 4, 2018, 5:17 p.m.
martinvonz added a comment.


  Queued with s/new could/new code/ in the commit message. Thanks!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: martinvonz, mharbison72, mercurial-devel
phabricator - May 5, 2018, 12:57 a.m.
yuja added inline comments.

INLINE COMMENTS

> httppeer.py:894
> +        resp = sendrequest(ui, opener, req)
> +        respurl, ct, resp = parsev1commandresponse(ui, url, requrl, qs, resp,
> +                                                   compressible=False,

Nit: `baseurl=url` seems not correct here since we've explicitly followed the redirect.

Anyway, this isn't a blocker as the `baseurl` (and `safeurl`) are used only for error
reporting.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: yuja, martinvonz, mharbison72, mercurial-devel
phabricator - May 5, 2018, 3:33 a.m.
mharbison72 added inline comments.

INLINE COMMENTS

> indygreg wrote in test-http-protocol.t:361
> The distinction between these URLs is that one uses the host:port as it actually is and the other uses the host:port as identified by the client. Since we are sending a URL back to the client, we should be using the advertised host:port.
> 
> The fact that the advertised host:port is being mangled seemingly points to a bug with the formulation of the `advertised*` variables. This is likely the result of code somewhere using `gethostname()` to populate a hostname field. I'm not sure if this is happening on the client or server. And I'm not sure why it would only be happening on Windows.
> 
> It is something to keep our eye on. But I think it is an issue in the test harness and not the hgweb code. We have unit test coverage for URL formulation in `test-wsgirequest.py` and I'm pretty confident it adheres to PEP 3333.

> The fact that the advertised host:port is being mangled seemingly points to a bug
>  with the formulation of the advertised* variables. This is likely the result of code
>  somewhere using gethostname() to populate a hostname field. I'm not sure if this
>  is happening on the client or server. And I'm not sure why it would only be
>  happening on Windows.

The plot thickens...

I stripped down the *.t, ran it with --keep, and then launched a non-daemon `hg serve` + client from the leftover repo, without the test harness.  If I use baseurl, it succeeds as expected, using 127.0.0.1.  If I use advertisedbaseurl, it redirects to the hostname, and the access log shows the external IP address.  So it seems like more than a test harness issue.  (Though in the test harness, this fails with a connection refused.  But ProcessExplorer says it is listening on 0.0.0.0.)

There aren't too many uses of 'gethostname' (insensitive) in python 2.7.15, and none of them seem like an obvious problem.  None of the hits in hg code are relevant.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3433

To: indygreg, #hg-reviewers
Cc: yuja, martinvonz, mharbison72, mercurial-devel

Patch

diff --git a/tests/test-http-protocol.t b/tests/test-http-protocol.t
--- a/tests/test-http-protocol.t
+++ b/tests/test-http-protocol.t
@@ -333,3 +333,394 @@ 
   response: [b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
 
   $ killdaemons.py
+
+HTTP client follows HTTP redirect on handshake to new repo
+
+  $ cd $TESTTMP
+
+  $ hg init redirector
+  $ hg init redirected
+  $ cd redirected
+  $ touch foo
+  $ hg -q commit -A -m initial
+  $ cd ..
+
+  $ cat > paths.conf << EOF
+  > [paths]
+  > / = $TESTTMP/*
+  > EOF
+
+  $ cat > redirectext.py << EOF
+  > from mercurial import extensions, wireprotoserver
+  > def wrappedcallhttp(orig, repo, req, res, proto, cmd):
+  >     path = req.advertisedurl[len(req.advertisedbaseurl):]
+  >     if not path.startswith(b'/redirector'):
+  >         return orig(repo, req, res, proto, cmd)
+  >     relpath = path[len(b'/redirector'):]
+  >     res.status = b'301 Redirect'
+  >     newurl = b'%s/redirected%s' % (req.advertisedbaseurl, relpath)
+  >     if not repo.ui.configbool('testing', 'redirectqs', True) and b'?' in newurl:
+  >         newurl = newurl[0:newurl.index(b'?')]
+  >     res.headers[b'Location'] = newurl
+  >     res.headers[b'Content-Type'] = b'text/plain'
+  >     res.setbodybytes(b'redirected')
+  >     return True
+  > 
+  > extensions.wrapfunction(wireprotoserver, '_callhttp', wrappedcallhttp)
+  > EOF
+
+  $ hg --config extensions.redirect=$TESTTMP/redirectext.py \
+  >    --config server.compressionengines=zlib \
+  >     serve --web-conf paths.conf --pid-file hg.pid -p $HGPORT -d
+  $ cat hg.pid > $DAEMON_PIDS
+
+Verify our HTTP 301 is served properly
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET /redirector?cmd=capabilities
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     GET /redirector?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 301 Redirect\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Location: http://$LOCALIP:$HGPORT/redirected?cmd=capabilities\r\n (glob)
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 10\r\n
+  s>     \r\n
+  s>     redirected
+  s>     GET /redirected?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.1\r\n
+  s>     Content-Length: 453\r\n
+  s>     \r\n
+  s>     batch branchmap $USUAL_BUNDLE2_CAPS_SERVER$ changegroupsubset compression=$BUNDLE2_COMPRESSIONS$ getbundle httpheader=1024 httpmediatype=0.1rx,0.1tx,0.2tx known lookup pushkey streamreqs=generaldelta,revlogv1 unbundle=HG10GZ,HG10BZ,HG10UN unbundlehash
+
+Test with the HTTP peer
+
+  $ hg --verbose debugwireproto http://$LOCALIP:$HGPORT/redirector << EOF
+  > command heads
+  > EOF
+  s>     GET /redirector?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 301 Redirect\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Location: http://$LOCALIP:$HGPORT/redirected?cmd=capabilities\r\n (glob)
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 10\r\n
+  s>     \r\n
+  s>     redirected
+  s>     GET /redirected?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.1\r\n
+  s>     Content-Length: 453\r\n
+  s>     \r\n
+  real URL is http://$LOCALIP:$HGPORT/redirected (glob)
+  s>     batch branchmap $USUAL_BUNDLE2_CAPS_SERVER$ changegroupsubset compression=$BUNDLE2_COMPRESSIONS$ getbundle httpheader=1024 httpmediatype=0.1rx,0.1tx,0.2tx known lookup pushkey streamreqs=generaldelta,revlogv1 unbundle=HG10GZ,HG10BZ,HG10UN unbundlehash
+  sending heads command
+  s>     GET /redirected?cmd=heads HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     vary: X-HgProto-1\r\n
+  s>     x-hgproto-1: 0.1 0.2 comp=$USUAL_COMPRESSIONS$ partial-pull\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.1\r\n
+  s>     Content-Length: 41\r\n
+  s>     \r\n
+  s>     96ee1d7354c4ad7372047672c36a1f561e3a6a4c\n
+  response: [b'\x96\xee\x1dsT\xc4\xadsr\x04vr\xc3j\x1fV\x1e:jL']
+
+  $ killdaemons.py
+
+Now test a variation where we strip the query string from the redirect URL.
+(SCM Manager apparently did this and clients would recover from it)
+
+  $ hg --config extensions.redirect=$TESTTMP/redirectext.py \
+  >    --config server.compressionengines=zlib \
+  >    --config testing.redirectqs=false \
+  >     serve --web-conf paths.conf --pid-file hg.pid -p $HGPORT -d
+  $ cat hg.pid > $DAEMON_PIDS
+
+  $ hg --verbose debugwireproto --peer raw http://$LOCALIP:$HGPORT << EOF
+  > httprequest GET /redirector?cmd=capabilities
+  >     user-agent: test
+  > EOF
+  using raw connection to peer
+  s>     GET /redirector?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 301 Redirect\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Location: http://$LOCALIP:$HGPORT/redirected\r\n (glob)
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 10\r\n
+  s>     \r\n
+  s>     redirected
+  s>     GET /redirected HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     user-agent: test\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     ETag: W/"*"\r\n (glob)
+  s>     Content-Type: text/html; charset=ascii\r\n
+  s>     Transfer-Encoding: chunked\r\n
+  s>     \r\n
+  s>     414\r\n
+  s>     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\n
+  s>     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">\n
+  s>     <head>\n
+  s>     <link rel="icon" href="/redirected/static/hgicon.png" type="image/png" />\n
+  s>     <meta name="robots" content="index, nofollow" />\n
+  s>     <link rel="stylesheet" href="/redirected/static/style-paper.css" type="text/css" />\n
+  s>     <script type="text/javascript" src="/redirected/static/mercurial.js"></script>\n
+  s>     \n
+  s>     <title>redirected: log</title>\n
+  s>     <link rel="alternate" type="application/atom+xml"\n
+  s>        href="/redirected/atom-log" title="Atom feed for redirected" />\n
+  s>     <link rel="alternate" type="application/rss+xml"\n
+  s>        href="/redirected/rss-log" title="RSS feed for redirected" />\n
+  s>     </head>\n
+  s>     <body>\n
+  s>     \n
+  s>     <div class="container">\n
+  s>     <div class="menu">\n
+  s>     <div class="logo">\n
+  s>     <a href="https://mercurial-scm.org/">\n
+  s>     <img src="/redirected/static/hglogo.png" alt="mercurial" /></a>\n
+  s>     </div>\n
+  s>     <ul>\n
+  s>     <li class="active">log</li>\n
+  s>     <li><a href="/redirected/graph/tip">graph</a></li>\n
+  s>     <li><a href="/redirected/tags">tags</a></li>\n
+  s>     <li><a href="
+  s>     \r\n
+  s>     810\r\n
+  s>     /redirected/bookmarks">bookmarks</a></li>\n
+  s>     <li><a href="/redirected/branches">branches</a></li>\n
+  s>     </ul>\n
+  s>     <ul>\n
+  s>     <li><a href="/redirected/rev/tip">changeset</a></li>\n
+  s>     <li><a href="/redirected/file/tip">browse</a></li>\n
+  s>     </ul>\n
+  s>     <ul>\n
+  s>     \n
+  s>     </ul>\n
+  s>     <ul>\n
+  s>      <li><a href="/redirected/help">help</a></li>\n
+  s>     </ul>\n
+  s>     <div class="atom-logo">\n
+  s>     <a href="/redirected/atom-log" title="subscribe to atom feed">\n
+  s>     <img class="atom-logo" src="/redirected/static/feed-icon-14x14.png" alt="atom feed" />\n
+  s>     </a>\n
+  s>     </div>\n
+  s>     </div>\n
+  s>     \n
+  s>     <div class="main">\n
+  s>     <h2 class="breadcrumb"><a href="/">Mercurial</a> &gt; <a href="/redirected">redirected</a> </h2>\n
+  s>     <h3>log</h3>\n
+  s>     \n
+  s>     \n
+  s>     <form class="search" action="/redirected/log">\n
+  s>     \n
+  s>     <p><input name="rev" id="search1" type="text" size="30" value="" /></p>\n
+  s>     <div id="hint">Find changesets by keywords (author, files, the commit message), revision\n
+  s>     number or hash, or <a href="/redirected/help/revsets">revset expression</a>.</div>\n
+  s>     </form>\n
+  s>     \n
+  s>     <div class="navigate">\n
+  s>     <a href="/redirected/shortlog/tip?revcount=30">less</a>\n
+  s>     <a href="/redirected/shortlog/tip?revcount=120">more</a>\n
+  s>     | rev 0: <a href="/redirected/shortlog/96ee1d7354c4">(0)</a> <a href="/redirected/shortlog/tip">tip</a> \n
+  s>     </div>\n
+  s>     \n
+  s>     <table class="bigtable">\n
+  s>     <thead>\n
+  s>      <tr>\n
+  s>       <th class="age">age</th>\n
+  s>       <th class="author">author</th>\n
+  s>       <th class="description">description</th>\n
+  s>      </tr>\n
+  s>     </thead>\n
+  s>     <tbody class="stripes2">\n
+  s>      <tr>\n
+  s>       <td class="age">Thu, 01 Jan 1970 00:00:00 +0000</td>\n
+  s>       <td class="author">test</td>\n
+  s>       <td class="description">\n
+  s>        <a href="/redirected/rev/96ee1d7354c4">initial</a>\n
+  s>        <span class="phase">draft</span> <span class="branchhead">default</span> <span class="tag">tip</span> \n
+  s>       </td>\n
+  s>      </tr>\n
+  s>     \n
+  s>     </tbody>\n
+  s>     </table>\n
+  s>     \n
+  s>     <div class="navigate">\n
+  s>     <a href="/redirected/shortlog/tip?revcount=30">less</a>\n
+  s>     <a href="/redirected/shortlog/tip?revcount=120">more</a>\n
+  s>     | rev 0: <a href="/redirected/shortlog/96ee1d7354c4">(0)</a> <a href="/redirected/shortlog/tip">tip</a> \n
+  s>     </div>\n
+  s>     \n
+  s>     <script type="text/javascript">\n
+  s>         ajaxScrollInit(\n
+  s>                 \'/redirected/shortlog/%next%\',\n
+  s>                 \'\', <!-- NEXTHASH\n
+  s>                 function (htmlText) {
+  s>     \r\n
+  s>     14a\r\n
+  s>     \n
+  s>                     var m = htmlText.match(/\'(\\w+)\', <!-- NEXTHASH/);\n
+  s>                     return m ? m[1] : null;\n
+  s>                 },\n
+  s>                 \'.bigtable > tbody\',\n
+  s>                 \'<tr class="%class%">\\\n
+  s>                 <td colspan="3" style="text-align: center;">%text%</td>\\\n
+  s>                 </tr>\'\n
+  s>         );\n
+  s>     </script>\n
+  s>     \n
+  s>     </div>\n
+  s>     </div>\n
+  s>     \n
+  s>     \n
+  s>     \n
+  s>     </body>\n
+  s>     </html>\n
+  s>     \n
+  s>     \r\n
+  s>     0\r\n
+  s>     \r\n
+
+  $ hg --verbose debugwireproto http://$LOCALIP:$HGPORT/redirector << EOF
+  > command heads
+  > EOF
+  s>     GET /redirector?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 301 Redirect\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Location: http://$LOCALIP:$HGPORT/redirected\r\n (glob)
+  s>     Content-Type: text/plain\r\n
+  s>     Content-Length: 10\r\n
+  s>     \r\n
+  s>     redirected
+  s>     GET /redirected HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     ETag: W/"*"\r\n (glob)
+  s>     Content-Type: text/html; charset=ascii\r\n
+  s>     Transfer-Encoding: chunked\r\n
+  s>     \r\n
+  real URL is http://$LOCALIP:$HGPORT/redirected (glob)
+  s>     414\r\n
+  s>     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\n
+  s>     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US">\n
+  s>     <head>\n
+  s>     <link rel="icon" href="/redirected/static/hgicon.png" type="image/png" />\n
+  s>     <meta name="robots" content="index, nofollow" />\n
+  s>     <link rel="stylesheet" href="/redirected/static/style-paper.css" type="text/css" />\n
+  s>     <script type="text/javascript" src="/redirected/static/mercurial.js"></script>\n
+  s>     \n
+  s>     <title>redirected: log</title>\n
+  s>     <link rel="alternate" type="application/atom+xml"\n
+  s>        href="/redirected/atom-log" title="Atom feed for redirected" />\n
+  s>     <link rel="alternate" type="application/rss+xml"\n
+  s>        href="/redirected/rss-log" title="RSS feed for redirected" />\n
+  s>     </head>\n
+  s>     <body>\n
+  s>     \n
+  s>     <div class="container">\n
+  s>     <div class="menu">\n
+  s>     <div class="logo">\n
+  s>     <a href="https://mercurial-scm.org/">\n
+  s>     <img src="/redirected/static/hglogo.png" alt="mercurial" /></a>\n
+  s>     </div>\n
+  s>     <ul>\n
+  s>     <li class="active">log</li>\n
+  s>     <li><a href="/redirected/graph/tip">graph</a></li>\n
+  s>     <li><a href="/redirected/tags">tags</a
+  s>     GET /redirected?cmd=capabilities HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.1\r\n
+  s>     Content-Length: 453\r\n
+  s>     \r\n
+  real URL is http://$LOCALIP:$HGPORT/redirected (glob)
+  s>     batch branchmap $USUAL_BUNDLE2_CAPS_SERVER$ changegroupsubset compression=$BUNDLE2_COMPRESSIONS$ getbundle httpheader=1024 httpmediatype=0.1rx,0.1tx,0.2tx known lookup pushkey streamreqs=generaldelta,revlogv1 unbundle=HG10GZ,HG10BZ,HG10UN unbundlehash
+  sending heads command
+  s>     GET /redirected?cmd=heads HTTP/1.1\r\n
+  s>     Accept-Encoding: identity\r\n
+  s>     vary: X-HgProto-1\r\n
+  s>     x-hgproto-1: 0.1 0.2 comp=$USUAL_COMPRESSIONS$ partial-pull\r\n
+  s>     accept: application/mercurial-0.1\r\n
+  s>     host: $LOCALIP:$HGPORT\r\n (glob)
+  s>     user-agent: Mercurial debugwireproto\r\n
+  s>     \r\n
+  s> makefile('rb', None)
+  s>     HTTP/1.1 200 Script output follows\r\n
+  s>     Server: testing stub value\r\n
+  s>     Date: $HTTP_DATE$\r\n
+  s>     Content-Type: application/mercurial-0.1\r\n
+  s>     Content-Length: 41\r\n
+  s>     \r\n
+  s>     96ee1d7354c4ad7372047672c36a1f561e3a6a4c\n
+  response: [b'\x96\xee\x1dsT\xc4\xadsr\x04vr\xc3j\x1fV\x1e:jL']
diff --git a/mercurial/httppeer.py b/mercurial/httppeer.py
--- a/mercurial/httppeer.py
+++ b/mercurial/httppeer.py
@@ -328,13 +328,24 @@ 
 
     return res
 
+class RedirectedRepoError(error.RepoError):
+    def __init__(self, msg, respurl):
+        super(RedirectedRepoError, self).__init__(msg)
+        self.respurl = respurl
+
 def parsev1commandresponse(ui, baseurl, requrl, qs, resp, compressible,
                            allowcbor=False):
     # record the url we got redirected to
+    redirected = False
     respurl = pycompat.bytesurl(resp.geturl())
     if respurl.endswith(qs):
         respurl = respurl[:-len(qs)]
+        qsdropped = False
+    else:
+        qsdropped = True
+
     if baseurl.rstrip('/') != respurl.rstrip('/'):
+        redirected = True
         if not ui.quiet:
             ui.warn(_('real URL is %s\n') % respurl)
 
@@ -351,10 +362,16 @@ 
     # application/hg-changegroup. We don't support such old servers.
     if not proto.startswith('application/mercurial-'):
         ui.debug("requested URL: '%s'\n" % util.hidepassword(requrl))
-        raise error.RepoError(
-            _("'%s' does not appear to be an hg repository:\n"
-              "---%%<--- (%s)\n%s\n---%%<---\n")
-            % (safeurl, proto or 'no content-type', resp.read(1024)))
+        msg = _("'%s' does not appear to be an hg repository:\n"
+                "---%%<--- (%s)\n%s\n---%%<---\n") % (
+            safeurl, proto or 'no content-type', resp.read(1024))
+
+        # Some servers may strip the query string from the redirect. We
+        # raise a special error type so callers can react to this specially.
+        if redirected and qsdropped:
+            raise RedirectedRepoError(msg, respurl)
+        else:
+            raise error.RepoError(msg)
 
     try:
         subtype = proto.split('-', 1)[1]
@@ -434,8 +451,6 @@ 
 
     # End of ipeercommands interface.
 
-    # look up capabilities only when needed
-
     def _callstream(self, cmd, _compressible=False, **args):
         args = pycompat.byteskwargs(args)
 
@@ -853,12 +868,32 @@ 
     req, requrl, qs = makev1commandrequest(ui, requestbuilder, caps,
                                            capable, url, 'capabilities',
                                            args)
-
     resp = sendrequest(ui, opener, req)
 
-    respurl, ct, resp = parsev1commandresponse(ui, url, requrl, qs, resp,
-                                               compressible=False,
-                                               allowcbor=advertisev2)
+    # The server may redirect us to the repo root, stripping the
+    # ?cmd=capabilities query string from the URL. The server would likely
+    # return HTML in this case and ``parsev1commandresponse()`` would raise.
+    # We catch this special case and re-issue the capabilities request against
+    # the new URL.
+    #
+    # We should ideally not do this, as a redirect that drops the query
+    # string from the URL is arguably a server bug. (Garbage in, garbage out).
+    # However,  Mercurial clients for several years appeared to handle this
+    # issue without behavior degradation. And according to issue 5860, it may
+    # be a longstanding bug in some server implementations. So we allow a
+    # redirect that drops the query string to "just work."
+    try:
+        respurl, ct, resp = parsev1commandresponse(ui, url, requrl, qs, resp,
+                                                   compressible=False,
+                                                   allowcbor=advertisev2)
+    except RedirectedRepoError as e:
+        req, requrl, qs = makev1commandrequest(ui, requestbuilder, caps,
+                                               capable, e.respurl,
+                                               'capabilities', args)
+        resp = sendrequest(ui, opener, req)
+        respurl, ct, resp = parsev1commandresponse(ui, url, requrl, qs, resp,
+                                                   compressible=False,
+                                                   allowcbor=advertisev2)
 
     try:
         rawdata = resp.read()