Patchwork context: don't hex encode all unknown 20 char revision specs (issue4890)

login
register
mail settings
Submitter Mads Kiilerich
Date Oct. 8, 2015, 11:20 p.m.
Message ID <dfa056583dbbec4e0a2c.1444346427@localhost.localdomain>
Download mbox | patch
Permalink /patch/10904/
State Accepted
Commit a3fcc8e3136bd19012d28b863d6bf4429948c573
Headers show

Comments

Mads Kiilerich - Oct. 8, 2015, 11:20 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1444346377 -7200
#      Fri Oct 09 01:19:37 2015 +0200
# Node ID dfa056583dbbec4e0a2cc31366a46599c989a979
# Parent  3f234db6fe8d7cc3d39083f7e7523444bc21e5a2
context: don't hex encode all unknown 20 char revision specs (issue4890)

d3908c911d5e introduced nice hexified display of missing nodes. It did however
also make missing 20 character revision specifications be shown as hex - very
confusing.

Users are often wrong and somehow specify revisions that don't exist. Nodes
will however rarely be missing ... and they will only look like a user provided
revision specification and be all ascii in 1 of 4*10**9.

With this change, missing revisions will only be hexified if they really look
like binary nodes. This change will thus improve the error reporting UI in the
common case and only very rarely make it confusing in the opposite direction of
how it was before.
Matt Mackall - Oct. 9, 2015, 4:16 a.m.
On Fri, 2015-10-09 at 01:20 +0200, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1444346377 -7200
> #      Fri Oct 09 01:19:37 2015 +0200
> # Node ID dfa056583dbbec4e0a2cc31366a46599c989a979
> # Parent  3f234db6fe8d7cc3d39083f7e7523444bc21e5a2
> context: don't hex encode all unknown 20 char revision specs (issue4890)
> 
> d3908c911d5e introduced nice hexified display of missing nodes. It did however
> also make missing 20 character revision specifications be shown as hex - very
> confusing.
> 
> Users are often wrong and somehow specify revisions that don't exist. Nodes
> will however rarely be missing ... and they will only look like a user provided
> revision specification and be all ascii in 1 of 4*10**9.

It's actually 4e8, not 9. Which means it has something like a 1% chance
for mysteriously breaking one of the repos at my day job.

> -                if len(changeid) == 20:
> +                if len(changeid) == 20 and nonascii(changeid):
>                      changeid = hex(changeid)

You cannot put a regex check in the context creation fast path. That
code desperately needs to be faster, not slower.

You are, however, welcome to make the error handler as slow as you want.
Mads Kiilerich - Oct. 9, 2015, 10:57 a.m.
On 10/09/2015 06:16 AM, Matt Mackall wrote:
> On Fri, 2015-10-09 at 01:20 +0200, Mads Kiilerich wrote:
>> # HG changeset patch
>> # User Mads Kiilerich <madski@unity3d.com>
>> # Date 1444346377 -7200
>> #      Fri Oct 09 01:19:37 2015 +0200
>> # Node ID dfa056583dbbec4e0a2cc31366a46599c989a979
>> # Parent  3f234db6fe8d7cc3d39083f7e7523444bc21e5a2
>> context: don't hex encode all unknown 20 char revision specs (issue4890)
>>
>> d3908c911d5e introduced nice hexified display of missing nodes. It did however
>> also make missing 20 character revision specifications be shown as hex - very
>> confusing.
>>
>> Users are often wrong and somehow specify revisions that don't exist. Nodes
>> will however rarely be missing ... and they will only look like a user provided
>> revision specification and be all ascii in 1 of 4*10**9.
> It's actually 4e8, not 9. Which means it has something like a 1% chance
> for mysteriously breaking one of the repos at my day job.

Yes, if you make lookup of all the nodes and none of them exists.

I don't know, what is the simplest way of triggering a lookup of a 
non-existing node?

>> -                if len(changeid) == 20:
>> +                if len(changeid) == 20 and nonascii(changeid):
>>                       changeid = hex(changeid)
> You cannot put a regex check in the context creation fast path. That
> code desperately needs to be faster, not slower.
>
> You are, however, welcome to make the error handler as slow as you want.

This _is_ in the error handler.

/Mads
Matt Mackall - Oct. 11, 2015, 3:57 p.m.
On Fri, 2015-10-09 at 12:57 +0200, Mads Kiilerich wrote:
> >> -                if len(changeid) == 20:
> >> +                if len(changeid) == 20 and nonascii(changeid):
> >>                       changeid = hex(changeid)
> > You cannot put a regex check in the context creation fast path. That
> > code desperately needs to be faster, not slower.
> >
> > You are, however, welcome to make the error handler as slow as you want.
> 
> This _is_ in the error handler.

Right, now it all makes sense. I thought this was the earlier "== 20"
block. Queued for default, thanks.
Mads Kiilerich - Oct. 12, 2015, 6 p.m.
On 10/11/2015 05:57 PM, Matt Mackall wrote:
> On Fri, 2015-10-09 at 12:57 +0200, Mads Kiilerich wrote:
>>>> -                if len(changeid) == 20:
>>>> +                if len(changeid) == 20 and nonascii(changeid):
>>>>                        changeid = hex(changeid)
>>> You cannot put a regex check in the context creation fast path. That
>>> code desperately needs to be faster, not slower.
>>>
>>> You are, however, welcome to make the error handler as slow as you want.
>> This _is_ in the error handler.
> Right, now it all makes sense. I thought this was the earlier "== 20"
> block. Queued for default, thanks.

I looked more at it and also agree that the end of the __init__ not 
_just_ is an _error_ handler. It is also reached from localrepo 
__contains__ and lookup where RepoLookupError is caught and the nice 
localized message is discarded.

Some anecdotal evidence from the test suite: it only ends up in the case 
with missing 20 char changeids 50 times. 37 of them are 
'_rebasedefaultdest()'. Only 7 of them are binary. The code did thus 
already have room for improvement and the extra regexp check is not in a 
fast fast path and will not slow anything down in a noticeable way.

Also agreed, as a part of the mentioned desperately needed improvement, 
these nice and often irrelevant messages could be moved to the "real" 
error handler in dispatch.

/Mads

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -5,6 +5,8 @@ 
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
+import re
+
 from node import nullid, nullrev, wdirid, short, hex, bin
 from i18n import _
 import mdiff, error, util, scmutil, subrepo, patch, encoding, phases
@@ -22,6 +24,8 @@  propertycache = util.propertycache
 # dirty in the working copy.
 _newnode = '!' * 21
 
+nonascii = re.compile(r'[^\x21-\x7f]').search
+
 class basectx(object):
     """A basectx object represents the common logic for its children:
     changectx: read-only context that is already present in the repo,
@@ -466,7 +470,7 @@  class changectx(basectx):
                 msg = _("working directory has unknown parent '%s'!")
                 raise error.Abort(msg % short(changeid))
             try:
-                if len(changeid) == 20:
+                if len(changeid) == 20 and nonascii(changeid):
                     changeid = hex(changeid)
             except TypeError:
                 pass
diff --git a/tests/test-pull.t b/tests/test-pull.t
--- a/tests/test-pull.t
+++ b/tests/test-pull.t
@@ -52,6 +52,18 @@ 
   $ hg rollback --dry-run --verbose
   repository tip rolled back to revision -1 (undo pull: http://foo:***@localhost:$HGPORT/)
 
+Test pull of non-existing 20 character revision specification, making sure plain ascii identifiers
+not are encoded like a node:
+
+  $ hg pull -r 'xxxxxxxxxxxxxxxxxxxy'
+  pulling from http://foo@localhost:$HGPORT/
+  abort: unknown revision 'xxxxxxxxxxxxxxxxxxxy'!
+  [255]
+  $ hg pull -r 'xxxxxxxxxxxxxxxxxx y'
+  pulling from http://foo@localhost:$HGPORT/
+  abort: unknown revision '7878787878787878787878787878787878782079'!
+  [255]
+
 Issue622: hg init && hg pull -u URL doesn't checkout default branch
 
   $ cd ..