Patchwork [V3] revset: make id() to not abort on an unknown 40-byte hexadecimal string

login
register
mail settings
Submitter Alexander Drozdov
Date April 16, 2015, 4:49 a.m.
Message ID <a4bd33c69fc54357e500.1429159755@localhost6.localdomain6>
Download mbox | patch
Permalink /patch/8702/
State Accepted
Headers show

Comments

Alexander Drozdov - April 16, 2015, 4:49 a.m.
# HG changeset patch
# User Alexander Drozdov <al.drozdov@gmail.com>
# Date 1429159436 -10800
#      Thu Apr 16 07:43:56 2015 +0300
# Node ID a4bd33c69fc54357e5002bb320f7abd762010e37
# Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
revset: make id() to not abort on an unknown 40-byte hexadecimal string

Instead, just return the empty set in the case as for less-then-40-byte
strings.
Yuya Nishihara - April 16, 2015, 1:07 p.m.
On Thu, 16 Apr 2015 07:49:15 +0300, Alexander Drozdov wrote:
> # HG changeset patch
> # User Alexander Drozdov <al.drozdov@gmail.com>
> # Date 1429159436 -10800
> #      Thu Apr 16 07:43:56 2015 +0300
> # Node ID a4bd33c69fc54357e5002bb320f7abd762010e37
> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
> revset: make id() to not abort on an unknown 40-byte hexadecimal string
> 
> Instead, just return the empty set in the case as for less-then-40-byte
> strings.
> 
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1294,7 +1294,13 @@ def node_(repo, subset, x):
>      # i18n: "id" is a keyword
>      n = getstring(l[0], _("id requires a string"))
>      if len(n) == 40:
> -        rn = repo[n].rev()
> +        try:
> +            rn = repo.changelog.rev(node.bin(n))
> +        except (TypeError):
> +            # i18n: "id" is a keyword
> +            raise error.ParseError(_("id expects a hexadecimal string"))

If we raise ParseError here, short non-hexadecimal hash should also be handled
the same way. But node.bin() can't be applied to odd-length hex. My suggestion
was to make separate patches so that the first one, no abort on unknown hash,
would likely be accepted.

Regards,
Alexander Drozdov - April 16, 2015, 4:24 p.m.
On Thu, 16 Apr 2015 22:07:27 +0900, Yuya Nishihara wrote:
> On Thu, 16 Apr 2015 07:49:15 +0300, Alexander Drozdov wrote:
>> # HG changeset patch
>> # User Alexander Drozdov <al.drozdov@gmail.com>
>> # Date 1429159436 -10800
>> #      Thu Apr 16 07:43:56 2015 +0300
>> # Node ID a4bd33c69fc54357e5002bb320f7abd762010e37
>> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
>> revset: make id() to not abort on an unknown 40-byte hexadecimal string
>>
>> Instead, just return the empty set in the case as for less-then-40-byte
>> strings.
>>
>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>> --- a/mercurial/revset.py
>> +++ b/mercurial/revset.py
>> @@ -1294,7 +1294,13 @@ def node_(repo, subset, x):
>>       # i18n: "id" is a keyword
>>       n = getstring(l[0], _("id requires a string"))
>>       if len(n) == 40:
>> -        rn = repo[n].rev()
>> +        try:
>> +            rn = repo.changelog.rev(node.bin(n))
>> +        except (TypeError):
>> +            # i18n: "id" is a keyword
>> +            raise error.ParseError(_("id expects a hexadecimal string"))
>
> If we raise ParseError here, short non-hexadecimal hash should also be handled
> the same way. But node.bin() can't be applied to odd-length hex. My suggestion
> was to make separate patches so that the first one, no abort on unknown hash,
> would likely be accepted.

I think extra checks for strings shorter than 40 characters here may cause a
performance penalty as partialmatch() also do the checks. Instead, I think it is
better to raise (actually to just re-raise) an exception from partialmatch() to
just catch it here. C and pure python partialmatch() code should be changed.
That change also requires to review all the code calling partialmatch(). And I
thing that change is what should be done later by a separate patch, isn't it?

>
> Regards,
>
Matt Mackall - April 16, 2015, 10:56 p.m.
On Thu, 2015-04-16 at 19:24 +0300, Alexander Drozdov wrote:
> On Thu, 16 Apr 2015 22:07:27 +0900, Yuya Nishihara wrote:
> > On Thu, 16 Apr 2015 07:49:15 +0300, Alexander Drozdov wrote:
> >> # HG changeset patch
> >> # User Alexander Drozdov <al.drozdov@gmail.com>
> >> # Date 1429159436 -10800
> >> #      Thu Apr 16 07:43:56 2015 +0300
> >> # Node ID a4bd33c69fc54357e5002bb320f7abd762010e37
> >> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
> >> revset: make id() to not abort on an unknown 40-byte hexadecimal string
> >>
> >> Instead, just return the empty set in the case as for less-then-40-byte
> >> strings.
> >>
> >> diff --git a/mercurial/revset.py b/mercurial/revset.py
> >> --- a/mercurial/revset.py
> >> +++ b/mercurial/revset.py
> >> @@ -1294,7 +1294,13 @@ def node_(repo, subset, x):
> >>       # i18n: "id" is a keyword
> >>       n = getstring(l[0], _("id requires a string"))
> >>       if len(n) == 40:
> >> -        rn = repo[n].rev()
> >> +        try:
> >> +            rn = repo.changelog.rev(node.bin(n))
> >> +        except (TypeError):
> >> +            # i18n: "id" is a keyword
> >> +            raise error.ParseError(_("id expects a hexadecimal string"))
> >
> > If we raise ParseError here, short non-hexadecimal hash should also be handled
> > the same way. But node.bin() can't be applied to odd-length hex. My suggestion
> > was to make separate patches so that the first one, no abort on unknown hash,
> > would likely be accepted.
> 
> I think extra checks for strings shorter than 40 characters here may cause a
> performance penalty as partialmatch() also do the checks. Instead, I think it is
> better to raise (actually to just re-raise) an exception from partialmatch() to
> just catch it here. C and pure python partialmatch() code should be changed.
> That change also requires to review all the code calling partialmatch(). And I
> thing that change is what should be done later by a separate patch, isn't it?

Looks like we're still iterating here, but I'll probably accept patches
for this after the freeze.
Yuya Nishihara - April 17, 2015, 12:58 p.m.
On Thu, 16 Apr 2015 19:24:48 +0300, Alexander Drozdov wrote:
> On Thu, 16 Apr 2015 22:07:27 +0900, Yuya Nishihara wrote:
> > On Thu, 16 Apr 2015 07:49:15 +0300, Alexander Drozdov wrote:
> >> # HG changeset patch
> >> # User Alexander Drozdov <al.drozdov@gmail.com>
> >> # Date 1429159436 -10800
> >> #      Thu Apr 16 07:43:56 2015 +0300
> >> # Node ID a4bd33c69fc54357e5002bb320f7abd762010e37
> >> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
> >> revset: make id() to not abort on an unknown 40-byte hexadecimal string
> >>
> >> Instead, just return the empty set in the case as for less-then-40-byte
> >> strings.
> >>
> >> diff --git a/mercurial/revset.py b/mercurial/revset.py
> >> --- a/mercurial/revset.py
> >> +++ b/mercurial/revset.py
> >> @@ -1294,7 +1294,13 @@ def node_(repo, subset, x):
> >>       # i18n: "id" is a keyword
> >>       n = getstring(l[0], _("id requires a string"))
> >>       if len(n) == 40:
> >> -        rn = repo[n].rev()
> >> +        try:
> >> +            rn = repo.changelog.rev(node.bin(n))
> >> +        except (TypeError):
> >> +            # i18n: "id" is a keyword
> >> +            raise error.ParseError(_("id expects a hexadecimal string"))
> >
> > If we raise ParseError here, short non-hexadecimal hash should also be handled
> > the same way. But node.bin() can't be applied to odd-length hex. My suggestion
> > was to make separate patches so that the first one, no abort on unknown hash,
> > would likely be accepted.
> 
> I think extra checks for strings shorter than 40 characters here may cause a
> performance penalty as partialmatch() also do the checks. Instead, I think it is
> better to raise (actually to just re-raise) an exception from partialmatch() to
> just catch it here. C and pure python partialmatch() code should be changed.

I'm not sure if partialmatch() should be changed that way. And I guess the
extra checks would be much cheaper compared to partialmatch().

> That change also requires to review all the code calling partialmatch(). And I
> thing that change is what should be done later by a separate patch, isn't it?

Yes, by a separate patch, So for now, I think ParseError shouldn't be raised
at all.

This patch (partially) addresses the following issues.

 a. id(unknown_full_hash) aborts, but id(unknown_short_hash) doesn't
 b. id(40byte_tag_or_bookmark) returns tagged/bookmarked revision
 c. id(non_hexadecimal_string) should raise ParseError just like
    rev(non_integer) does?

Perhaps (a) and (b) will be accepted for stable, but I don't think (c) will.
I think (c) is correct, but it's only my opinion.

Regards,
Alexander Drozdov - April 18, 2015, 11:49 a.m.
On Fri, 17 Apr 2015 21:58:26 +0900, Yuya Nishihara wrote:
> On Thu, 16 Apr 2015 19:24:48 +0300, Alexander Drozdov wrote:
>> On Thu, 16 Apr 2015 22:07:27 +0900, Yuya Nishihara wrote:
>>> On Thu, 16 Apr 2015 07:49:15 +0300, Alexander Drozdov wrote:
>>>> # HG changeset patch
>>>> # User Alexander Drozdov <al.drozdov@gmail.com>
>>>> # Date 1429159436 -10800
>>>> #      Thu Apr 16 07:43:56 2015 +0300
>>>> # Node ID a4bd33c69fc54357e5002bb320f7abd762010e37
>>>> # Parent  c560d8c687916cb70a6d54c2c9ddcb5c9e457be2
>>>> revset: make id() to not abort on an unknown 40-byte hexadecimal string
>>>>
>>>> Instead, just return the empty set in the case as for less-then-40-byte
>>>> strings.
>>>>
>>>> diff --git a/mercurial/revset.py b/mercurial/revset.py
>>>> --- a/mercurial/revset.py
>>>> +++ b/mercurial/revset.py
>>>> @@ -1294,7 +1294,13 @@ def node_(repo, subset, x):
>>>>        # i18n: "id" is a keyword
>>>>        n = getstring(l[0], _("id requires a string"))
>>>>        if len(n) == 40:
>>>> -        rn = repo[n].rev()
>>>> +        try:
>>>> +            rn = repo.changelog.rev(node.bin(n))
>>>> +        except (TypeError):
>>>> +            # i18n: "id" is a keyword
>>>> +            raise error.ParseError(_("id expects a hexadecimal string"))
>>>
>>> If we raise ParseError here, short non-hexadecimal hash should also be handled
>>> the same way. But node.bin() can't be applied to odd-length hex. My suggestion
>>> was to make separate patches so that the first one, no abort on unknown hash,
>>> would likely be accepted.
>>
>> I think extra checks for strings shorter than 40 characters here may cause a
>> performance penalty as partialmatch() also do the checks. Instead, I think it is
>> better to raise (actually to just re-raise) an exception from partialmatch() to
>> just catch it here. C and pure python partialmatch() code should be changed.
>
> I'm not sure if partialmatch() should be changed that way. And I guess the
> extra checks would be much cheaper compared to partialmatch().
>
>> That change also requires to review all the code calling partialmatch(). And I
>> thing that change is what should be done later by a separate patch, isn't it?
>
> Yes, by a separate patch, So for now, I think ParseError shouldn't be raised
> at all.
>
> This patch (partially) addresses the following issues.
>
>   a. id(unknown_full_hash) aborts, but id(unknown_short_hash) doesn't
>   b. id(40byte_tag_or_bookmark) returns tagged/bookmarked revision
>   c. id(non_hexadecimal_string) should raise ParseError just like
>      rev(non_integer) does?
>
> Perhaps (a) and (b) will be accepted for stable, but I don't think (c) will.
> I think (c) is correct, but it's only my opinion.

Thanks! I'll resend (a) & (b).

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1294,7 +1294,13 @@  def node_(repo, subset, x):
     # i18n: "id" is a keyword
     n = getstring(l[0], _("id requires a string"))
     if len(n) == 40:
-        rn = repo[n].rev()
+        try:
+            rn = repo.changelog.rev(node.bin(n))
+        except (TypeError):
+            # i18n: "id" is a keyword
+            raise error.ParseError(_("id expects a hexadecimal string"))
+        except (LookupError):
+            rn = None
     else:
         rn = None
         pm = repo.changelog._partialmatch(n)
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -554,6 +554,22 @@  Test explicit numeric revision
   hg: parse error: rev expects a number
   [255]
 
+Test hexadecimal revision
+  $ log 'id(2)'
+  abort: 00changelog.i@2: ambiguous identifier!
+  [255]
+  $ log 'id(23268)'
+  4
+  $ log 'id(2785f51eece)'
+  0
+  $ log 'id(d5d0dcbdc4d9ff5dbb2d336f32f0bb561c1a532c)'
+  8
+  $ log 'id(d5d0dcbdc4a)'
+  $ log 'id(d5d0dcbdc4d9ff5dbb2d336f32f0bb561c1a532d)'
+  $ log 'id(d5d0dcbdc4d9ff5dbb2d336f32f0bb561c1a532q)'
+  hg: parse error: id expects a hexadecimal string
+  [255]
+
 Test null revision
   $ log '(null)'
   -1