Patchwork [2,of,5] revlog: use hashlib.sha1 directly instead of through util

login
register
mail settings
Submitter Augie Fackler
Date June 10, 2016, 4:41 a.m.
Message ID <bd2caf6762219aba40ed.1465533675@imladris.local>
Download mbox | patch
Permalink /patch/15453/
State Accepted
Headers show

Comments

Augie Fackler - June 10, 2016, 4:41 a.m.
# HG changeset patch
# User Augie Fackler <raf@durin42.com>
# Date 1465531834 14400
#      Fri Jun 10 00:10:34 2016 -0400
# Node ID bd2caf6762219aba40edd4dd697fee70de306940
# Parent  16f5b31b1228212dc175b5a0baf3601cca48330d
revlog: use hashlib.sha1 directly instead of through util

Also remove module-local _sha alias, which was barely used.
Matt Mackall - June 10, 2016, 8:47 p.m.
On Fri, 2016-06-10 at 00:41 -0400, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <raf@durin42.com>
> # Date 1465531834 14400
> #      Fri Jun 10 00:10:34 2016 -0400
> # Node ID bd2caf6762219aba40edd4dd697fee70de306940
> # Parent  16f5b31b1228212dc175b5a0baf3601cca48330d
> revlog: use hashlib.sha1 directly instead of through util
> 
> Also remove module-local _sha alias, which was barely used.

@@ -40,7 +41,6 @@ from . import (
>  _unpack = struct.unpack
>  _compress = zlib.compress
>  _decompress = zlib.decompress
> -_sha = util.sha1

These are each a micro-optimization on Python's abysmal lookup path.

-- 
Mathematics is the supreme nostalgia of our time.
Augie Fackler - June 10, 2016, 9:36 p.m.
> On Jun 10, 2016, at 4:47 PM, Matt Mackall <mpm@selenic.com> wrote:
> 
> On Fri, 2016-06-10 at 00:41 -0400, Augie Fackler wrote:
>> # HG changeset patch
>> # User Augie Fackler <raf@durin42.com>
>> # Date 1465531834 14400
>> #      Fri Jun 10 00:10:34 2016 -0400
>> # Node ID bd2caf6762219aba40edd4dd697fee70de306940
>> # Parent  16f5b31b1228212dc175b5a0baf3601cca48330d
>> revlog: use hashlib.sha1 directly instead of through util
>> 
>> Also remove module-local _sha alias, which was barely used.
> 
> @@ -40,7 +41,6 @@ from . import (
>>  _unpack = struct.unpack
>>  _compress = zlib.compress
>>  _decompress = zlib.decompress
>> -_sha = util.sha1
> 
> These are each a micro-optimization on Python's abysmal lookup path.

Ugh. Does it make a measurable difference on a perf command?

> 
> --
> Mathematics is the supreme nostalgia of our time.
> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Gregory Szorc - June 10, 2016, 9:41 p.m.
On Fri, Jun 10, 2016 at 2:36 PM, Augie Fackler <raf@durin42.com> wrote:

>
> > On Jun 10, 2016, at 4:47 PM, Matt Mackall <mpm@selenic.com> wrote:
> >
> > On Fri, 2016-06-10 at 00:41 -0400, Augie Fackler wrote:
> >> # HG changeset patch
> >> # User Augie Fackler <raf@durin42.com>
> >> # Date 1465531834 14400
> >> #      Fri Jun 10 00:10:34 2016 -0400
> >> # Node ID bd2caf6762219aba40edd4dd697fee70de306940
> >> # Parent  16f5b31b1228212dc175b5a0baf3601cca48330d
> >> revlog: use hashlib.sha1 directly instead of through util
> >>
> >> Also remove module-local _sha alias, which was barely used.
> >
> > @@ -40,7 +41,6 @@ from . import (
> >>  _unpack = struct.unpack
> >>  _compress = zlib.compress
> >>  _decompress = zlib.decompress
> >> -_sha = util.sha1
> >
> > These are each a micro-optimization on Python's abysmal lookup path.
>
> Ugh. Does it make a measurable difference on a perf command?
>

I would go for a revset that accesses non-indexed changelog revision data.
Something like author() or date(). That will perform a sha1 for every
changeset without involving too many other operations.


>
> >
> > --
> > Mathematics is the supreme nostalgia of our time.
> >
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
Augie Fackler - June 24, 2016, 1:51 a.m.
> On Jun 10, 2016, at 5:41 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> 
> On Fri, Jun 10, 2016 at 2:36 PM, Augie Fackler <raf@durin42.com <mailto:raf@durin42.com>> wrote:
> 
> > On Jun 10, 2016, at 4:47 PM, Matt Mackall <mpm@selenic.com <mailto:mpm@selenic.com>> wrote:
> >
> > On Fri, 2016-06-10 at 00:41 -0400, Augie Fackler wrote:
> >> # HG changeset patch
> >> # User Augie Fackler <raf@durin42.com <mailto:raf@durin42.com>>
> >> # Date 1465531834 14400
> >> #      Fri Jun 10 00:10:34 2016 -0400
> >> # Node ID bd2caf6762219aba40edd4dd697fee70de306940
> >> # Parent  16f5b31b1228212dc175b5a0baf3601cca48330d
> >> revlog: use hashlib.sha1 directly instead of through util
> >>
> >> Also remove module-local _sha alias, which was barely used.
> >
> > @@ -40,7 +41,6 @@ from . import (
> >>  _unpack = struct.unpack
> >>  _compress = zlib.compress
> >>  _decompress = zlib.decompress
> >> -_sha = util.sha1
> >
> > These are each a micro-optimization on Python's abysmal lookup path.
> 
> Ugh. Does it make a measurable difference on a perf command?
> 
> I would go for a revset that accesses non-indexed changelog revision data. Something like author() or date(). That will perform a sha1 for every changeset without involving too many other operations.

Curiously, this appears to make things worse? See http://hg.durin42.com/hg-wip/rev/997c8cf8f586 for a patch and some perfrevset results.

I’m a little suspicious of my results, but this result lines up with what I’ve seen elsewhere - it’s either a little worse or too close to be a meaningful result.

> 
> 
> >
> > --
> > Mathematics is the supreme nostalgia of our time.
> >
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org <mailto:Mercurial-devel@mercurial-scm.org>
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel <https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel>
> 
> 
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org <mailto:Mercurial-devel@mercurial-scm.org>
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel <https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel>
Matt Mackall - June 24, 2016, 8:10 p.m.
On Thu, 2016-06-23 at 21:51 -0400, Augie Fackler wrote:
> > 
> > On Jun 10, 2016, at 5:41 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> > 
> > On Fri, Jun 10, 2016 at 2:36 PM, Augie Fackler <raf@durin42.com <mailto:raf@
> > durin42.com>> wrote:
> > 
> > > 
> > > On Jun 10, 2016, at 4:47 PM, Matt Mackall <mpm@selenic.com <mailto:mpm@sel
> > > enic.com>> wrote:
> > > 
> > > On Fri, 2016-06-10 at 00:41 -0400, Augie Fackler wrote:
> > > > 
> > > > # HG changeset patch
> > > > # User Augie Fackler <raf@durin42.com <mailto:raf@durin42.com>>
> > > > # Date 1465531834 14400
> > > > #      Fri Jun 10 00:10:34 2016 -0400
> > > > # Node ID bd2caf6762219aba40edd4dd697fee70de306940
> > > > # Parent  16f5b31b1228212dc175b5a0baf3601cca48330d
> > > > revlog: use hashlib.sha1 directly instead of through util
> > > > 
> > > > Also remove module-local _sha alias, which was barely used.
> > > @@ -40,7 +41,6 @@ from . import (
> > > > 
> > > >  _unpack = struct.unpack
> > > >  _compress = zlib.compress
> > > >  _decompress = zlib.decompress
> > > > -_sha = util.sha1
> > > These are each a micro-optimization on Python's abysmal lookup path.
> > Ugh. Does it make a measurable difference on a perf command?
> > 
> > I would go for a revset that accesses non-indexed changelog revision data.
> > Something like author() or date(). That will perform a sha1 for every
> > changeset without involving too many other operations.
> Curiously, this appears to make things worse? See http://hg.durin42.com/hg-wip
> /rev/997c8cf8f586 for a patch and some perfrevset results.
> 
> I’m a little suspicious of my results, but this result lines up with what I’ve
> seen elsewhere - it’s either a little worse or too close to be a meaningful
> result.

I couldn't measure it last week. Power management effects seemed to dominate. Oh
well.

-- 
Mathematics is the supreme nostalgia of our time.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -15,6 +15,7 @@  from __future__ import absolute_import
 
 import collections
 import errno
+import hashlib
 import os
 import struct
 import zlib
@@ -40,7 +41,6 @@  from . import (
 _unpack = struct.unpack
 _compress = zlib.compress
 _decompress = zlib.decompress
-_sha = util.sha1
 
 # revlog header flags
 REVLOGV0 = 0
@@ -74,7 +74,7 @@  def gettype(q):
 def offset_type(offset, type):
     return long(long(offset) << 16 | type)
 
-_nullhash = _sha(nullid)
+_nullhash = hashlib.sha1(nullid)
 
 def hash(text, p1, p2):
     """generate a hash from the given text and its parent hashes
@@ -92,7 +92,7 @@  def hash(text, p1, p2):
         # none of the parent nodes are nullid
         l = [p1, p2]
         l.sort()
-        s = _sha(l[0])
+        s = hashlib.sha1(l[0])
         s.update(l[1])
     s.update(text)
     return s.digest()