Patchwork [4,of,6,v6,lazy-manifest] manifest: use custom C implementation of lazymanifest

login
register
mail settings
Submitter Augie Fackler
Date March 7, 2015, 5:16 p.m.
Message ID <821f59a10986d380352d.1425748579@imladris.local>
Download mbox | patch
Permalink /patch/7929/
State Accepted
Headers show

Comments

Augie Fackler - March 7, 2015, 5:16 p.m.
# HG changeset patch
# User Augie Fackler <augie@google.com>
# Date 1425695387 18000
#      Fri Mar 06 21:29:47 2015 -0500
# Node ID 821f59a10986d380352d4f88f65f9398e3671d4e
# Parent  74e64852b07f9cfb5a7b89d827dd9e1f01314b1b
manifest: use custom C implementation of lazymanifest

This version is actually lazy, unlike the pure-python version. The
latter could stand to be optimized if anyone actually wants to use it
seriously. I put no work into it.

Before any of my related changes on mozilla-central:

perfmanifest tip
! wall 0.268805 comb 0.260000 user 0.260000 sys 0.000000 (best of 37)
perftags
! result: 162
! wall 0.007099 comb 0.000000 user 0.000000 sys 0.000000 (best of 401)
perfstatus
! wall 0.415680 comb 0.420000 user 0.260000 sys 0.160000 (best of 24)
hgperf export tip
! wall 0.142118 comb 0.140000 user 0.140000 sys 0.000000 (best of 67)

after all of my changes on mozilla-central:

./hg:
perfmanifest tip
! wall 0.232640 comb 0.230000 user 0.220000 sys 0.010000 (best of 43)
perftags
! result: 162
! wall 0.007057 comb 0.010000 user 0.000000 sys 0.010000 (best of 395)
perfstatus
! wall 0.415503 comb 0.420000 user 0.280000 sys 0.140000 (best of 24)
hgperf export tip
! wall 0.025096 comb 0.030000 user 0.030000 sys 0.000000 (best of 102)

so it's no real change in performance on perf{manifest,tags,status},
but is a huge win on 'hgperf export tip'.

There's a little performance work that could still be done here:
fastdelta() could be done significantly more intelligently by using
the internal state of the lazymanifest type in C, but that seems like
good future work.
Matt Harbison - March 12, 2015, 2:18 a.m.
On Sat, 07 Mar 2015 12:16:19 -0500, Augie Fackler <raf@durin42.com> wrote:

> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1425695387 18000
> #      Fri Mar 06 21:29:47 2015 -0500
> # Node ID 821f59a10986d380352d4f88f65f9398e3671d4e
> # Parent  74e64852b07f9cfb5a7b89d827dd9e1f01314b1b
> manifest: use custom C implementation of lazymanifest

I'm getting a python crash on Windows, starting with this when running  
'./run-tests.py --local test-locate.t', with 2.7.2.  Letting Visual Studio  
debug it only indicates that it dies in a free(), on top of a fairly large  
python27.dll call stack.

What's the best way to go about debugging this?  Is there a way to build  
the dll in debug mode, and maybe it will trigger some assertion closer to  
the original heap corruption?

--Matt
Matt Harbison - March 12, 2015, 4:25 a.m.
On Wed, 11 Mar 2015 22:18:44 -0400, Matt Harbison <mharbison72@gmail.com>  
wrote:

> On Sat, 07 Mar 2015 12:16:19 -0500, Augie Fackler <raf@durin42.com>  
> wrote:
>
>> # HG changeset patch
>> # User Augie Fackler <augie@google.com>
>> # Date 1425695387 18000
>> #      Fri Mar 06 21:29:47 2015 -0500
>> # Node ID 821f59a10986d380352d4f88f65f9398e3671d4e
>> # Parent  74e64852b07f9cfb5a7b89d827dd9e1f01314b1b
>> manifest: use custom C implementation of lazymanifest
>
> I'm getting a python crash on Windows, starting with this when running  
> './run-tests.py --local test-locate.t', with 2.7.2.  Letting Visual  
> Studio debug it only indicates that it dies in a free(), on top of a  
> fairly large python27.dll call stack.
>
> What's the best way to go about debugging this?  Is there a way to build  
> the dll in debug mode, and maybe it will trigger some assertion closer  
> to the original heap corruption?
>
> --Matt

I managed to get a debug version built with various _ASSERTE(  
_CrtCheckMemory( ) ) calls sprinkled around.  It asserts in  
lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2),  
"%02hhx", hash[i]);'.

I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't  
change anything.  I got this from DebugView, tracing the relevant  
variables:

[7704] dest is 006F1A90, plen is 1, i is 17, total is 006F1AB4
[7704] dest is 006F1A90, plen is 1, i is 18, total is 006F1AB6
[7704] dest is 006F1A90, plen is 1, i is 19, total is 006F1AB8
[7704] HEAP CORRUPTION DETECTED: after Normal block (#100) at 0x006F1A90.
[7704] CRT detected that the application wrote to memory after end of heap  
buffer.
[7704] Normal located at 0x006F1A90 is 43 bytes long.

It seems to happen the first time it gets to 19 in the loop.  After  
selecting 'close program', a ton more tracing comes out where it goes  
through that loop with no issue.

I'll pick this up tomorrow when I'm not half asleep if nobody else figures  
it out first.

--Matt

[1] https://msdn.microsoft.com/en-us/library/56e442dc.aspx
Martin von Zweigbergk - March 12, 2015, 5:53 a.m.
On Wed, Mar 11, 2015 at 9:26 PM Matt Harbison <mharbison72@gmail.com> wrote:

> I managed to get a debug version built with various _ASSERTE(
> _CrtCheckMemory( ) ) calls sprinkled around.  It asserts in
> lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2),
> "%02hhx", hash[i]);'.
>
> I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't
> change anything.  I got this from DebugView, tracing the relevant
> variables:
>

It seems unclear what 'hh' really means, see [1].

The '2' for the 'width' parameter is a *minimum*. I suppose if the hash was
negative, it might get sign-extended and then printed as 8 hex characters?
Does casting "hash[i]" to "unsigned char" help? But maybe it's better to
use snprintf?

Also, if the hash byte (char) get sign-extended, it seems like that would
mean that any negative byte would be result in "ff", because "ffffffxx"
gets written and most of it gets overwritten.

While writing the above, I thought that C's "char" was always signed (and
had always thought so), but according to [2], it's implementation-defined
whether it's signed or unsigned. It seems like most people's
implementations use unsigned chars, while yours uses signed chars. I
suppose that means that when your version does succeed without crashing,
your repo is likely to not verify, since the hash will be messed up. Does
it sometimes not crash? Can you check whether 'hg verify' works after
making a commit or two?

[1]
http://stackoverflow.com/questions/4586962/what-is-the-purpose-of-the-h-and-hh-modifiers-for-printf
.
[2]
http://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default
Martin von Zweigbergk - March 12, 2015, 3:33 p.m.
On Wed, Mar 11, 2015 at 10:53 PM Martin von Zweigbergk <
martinvonz@google.com> wrote:

> On Wed, Mar 11, 2015 at 9:26 PM Matt Harbison <mharbison72@gmail.com>
> wrote:
>
>> I managed to get a debug version built with various _ASSERTE(
>> _CrtCheckMemory( ) ) calls sprinkled around.  It asserts in
>> lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2),
>> "%02hhx", hash[i]);'.
>>
>> I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't
>> change anything.  I got this from DebugView, tracing the relevant
>> variables:
>>
>
> It seems unclear what 'hh' really means, see [1].
>
> The '2' for the 'width' parameter is a *minimum*. I suppose if the hash
> was negative, it might get sign-extended and then printed as 8 hex
> characters? Does casting "hash[i]" to "unsigned char" help? But maybe it's
> better to use snprintf?
>
> Also, if the hash byte (char) get sign-extended, it seems like that would
> mean that any negative byte would be result in "ff", because "ffffffxx"
> gets written and most of it gets overwritten.
>
> While writing the above, I thought that C's "char" was always signed (and
> had always thought so), but according to [2], it's implementation-defined
> whether it's signed or unsigned. It seems like most people's
> implementations use unsigned chars, while yours uses signed chars. I
> suppose that means that when your version does succeed without crashing,
> your repo is likely to not verify, since the hash will be messed up. Does
> it sometimes not crash? Can you check whether 'hg verify' works after
> making a commit or two?
>
> [1] http://stackoverflow.com/questions/4586962/what-is-the-
> purpose-of-the-h-and-hh-modifiers-for-printf.
> [2] http://stackoverflow.com/questions/2054939/is-char-
> signed-or-unsigned-by-default
>
>
It turns out char is signed on my system too, so I suppose what saves me is
that my sprintf does treat 'hh' by treating the parameter as char width,
while it seems like your sprintf doesn't. So, I think dropping the "hh" and
sticking a cast to unsigned char in there is a good solution. Augie, what
do you think?
Martin von Zweigbergk - March 12, 2015, 4:15 p.m.
I'll send a patch soon. Matt, let me know if it helps you.

On Thu, Mar 12, 2015 at 8:33 AM Martin von Zweigbergk <martinvonz@google.com>
wrote:

> On Wed, Mar 11, 2015 at 10:53 PM Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>> On Wed, Mar 11, 2015 at 9:26 PM Matt Harbison <mharbison72@gmail.com>
>> wrote:
>>
>>> I managed to get a debug version built with various _ASSERTE(
>>> _CrtCheckMemory( ) ) calls sprinkled around.  It asserts in
>>> lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2),
>>> "%02hhx", hash[i]);'.
>>>
>>> I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't
>>> change anything.  I got this from DebugView, tracing the relevant
>>> variables:
>>>
>>
>> It seems unclear what 'hh' really means, see [1].
>>
>> The '2' for the 'width' parameter is a *minimum*. I suppose if the hash
>> was negative, it might get sign-extended and then printed as 8 hex
>> characters? Does casting "hash[i]" to "unsigned char" help? But maybe it's
>> better to use snprintf?
>>
>> Also, if the hash byte (char) get sign-extended, it seems like that would
>> mean that any negative byte would be result in "ff", because "ffffffxx"
>> gets written and most of it gets overwritten.
>>
>> While writing the above, I thought that C's "char" was always signed (and
>> had always thought so), but according to [2], it's implementation-defined
>> whether it's signed or unsigned. It seems like most people's
>> implementations use unsigned chars, while yours uses signed chars. I
>> suppose that means that when your version does succeed without crashing,
>> your repo is likely to not verify, since the hash will be messed up. Does
>> it sometimes not crash? Can you check whether 'hg verify' works after
>> making a commit or two?
>>
>> [1] http://stackoverflow.com/questions/4586962/what-is-the-p
>> urpose-of-the-h-and-hh-modifiers-for-printf.
>> [2] http://stackoverflow.com/questions/2054939/is-char-signed-
>> or-unsigned-by-default
>>
>>
> It turns out char is signed on my system too, so I suppose what saves me
> is that my sprintf does treat 'hh' by treating the parameter as char width,
> while it seems like your sprintf doesn't. So, I think dropping the "hh" and
> sticking a cast to unsigned char in there is a good solution. Augie, what
> do you think?
>
>

Patch

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -88,6 +88,11 @@  class _lazymanifest(dict):
         return ''.join("%s\0%s%s\n" % (
             f, _hex(n[:20]), flag) for f, n, flag in fl)
 
+try:
+    _lazymanifest = parsers.lazymanifest
+except AttributeError:
+    pass
+
 class manifestdict(object):
     def __init__(self, data=''):
         self._lm = _lazymanifest(data)