Submitter | Augie Fackler |
---|---|
Date | March 7, 2015, 5:16 p.m. |
Message ID | <821f59a10986d380352d.1425748579@imladris.local> |
Download | mbox | patch |
Permalink | /patch/7929/ |
State | Accepted |
Headers | show |
Comments
On Sat, 07 Mar 2015 12:16:19 -0500, Augie Fackler <raf@durin42.com> wrote: > # HG changeset patch > # User Augie Fackler <augie@google.com> > # Date 1425695387 18000 > # Fri Mar 06 21:29:47 2015 -0500 > # Node ID 821f59a10986d380352d4f88f65f9398e3671d4e > # Parent 74e64852b07f9cfb5a7b89d827dd9e1f01314b1b > manifest: use custom C implementation of lazymanifest I'm getting a python crash on Windows, starting with this when running './run-tests.py --local test-locate.t', with 2.7.2. Letting Visual Studio debug it only indicates that it dies in a free(), on top of a fairly large python27.dll call stack. What's the best way to go about debugging this? Is there a way to build the dll in debug mode, and maybe it will trigger some assertion closer to the original heap corruption? --Matt
On Wed, 11 Mar 2015 22:18:44 -0400, Matt Harbison <mharbison72@gmail.com> wrote: > On Sat, 07 Mar 2015 12:16:19 -0500, Augie Fackler <raf@durin42.com> > wrote: > >> # HG changeset patch >> # User Augie Fackler <augie@google.com> >> # Date 1425695387 18000 >> # Fri Mar 06 21:29:47 2015 -0500 >> # Node ID 821f59a10986d380352d4f88f65f9398e3671d4e >> # Parent 74e64852b07f9cfb5a7b89d827dd9e1f01314b1b >> manifest: use custom C implementation of lazymanifest > > I'm getting a python crash on Windows, starting with this when running > './run-tests.py --local test-locate.t', with 2.7.2. Letting Visual > Studio debug it only indicates that it dies in a free(), on top of a > fairly large python27.dll call stack. > > What's the best way to go about debugging this? Is there a way to build > the dll in debug mode, and maybe it will trigger some assertion closer > to the original heap corruption? > > --Matt I managed to get a debug version built with various _ASSERTE( _CrtCheckMemory( ) ) calls sprinkled around. It asserts in lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2), "%02hhx", hash[i]);'. I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't change anything. I got this from DebugView, tracing the relevant variables: [7704] dest is 006F1A90, plen is 1, i is 17, total is 006F1AB4 [7704] dest is 006F1A90, plen is 1, i is 18, total is 006F1AB6 [7704] dest is 006F1A90, plen is 1, i is 19, total is 006F1AB8 [7704] HEAP CORRUPTION DETECTED: after Normal block (#100) at 0x006F1A90. [7704] CRT detected that the application wrote to memory after end of heap buffer. [7704] Normal located at 0x006F1A90 is 43 bytes long. It seems to happen the first time it gets to 19 in the loop. After selecting 'close program', a ton more tracing comes out where it goes through that loop with no issue. I'll pick this up tomorrow when I'm not half asleep if nobody else figures it out first. --Matt [1] https://msdn.microsoft.com/en-us/library/56e442dc.aspx
On Wed, Mar 11, 2015 at 9:26 PM Matt Harbison <mharbison72@gmail.com> wrote: > I managed to get a debug version built with various _ASSERTE( > _CrtCheckMemory( ) ) calls sprinkled around. It asserts in > lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2), > "%02hhx", hash[i]);'. > > I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't > change anything. I got this from DebugView, tracing the relevant > variables: > It seems unclear what 'hh' really means, see [1]. The '2' for the 'width' parameter is a *minimum*. I suppose if the hash was negative, it might get sign-extended and then printed as 8 hex characters? Does casting "hash[i]" to "unsigned char" help? But maybe it's better to use snprintf? Also, if the hash byte (char) get sign-extended, it seems like that would mean that any negative byte would be result in "ff", because "ffffffxx" gets written and most of it gets overwritten. While writing the above, I thought that C's "char" was always signed (and had always thought so), but according to [2], it's implementation-defined whether it's signed or unsigned. It seems like most people's implementations use unsigned chars, while yours uses signed chars. I suppose that means that when your version does succeed without crashing, your repo is likely to not verify, since the hash will be messed up. Does it sometimes not crash? Can you check whether 'hg verify' works after making a commit or two? [1] http://stackoverflow.com/questions/4586962/what-is-the-purpose-of-the-h-and-hh-modifiers-for-printf . [2] http://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default
On Wed, Mar 11, 2015 at 10:53 PM Martin von Zweigbergk < martinvonz@google.com> wrote: > On Wed, Mar 11, 2015 at 9:26 PM Matt Harbison <mharbison72@gmail.com> > wrote: > >> I managed to get a debug version built with various _ASSERTE( >> _CrtCheckMemory( ) ) calls sprinkled around. It asserts in >> lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2), >> "%02hhx", hash[i]);'. >> >> I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't >> change anything. I got this from DebugView, tracing the relevant >> variables: >> > > It seems unclear what 'hh' really means, see [1]. > > The '2' for the 'width' parameter is a *minimum*. I suppose if the hash > was negative, it might get sign-extended and then printed as 8 hex > characters? Does casting "hash[i]" to "unsigned char" help? But maybe it's > better to use snprintf? > > Also, if the hash byte (char) get sign-extended, it seems like that would > mean that any negative byte would be result in "ff", because "ffffffxx" > gets written and most of it gets overwritten. > > While writing the above, I thought that C's "char" was always signed (and > had always thought so), but according to [2], it's implementation-defined > whether it's signed or unsigned. It seems like most people's > implementations use unsigned chars, while yours uses signed chars. I > suppose that means that when your version does succeed without crashing, > your repo is likely to not verify, since the hash will be messed up. Does > it sometimes not crash? Can you check whether 'hg verify' works after > making a commit or two? > > [1] http://stackoverflow.com/questions/4586962/what-is-the- > purpose-of-the-h-and-hh-modifiers-for-printf. > [2] http://stackoverflow.com/questions/2054939/is-char- > signed-or-unsigned-by-default > > It turns out char is signed on my system too, so I suppose what saves me is that my sprintf does treat 'hh' by treating the parameter as char width, while it seems like your sprintf doesn't. So, I think dropping the "hh" and sticking a cast to unsigned char in there is a good solution. Augie, what do you think?
I'll send a patch soon. Matt, let me know if it helps you. On Thu, Mar 12, 2015 at 8:33 AM Martin von Zweigbergk <martinvonz@google.com> wrote: > On Wed, Mar 11, 2015 at 10:53 PM Martin von Zweigbergk < > martinvonz@google.com> wrote: > >> On Wed, Mar 11, 2015 at 9:26 PM Matt Harbison <mharbison72@gmail.com> >> wrote: >> >>> I managed to get a debug version built with various _ASSERTE( >>> _CrtCheckMemory( ) ) calls sprinkled around. It asserts in >>> lazymanifest_setitem() after calling 'sprintf(dest + plen + 1 + (i * 2), >>> "%02hhx", hash[i]);'. >>> >>> I don't see an 'hh' size prefix [1], but dropping one of the 'h' didn't >>> change anything. I got this from DebugView, tracing the relevant >>> variables: >>> >> >> It seems unclear what 'hh' really means, see [1]. >> >> The '2' for the 'width' parameter is a *minimum*. I suppose if the hash >> was negative, it might get sign-extended and then printed as 8 hex >> characters? Does casting "hash[i]" to "unsigned char" help? But maybe it's >> better to use snprintf? >> >> Also, if the hash byte (char) get sign-extended, it seems like that would >> mean that any negative byte would be result in "ff", because "ffffffxx" >> gets written and most of it gets overwritten. >> >> While writing the above, I thought that C's "char" was always signed (and >> had always thought so), but according to [2], it's implementation-defined >> whether it's signed or unsigned. It seems like most people's >> implementations use unsigned chars, while yours uses signed chars. I >> suppose that means that when your version does succeed without crashing, >> your repo is likely to not verify, since the hash will be messed up. Does >> it sometimes not crash? Can you check whether 'hg verify' works after >> making a commit or two? >> >> [1] http://stackoverflow.com/questions/4586962/what-is-the-p >> urpose-of-the-h-and-hh-modifiers-for-printf. >> [2] http://stackoverflow.com/questions/2054939/is-char-signed- >> or-unsigned-by-default >> >> > It turns out char is signed on my system too, so I suppose what saves me > is that my sprintf does treat 'hh' by treating the parameter as char width, > while it seems like your sprintf doesn't. So, I think dropping the "hh" and > sticking a cast to unsigned char in there is a good solution. Augie, what > do you think? > >
Patch
diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -88,6 +88,11 @@ class _lazymanifest(dict): return ''.join("%s\0%s%s\n" % ( f, _hex(n[:20]), flag) for f, n, flag in fl) +try: + _lazymanifest = parsers.lazymanifest +except AttributeError: + pass + class manifestdict(object): def __init__(self, data=''): self._lm = _lazymanifest(data)