Patchwork lazymanifest: don't depend on printf's 'hh' format to work

login
register
mail settings
Submitter Martin von Zweigbergk
Date March 12, 2015, 4:18 p.m.
Message ID <ffee672e53d8861b75f4.1426177126@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/8025/
State Accepted
Commit 9c4663abf48a62585866508d9909fda36f181b19
Headers show

Comments

Martin von Zweigbergk - March 12, 2015, 4:18 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1426176405 25200
#      Thu Mar 12 09:06:45 2015 -0700
# Node ID ffee672e53d8861b75f4190e3ab4e17428134776
# Parent  7cf9a9e0cf893e7ae82dc576a03c843fd6640438
lazymanifest: don't depend on printf's 'hh' format to work

Where we convert a 20-byte binary to a 40-byte hex string in
lazymanifest_setitem(), we use sprintf("%02hhx", hash[i]). As Matt
Harbison found out, 'hh' seems to be ignored on some platforms (Visual
Studio?). If char is signed on such platforms, the value gets
sign-extended as it gets promoted into an int when passed into the
variadic sprintf(). The resulting integer value will then be printed
with leading f's (14 of them on 64-bit systems), since the '2' in the
format string indicates only minimum number of characters. This is
both incorrect and runs the risk of writing outside of allocated
memory (as Matt reported).

To fix, let's cast the value to unsigned char before passing it to
sprintf(). Also drop the poorly supported 'hh' formatting that now
becomes unnecessary.
Matt Harbison - March 12, 2015, 5 p.m.

Augie Fackler - March 12, 2015, 5:18 p.m.
On Mar 12, 2015, at 1:00 PM, Matt Harbison <matt_harbison@yahoo.com> wrote:

> On Mar 12, 2015 12:18 PM, Martin von Zweigbergk <martinvonz@google.com> wrote: 
> 
> # HG changeset patch 
> # User Martin von Zweigbergk <martinvonz@google.com> 
> # Date 1426176405 25200 
> # Thu Mar 12 09:06:45 2015 -0700 
> # Node ID ffee672e53d8861b75f4190e3ab4e17428134776 
> # Parent 7cf9a9e0cf893e7ae82dc576a03c843fd6640438 
> lazymanifest: don't depend on printf's 'hh' format to work 
>  
> This works for me on 32bit Win7.  I'll try the original machine tonight.  Thanks!

Was it broken before on 32bit? I've lost track a little bit.
Matt Harbison - March 12, 2015, 5:31 p.m.

Augie Fackler - March 12, 2015, 5:32 p.m.
On Mar 12, 2015, at 1:31 PM, Matt Harbison <matt_harbison@yahoo.com> wrote:

> On Mar 12, 2015 1:18 PM, Augie Fackler <raf@durin42.com> wrote: 
> 
> 
> On Mar 12, 2015, at 1:00 PM, Matt Harbison <matt_harbison@yahoo.com> wrote: 
> 
> > On Mar 12, 2015 12:18 PM, Martin von Zweigbergk <martinvonz@google.com> wrote: 
> > 
> > # HG changeset patch 
> > # User Martin von Zweigbergk <martinvonz@google.com> 
> > # Date 1426176405 25200 
> > # Thu Mar 12 09:06:45 2015 -0700 
> > # Node ID ffee672e53d8861b75f4190e3ab4e17428134776 
> > # Parent 7cf9a9e0cf893e7ae82dc576a03c843fd6640438 
> > lazymanifest: don't depend on printf's 'hh' format to work 
> > 
> > This works for me on 32bit Win7. I'll try the original machine tonight. Thanks! 
> 
> Was it broken before on 32bit? I've lost track a little bit. 
> 
> There was a crash on 32 bit before I applied this that went away with this.  All the debugging last night was on 64 bit.  I assume its the same issue, but don't last night's setup by me ATM.


Okay. In that case I'm going to queue this since it fixes at least one problem and seems to tiptoe around C shenanigans better.

(Hello patchbot: this is queued.)
Matt Harbison - March 13, 2015, 1:37 a.m.
On Thu, 12 Mar 2015 12:18:46 -0400, Martin von Zweigbergk  
<martinvonz@google.com> wrote:

> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1426176405 25200
> #      Thu Mar 12 09:06:45 2015 -0700
> # Node ID ffee672e53d8861b75f4190e3ab4e17428134776
> # Parent  7cf9a9e0cf893e7ae82dc576a03c843fd6640438
> lazymanifest: don't depend on printf's 'hh' format to work
>
> Where we convert a 20-byte binary to a 40-byte hex string in
> lazymanifest_setitem(), we use sprintf("%02hhx", hash[i]). As Matt
> Harbison found out, 'hh' seems to be ignored on some platforms (Visual
> Studio?). If char is signed on such platforms, the value gets
> sign-extended as it gets promoted into an int when passed into the
> variadic sprintf(). The resulting integer value will then be printed
> with leading f's (14 of them on 64-bit systems), since the '2' in the
> format string indicates only minimum number of characters. This is
> both incorrect and runs the risk of writing outside of allocated
> memory (as Matt reported).
>
> To fix, let's cast the value to unsigned char before passing it to
> sprintf(). Also drop the poorly supported 'hh' formatting that now
> becomes unnecessary.
>
> diff -r 7cf9a9e0cf89 -r ffee672e53d8 mercurial/manifest.c
> --- a/mercurial/manifest.c	Wed Mar 11 15:22:34 2015 -0700
> +++ b/mercurial/manifest.c	Thu Mar 12 09:06:45 2015 -0700
> @@ -466,7 +466,10 @@
>  	}
>  	memcpy(dest, path, plen + 1);
>  	for (i = 0; i < 20; i++) {
> -		sprintf(dest + plen + 1 + (i * 2), "%02hhx", hash[i]);
> +		/* Cast to unsigned, so it will not get sign-extended when promoted
> +		 * to int (as is done when passing to a variadic function)
> +		 */
> +		sprintf(dest + plen + 1 + (i * 2), "%02x", (unsigned char)hash[i]);
>  	}
>  	memcpy(dest + plen + 41, flags, flen);
>  	dest[plen + 41 + flen] = '\n';

To wrap this up, this fixes it on my 64bit system as well.  I added a  
couple more traces prior to applying this, and the "%02hhx" was converting  
the last char into "fffe", explaining the overrun.  The full hash was  
printed as "a", so verify wouldn't have been happy.

FWIW, it looks like MSVC can build with char either signed or unsigned  
based on limits.h, as mentioned in one of the links you referenced.  But  
this certainly seems safer.

Thanks for picking this up and figuring it out.

--Matt
Martin von Zweigbergk - March 13, 2015, 1:41 a.m.
Great. Thanks for a detailed bug report and for verifying the fix.

On Thu, Mar 12, 2015 at 6:37 PM Matt Harbison <matt_harbison@yahoo.com>
wrote:

> On Thu, 12 Mar 2015 12:18:46 -0400, Martin von Zweigbergk
> <martinvonz@google.com> wrote:
>
> > # HG changeset patch
> > # User Martin von Zweigbergk <martinvonz@google.com>
> > # Date 1426176405 25200
> > #      Thu Mar 12 09:06:45 2015 -0700
> > # Node ID ffee672e53d8861b75f4190e3ab4e17428134776
> > # Parent  7cf9a9e0cf893e7ae82dc576a03c843fd6640438
> > lazymanifest: don't depend on printf's 'hh' format to work
> >
> > Where we convert a 20-byte binary to a 40-byte hex string in
> > lazymanifest_setitem(), we use sprintf("%02hhx", hash[i]). As Matt
> > Harbison found out, 'hh' seems to be ignored on some platforms (Visual
> > Studio?). If char is signed on such platforms, the value gets
> > sign-extended as it gets promoted into an int when passed into the
> > variadic sprintf(). The resulting integer value will then be printed
> > with leading f's (14 of them on 64-bit systems), since the '2' in the
> > format string indicates only minimum number of characters. This is
> > both incorrect and runs the risk of writing outside of allocated
> > memory (as Matt reported).
> >
> > To fix, let's cast the value to unsigned char before passing it to
> > sprintf(). Also drop the poorly supported 'hh' formatting that now
> > becomes unnecessary.
> >
> > diff -r 7cf9a9e0cf89 -r ffee672e53d8 mercurial/manifest.c
> > --- a/mercurial/manifest.c    Wed Mar 11 15:22:34 2015 -0700
> > +++ b/mercurial/manifest.c    Thu Mar 12 09:06:45 2015 -0700
> > @@ -466,7 +466,10 @@
> >       }
> >       memcpy(dest, path, plen + 1);
> >       for (i = 0; i < 20; i++) {
> > -             sprintf(dest + plen + 1 + (i * 2), "%02hhx", hash[i]);
> > +             /* Cast to unsigned, so it will not get sign-extended when
> promoted
> > +              * to int (as is done when passing to a variadic function)
> > +              */
> > +             sprintf(dest + plen + 1 + (i * 2), "%02x", (unsigned
> char)hash[i]);
> >       }
> >       memcpy(dest + plen + 41, flags, flen);
> >       dest[plen + 41 + flen] = '\n';
>
> To wrap this up, this fixes it on my 64bit system as well.  I added a
> couple more traces prior to applying this, and the "%02hhx" was converting
> the last char into "fffe", explaining the overrun.  The full hash was
> printed as "a", so verify wouldn't have been happy.
>
> FWIW, it looks like MSVC can build with char either signed or unsigned
> based on limits.h, as mentioned in one of the links you referenced.  But
> this certainly seems safer.
>
> Thanks for picking this up and figuring it out.
>
> --Matt
>

Patch

diff -r 7cf9a9e0cf89 -r ffee672e53d8 mercurial/manifest.c
--- a/mercurial/manifest.c	Wed Mar 11 15:22:34 2015 -0700
+++ b/mercurial/manifest.c	Thu Mar 12 09:06:45 2015 -0700
@@ -466,7 +466,10 @@ 
 	}
 	memcpy(dest, path, plen + 1);
 	for (i = 0; i < 20; i++) {
-		sprintf(dest + plen + 1 + (i * 2), "%02hhx", hash[i]);
+		/* Cast to unsigned, so it will not get sign-extended when promoted
+		 * to int (as is done when passing to a variadic function)
+		 */
+		sprintf(dest + plen + 1 + (i * 2), "%02x", (unsigned char)hash[i]);
 	}
 	memcpy(dest + plen + 41, flags, flen);
 	dest[plen + 41 + flen] = '\n';