Patchwork D7030: dirs: fix trivial over-read of input data

login
register
mail settings
Submitter phabricator
Date Oct. 9, 2019, 3 p.m.
Message ID <differential-rev-PHID-DREV-d5iorktc4qb5g7uuxg4o-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42136/
State Superseded
Headers show

Comments

phabricator - Oct. 9, 2019, 3 p.m.
durin42 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This code, introduced in 8c0a7eeda06d <https://phab.mercurial-scm.org/rHG8c0a7eeda06d2773ec92b14527280db3e0167588>, was intentionally over-reading
  an input string to avoid getting a shared string object for a one-byte
  input. Unfortunately with an empty input (like in the case of a fuzzer
  getting started) this was a trivial over-read and triggered an
  AddressSanitizer failure.
  
  I went out of my way to make sure the code still does the
  copy-avoidance tricks. I don't think this change will cost us much
  performance since the one-character strings should be cached
  aggressively anyway.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7030

AFFECTED FILES
  mercurial/cext/dirs.c

CHANGE DETAILS




To: durin42, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/cext/dirs.c b/mercurial/cext/dirs.c
--- a/mercurial/cext/dirs.c
+++ b/mercurial/cext/dirs.c
@@ -68,26 +68,41 @@ 
 	while ((pos = _finddir(cpath, pos - 1)) != -1) {
 		PyObject *val;
 
-		/* It's likely that every prefix already has an entry
-		   in our dict. Try to avoid allocating and
-		   deallocating a string for each prefix we check. */
-		if (key != NULL)
-			((PyBytesObject *)key)->ob_shash = -1;
-		else {
-			/* Force Python to not reuse a small shared string. */
-			key = PyBytes_FromStringAndSize(cpath,
-							 pos < 2 ? 2 : pos);
+		if (pos < 2) {
+			key = PyBytes_FromStringAndSize(cpath, pos);
 			if (key == NULL)
 				goto bail;
+		} else {
+			/* It's likely that every prefix already has an entry
+			   in our dict. Try to avoid allocating and
+			   deallocating a string for each prefix we check. */
+			if (key != NULL)
+				((PyBytesObject *)key)->ob_shash = -1;
+			else {
+				/* We know pos >= 2, so we won't get a small
+				 * shared string. */
+				key = PyBytes_FromStringAndSize(cpath, pos);
+				if (key == NULL)
+					goto bail;
+			}
+			/* Py_SIZE(o) refers to the ob_size member of
+			 * the struct. Yes, assigning to what looks
+			 * like a function seems wrong. */
+			Py_SIZE(key) = pos;
+			((PyBytesObject *)key)->ob_sval[pos] = '\0';
 		}
-		/* Py_SIZE(o) refers to the ob_size member of the struct. Yes,
-		* assigning to what looks like a function seems wrong. */
-		Py_SIZE(key) = pos;
-		((PyBytesObject *)key)->ob_sval[pos] = '\0';
 
 		val = PyDict_GetItem(dirs, key);
 		if (val != NULL) {
 			PYLONG_VALUE(val) += 1;
+			if (pos < 2) {
+				/* This was a short string, so we
+				 * probably got a small shared string
+				 * we can't mutate on the next loop
+				 * iteration. Clear it.
+				 */
+				Py_CLEAR(key);
+			}
 			break;
 		}