Patchwork D7252: dirs: reject consecutive slashes in paths

login
register
mail settings
Submitter phabricator
Date Nov. 6, 2019, 5:16 a.m.
Message ID <differential-rev-PHID-DREV-ogrsmordypahfwve6ojs-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42798/
State Superseded
Headers show

Comments

phabricator - Nov. 6, 2019, 5:16 a.m.
durin42 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  We shouldn't ever see those, and the fuzzer go really excited that if
  it gives us a 65k string with 55k slashes in it we use a lot of RAM.
  
  This is a better fix than what I tried in D7105 <https://phab.mercurial-scm.org/D7105>. It was suggested by
  Yuya, and I verified it does in fact cause the fuzzer to not OOM.
  
  This is a revision of D7234 <https://phab.mercurial-scm.org/D7234>, but with the missing set of an error
  added. I added a unit test of the dirs behavior because I needed to
  reason more carefully about the failure modes around consecutive
  slashes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7252

AFFECTED FILES
  mercurial/cext/dirs.c
  mercurial/util.py
  tests/test-dirs.py

CHANGE DETAILS




To: durin42, #hg-reviewers
Cc: mercurial-devel
phabricator - Nov. 19, 2019, 10:30 a.m.
Alphare added a comment.


  Sorry to necropost, but since this broke the Rust implementation, I was wondering what the best approach would be to replicate this behavior, and I am starting to think that this should be reverted.
  
  IIUC, currently any new path passes through the `pathauditor` first for validation, so any further check would be completely redundant. 
  My intuition is that adding this check is purely here to satisfy the fuzzer, but would never happen in real life. Adding checks to this (very) internal data structure comes at a cost, both in performance and in code ergonomics.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7252/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7252

To: durin42, #hg-reviewers, indygreg
Cc: Alphare, mercurial-devel
phabricator - Nov. 19, 2019, 3:38 p.m.
durin42 added a comment.


  In D7252#109627 <https://phab.mercurial-scm.org/D7252#109627>, @Alphare wrote:
  
  > Sorry to necropost, but since this broke the Rust implementation, I was wondering what the best approach would be to replicate this behavior, and I am starting to think that this should be reverted.
  
  Oh, you mean the Rust version doesn't do the same rejection?
  
  Given that you're about to do a hash lookup, I'm a little skeptical that a `endswith('/')` check would show up meaningfully in a profiler, but I'm willing to be proven wrong?
  
  > IIUC, currently any new path passes through the `pathauditor` first for validation, so any further check would be completely redundant.
  
  Plausible, but I'd like some sort of test coverage demonstrating that.
  
  > My intuition is that adding this check is purely here to satisfy the fuzzer, but would never happen in real life. Adding checks to this (very) internal data structure comes at a cost, both in performance and in code ergonomics.
  
  Yes, this was largely added to make the fuzzer not get stuck on OOM conditions. That said, if the pathauditor can't catch this, we need to defend against this DoS vector at this layer, and it's such a small check at this layer I'm inclined to keep it unless it is measurably slowing down real uses...

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7252/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7252

To: durin42, #hg-reviewers, indygreg
Cc: Alphare, mercurial-devel
phabricator - Nov. 19, 2019, 3:39 p.m.
durin42 added a comment.


  In D7252#109656 <https://phab.mercurial-scm.org/D7252#109656>, @durin42 wrote:
  
  > In D7252#109627 <https://phab.mercurial-scm.org/D7252#109627>, @Alphare wrote:
  >
  >> Sorry to necropost, but since this broke the Rust implementation, I was wondering what the best approach would be to replicate this behavior, and I am starting to think that this should be reverted.
  >
  > Oh, you mean the Rust version doesn't do the same rejection?
  > Given that you're about to do a hash lookup, I'm a little skeptical that a `endswith('/')` check would show up meaningfully in a profiler, but I'm willing to be proven wrong?
  >
  >> IIUC, currently any new path passes through the `pathauditor` first for validation, so any further check would be completely redundant.
  >
  > Plausible, but I'd like some sort of test coverage demonstrating that.
  >
  >> My intuition is that adding this check is purely here to satisfy the fuzzer, but would never happen in real life. Adding checks to this (very) internal data structure comes at a cost, both in performance and in code ergonomics.
  >
  > Yes, this was largely added to make the fuzzer not get stuck on OOM conditions. That said, if the pathauditor can't catch this, we need to defend against this DoS vector at this layer, and it's such a small check at this layer I'm inclined to keep it unless it is measurably slowing down real uses...
  
  I should add: if we can substantiate that such a path can't make it through the pathauditor (and we have tests for that in the pathauditor layer so we don't break that in the future!) we can push this consecutive-slashes check into dirs_fuzzer.cc and remove it from dirs.c.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7252/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7252

To: durin42, #hg-reviewers, indygreg
Cc: Alphare, mercurial-devel
phabricator - Nov. 19, 2019, 3:52 p.m.
Alphare added a comment.


  > Oh, you mean the Rust version doesn't do the same rejection?
  
  It does not, currently.
  
  > Given that you're about to do a hash lookup, I'm a little skeptical that a `endswith('/')` check would show up meaningfully in a profiler, but I'm willing to be proven wrong?
  
  I have the same intuition you do, it's mostly about not repeating the same operations that have any impact at all on performance/code.
  
  > I should add: if we can substantiate that such a path can't make it through the pathauditor (and we have tests for that in the pathauditor layer so we don't break that in the future!) we can push this consecutive-slashes check into dirs_fuzzer.cc and remove it from dirs.c.
  
  It was my candid intuition that the `pathauditor` was already "ensured" as the barrier for path-based vulnerabilities. 
  Since my plan for the Rust implementation was to enforce that very fact, I guess I'll be the one to write said tests when I write a Rust `pathauditor`. It should come up not too long after my current work (in-progress) about matchers, since handling unknown files in any capacity will require the `pathauditor`.
  
  I'm not 100% sure of the implications of writing the aforementioned tests, if it's too much work, I can replicate this check in Rust for the time being.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7252/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7252

To: durin42, #hg-reviewers, indygreg
Cc: Alphare, mercurial-devel
phabricator - Nov. 22, 2019, 9:18 a.m.
marmoute added a comment.


  In practice, this means the tests consistently fails when testing with Rust. Can we either have a quick fix of the Rust code or a temporary backout of this (until we Rust code is fixed)?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7252/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7252

To: durin42, #hg-reviewers, indygreg
Cc: marmoute, Alphare, mercurial-devel

Patch

diff --git a/tests/test-dirs.py b/tests/test-dirs.py
new file mode 100644
--- /dev/null
+++ b/tests/test-dirs.py
@@ -0,0 +1,27 @@ 
+from __future__ import absolute_import
+
+import unittest
+
+import silenttestrunner
+
+from mercurial import util
+
+
+class dirstests(unittest.TestCase):
+    def testdirs(self):
+        for case, want in [
+            (b'a/a/a', [b'a', b'a/a', b'']),
+            (b'alpha/beta/gamma', [b'', b'alpha', b'alpha/beta']),
+        ]:
+            d = util.dirs()
+            d.addpath(case)
+            self.assertEqual(sorted(d), sorted(want))
+
+    def testinvalid(self):
+        with self.assertRaises(ValueError):
+            d = util.dirs()
+            d.addpath(b'a//b')
+
+
+if __name__ == '__main__':
+    silenttestrunner.main(__name__)
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -3545,6 +3545,8 @@ 
 def finddirs(path):
     pos = path.rfind(b'/')
     while pos != -1:
+        if path[pos - 2 : pos - 1] == '/':
+            raise ValueError("found invalid consecutive slashes in path")
         yield path[:pos]
         pos = path.rfind(b'/', 0, pos)
     yield b''
diff --git a/mercurial/cext/dirs.c b/mercurial/cext/dirs.c
--- a/mercurial/cext/dirs.c
+++ b/mercurial/cext/dirs.c
@@ -66,6 +66,14 @@ 
 	while ((pos = _finddir(cpath, pos - 1)) != -1) {
 		PyObject *val;
 
+		/* Sniff for trailing slashes, a marker of an invalid input. */
+		if (pos > 0 && cpath[pos - 1] == '/') {
+			PyErr_SetString(
+			    PyExc_ValueError,
+			    "found invalid consecutive slashes in path");
+			goto bail;
+		}
+
 		key = PyBytes_FromStringAndSize(cpath, pos);
 		if (key == NULL)
 			goto bail;