Patchwork changelog: keep track of file end in appender (issue5444)

login
register
mail settings
Submitter Durham Goode
Date Dec. 15, 2016, 7:01 p.m.
Message ID <6db5cb70cecbd50835f7.1481828469@dev111.prn1.facebook.com>
Download mbox | patch
Permalink /patch/17928/
State Accepted
Headers show

Comments

Durham Goode - Dec. 15, 2016, 7:01 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1481828418 28800
#      Thu Dec 15 11:00:18 2016 -0800
# Node ID 6db5cb70cecbd50835f7c8f28770643279001e76
# Parent  07bcd1bf61514be28956cf32280cc3d442ffc760
changelog: keep track of file end in appender (issue5444)

Previously, changelog.appender.end() would compute the end of the file by
joining all the current appended data and checking the length. This is an O(n)
operation.  e240e914d226 introduced a seek call before every revlog write, which
means we are hitting this O(n) behavior n times, which causes changelog writes
during a pull to be n^2.

In our large repo, this caused pulling 100k commits to go from 17s to 130s. With
this fix, it's back to 17s.
Augie Fackler - Dec. 16, 2016, 2:16 a.m.
> On Dec 15, 2016, at 2:01 PM, Durham Goode <durham@fb.com> wrote:
> 
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1481828418 28800
> #      Thu Dec 15 11:00:18 2016 -0800
> # Node ID 6db5cb70cecbd50835f7c8f28770643279001e76
> # Parent  07bcd1bf61514be28956cf32280cc3d442ffc760
> changelog: keep track of file end in appender (issue5444)

Nice fix. Queued.

> 
> Previously, changelog.appender.end() would compute the end of the file by
> joining all the current appended data and checking the length. This is an O(n)
> operation.  e240e914d226 introduced a seek call before every revlog write, which
> means we are hitting this O(n) behavior n times, which causes changelog writes
> during a pull to be n^2.
> 
> In our large repo, this caused pulling 100k commits to go from 17s to 130s. With
> this fix, it's back to 17s.
> 
> diff --git a/mercurial/changelog.py b/mercurial/changelog.py
> --- a/mercurial/changelog.py
> +++ b/mercurial/changelog.py
> @@ -79,9 +79,10 @@ class appender(object):
>         self.fp = fp
>         self.offset = fp.tell()
>         self.size = vfs.fstat(fp).st_size
> +        self._end = self.size
> 
>     def end(self):
> -        return self.size + len("".join(self.data))
> +        return self._end
>     def tell(self):
>         return self.offset
>     def flush(self):
> @@ -121,6 +122,7 @@ class appender(object):
>     def write(self, s):
>         self.data.append(str(s))
>         self.offset += len(s)
> +        self._end += len(s)
> 
> def _divertopener(opener, target):
>     """build an opener that writes in 'target.a' instead of 'target'"""
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Gregory Szorc - Dec. 16, 2016, 5:56 a.m.
Ahh - this is why I didn't find this regression: I was only testing the initial unbundle code path, which doesn't exercise this changing appending code.

I'm glad this perf issue is due to something wonky in hg and not some filesystem issue.

> On Dec 15, 2016, at 11:01, Durham Goode <durham@fb.com> wrote:
> 
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1481828418 28800
> #      Thu Dec 15 11:00:18 2016 -0800
> # Node ID 6db5cb70cecbd50835f7c8f28770643279001e76
> # Parent  07bcd1bf61514be28956cf32280cc3d442ffc760
> changelog: keep track of file end in appender (issue5444)
> 
> Previously, changelog.appender.end() would compute the end of the file by
> joining all the current appended data and checking the length. This is an O(n)
> operation.  e240e914d226 introduced a seek call before every revlog write, which
> means we are hitting this O(n) behavior n times, which causes changelog writes
> during a pull to be n^2.
> 
> In our large repo, this caused pulling 100k commits to go from 17s to 130s. With
> this fix, it's back to 17s.
> 
> diff --git a/mercurial/changelog.py b/mercurial/changelog.py
> --- a/mercurial/changelog.py
> +++ b/mercurial/changelog.py
> @@ -79,9 +79,10 @@ class appender(object):
>         self.fp = fp
>         self.offset = fp.tell()
>         self.size = vfs.fstat(fp).st_size
> +        self._end = self.size
> 
>     def end(self):
> -        return self.size + len("".join(self.data))
> +        return self._end
>     def tell(self):
>         return self.offset
>     def flush(self):
> @@ -121,6 +122,7 @@ class appender(object):
>     def write(self, s):
>         self.data.append(str(s))
>         self.offset += len(s)
> +        self._end += len(s)
> 
> def _divertopener(opener, target):
>     """build an opener that writes in 'target.a' instead of 'target'"""

Patch

diff --git a/mercurial/changelog.py b/mercurial/changelog.py
--- a/mercurial/changelog.py
+++ b/mercurial/changelog.py
@@ -79,9 +79,10 @@  class appender(object):
         self.fp = fp
         self.offset = fp.tell()
         self.size = vfs.fstat(fp).st_size
+        self._end = self.size
 
     def end(self):
-        return self.size + len("".join(self.data))
+        return self._end
     def tell(self):
         return self.offset
     def flush(self):
@@ -121,6 +122,7 @@  class appender(object):
     def write(self, s):
         self.data.append(str(s))
         self.offset += len(s)
+        self._end += len(s)
 
 def _divertopener(opener, target):
     """build an opener that writes in 'target.a' instead of 'target'"""