Patchwork [08,of,10,V4] context: floor adjustlinkrev graph walk during copy tracing

login
register
mail settings
Submitter Boris Feld
Date Oct. 4, 2018, 2:44 p.m.
Message ID <53c0bf99c013909bd628.1538664282@localhost.localdomain>
Download mbox | patch
Permalink /patch/35456/
State Accepted
Headers show

Comments

Boris Feld - Oct. 4, 2018, 2:44 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1536255188 14400
#      Thu Sep 06 13:33:08 2018 -0400
# Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
# Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
# EXP-Topic copy-perf
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 53c0bf99c013
context: floor adjustlinkrev graph walk during copy tracing

The `_adjustlinkrev` method gains an optional "stoprev" argument. The linkrev
adjustment will give up once this floor is reached. The relevant functions
using `_adjustlinkrev` are updated to pass an appropriate value in the copy
tracing code.

In some private repository, about 10% of the status call triggered the
pathological case addressed by this change. The speedup varies from one call
to another, the best-observed win is moving from 170s to 11s.
via Mercurial-devel - Oct. 4, 2018, 5:23 p.m.
On Thu, Oct 4, 2018 at 7:44 AM Boris Feld <boris.feld@octobus.net> wrote:

> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1536255188 14400
> #      Thu Sep 06 13:33:08 2018 -0400
> # Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
> # Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
> # EXP-Topic copy-perf
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> 53c0bf99c013
> context: floor adjustlinkrev graph walk during copy tracing
>
> The `_adjustlinkrev` method gains an optional "stoprev" argument. The
> linkrev
> adjustment will give up once this floor is reached. The relevant functions
> using `_adjustlinkrev` are updated to pass an appropriate value in the copy
> tracing code.
>

When does this happen? In normal repos or only in broken ones? We don't
seem to have any such repos in our test suite (raising exception instead of
returning None does not fail any tests).


>
> In some private repository, about 10% of the status call triggered the
> pathological case addressed by this change. The speedup varies from one
> call
> to another, the best-observed win is moving from 170s to 11s.
>

Is that from just this patch or the entire series?


> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -719,7 +719,7 @@ class basefilectx(object):
>
>          return True
>
> -    def _adjustlinkrev(self, srcrev, inclusive=False):
> +    def _adjustlinkrev(self, srcrev, inclusive=False, stoprev=None):
>          """return the first ancestor of <srcrev> introducing <fnode>
>
>          If the linkrev of the file revision does not point to an ancestor
> of
> @@ -728,6 +728,10 @@ class basefilectx(object):
>
>          :srcrev: the changeset revision we search ancestors from
>          :inclusive: if true, the src revision will also be checked
> +        :stoprev: an optional revision to stop the walk at. If no
> introduction
> +                  of this file content could be found before this floor
> +                  revision, the function will returns "None" and stops its
> +                  iteration.
>          """
>          repo = self._repo
>          cl = repo.unfiltered().changelog
> @@ -755,6 +759,8 @@ class basefilectx(object):
>              fnode = self._filenode
>              path = self._path
>              for a in iteranc:
> +                if stoprev is not None and a < stoprev:
> +                    return None
>                  ac = cl.read(a) # get changeset data (we avoid object
> creation)
>                  if path in ac[3]: # checking the 'files' field.
>                      # The file has been touched, check if the content is
> @@ -770,8 +776,12 @@ class basefilectx(object):
>      def isintroducedafter(self, changelogrev):
>          """True if a filectx have been introduced after a given floor
> revision
>          """
> -        return (self.linkrev() > changelogrev
> -                or self._introrev() > changelogrev)
> +        if self.linkrev() > changelogrev:
> +            return True
> +        introrev = self._introrev(stoprev=changelogrev)
> +        if introrev is None:
> +            return False
> +        return introrev > changelogrev
>
>      def introrev(self):
>          """return the rev of the changeset which introduced this file
> revision
> @@ -784,7 +794,15 @@ class basefilectx(object):
>          """
>          return self._introrev()
>
> -    def _introrev(self):
> +    def _introrev(self, stoprev=None):
> +        """
> +        Same as `introrev` but, with an extra argument to limit changelog
> +        iteration range in some internal usecase.
> +
> +        If `stoprev` is set, the `introrev` will not be searched past that
> +        `stoprev` revision and "None" might be returned. This is useful to
> +        limit the iteration range.
> +        """
>          toprev = None
>          attrs = vars(self)
>          if r'_changeid' in attrs:
> @@ -795,11 +813,12 @@ class basefilectx(object):
>              toprev = self._changectx.rev()
>
>          if toprev is not None:
> -            return self._adjustlinkrev(toprev, inclusive=True)
> +            return self._adjustlinkrev(toprev, inclusive=True,
> stoprev=stoprev)
>          elif r'_descendantrev' in attrs:
> -            introrev = self._adjustlinkrev(self._descendantrev)
> +            introrev = self._adjustlinkrev(self._descendantrev,
> stoprev=stoprev)
>              # be nice and cache the result of the computation
> -            self._changeid = introrev
> +            if introrev is not None:
> +                self._changeid = introrev
>              return introrev
>          else:
>              return self.linkrev()
>
Boris Feld - Oct. 9, 2018, 7 p.m.
On 04/10/2018 19:23, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
> On Thu, Oct 4, 2018 at 7:44 AM Boris Feld <boris.feld@octobus.net
> <mailto:boris.feld@octobus.net>> wrote:
>
>     # HG changeset patch
>     # User Boris Feld <boris.feld@octobus.net
>     <mailto:boris.feld@octobus.net>>
>     # Date 1536255188 14400
>     #      Thu Sep 06 13:33:08 2018 -0400
>     # Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
>     # Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
>     # EXP-Topic copy-perf
>     # Available At https://bitbucket.org/octobus/mercurial-devel/
>     #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r 53c0bf99c013
>     context: floor adjustlinkrev graph walk during copy tracing
>
>     The `_adjustlinkrev` method gains an optional "stoprev" argument.
>     The linkrev
>     adjustment will give up once this floor is reached. The relevant
>     functions
>     using `_adjustlinkrev` are updated to pass an appropriate value in
>     the copy
>     tracing code.
>
>
> When does this happen? In normal repos or only in broken ones? We
> don't seem to have any such repos in our test suite (raising exception
> instead of returning None does not fail any tests).
It happens in normal repository. The repository in the test suite might
be too simple to trigger this.
>  
>
>
>     In some private repository, about 10% of the status call triggered the
>     pathological case addressed by this change. The speedup varies
>     from one call
>     to another, the best-observed win is moving from 170s to 11s.
>
>
> Is that from just this patch or the entire series?
This patch is doing most of the performance win, some of the
intermediate refactoring steps might have an intermediate effect on
performance too (eg: the one removing the .rev()) call. However, we
don't have timing for them.
>
>
>     diff --git a/mercurial/context.py b/mercurial/context.py
>     --- a/mercurial/context.py
>     +++ b/mercurial/context.py
>     @@ -719,7 +719,7 @@ class basefilectx(object):
>
>              return True
>
>     -    def _adjustlinkrev(self, srcrev, inclusive=False):
>     +    def _adjustlinkrev(self, srcrev, inclusive=False, stoprev=None):
>              """return the first ancestor of <srcrev> introducing <fnode>
>
>              If the linkrev of the file revision does not point to an
>     ancestor of
>     @@ -728,6 +728,10 @@ class basefilectx(object):
>
>              :srcrev: the changeset revision we search ancestors from
>              :inclusive: if true, the src revision will also be checked
>     +        :stoprev: an optional revision to stop the walk at. If no
>     introduction
>     +                  of this file content could be found before this
>     floor
>     +                  revision, the function will returns "None" and
>     stops its
>     +                  iteration.
>              """
>              repo = self._repo
>              cl = repo.unfiltered().changelog
>     @@ -755,6 +759,8 @@ class basefilectx(object):
>                  fnode = self._filenode
>                  path = self._path
>                  for a in iteranc:
>     +                if stoprev is not None and a < stoprev:
>     +                    return None
>                      ac = cl.read(a) # get changeset data (we avoid
>     object creation)
>                      if path in ac[3]: # checking the 'files' field.
>                          # The file has been touched, check if the
>     content is
>     @@ -770,8 +776,12 @@ class basefilectx(object):
>          def isintroducedafter(self, changelogrev):
>              """True if a filectx have been introduced after a given
>     floor revision
>              """
>     -        return (self.linkrev() > changelogrev
>     -                or self._introrev() > changelogrev)
>     +        if self.linkrev() > changelogrev:
>     +            return True
>     +        introrev = self._introrev(stoprev=changelogrev)
>     +        if introrev is None:
>     +            return False
>     +        return introrev > changelogrev
>
>          def introrev(self):
>              """return the rev of the changeset which introduced this
>     file revision
>     @@ -784,7 +794,15 @@ class basefilectx(object):
>              """
>              return self._introrev()
>
>     -    def _introrev(self):
>     +    def _introrev(self, stoprev=None):
>     +        """
>     +        Same as `introrev` but, with an extra argument to limit
>     changelog
>     +        iteration range in some internal usecase.
>     +
>     +        If `stoprev` is set, the `introrev` will not be searched
>     past that
>     +        `stoprev` revision and "None" might be returned. This is
>     useful to
>     +        limit the iteration range.
>     +        """
>              toprev = None
>              attrs = vars(self)
>              if r'_changeid' in attrs:
>     @@ -795,11 +813,12 @@ class basefilectx(object):
>                  toprev = self._changectx.rev()
>
>              if toprev is not None:
>     -            return self._adjustlinkrev(toprev, inclusive=True)
>     +            return self._adjustlinkrev(toprev, inclusive=True,
>     stoprev=stoprev)
>              elif r'_descendantrev' in attrs:
>     -            introrev = self._adjustlinkrev(self._descendantrev)
>     +            introrev = self._adjustlinkrev(self._descendantrev,
>     stoprev=stoprev)
>                  # be nice and cache the result of the computation
>     -            self._changeid = introrev
>     +            if introrev is not None:
>     +                self._changeid = introrev
>                  return introrev
>              else:
>                  return self.linkrev()
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
via Mercurial-devel - Oct. 9, 2018, 10:18 p.m.
On Tue, Oct 9, 2018 at 12:00 PM Boris FELD <boris.feld@octobus.net> wrote:

>
> On 04/10/2018 19:23, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
>
> On Thu, Oct 4, 2018 at 7:44 AM Boris Feld <boris.feld@octobus.net> wrote:
>
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1536255188 14400
>> #      Thu Sep 06 13:33:08 2018 -0400
>> # Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
>> # Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
>> # EXP-Topic copy-perf
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
>> 53c0bf99c013
>> context: floor adjustlinkrev graph walk during copy tracing
>>
>> The `_adjustlinkrev` method gains an optional "stoprev" argument. The
>> linkrev
>> adjustment will give up once this floor is reached. The relevant functions
>> using `_adjustlinkrev` are updated to pass an appropriate value in the
>> copy
>> tracing code.
>>
>
> When does this happen? In normal repos or only in broken ones? We don't
> seem to have any such repos in our test suite (raising exception instead of
> returning None does not fail any tests).
>
> It happens in normal repository. The repository in the test suite might be
> too simple to trigger this.
>

I tried this (after replacing "return None" by "raise 1"):

for f in $(hg files -r .); do ./hg log -fG $f > /dev/null || echo $f; done

That didn't raise any exceptions in my hg core repo (after running for a
little over an hour). Do you have a repo that you can share where it does
happen?


>
>
>>
>> In some private repository, about 10% of the status call triggered the
>> pathological case addressed by this change. The speedup varies from one
>> call
>> to another, the best-observed win is moving from 170s to 11s.
>>
>
> Is that from just this patch or the entire series?
>
> This patch is doing most of the performance win, some of the intermediate
> refactoring steps might have an intermediate effect on performance too (eg:
> the one removing the .rev()) call. However, we don't have timing for them.
>
>
>
>> diff --git a/mercurial/context.py b/mercurial/context.py
>> --- a/mercurial/context.py
>> +++ b/mercurial/context.py
>> @@ -719,7 +719,7 @@ class basefilectx(object):
>>
>>          return True
>>
>> -    def _adjustlinkrev(self, srcrev, inclusive=False):
>> +    def _adjustlinkrev(self, srcrev, inclusive=False, stoprev=None):
>>          """return the first ancestor of <srcrev> introducing <fnode>
>>
>>          If the linkrev of the file revision does not point to an
>> ancestor of
>> @@ -728,6 +728,10 @@ class basefilectx(object):
>>
>>          :srcrev: the changeset revision we search ancestors from
>>          :inclusive: if true, the src revision will also be checked
>> +        :stoprev: an optional revision to stop the walk at. If no
>> introduction
>> +                  of this file content could be found before this floor
>> +                  revision, the function will returns "None" and stops
>> its
>> +                  iteration.
>>          """
>>          repo = self._repo
>>          cl = repo.unfiltered().changelog
>> @@ -755,6 +759,8 @@ class basefilectx(object):
>>              fnode = self._filenode
>>              path = self._path
>>              for a in iteranc:
>> +                if stoprev is not None and a < stoprev:
>> +                    return None
>>                  ac = cl.read(a) # get changeset data (we avoid object
>> creation)
>>                  if path in ac[3]: # checking the 'files' field.
>>                      # The file has been touched, check if the content is
>> @@ -770,8 +776,12 @@ class basefilectx(object):
>>      def isintroducedafter(self, changelogrev):
>>          """True if a filectx have been introduced after a given floor
>> revision
>>          """
>> -        return (self.linkrev() > changelogrev
>> -                or self._introrev() > changelogrev)
>> +        if self.linkrev() > changelogrev:
>> +            return True
>> +        introrev = self._introrev(stoprev=changelogrev)
>> +        if introrev is None:
>> +            return False
>> +        return introrev > changelogrev
>>
>>      def introrev(self):
>>          """return the rev of the changeset which introduced this file
>> revision
>> @@ -784,7 +794,15 @@ class basefilectx(object):
>>          """
>>          return self._introrev()
>>
>> -    def _introrev(self):
>> +    def _introrev(self, stoprev=None):
>> +        """
>> +        Same as `introrev` but, with an extra argument to limit changelog
>> +        iteration range in some internal usecase.
>> +
>> +        If `stoprev` is set, the `introrev` will not be searched past
>> that
>> +        `stoprev` revision and "None" might be returned. This is useful
>> to
>> +        limit the iteration range.
>> +        """
>>          toprev = None
>>          attrs = vars(self)
>>          if r'_changeid' in attrs:
>> @@ -795,11 +813,12 @@ class basefilectx(object):
>>              toprev = self._changectx.rev()
>>
>>          if toprev is not None:
>> -            return self._adjustlinkrev(toprev, inclusive=True)
>> +            return self._adjustlinkrev(toprev, inclusive=True,
>> stoprev=stoprev)
>>          elif r'_descendantrev' in attrs:
>> -            introrev = self._adjustlinkrev(self._descendantrev)
>> +            introrev = self._adjustlinkrev(self._descendantrev,
>> stoprev=stoprev)
>>              # be nice and cache the result of the computation
>> -            self._changeid = introrev
>> +            if introrev is not None:
>> +                self._changeid = introrev
>>              return introrev
>>          else:
>>              return self.linkrev()
>>
>
> _______________________________________________
> Mercurial-devel mailing listMercurial-devel@mercurial-scm.orghttps://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
Boris Feld - Oct. 10, 2018, 8:20 a.m.
On 10/10/2018 00:18, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
> On Tue, Oct 9, 2018 at 12:00 PM Boris FELD <boris.feld@octobus.net
> <mailto:boris.feld@octobus.net>> wrote:
>
>
>     On 04/10/2018 19:23, Martin von Zweigbergk via Mercurial-devel wrote:
>>
>>
>>     On Thu, Oct 4, 2018 at 7:44 AM Boris Feld <boris.feld@octobus.net
>>     <mailto:boris.feld@octobus.net>> wrote:
>>
>>         # HG changeset patch
>>         # User Boris Feld <boris.feld@octobus.net
>>         <mailto:boris.feld@octobus.net>>
>>         # Date 1536255188 14400
>>         #      Thu Sep 06 13:33:08 2018 -0400
>>         # Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
>>         # Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
>>         # EXP-Topic copy-perf
>>         # Available At https://bitbucket.org/octobus/mercurial-devel/
>>         #              hg pull
>>         https://bitbucket.org/octobus/mercurial-devel/ -r 53c0bf99c013
>>         context: floor adjustlinkrev graph walk during copy tracing
>>
>>         The `_adjustlinkrev` method gains an optional "stoprev"
>>         argument. The linkrev
>>         adjustment will give up once this floor is reached. The
>>         relevant functions
>>         using `_adjustlinkrev` are updated to pass an appropriate
>>         value in the copy
>>         tracing code.
>>
>>
>>     When does this happen? In normal repos or only in broken ones? We
>>     don't seem to have any such repos in our test suite (raising
>>     exception instead of returning None does not fail any tests).
>     It happens in normal repository. The repository in the test suite
>     might be too simple to trigger this.
>
>
> I tried this (after replacing "return None" by "raise 1"):
>
> for f in $(hg files -r .); do ./hg log -fG $f > /dev/null || echo $f; done
>
> That didn't raise any exceptions in my hg core repo (after running for
> a little over an hour). Do you have a repo that you can share where it
> does happen?
The issue happens when running ` hg status --copies`.

We cannot share the repository where we detected the issue, we actually
don't have access to it right now. We are planning to build better
tooling to bench the copy tracing but this won't happen this cycle.
Delta computation work has a higher priority than that. However, since
this code exists and provides such speedup, we would rather see it in
4.8 than in 4.9.

>  
>
>>      
>>
>>
>>         In some private repository, about 10% of the status call
>>         triggered the
>>         pathological case addressed by this change. The speedup
>>         varies from one call
>>         to another, the best-observed win is moving from 170s to 11s.
>>
>>
>>     Is that from just this patch or the entire series?
>     This patch is doing most of the performance win, some of the
>     intermediate refactoring steps might have an intermediate effect
>     on performance too (eg: the one removing the .rev()) call.
>     However, we don't have timing for them.
>>
>>
>>         diff --git a/mercurial/context.py b/mercurial/context.py
>>         --- a/mercurial/context.py
>>         +++ b/mercurial/context.py
>>         @@ -719,7 +719,7 @@ class basefilectx(object):
>>
>>                  return True
>>
>>         -    def _adjustlinkrev(self, srcrev, inclusive=False):
>>         +    def _adjustlinkrev(self, srcrev, inclusive=False,
>>         stoprev=None):
>>                  """return the first ancestor of <srcrev> introducing
>>         <fnode>
>>
>>                  If the linkrev of the file revision does not point
>>         to an ancestor of
>>         @@ -728,6 +728,10 @@ class basefilectx(object):
>>
>>                  :srcrev: the changeset revision we search ancestors from
>>                  :inclusive: if true, the src revision will also be
>>         checked
>>         +        :stoprev: an optional revision to stop the walk at.
>>         If no introduction
>>         +                  of this file content could be found before
>>         this floor
>>         +                  revision, the function will returns "None"
>>         and stops its
>>         +                  iteration.
>>                  """
>>                  repo = self._repo
>>                  cl = repo.unfiltered().changelog
>>         @@ -755,6 +759,8 @@ class basefilectx(object):
>>                      fnode = self._filenode
>>                      path = self._path
>>                      for a in iteranc:
>>         +                if stoprev is not None and a < stoprev:
>>         +                    return None
>>                          ac = cl.read(a) # get changeset data (we
>>         avoid object creation)
>>                          if path in ac[3]: # checking the 'files' field.
>>                              # The file has been touched, check if
>>         the content is
>>         @@ -770,8 +776,12 @@ class basefilectx(object):
>>              def isintroducedafter(self, changelogrev):
>>                  """True if a filectx have been introduced after a
>>         given floor revision
>>                  """
>>         -        return (self.linkrev() > changelogrev
>>         -                or self._introrev() > changelogrev)
>>         +        if self.linkrev() > changelogrev:
>>         +            return True
>>         +        introrev = self._introrev(stoprev=changelogrev)
>>         +        if introrev is None:
>>         +            return False
>>         +        return introrev > changelogrev
>>
>>              def introrev(self):
>>                  """return the rev of the changeset which introduced
>>         this file revision
>>         @@ -784,7 +794,15 @@ class basefilectx(object):
>>                  """
>>                  return self._introrev()
>>
>>         -    def _introrev(self):
>>         +    def _introrev(self, stoprev=None):
>>         +        """
>>         +        Same as `introrev` but, with an extra argument to
>>         limit changelog
>>         +        iteration range in some internal usecase.
>>         +
>>         +        If `stoprev` is set, the `introrev` will not be
>>         searched past that
>>         +        `stoprev` revision and "None" might be returned.
>>         This is useful to
>>         +        limit the iteration range.
>>         +        """
>>                  toprev = None
>>                  attrs = vars(self)
>>                  if r'_changeid' in attrs:
>>         @@ -795,11 +813,12 @@ class basefilectx(object):
>>                      toprev = self._changectx.rev()
>>
>>                  if toprev is not None:
>>         -            return self._adjustlinkrev(toprev, inclusive=True)
>>         +            return self._adjustlinkrev(toprev,
>>         inclusive=True, stoprev=stoprev)
>>                  elif r'_descendantrev' in attrs:
>>         -            introrev = self._adjustlinkrev(self._descendantrev)
>>         +            introrev =
>>         self._adjustlinkrev(self._descendantrev, stoprev=stoprev)
>>                      # be nice and cache the result of the computation
>>         -            self._changeid = introrev
>>         +            if introrev is not None:
>>         +                self._changeid = introrev
>>                      return introrev
>>                  else:
>>                      return self.linkrev()
>>
>>
>>     _______________________________________________
>>     Mercurial-devel mailing list
>>     Mercurial-devel@mercurial-scm.org <mailto:Mercurial-devel@mercurial-scm.org>
>>     https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
via Mercurial-devel - Oct. 10, 2018, 3:19 p.m.
On Wed, Oct 10, 2018 at 1:20 AM Boris FELD <boris.feld@octobus.net> wrote:

>
> On 10/10/2018 00:18, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
>
> On Tue, Oct 9, 2018 at 12:00 PM Boris FELD <boris.feld@octobus.net> wrote:
>
>>
>> On 04/10/2018 19:23, Martin von Zweigbergk via Mercurial-devel wrote:
>>
>>
>>
>> On Thu, Oct 4, 2018 at 7:44 AM Boris Feld <boris.feld@octobus.net> wrote:
>>
>>> # HG changeset patch
>>> # User Boris Feld <boris.feld@octobus.net>
>>> # Date 1536255188 14400
>>> #      Thu Sep 06 13:33:08 2018 -0400
>>> # Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
>>> # Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
>>> # EXP-Topic copy-perf
>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/
>>> -r 53c0bf99c013
>>> context: floor adjustlinkrev graph walk during copy tracing
>>>
>>> The `_adjustlinkrev` method gains an optional "stoprev" argument. The
>>> linkrev
>>> adjustment will give up once this floor is reached. The relevant
>>> functions
>>> using `_adjustlinkrev` are updated to pass an appropriate value in the
>>> copy
>>> tracing code.
>>>
>>
>> When does this happen? In normal repos or only in broken ones? We don't
>> seem to have any such repos in our test suite (raising exception instead of
>> returning None does not fail any tests).
>>
>> It happens in normal repository. The repository in the test suite might
>> be too simple to trigger this.
>>
>
> I tried this (after replacing "return None" by "raise 1"):
>
> for f in $(hg files -r .); do ./hg log -fG $f > /dev/null || echo $f; done
>
> That didn't raise any exceptions in my hg core repo (after running for a
> little over an hour). Do you have a repo that you can share where it does
> happen?
>
> The issue happens when running ` hg status --copies`.
>
> We cannot share the repository where we detected the issue, we actually
> don't have access to it right now.
>

That's understandable. However, can you at least explain when this can
happen? Would I be wasting my time (again) if I try to trigger it by
running `hg status --copies --change <rev>` for every revision in the hg
repo?


> We are planning to build better tooling to bench the copy tracing but this
> won't happen this cycle. Delta computation work has a higher priority than
> that. However, since this code exists and provides such speedup, we would
> rather see it in 4.8 than in 4.9.
>
>
>
>>
>>
>>>
>>> In some private repository, about 10% of the status call triggered the
>>> pathological case addressed by this change. The speedup varies from one
>>> call
>>> to another, the best-observed win is moving from 170s to 11s.
>>>
>>
>> Is that from just this patch or the entire series?
>>
>> This patch is doing most of the performance win, some of the intermediate
>> refactoring steps might have an intermediate effect on performance too (eg:
>> the one removing the .rev()) call. However, we don't have timing for them.
>>
>>
>>
>>> diff --git a/mercurial/context.py b/mercurial/context.py
>>> --- a/mercurial/context.py
>>> +++ b/mercurial/context.py
>>> @@ -719,7 +719,7 @@ class basefilectx(object):
>>>
>>>          return True
>>>
>>> -    def _adjustlinkrev(self, srcrev, inclusive=False):
>>> +    def _adjustlinkrev(self, srcrev, inclusive=False, stoprev=None):
>>>          """return the first ancestor of <srcrev> introducing <fnode>
>>>
>>>          If the linkrev of the file revision does not point to an
>>> ancestor of
>>> @@ -728,6 +728,10 @@ class basefilectx(object):
>>>
>>>          :srcrev: the changeset revision we search ancestors from
>>>          :inclusive: if true, the src revision will also be checked
>>> +        :stoprev: an optional revision to stop the walk at. If no
>>> introduction
>>> +                  of this file content could be found before this floor
>>> +                  revision, the function will returns "None" and stops
>>> its
>>> +                  iteration.
>>>          """
>>>          repo = self._repo
>>>          cl = repo.unfiltered().changelog
>>> @@ -755,6 +759,8 @@ class basefilectx(object):
>>>              fnode = self._filenode
>>>              path = self._path
>>>              for a in iteranc:
>>> +                if stoprev is not None and a < stoprev:
>>> +                    return None
>>>                  ac = cl.read(a) # get changeset data (we avoid object
>>> creation)
>>>                  if path in ac[3]: # checking the 'files' field.
>>>                      # The file has been touched, check if the content is
>>> @@ -770,8 +776,12 @@ class basefilectx(object):
>>>      def isintroducedafter(self, changelogrev):
>>>          """True if a filectx have been introduced after a given floor
>>> revision
>>>          """
>>> -        return (self.linkrev() > changelogrev
>>> -                or self._introrev() > changelogrev)
>>> +        if self.linkrev() > changelogrev:
>>> +            return True
>>> +        introrev = self._introrev(stoprev=changelogrev)
>>> +        if introrev is None:
>>> +            return False
>>> +        return introrev > changelogrev
>>>
>>>      def introrev(self):
>>>          """return the rev of the changeset which introduced this file
>>> revision
>>> @@ -784,7 +794,15 @@ class basefilectx(object):
>>>          """
>>>          return self._introrev()
>>>
>>> -    def _introrev(self):
>>> +    def _introrev(self, stoprev=None):
>>> +        """
>>> +        Same as `introrev` but, with an extra argument to limit
>>> changelog
>>> +        iteration range in some internal usecase.
>>> +
>>> +        If `stoprev` is set, the `introrev` will not be searched past
>>> that
>>> +        `stoprev` revision and "None" might be returned. This is useful
>>> to
>>> +        limit the iteration range.
>>> +        """
>>>          toprev = None
>>>          attrs = vars(self)
>>>          if r'_changeid' in attrs:
>>> @@ -795,11 +813,12 @@ class basefilectx(object):
>>>              toprev = self._changectx.rev()
>>>
>>>          if toprev is not None:
>>> -            return self._adjustlinkrev(toprev, inclusive=True)
>>> +            return self._adjustlinkrev(toprev, inclusive=True,
>>> stoprev=stoprev)
>>>          elif r'_descendantrev' in attrs:
>>> -            introrev = self._adjustlinkrev(self._descendantrev)
>>> +            introrev = self._adjustlinkrev(self._descendantrev,
>>> stoprev=stoprev)
>>>              # be nice and cache the result of the computation
>>> -            self._changeid = introrev
>>> +            if introrev is not None:
>>> +                self._changeid = introrev
>>>              return introrev
>>>          else:
>>>              return self.linkrev()
>>>
>>
>> _______________________________________________
>> Mercurial-devel mailing listMercurial-devel@mercurial-scm.orghttps://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
>>
> _______________________________________________
> Mercurial-devel mailing listMercurial-devel@mercurial-scm.orghttps://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
Boris Feld - Oct. 10, 2018, 7:43 p.m.
On 10/10/2018 17:19, Martin von Zweigbergk via Mercurial-devel wrote:
>
>
> On Wed, Oct 10, 2018 at 1:20 AM Boris FELD <boris.feld@octobus.net
> <mailto:boris.feld@octobus.net>> wrote:
>
>
>     On 10/10/2018 00:18, Martin von Zweigbergk via Mercurial-devel wrote:
>>
>>
>>     On Tue, Oct 9, 2018 at 12:00 PM Boris FELD
>>     <boris.feld@octobus.net <mailto:boris.feld@octobus.net>> wrote:
>>
>>
>>         On 04/10/2018 19:23, Martin von Zweigbergk via
>>         Mercurial-devel wrote:
>>>
>>>
>>>         On Thu, Oct 4, 2018 at 7:44 AM Boris Feld
>>>         <boris.feld@octobus.net <mailto:boris.feld@octobus.net>> wrote:
>>>
>>>             # HG changeset patch
>>>             # User Boris Feld <boris.feld@octobus.net
>>>             <mailto:boris.feld@octobus.net>>
>>>             # Date 1536255188 14400
>>>             #      Thu Sep 06 13:33:08 2018 -0400
>>>             # Node ID 53c0bf99c013909bd628aa5254c26d301236ba26
>>>             # Parent  462edcfdf6346b522eee9a64c5c1ca9ff566d7b8
>>>             # EXP-Topic copy-perf
>>>             # Available At
>>>             https://bitbucket.org/octobus/mercurial-devel/
>>>             #              hg pull
>>>             https://bitbucket.org/octobus/mercurial-devel/ -r
>>>             53c0bf99c013
>>>             context: floor adjustlinkrev graph walk during copy tracing
>>>
>>>             The `_adjustlinkrev` method gains an optional "stoprev"
>>>             argument. The linkrev
>>>             adjustment will give up once this floor is reached. The
>>>             relevant functions
>>>             using `_adjustlinkrev` are updated to pass an
>>>             appropriate value in the copy
>>>             tracing code.
>>>
>>>
>>>         When does this happen? In normal repos or only in broken
>>>         ones? We don't seem to have any such repos in our test suite
>>>         (raising exception instead of returning None does not fail
>>>         any tests).
>>         It happens in normal repository. The repository in the test
>>         suite might be too simple to trigger this.
>>
>>
>>     I tried this (after replacing "return None" by "raise 1"):
>>
>>     for f in $(hg files -r .); do ./hg log -fG $f > /dev/null || echo
>>     $f; done
>>
>>     That didn't raise any exceptions in my hg core repo (after
>>     running for a little over an hour). Do you have a repo that you
>>     can share where it does happen?
>     The issue happens when running ` hg status --copies`.
>
>     We cannot share the repository where we detected the issue, we
>     actually don't have access to it right now.
>
>
> That's understandable. However, can you at least explain when this can
> happen? Would I be wasting my time (again) if I try to trigger it by
> running `hg status --copies --change <rev>` for every revision in the
> hg repo?

This is rename detection over a range of changesets. Example of
occurrence should be merges or `hg status --copies --rev .~1000`.

Running it on `hg status --copies --change <rev>` will not work as it
only looks into a single changeset.
>  
>
>     We are planning to build better tooling to bench the copy tracing
>     but this won't happen this cycle. Delta computation work has a
>     higher priority than that. However, since this code exists and
>     provides such speedup, we would rather see it in 4.8 than in 4.9.
>
>>      
>>
>>>          
>>>
>>>
>>>             In some private repository, about 10% of the status call
>>>             triggered the
>>>             pathological case addressed by this change. The speedup
>>>             varies from one call
>>>             to another, the best-observed win is moving from 170s to
>>>             11s.
>>>
>>>
>>>         Is that from just this patch or the entire series?
>>         This patch is doing most of the performance win, some of the
>>         intermediate refactoring steps might have an intermediate
>>         effect on performance too (eg: the one removing the .rev())
>>         call. However, we don't have timing for them.
>>>
>>>
>>>             diff --git a/mercurial/context.py b/mercurial/context.py
>>>             --- a/mercurial/context.py
>>>             +++ b/mercurial/context.py
>>>             @@ -719,7 +719,7 @@ class basefilectx(object):
>>>
>>>                      return True
>>>
>>>             -    def _adjustlinkrev(self, srcrev, inclusive=False):
>>>             +    def _adjustlinkrev(self, srcrev, inclusive=False,
>>>             stoprev=None):
>>>                      """return the first ancestor of <srcrev>
>>>             introducing <fnode>
>>>
>>>                      If the linkrev of the file revision does not
>>>             point to an ancestor of
>>>             @@ -728,6 +728,10 @@ class basefilectx(object):
>>>
>>>                      :srcrev: the changeset revision we search
>>>             ancestors from
>>>                      :inclusive: if true, the src revision will also
>>>             be checked
>>>             +        :stoprev: an optional revision to stop the walk
>>>             at. If no introduction
>>>             +                  of this file content could be found
>>>             before this floor
>>>             +                  revision, the function will returns
>>>             "None" and stops its
>>>             +                  iteration.
>>>                      """
>>>                      repo = self._repo
>>>                      cl = repo.unfiltered().changelog
>>>             @@ -755,6 +759,8 @@ class basefilectx(object):
>>>                          fnode = self._filenode
>>>                          path = self._path
>>>                          for a in iteranc:
>>>             +                if stoprev is not None and a < stoprev:
>>>             +                    return None
>>>                              ac = cl.read(a) # get changeset data
>>>             (we avoid object creation)
>>>                              if path in ac[3]: # checking the
>>>             'files' field.
>>>                                  # The file has been touched, check
>>>             if the content is
>>>             @@ -770,8 +776,12 @@ class basefilectx(object):
>>>                  def isintroducedafter(self, changelogrev):
>>>                      """True if a filectx have been introduced after
>>>             a given floor revision
>>>                      """
>>>             -        return (self.linkrev() > changelogrev
>>>             -                or self._introrev() > changelogrev)
>>>             +        if self.linkrev() > changelogrev:
>>>             +            return True
>>>             +        introrev = self._introrev(stoprev=changelogrev)
>>>             +        if introrev is None:
>>>             +            return False
>>>             +        return introrev > changelogrev
>>>
>>>                  def introrev(self):
>>>                      """return the rev of the changeset which
>>>             introduced this file revision
>>>             @@ -784,7 +794,15 @@ class basefilectx(object):
>>>                      """
>>>                      return self._introrev()
>>>
>>>             -    def _introrev(self):
>>>             +    def _introrev(self, stoprev=None):
>>>             +        """
>>>             +        Same as `introrev` but, with an extra argument
>>>             to limit changelog
>>>             +        iteration range in some internal usecase.
>>>             +
>>>             +        If `stoprev` is set, the `introrev` will not be
>>>             searched past that
>>>             +        `stoprev` revision and "None" might be
>>>             returned. This is useful to
>>>             +        limit the iteration range.
>>>             +        """
>>>                      toprev = None
>>>                      attrs = vars(self)
>>>                      if r'_changeid' in attrs:
>>>             @@ -795,11 +813,12 @@ class basefilectx(object):
>>>                          toprev = self._changectx.rev()
>>>
>>>                      if toprev is not None:
>>>             -            return self._adjustlinkrev(toprev,
>>>             inclusive=True)
>>>             +            return self._adjustlinkrev(toprev,
>>>             inclusive=True, stoprev=stoprev)
>>>                      elif r'_descendantrev' in attrs:
>>>             -            introrev =
>>>             self._adjustlinkrev(self._descendantrev)
>>>             +            introrev =
>>>             self._adjustlinkrev(self._descendantrev, stoprev=stoprev)
>>>                          # be nice and cache the result of the
>>>             computation
>>>             -            self._changeid = introrev
>>>             +            if introrev is not None:
>>>             +                self._changeid = introrev
>>>                          return introrev
>>>                      else:
>>>                          return self.linkrev()
>>>
>>>
>>>         _______________________________________________
>>>         Mercurial-devel mailing list
>>>         Mercurial-devel@mercurial-scm.org <mailto:Mercurial-devel@mercurial-scm.org>
>>>         https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
>>
>>     _______________________________________________
>>     Mercurial-devel mailing list
>>     Mercurial-devel@mercurial-scm.org <mailto:Mercurial-devel@mercurial-scm.org>
>>     https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -719,7 +719,7 @@  class basefilectx(object):
 
         return True
 
-    def _adjustlinkrev(self, srcrev, inclusive=False):
+    def _adjustlinkrev(self, srcrev, inclusive=False, stoprev=None):
         """return the first ancestor of <srcrev> introducing <fnode>
 
         If the linkrev of the file revision does not point to an ancestor of
@@ -728,6 +728,10 @@  class basefilectx(object):
 
         :srcrev: the changeset revision we search ancestors from
         :inclusive: if true, the src revision will also be checked
+        :stoprev: an optional revision to stop the walk at. If no introduction
+                  of this file content could be found before this floor
+                  revision, the function will returns "None" and stops its
+                  iteration.
         """
         repo = self._repo
         cl = repo.unfiltered().changelog
@@ -755,6 +759,8 @@  class basefilectx(object):
             fnode = self._filenode
             path = self._path
             for a in iteranc:
+                if stoprev is not None and a < stoprev:
+                    return None
                 ac = cl.read(a) # get changeset data (we avoid object creation)
                 if path in ac[3]: # checking the 'files' field.
                     # The file has been touched, check if the content is
@@ -770,8 +776,12 @@  class basefilectx(object):
     def isintroducedafter(self, changelogrev):
         """True if a filectx have been introduced after a given floor revision
         """
-        return (self.linkrev() > changelogrev
-                or self._introrev() > changelogrev)
+        if self.linkrev() > changelogrev:
+            return True
+        introrev = self._introrev(stoprev=changelogrev)
+        if introrev is None:
+            return False
+        return introrev > changelogrev
 
     def introrev(self):
         """return the rev of the changeset which introduced this file revision
@@ -784,7 +794,15 @@  class basefilectx(object):
         """
         return self._introrev()
 
-    def _introrev(self):
+    def _introrev(self, stoprev=None):
+        """
+        Same as `introrev` but, with an extra argument to limit changelog
+        iteration range in some internal usecase.
+
+        If `stoprev` is set, the `introrev` will not be searched past that
+        `stoprev` revision and "None" might be returned. This is useful to
+        limit the iteration range.
+        """
         toprev = None
         attrs = vars(self)
         if r'_changeid' in attrs:
@@ -795,11 +813,12 @@  class basefilectx(object):
             toprev = self._changectx.rev()
 
         if toprev is not None:
-            return self._adjustlinkrev(toprev, inclusive=True)
+            return self._adjustlinkrev(toprev, inclusive=True, stoprev=stoprev)
         elif r'_descendantrev' in attrs:
-            introrev = self._adjustlinkrev(self._descendantrev)
+            introrev = self._adjustlinkrev(self._descendantrev, stoprev=stoprev)
             # be nice and cache the result of the computation
-            self._changeid = introrev
+            if introrev is not None:
+                self._changeid = introrev
             return introrev
         else:
             return self.linkrev()