Submitter | Jun Wu |
---|---|
Date | Sept. 2, 2016, 3:10 p.m. |
Message ID | <c9a9040c95add098eb8e.1472829030@x1c> |
Download | mbox | patch |
Permalink | /patch/16533/ |
State | Changes Requested |
Headers | show |
Comments
On Fri, 2 Sep 2016 16:10:30 +0100, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1472826059 -3600 > # Fri Sep 02 15:20:59 2016 +0100 > # Node ID c9a9040c95add098eb8e280fa60edaa766154141 > # Parent d130a38ef41f3c9e2d2f26df3535d89a93f87301 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r c9a9040c95ad > annotate: pre-calculate the "needed" dictionary > > The "needed" dict is used as a reference counter to free items in the giant > "hist" dict. However, currently it is not very accurate and can lead to > dropping "hist" items unnecessarily, for example, with the following DAG, > > -3- > / \ > --1--2--4-- > > The current algorithm will visit and calculate rev 1 twice, undesired. Also that would generate wrong result probably because hist[f] was deleted too early and pcache[f] was emptied. Compare "hg annotate mercurial/commands.py" for example. Can you add a test case for that? This patch appears to fix old bug, so I think it's good for stable. > + # 1st DFS pre-calculates pcache and needed > visit = [base] > - hist = {} > pcache = {} > needed = {base: 1} > while visit: > f = visit[-1] > - pcached = f in pcache > - if not pcached: > - pcache[f] = parents(f) > + visit.pop() Nit: could be f = visit.pop() > + pl = parents(f) > + pcache[f] = pl > + for p in pl: > + if p not in needed: > + needed[p] = 1 > + else: > + needed[p] += 1 > + if p not in pcache: > + visit.append(p) There can be more than one route to 'p' from 'visit[]', so 'p not in pcache' isn't enough to avoid unneeded calculation of 'parents(f)', and needed[p] can be over-counted.
Good catch about both issues! I will prepare a V2. Excerpts from Yuya Nishihara's message of 2016-09-05 00:21:45 +0900: > On Fri, 2 Sep 2016 16:10:30 +0100, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1472826059 -3600 > > # Fri Sep 02 15:20:59 2016 +0100 > > # Node ID c9a9040c95add098eb8e280fa60edaa766154141 > > # Parent d130a38ef41f3c9e2d2f26df3535d89a93f87301 > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r c9a9040c95ad > > annotate: pre-calculate the "needed" dictionary > > > > The "needed" dict is used as a reference counter to free items in the giant > > "hist" dict. However, currently it is not very accurate and can lead to > > dropping "hist" items unnecessarily, for example, with the following DAG, > > > > -3- > > / \ > > --1--2--4-- > > > > The current algorithm will visit and calculate rev 1 twice, undesired. > > Also that would generate wrong result probably because hist[f] was deleted too > early and pcache[f] was emptied. Compare "hg annotate mercurial/commands.py" > for example. > > Can you add a test case for that? This patch appears to fix old bug, so I think > it's good for stable. > > > + # 1st DFS pre-calculates pcache and needed > > visit = [base] > > - hist = {} > > pcache = {} > > needed = {base: 1} > > while visit: > > f = visit[-1] > > - pcached = f in pcache > > - if not pcached: > > - pcache[f] = parents(f) > > + visit.pop() > > Nit: could be f = visit.pop() > > > + pl = parents(f) > > + pcache[f] = pl > > + for p in pl: > > + if p not in needed: > > + needed[p] = 1 > > + else: > > + needed[p] += 1 > > + if p not in pcache: > > + visit.append(p) > > There can be more than one route to 'p' from 'visit[]', so 'p not in pcache' > isn't enough to avoid unneeded calculation of 'parents(f)', and needed[p] can > be over-counted.
Patch
diff --git a/mercurial/context.py b/mercurial/context.py --- a/mercurial/context.py +++ b/mercurial/context.py @@ -991,13 +991,26 @@ class basefilectx(object): # depth-first search. + # 1st DFS pre-calculates pcache and needed visit = [base] - hist = {} pcache = {} needed = {base: 1} while visit: f = visit[-1] - pcached = f in pcache - if not pcached: - pcache[f] = parents(f) + visit.pop() + pl = parents(f) + pcache[f] = pl + for p in pl: + if p not in needed: + needed[p] = 1 + else: + needed[p] += 1 + if p not in pcache: + visit.append(p) + + # 2nd DFS does the actual annotate + visit[:] = [base] + hist = {} + while visit: + f = visit[-1] ready = True @@ -1007,6 +1020,4 @@ class basefilectx(object): ready = False visit.append(p) - if not pcached: - needed[p] = needed.get(p, 0) + 1 if ready: visit.pop()