Patchwork D6561: copies: simplify merging of copy dicts on merge commits

login
register
mail settings
Submitter phabricator
Date June 20, 2019, 9:09 p.m.
Message ID <differential-rev-PHID-DREV-fh4g6ufirvk523hpzp4v-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/40633/
State Superseded
Headers show

Comments

phabricator - June 20, 2019, 9:09 p.m.
martinvonz created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  After we removed some filtering in 35d674a3d5db <https://phab.mercurial-scm.org/rHG35d674a3d5db319b47560f1ef8949b0f77c1cafb> (copies: don't filter
  out copy targets created on other side of merge commit, 2019-04-18),
  we will always include all entries from "copies1", so we can simplify
  the code based on that.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6561

AFFECTED FILES
  mercurial/copies.py

CHANGE DETAILS




To: martinvonz, #hg-reviewers
Cc: mercurial-devel
Yuya Nishihara - June 25, 2019, 11:25 p.m.
>      while work:
> -        r, i1, copies1 = heapq.heappop(work)
> +        r, i1, copies = heapq.heappop(work)
>          if work and work[0][0] == r:
>              # We are tracing copies from both parents
>              r, i2, copies2 = heapq.heappop(work)
> -            copies = {}
> -            allcopies = set(copies1) | set(copies2)
> -            for dst in allcopies:
> +            for dst, src in copies2.items():
>                  # Unlike when copies are stored in the filelog, we consider
>                  # it a copy even if the destination already existed on the
>                  # other branch. It's simply too expensive to check if the
>                  # file existed in the manifest.
> -                if dst in copies1:
> -                    # If it was copied on the p1 side, mark it as copied from
> +                if dst not in copies:
> +                    # If it was copied on the p1 side, leave it as copied from
>                      # that side, even if it was also copied on the p2 side.
> -                    copies[dst] = copies1[dst]
> -                else:
>                      copies[dst] = copies2[dst]

Are we sure there's no `copies` alias held by later `work`?

I'm just asking because we've optimized some `copies.copy()`s away at
5ceb91136ebe. I don't know if it's safe or not to mutate `copies` at this
point.
phabricator - June 25, 2019, 11:28 p.m.
yuja added a comment.


  >   while work:
  >
  > - r, i1, copies1 = heapq.heappop(work)
  >
  > +        r, i1, copies = heapq.heappop(work)
  >
  >   if work and work[0][0] == r:
  >       # We are tracing copies from both parents
  >       r, i2, copies2 = heapq.heappop(work)
  >
  > - copies = {}
  > - allcopies = set(copies1) | set(copies2)
  > - for dst in allcopies:
  >
  > +            for dst, src in copies2.items():
  >
  >   1. Unlike when copies are stored in the filelog, we consider
  >   2. it a copy even if the destination already existed on the
  >   3. other branch. It's simply too expensive to check if the
  >   4. file existed in the manifest.
  > - if dst in copies1:
  > - # If it was copied on the p1 side, mark it as copied from
  >
  > +                if dst not in copies:
  > +                    # If it was copied on the p1 side, leave it as copied from
  >
  >   1. that side, even if it was also copied on the p2 side.
  > - copies[dst] = copies1[dst]
  > - else: copies[dst] = copies2[dst]
  
  Are we sure there's no `copies` alias held by later `work`?
  
  I'm just asking because we've optimized some `copies.copy()`s away at
  5ceb91136ebe <https://phab.mercurial-scm.org/rHG5ceb91136ebe79215da1723d3a905009b5f2a64f>. I don't know if it's safe or not to mutate `copies` at this
  point.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6561/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6561

To: martinvonz, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - June 26, 2019, 1:08 a.m.
martinvonz added a comment.


  In D6561#95834 <https://phab.mercurial-scm.org/D6561#95834>, @yuja wrote:
  
  >>   while work:
  >>
  >> - r, i1, copies1 = heapq.heappop(work)
  >>
  >> +        r, i1, copies = heapq.heappop(work)
  >>
  >>   if work and work[0][0] == r:
  >>       # We are tracing copies from both parents
  >>       r, i2, copies2 = heapq.heappop(work)
  >>
  >> - copies = {}
  >> - allcopies = set(copies1) | set(copies2)
  >> - for dst in allcopies:
  >>
  >> +            for dst, src in copies2.items():
  >>
  >>   1. Unlike when copies are stored in the filelog, we consider
  >>   2. it a copy even if the destination already existed on the
  >>   3. other branch. It's simply too expensive to check if the
  >>   4. file existed in the manifest.
  >> - if dst in copies1:
  >> - # If it was copied on the p1 side, mark it as copied from
  >>
  >> +                if dst not in copies:
  >> +                    # If it was copied on the p1 side, leave it as copied from
  >>
  >>   1. that side, even if it was also copied on the p2 side.
  >> - copies[dst] = copies1[dst]
  >> - else: copies[dst] = copies2[dst]
  >
  > Are we sure there's no `copies` alias held by later `work`?
  > I'm just asking because we've optimized some `copies.copy()`s away at
  > 5ceb91136ebe <https://phab.mercurial-scm.org/rHG5ceb91136ebe79215da1723d3a905009b5f2a64f>. I don't know if it's safe or not to mutate `copies` at this
  > point.
  
  I *think* it should be safe. The policy is that each instance can only appear in one place in the queue. I should send a patch to document that policy. Let me know if you see a violation of that policy.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6561/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6561

To: martinvonz, #hg-reviewers
Cc: yuja, mercurial-devel

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -272,25 +272,19 @@ 
     heapq.heapify(work)
     alwaysmatch = match.always()
     while work:
-        r, i1, copies1 = heapq.heappop(work)
+        r, i1, copies = heapq.heappop(work)
         if work and work[0][0] == r:
             # We are tracing copies from both parents
             r, i2, copies2 = heapq.heappop(work)
-            copies = {}
-            allcopies = set(copies1) | set(copies2)
-            for dst in allcopies:
+            for dst, src in copies2.items():
                 # Unlike when copies are stored in the filelog, we consider
                 # it a copy even if the destination already existed on the
                 # other branch. It's simply too expensive to check if the
                 # file existed in the manifest.
-                if dst in copies1:
-                    # If it was copied on the p1 side, mark it as copied from
+                if dst not in copies:
+                    # If it was copied on the p1 side, leave it as copied from
                     # that side, even if it was also copied on the p2 side.
-                    copies[dst] = copies1[dst]
-                else:
                     copies[dst] = copies2[dst]
-        else:
-            copies = copies1
         if r == b.rev():
             _filter(a, b, copies)
             return copies