Patchwork [1,of,3,V2] merge: check for dir rename dest before adding ACTION_KEEP

login
register
mail settings
Submitter Pulkit Goyal
Date Sept. 16, 2020, 1:13 p.m.
Message ID <72edc2d86f6524d1d65f.1600262026@workspace>
Download mbox | patch
Permalink /patch/47174/
State New
Headers show

Comments

Pulkit Goyal - Sept. 16, 2020, 1:13 p.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1599034669 -19800
#      Wed Sep 02 13:47:49 2020 +0530
# Node ID 72edc2d86f6524d1d65f33e49cdb46606826200f
# Parent  4f27ca3ef97d8372b9bb288bd8102460d372d160
# EXP-Topic merge-newnode
merge: check for dir rename dest before adding ACTION_KEEP

A previous patch in the series blindly uses `ACTION_KEEP` if the file is not
present on both remote and ancestor. This was wrong.

We tries to detect directory renames and in some graft cases, it can be possible
that file is not present on both sides but is created in rename destination of
other directory which exists on remote. In such cases, we need to merge the
rename source from remote with rename dest from local.

This patch makes sure we checks for such rename cases before falling back to
`ACTION_KEEP`.

Action for rename destination can be added while processing rename destination
or processing rename source. In some cases, when an optimization only diffing
files changes between m2-vs-ma are in play, either of rename dest or rename
source might not be processed.
Hence we need to be extra sure about adding action related to rename
destination.

This issue of missing to check dir rename dest was spotted by a future change
where some tests were failing.

Differential Revision: https://phab.mercurial-scm.org/D8977
Yuya Nishihara - Sept. 17, 2020, 11:04 a.m.
On Wed, 16 Sep 2020 18:43:46 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1599034669 -19800
> #      Wed Sep 02 13:47:49 2020 +0530
> # Node ID 72edc2d86f6524d1d65f33e49cdb46606826200f
> # Parent  4f27ca3ef97d8372b9bb288bd8102460d372d160
> # EXP-Topic merge-newnode
> merge: check for dir rename dest before adding ACTION_KEEP

> +    # we might miss adding an action for the rename destination file as it
> +    # didn't change between m2 and ma and hence was optimized out. However,
> +    # the file is important as it contains rename information which is
> +    # important. Stores such actions and makes sure they are all there
> +    # in mresult object. If not, adds them.
> +    rename_dest_actions = {}
> +
>      for f, ((n1, fl1), (n2, fl2)) in pycompat.iteritems(diff):
>          if n1 and n2:  # file exists on both local and remote side
>              if f not in ma:
> @@ -933,12 +940,41 @@ def manifestmerge(
>                          f, mergestatemod.ACTION_REMOVE, None, b'other deleted',
>                      )
>              else:  # file not in ancestor, not in remote
> -                mresult.addfile(
> -                    f,
> -                    mergestatemod.ACTION_KEEP,
> -                    None,
> -                    b'ancestor missing, remote missing',
> -                )
> +                rename_found = False
> +                for source, dest in branch_copies1.dirmove.items():
> +                    if f.startswith(dest):
> +                        # this directory in which this file is created
> +                        # is moved from somewhere else
> +                        # we should check if the source file exists on
> +                        # remote side
> +                        sf = source + f[len(dest) :]
> +                        if sf in m2:
> +                            rename_found = True
> +                            if m2[sf] == ma[sf]:
> +                                mresult.addfile(
> +                                    f,
> +                                    mergestatemod.ACTION_KEEP,
> +                                    None,
> +                                    b'file is rename dest and rename source'
> +                                    b' did not change',
> +                                )
> +                            else:
> +                                mresult.addfile(
> +                                    f,
> +                                    mergestatemod.ACTION_MERGE,
> +                                    (f, sf, sf, False, pa.node()),
> +                                    b'local directory rename - respect move '
> +                                    b'from %s' % sf,
> +                                )
> +                            break
> +
> +                if not rename_found:
> +                    mresult.addfile(
> +                        f,
> +                        mergestatemod.ACTION_KEEP,
> +                        None,
> +                        b'ancestor missing, remote missing',
> +                    )
>  
>          elif n2:  # file exists only on remote side
>              if f in copied1:
> @@ -1010,8 +1046,19 @@ def manifestmerge(
>                          df = branch_copies1.dirmove[d] + f[len(d) :]
>                          break
>                  if df is not None and df in m1:
> +                    # this file is not present in local however its a source
> +                    # of rename for another file. keep this file not
> +                    # present in local.
>                      mresult.addfile(
> -                        df,
> +                        f,
> +                        mergestatemod.ACTION_KEEP_ABSENT,
> +                        None,
> +                        b'remote directory is rename source - respect move '
> +                        b'to %s' % df,
> +                    )
> +                    # action for rename destination is important and can be
> +                    # missed due to some optimizations.
> +                    rename_dest_actions[df] = (
>                          mergestatemod.ACTION_MERGE,
>                          (df, f, f, False, pa.node()),
>                          b'local directory rename - respect move '
> @@ -1039,6 +1086,12 @@ def manifestmerge(
>                      b'local not present, remote unchanged',
>                  )
>  
> +    # make sure actions for rename destination are there and if not
> +    # add them
> +    for f, data in pycompat.iteritems(rename_dest_actions):
> +        if mresult.getfile(f) is None:
> +            mresult.addfile(f, *data)

Are we sure if this "f not in mresult: addfile(f, ACTION_MERGE)" logic is
not necessary for the "file exists only on local side" case?

I'm not sure, but I feel it's a bit odd that the n1 logic isn't similar
to the n2 one. The KEEP action might have to be "weak" in both cases.
Pulkit Goyal - Sept. 17, 2020, 12:53 p.m.
On Thu, Sep 17, 2020 at 4:34 PM Yuya Nishihara <yuya@tcha.org> wrote:
>
> On Wed, 16 Sep 2020 18:43:46 +0530, Pulkit Goyal wrote:
> > # HG changeset patch
> > # User Pulkit Goyal <7895pulkit@gmail.com>
> > # Date 1599034669 -19800
> > #      Wed Sep 02 13:47:49 2020 +0530
> > # Node ID 72edc2d86f6524d1d65f33e49cdb46606826200f
> > # Parent  4f27ca3ef97d8372b9bb288bd8102460d372d160
> > # EXP-Topic merge-newnode
> > merge: check for dir rename dest before adding ACTION_KEEP
>
> > +    # we might miss adding an action for the rename destination file as it
> > +    # didn't change between m2 and ma and hence was optimized out. However,
> > +    # the file is important as it contains rename information which is
> > +    # important. Stores such actions and makes sure they are all there
> > +    # in mresult object. If not, adds them.
> > +    rename_dest_actions = {}
> > +
> >      for f, ((n1, fl1), (n2, fl2)) in pycompat.iteritems(diff):
> >          if n1 and n2:  # file exists on both local and remote side
> >              if f not in ma:
> > @@ -933,12 +940,41 @@ def manifestmerge(
> >                          f, mergestatemod.ACTION_REMOVE, None, b'other deleted',
> >                      )
> >              else:  # file not in ancestor, not in remote
> > -                mresult.addfile(
> > -                    f,
> > -                    mergestatemod.ACTION_KEEP,
> > -                    None,
> > -                    b'ancestor missing, remote missing',
> > -                )
> > +                rename_found = False
> > +                for source, dest in branch_copies1.dirmove.items():
> > +                    if f.startswith(dest):
> > +                        # this directory in which this file is created
> > +                        # is moved from somewhere else
> > +                        # we should check if the source file exists on
> > +                        # remote side
> > +                        sf = source + f[len(dest) :]
> > +                        if sf in m2:
> > +                            rename_found = True
> > +                            if m2[sf] == ma[sf]:
> > +                                mresult.addfile(
> > +                                    f,
> > +                                    mergestatemod.ACTION_KEEP,
> > +                                    None,
> > +                                    b'file is rename dest and rename source'
> > +                                    b' did not change',
> > +                                )
> > +                            else:
> > +                                mresult.addfile(
> > +                                    f,
> > +                                    mergestatemod.ACTION_MERGE,
> > +                                    (f, sf, sf, False, pa.node()),
> > +                                    b'local directory rename - respect move '
> > +                                    b'from %s' % sf,
> > +                                )
> > +                            break
> > +
> > +                if not rename_found:
> > +                    mresult.addfile(
> > +                        f,
> > +                        mergestatemod.ACTION_KEEP,
> > +                        None,
> > +                        b'ancestor missing, remote missing',
> > +                    )
> >
> >          elif n2:  # file exists only on remote side
> >              if f in copied1:
> > @@ -1010,8 +1046,19 @@ def manifestmerge(
> >                          df = branch_copies1.dirmove[d] + f[len(d) :]
> >                          break
> >                  if df is not None and df in m1:
> > +                    # this file is not present in local however its a source
> > +                    # of rename for another file. keep this file not
> > +                    # present in local.
> >                      mresult.addfile(
> > -                        df,
> > +                        f,
> > +                        mergestatemod.ACTION_KEEP_ABSENT,
> > +                        None,
> > +                        b'remote directory is rename source - respect move '
> > +                        b'to %s' % df,
> > +                    )
> > +                    # action for rename destination is important and can be
> > +                    # missed due to some optimizations.
> > +                    rename_dest_actions[df] = (
> >                          mergestatemod.ACTION_MERGE,
> >                          (df, f, f, False, pa.node()),
> >                          b'local directory rename - respect move '
> > @@ -1039,6 +1086,12 @@ def manifestmerge(
> >                      b'local not present, remote unchanged',
> >                  )
> >
> > +    # make sure actions for rename destination are there and if not
> > +    # add them
> > +    for f, data in pycompat.iteritems(rename_dest_actions):
> > +        if mresult.getfile(f) is None:
> > +            mresult.addfile(f, *data)
>
> Are we sure if this "f not in mresult: addfile(f, ACTION_MERGE)" logic is
> not necessary for the "file exists only on local side" case?

Hm, thinking about this now, seems like it should be there for "file
exists only on local side" case. I will investigate and see why
currently we don't have that and add if we should.
>
> I'm not sure, but I feel it's a bit odd that the n1 logic isn't similar
> to the n2 one.

Agreed

> The KEEP action might have to be "weak" in both cases.

Sorry, didn't understand what "weak" means here.
Yuya Nishihara - Sept. 17, 2020, 12:59 p.m.
On Thu, 17 Sep 2020 18:23:21 +0530, Pulkit Goyal wrote:
> On Thu, Sep 17, 2020 at 4:34 PM Yuya Nishihara <yuya@tcha.org> wrote:
> > > +    # make sure actions for rename destination are there and if not
> > > +    # add them
> > > +    for f, data in pycompat.iteritems(rename_dest_actions):
> > > +        if mresult.getfile(f) is None:
> > > +            mresult.addfile(f, *data)
> >
> > Are we sure if this "f not in mresult: addfile(f, ACTION_MERGE)" logic is
> > not necessary for the "file exists only on local side" case?
> 
> Hm, thinking about this now, seems like it should be there for "file
> exists only on local side" case. I will investigate and see why
> currently we don't have that and add if we should.
> >
> > I'm not sure, but I feel it's a bit odd that the n1 logic isn't similar
> > to the n2 one.
> 
> Agreed
> 
> > The KEEP action might have to be "weak" in both cases.
> 
> Sorry, didn't understand what "weak" means here.

I just mean it will be added only if f not in mresult.

Patch

diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -812,6 +812,13 @@  def manifestmerge(
 
     diff = m1.diff(m2, match=matcher)
 
+    # we might miss adding an action for the rename destination file as it
+    # didn't change between m2 and ma and hence was optimized out. However,
+    # the file is important as it contains rename information which is
+    # important. Stores such actions and makes sure they are all there
+    # in mresult object. If not, adds them.
+    rename_dest_actions = {}
+
     for f, ((n1, fl1), (n2, fl2)) in pycompat.iteritems(diff):
         if n1 and n2:  # file exists on both local and remote side
             if f not in ma:
@@ -933,12 +940,41 @@  def manifestmerge(
                         f, mergestatemod.ACTION_REMOVE, None, b'other deleted',
                     )
             else:  # file not in ancestor, not in remote
-                mresult.addfile(
-                    f,
-                    mergestatemod.ACTION_KEEP,
-                    None,
-                    b'ancestor missing, remote missing',
-                )
+                rename_found = False
+                for source, dest in branch_copies1.dirmove.items():
+                    if f.startswith(dest):
+                        # this directory in which this file is created
+                        # is moved from somewhere else
+                        # we should check if the source file exists on
+                        # remote side
+                        sf = source + f[len(dest) :]
+                        if sf in m2:
+                            rename_found = True
+                            if m2[sf] == ma[sf]:
+                                mresult.addfile(
+                                    f,
+                                    mergestatemod.ACTION_KEEP,
+                                    None,
+                                    b'file is rename dest and rename source'
+                                    b' did not change',
+                                )
+                            else:
+                                mresult.addfile(
+                                    f,
+                                    mergestatemod.ACTION_MERGE,
+                                    (f, sf, sf, False, pa.node()),
+                                    b'local directory rename - respect move '
+                                    b'from %s' % sf,
+                                )
+                            break
+
+                if not rename_found:
+                    mresult.addfile(
+                        f,
+                        mergestatemod.ACTION_KEEP,
+                        None,
+                        b'ancestor missing, remote missing',
+                    )
 
         elif n2:  # file exists only on remote side
             if f in copied1:
@@ -1010,8 +1046,19 @@  def manifestmerge(
                         df = branch_copies1.dirmove[d] + f[len(d) :]
                         break
                 if df is not None and df in m1:
+                    # this file is not present in local however its a source
+                    # of rename for another file. keep this file not
+                    # present in local.
                     mresult.addfile(
-                        df,
+                        f,
+                        mergestatemod.ACTION_KEEP_ABSENT,
+                        None,
+                        b'remote directory is rename source - respect move '
+                        b'to %s' % df,
+                    )
+                    # action for rename destination is important and can be
+                    # missed due to some optimizations.
+                    rename_dest_actions[df] = (
                         mergestatemod.ACTION_MERGE,
                         (df, f, f, False, pa.node()),
                         b'local directory rename - respect move '
@@ -1039,6 +1086,12 @@  def manifestmerge(
                     b'local not present, remote unchanged',
                 )
 
+    # make sure actions for rename destination are there and if not
+    # add them
+    for f, data in pycompat.iteritems(rename_dest_actions):
+        if mresult.getfile(f) is None:
+            mresult.addfile(f, *data)
+
     if repo.ui.configbool(b'experimental', b'merge.checkpathconflicts'):
         # If we are merging, look for path conflicts.
         checkpathconflicts(repo, wctx, p2, mresult)