Patchwork [2,of,6] filemerge: turn filemerge into a generator

login
register
mail settings
Submitter Siddharth Agarwal
Date Oct. 9, 2015, 6:46 p.m.
Message ID <66ed0fddfd25ea6070c5.1444416369@dev6666.prn1.facebook.com>
Download mbox | patch
Permalink /patch/10923/
State Accepted
Headers show

Comments

Siddharth Agarwal - Oct. 9, 2015, 6:46 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1444371736 25200
#      Thu Oct 08 23:22:16 2015 -0700
# Node ID 66ed0fddfd25ea6070c578d0c58e4d198e9236d1
# Parent  f32ecec8697102c6daeda1f3a98711a8305f928d
filemerge: turn filemerge into a generator

As described in the previous patch, we separate out premerge and merge into
separate steps. We pause the file merge operation in between the premerge and
merge steps while retaining all the state.

We don't take advantage of this yet in the merge state, but will soon.

Why a generator and not an explicit state machine?

- The filemerge operation is a linear state machine with no retries or anything
  of that sort. That is the sort of operation that a generator is best at
  expressing.

- There is a huge amount of state shared between steps, and a lot of state is
  initialized in one step before being used in the next. A state machine would
  have to either have all those initializations as explicit steps (verbose), use
  memoized getters (slightly less verbose but harder to follow execution flow),
  or just have earlier states dump fields onto the object that later states
  access (error-prone).

  In contrast, the generator allows us to follow execution flow linearly, with
  a single 'yield' statement inserted in between.

- The state machine would have to have an explicit cleanup state. That's much
  more naturally handled by the try/finally in the generator.

I actually did try writing an explicit state machine, but found that it was
significantly more complicated, with five total states, and had more
confusing transitions due to the need to capture the cleanup step in the state
machine. It was also much longer (~220 lines vs ~130).

The API for this is kind of crappy, but we'll improve on it in an upcoming
patch.
Matt Mackall - Oct. 11, 2015, 4:38 p.m.
On Fri, 2015-10-09 at 11:46 -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1444371736 25200
> #      Thu Oct 08 23:22:16 2015 -0700
> # Node ID 66ed0fddfd25ea6070c578d0c58e4d198e9236d1
> # Parent  f32ecec8697102c6daeda1f3a98711a8305f928d
> filemerge: turn filemerge into a generator

> As described in the previous patch, we separate out premerge and merge into
> separate steps.

Recently there was a BTS discussion about aborting huge merges[1]. My
contention there is that we actually have to do ALL the premerges before
ALL the merges. This gives us two things: a maximally complete state
before we involve the user and a notion of how many actual merges
they'll have to do (which we can report and/or use to just drop them to
resolve if they're over some threshold).

When you were talking about separate steps, I'd actually assumed this is
what you were aiming for. But it doesn't seem to be?

Let's instead just pass a flag that says premerge=True|False|None (= do
both) to filemerge, pass it through mergestate.resolve and have two
loops in applyupdates. We'll redo some work, yes, but in practice most
of the premerges should be successful so we'll never do the second pass.

Also, I think mergectx is no good as a name. If it's not in context.py,
it probably shouldn't be a "ctx". But it shouldn't be needed if we do
the above.

[1] Which apparently only like two people read and I can't even find
now, which is why you should never use the BTS for design discussion.
Siddharth Agarwal - Oct. 11, 2015, 4:45 p.m.
On Sunday, October 11, 2015, Matt Mackall <mpm@selenic.com> wrote:

> On Fri, 2015-10-09 at 11:46 -0700, Siddharth Agarwal wrote:
> > # HG changeset patch
> > # User Siddharth Agarwal <sid0@fb.com <javascript:;>>
> > # Date 1444371736 25200
> > #      Thu Oct 08 23:22:16 2015 -0700
> > # Node ID 66ed0fddfd25ea6070c578d0c58e4d198e9236d1
> > # Parent  f32ecec8697102c6daeda1f3a98711a8305f928d
> > filemerge: turn filemerge into a generator
>
> > As described in the previous patch, we separate out premerge and merge
> into
> > separate steps.
>
> Recently there was a BTS discussion about aborting huge merges[1]. My
> contention there is that we actually have to do ALL the premerges before
> ALL the merges.



Yeah, that's exactly what this series does.


> This gives us two things: a maximally complete state
> before we involve the user and a notion of how many actual merges
> they'll have to do (which we can report and/or use to just drop them to
> resolve if they're over some threshold).
>
> When you were talking about separate steps, I'd actually assumed this is
> what you were aiming for. But it doesn't seem to be?


The overall goal is indeed that -- this patch just prepares for that.


>
> Let's instead just pass a flag that says premerge=True|False|None (= do
> both) to filemerge, pass it through mergestate.resolve and have two
> loops in applyupdates.



 The two loops in applyupdates is indeed something I'm doing in a future
patch. The premerge true vs false thing was actually the way I was first
going to do it, but after looking at the code a bit I decided that the
generator approach was a little cleaner. Not sure how strongly you feel
about this.




> We'll redo some work, yes, but in practice most
> of the premerges should be successful so we'll never do the second pass.


> Also, I think mergectx is no good as a name. If it's not in context.py,
> it probably shouldn't be a "ctx". But it shouldn't be needed if we do
> the above.


I can change the name in a followup.


>
> [1] Which apparently only like two people read and I can't even find
> now, which is why you should never use the BTS for design discussion.
>
> --
> Mathematics is the supreme nostalgia of our time.
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com <javascript:;>
> https://selenic.com/mailman/listinfo/mercurial-devel
>
Siddharth Agarwal - Oct. 11, 2015, 4:49 p.m.
On Sunday, October 11, 2015, Siddharth Agarwal <sid@less-broken.com> wrote:

>
>
> On Sunday, October 11, 2015, Matt Mackall <mpm@selenic.com
> <javascript:_e(%7B%7D,'cvml','mpm@selenic.com');>> wrote:
>
>> On Fri, 2015-10-09 at 11:46 -0700, Siddharth Agarwal wrote:
>> > # HG changeset patch
>> > # User Siddharth Agarwal <sid0@fb.com>
>> > # Date 1444371736 25200
>> > #      Thu Oct 08 23:22:16 2015 -0700
>> > # Node ID 66ed0fddfd25ea6070c578d0c58e4d198e9236d1
>> > # Parent  f32ecec8697102c6daeda1f3a98711a8305f928d
>> > filemerge: turn filemerge into a generator
>>
>> > As described in the previous patch, we separate out premerge and merge
>> into
>> > separate steps.
>>
>> Recently there was a BTS discussion about aborting huge merges[1]. My
>> contention there is that we actually have to do ALL the premerges before
>> ALL the merges.
>
>
>
> Yeah, that's exactly what this series does.
>

More precisely, that will be the state of the world at the end of the next
series that's in flight, IIRC.


>
>
>> This gives us two things: a maximally complete state
>> before we involve the user and a notion of how many actual merges
>> they'll have to do (which we can report and/or use to just drop them to
>> resolve if they're over some threshold).
>>
>> When you were talking about separate steps, I'd actually assumed this is
>> what you were aiming for. But it doesn't seem to be?
>
>
> The overall goal is indeed that -- this patch just prepares for that.
>
>
>>
>> Let's instead just pass a flag that says premerge=True|False|None (= do
>> both) to filemerge, pass it through mergestate.resolve and have two
>> loops in applyupdates.
>
>
>
>  The two loops in applyupdates is indeed something I'm doing in a future
> patch. The premerge true vs false thing was actually the way I was first
> going to do it, but after looking at the code a bit I decided that the
> generator approach was a little cleaner. Not sure how strongly you feel
> about this.
>
>
>
>
>> We'll redo some work, yes, but in practice most
>> of the premerges should be successful so we'll never do the second pass.
>
>
>> Also, I think mergectx is no good as a name. If it's not in context.py,
>> it probably shouldn't be a "ctx". But it shouldn't be needed if we do
>> the above.
>
>
> I can change the name in a followup.
>
>
>>
>> [1] Which apparently only like two people read and I can't even find
>> now, which is why you should never use the BTS for design discussion.
>>
>> --
>> Mathematics is the supreme nostalgia of our time.
>>
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> https://selenic.com/mailman/listinfo/mercurial-devel
>>
>
Matt Mackall - Oct. 11, 2015, 5:48 p.m.
On Sun, 2015-10-11 at 09:45 -0700, Siddharth Agarwal wrote:
> On Sunday, October 11, 2015, Matt Mackall <mpm@selenic.com> wrote:
> 
> > On Fri, 2015-10-09 at 11:46 -0700, Siddharth Agarwal wrote:
> > > # HG changeset patch
> > > # User Siddharth Agarwal <sid0@fb.com <javascript:;>>
> > > # Date 1444371736 25200
> > > #      Thu Oct 08 23:22:16 2015 -0700
> > > # Node ID 66ed0fddfd25ea6070c578d0c58e4d198e9236d1
> > > # Parent  f32ecec8697102c6daeda1f3a98711a8305f928d
> > > filemerge: turn filemerge into a generator
> >
> > > As described in the previous patch, we separate out premerge and merge
> > into
> > > separate steps.
> >
> > Recently there was a BTS discussion about aborting huge merges[1]. My
> > contention there is that we actually have to do ALL the premerges before
> > ALL the merges.
> 
> 
> 
> Yeah, that's exactly what this series does.
> 
> 
> > This gives us two things: a maximally complete state
> > before we involve the user and a notion of how many actual merges
> > they'll have to do (which we can report and/or use to just drop them to
> > resolve if they're over some threshold).
> >
> > When you were talking about separate steps, I'd actually assumed this is
> > what you were aiming for. But it doesn't seem to be?
> 
> 
> The overall goal is indeed that -- this patch just prepares for that.
> 
> 
> >
> > Let's instead just pass a flag that says premerge=True|False|None (= do
> > both) to filemerge, pass it through mergestate.resolve and have two
> > loops in applyupdates.
> 
> 
> 
>  The two loops in applyupdates is indeed something I'm doing in a future
> patch. The premerge true vs false thing was actually the way I was first
> going to do it, but after looking at the code a bit I decided that the
> generator approach was a little cleaner. Not sure how strongly you feel
> about this.

So the plan is to stash the generator objects in a list for later
completion? What resources are held from one call to the other? Temp
files? Filectxs and associated filelogs and their caches? For a
disastrous merge, this could exhaust memory or /tmp for some people.

We should generally assume we've only got enough memory to work on one
file at a time. This maximizes the size of the largest file people can
work on.
Siddharth Agarwal - Oct. 11, 2015, 5:57 p.m.
On 10/11/15 10:48, Matt Mackall wrote:
> So the plan is to stash the generator objects in a list for later 
> completion? What resources are held from one call to the other? Temp 
> files? Filectxs and associated filelogs and their caches? For a 
> disastrous merge, this could exhaust memory or /tmp for some people.

Yeah, both of those. The memory use is a good point and one I hadn't 
considered -- I guess I'll have to rework this a bit :)

Please drop patch 2 onwards, and the set of 5 I sent after that. I'll 
clean all this up -- shouldn't be too much work -- and send a V2.

- Siddharth


>
> We should generally assume we've only got enough memory to work on one
> file at a time. This maximizes the size of the largest file people can
> work on.
>
Pierre-Yves David - Oct. 11, 2015, 7:38 p.m.
On 10/11/2015 09:38 AM, Matt Mackall wrote:
> On Fri, 2015-10-09 at 11:46 -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0@fb.com>
>> # Date 1444371736 25200
>> #      Thu Oct 08 23:22:16 2015 -0700
>> # Node ID 66ed0fddfd25ea6070c578d0c58e4d198e9236d1
>> # Parent  f32ecec8697102c6daeda1f3a98711a8305f928d
>> filemerge: turn filemerge into a generator
>
>> As described in the previous patch, we separate out premerge and merge into
>> separate steps.
>
> Recently there was a BTS discussion about aborting huge merges[1].
 > [1] Which apparently only like two people read and I can't even find
 > now, which is why you should never use the BTS for design discussion.

https://bz.mercurial-scm.org/show_bug.cgi?id=4841

Here it is. Found it searching for "conflict" I'm adding "merge" to the 
title for good measure.

BTS might not be a great place for slow paced design discussion, but 
I've not found better the mailing list if much more noisy and finding 
old discussion there is a nightmare.

Patch

diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
--- a/hgext/largefiles/overrides.py
+++ b/hgext/largefiles/overrides.py
@@ -539,7 +539,10 @@  def mergerecordupdates(orig, repo, actio
 def overridefilemerge(origfn, repo, mynode, orig, fcd, fco, fca, labels=None):
     if not lfutil.isstandin(orig):
         return origfn(repo, mynode, orig, fcd, fco, fca, labels=labels)
+    return _filemergegen(repo, mynode, orig, fcd, fco, fca, labels=labels)
 
+def _filemergegen(repo, mynode, orig, fcd, fco, fca, labels=None):
+    '''fake generator that looks similar to filemerge.filemerge'''
     ahash = fca.data().strip().lower()
     dhash = fcd.data().strip().lower()
     ohash = fco.data().strip().lower()
@@ -553,7 +556,8 @@  def overridefilemerge(origfn, repo, myno
                (lfutil.splitstandin(orig), ahash, dhash, ohash),
              0) == 1)):
         repo.wwrite(fcd.path(), fco.data(), fco.flags())
-    return 0
+    raise StopIteration(0)
+    yield  # make this a pretend generator
 
 def copiespathcopies(orig, ctx1, ctx2, match=None):
     copies = orig(ctx1, ctx2, match=match)
diff --git a/mercurial/filemerge.py b/mercurial/filemerge.py
--- a/mercurial/filemerge.py
+++ b/mercurial/filemerge.py
@@ -438,6 +438,16 @@  def _formatlabels(repo, fcd, fco, fca, l
 def filemerge(repo, mynode, orig, fcd, fco, fca, labels=None):
     """perform a 3-way merge in the working directory
 
+    This is a coroutine with the following protocol:
+    - The first time this yields, a 'premerge' step has happened.
+    - On continuing it, a 'merge' step will be performed and the iteration will
+      complete.
+    - This may create temporary files. These files will be cleaned up when
+      close() is called on the coroutine.
+
+    A StopIteration exception may be thrown at any time. The return value of the
+    function is args[0] of the exception.
+
     mynode = parent node before merge
     orig = original local filename before merge
     fco = other file context
@@ -456,7 +466,7 @@  def filemerge(repo, mynode, orig, fcd, f
             return name
 
         if not fco.cmp(fcd): # files identical?
-            return None
+            raise StopIteration(None)
 
         ui = repo.ui
         fd = fcd.path()
@@ -483,7 +493,8 @@  def filemerge(repo, mynode, orig, fcd, f
         toolconf = tool, toolpath, binary, symlink
 
         if mergetype == nomerge:
-            return func(repo, mynode, orig, fcd, fco, fca, toolconf)
+            r = func(repo, mynode, orig, fcd, fco, fca, toolconf)
+            raise StopIteration(r)
 
         if orig != fco.path():
             ui.status(_("merging %s and %s to %s\n") % (orig, fco.path(), fd))
@@ -496,7 +507,7 @@  def filemerge(repo, mynode, orig, fcd, f
                                      toolconf):
             if onfailure:
                 ui.warn(onfailure % fd)
-            return 1
+            raise StopIteration(1)
 
         a = repo.wjoin(fd)
         b = temp("base", fca)
@@ -516,6 +527,8 @@  def filemerge(repo, mynode, orig, fcd, f
         if mergetype == fullmerge:
             r = _premerge(repo, toolconf, files, labels=labels)
 
+        yield
+
         if not r:  # premerge successfully merged the file
             needcheck = False
         else:
@@ -529,7 +542,8 @@  def filemerge(repo, mynode, orig, fcd, f
             if onfailure:
                 ui.warn(onfailure % fd)
 
-        return r
+        raise StopIteration(r)
+
     finally:
         if not r:
             util.unlink(back)
diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -309,8 +309,14 @@  class mergestate(object):
         f = self._repo.vfs('merge/' + hash)
         self._repo.wwrite(dfile, f.read(), flags)
         f.close()
-        r = filemerge.filemerge(self._repo, self._local, lfile, fcd, fco, fca,
-                                labels=labels)
+        mergecr = filemerge.filemerge(self._repo, self._local, lfile, fcd, fco,
+                                      fca, labels=labels)
+        try:
+            next(mergecr)  # premerge
+            next(mergecr)  # merge
+        except StopIteration as e:
+            r = e.args[0]
+
         if r is None:
             # no real conflict
             del self._state[dfile]