Patchwork [2,of,4] merge: use file context objects instead of acting directly on localrepo

login
register
mail settings
Submitter Sean Farley
Date Aug. 19, 2014, 2:42 a.m.
Message ID <d1a21cc0bc4ba5764a1f.1408408959@1.0.0.127.in-addr.arpa>
Download mbox | patch
Permalink /patch/5491/
State Changes Requested
Headers show

Comments

Sean Farley - Aug. 19, 2014, 2:42 a.m.
# HG changeset patch
# User Sean Farley <sean.michael.farley@gmail.com>
# Date 1406339242 18000
#      Fri Jul 25 20:47:22 2014 -0500
# Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
# Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
merge: use file context objects instead of acting directly on localrepo

Arguably, one might say that there is a regression by removing the function
caching, but that was slowed down by disk I/O anyway so there will be a net
speed up when acting directly on memory.
Pierre-Yves David - Aug. 19, 2014, 6:02 a.m.
On 08/18/2014 07:42 PM, Sean Farley wrote:
> # HG changeset patch
> # User Sean Farley <sean.michael.farley@gmail.com>
> # Date 1406339242 18000
> #      Fri Jul 25 20:47:22 2014 -0500
> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
> merge: use file context objects instead of acting directly on localrepo

Can you remind me what is the motivation behind have those methods 
(write/removes) on the filectx instead of the changectx itself?

Given the potential cycle between filectx and changectx, I'm not sure 
the "methods on filectx" way can fly easily.

Small nits: We could also use one function update per patch for 
readability purpose.
Sean Farley - Aug. 19, 2014, 6:08 a.m.
Pierre-Yves David writes:

> On 08/18/2014 07:42 PM, Sean Farley wrote:
>> # HG changeset patch
>> # User Sean Farley <sean.michael.farley@gmail.com>
>> # Date 1406339242 18000
>> #      Fri Jul 25 20:47:22 2014 -0500
>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>> merge: use file context objects instead of acting directly on localrepo
>
> Can you remind me what is the motivation behind have those methods 
> (write/removes) on the filectx instead of the changectx itself?

Merge (and friends) need a way to change the data for a filectx.
Currently, this is done by actually writing merged data onto disk, then
having localrepo.commit read that off of disk.

If we want to do a merge in memory, then we need a way to change this
data in a filectx. That's why the previous patch needs to cache the
filectx in __getitem__ (so that the changed data is not thrown away).

This kind of refactoring will greatly help alleviate the pain of
localrepo.commit having to read off of disk directly (but that's another
patch series or two away).

> Given the potential cycle between filectx and changectx, I'm not sure 
> the "methods on filectx" way can fly easily.

I don't quite know what you're getting at but I see no other semantic
way than something like:

ctx[filename].data = newdata
Pierre-Yves David - Aug. 19, 2014, 6:29 a.m.
So, reply to patch 1 failed to be sent so you got reply to patch 2 
without reply to patch one, lacking some useful context.

On 08/18/2014 11:08 PM, Sean Farley wrote:
>
> Pierre-Yves David writes:
>
>> On 08/18/2014 07:42 PM, Sean Farley wrote:
>>> # HG changeset patch
>>> # User Sean Farley <sean.michael.farley@gmail.com>
>>> # Date 1406339242 18000
>>> #      Fri Jul 25 20:47:22 2014 -0500
>>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>>> merge: use file context objects instead of acting directly on localrepo
>>
>> Can you remind me what is the motivation behind have those methods
>> (write/removes) on the filectx instead of the changectx itself?
>
> Merge (and friends) need a way to change the data for a filectx.
> Currently, this is done by actually writing merged data onto disk, then
> having localrepo.commit read that off of disk.

Why does it especially needs to be from a filectx? Why can't it be from 
a changectx?

> If we want to do a merge in memory, then we need a way to change this
> data in a filectx. That's why the previous patch needs to cache the
> filectx in __getitem__ (so that the changed data is not thrown away).

I know we need a was to store the data. but why can't this be done in 
changectx?

> This kind of refactoring will greatly help alleviate the pain of
> localrepo.commit having to read off of disk directly (but that's another
> patch series or two away).
>
>> Given the potential cycle between filectx and changectx, I'm not sure
>> the "methods on filectx" way can fly easily.
>
> I don't quite know what you're getting at

The cycle issue is covered in the reply to patch 1. It should have made 
it to the rest interweb by now. (sorry about that)

> but I see no other semantic
> way than something like:
>
> ctx[filename].data = newdata

What about:

   ctx.setcontent(filename, content)
Sean Farley - Aug. 20, 2014, 7:56 p.m.
Pierre-Yves David writes:

> So, reply to patch 1 failed to be sent so you got reply to patch 2 
> without reply to patch one, lacking some useful context.

Ah, yeah, that did help :-)

> On 08/18/2014 11:08 PM, Sean Farley wrote:
>>
>> Pierre-Yves David writes:
>>
>>> On 08/18/2014 07:42 PM, Sean Farley wrote:
>>>> # HG changeset patch
>>>> # User Sean Farley <sean.michael.farley@gmail.com>
>>>> # Date 1406339242 18000
>>>> #      Fri Jul 25 20:47:22 2014 -0500
>>>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>>>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>>>> merge: use file context objects instead of acting directly on localrepo
>>>
>>> Can you remind me what is the motivation behind have those methods
>>> (write/removes) on the filectx instead of the changectx itself?
>>
>> Merge (and friends) need a way to change the data for a filectx.
>> Currently, this is done by actually writing merged data onto disk, then
>> having localrepo.commit read that off of disk.
>
> Why does it especially needs to be from a filectx? Why can't it be from 
> a changectx?

Because filectxs hold the data of the file.

>> If we want to do a merge in memory, then we need a way to change this
>> data in a filectx. That's why the previous patch needs to cache the
>> filectx in __getitem__ (so that the changed data is not thrown away).
>
> I know we need a was to store the data. but why can't this be done in 
> changectx?

Because filectxs hold the data of the file.

>> This kind of refactoring will greatly help alleviate the pain of
>> localrepo.commit having to read off of disk directly (but that's another
>> patch series or two away).
>>
>>> Given the potential cycle between filectx and changectx, I'm not sure
>>> the "methods on filectx" way can fly easily.
>>
>> I don't quite know what you're getting at
>
> The cycle issue is covered in the reply to patch 1. It should have made 
> it to the rest interweb by now. (sorry about that)
>
>> but I see no other semantic
>> way than something like:
>>
>> ctx[filename].data = newdata
>
> What about:
>
>    ctx.setcontent(filename, content)

Because filectxs hold the data of the file.
Pierre-Yves David - Aug. 20, 2014, 8:07 p.m.
On 08/20/2014 12:56 PM, Sean Farley wrote:
>
> Pierre-Yves David writes:
>
>> So, reply to patch 1 failed to be sent so you got reply to patch 2
>> without reply to patch one, lacking some useful context.
>
> Ah, yeah, that did help :-)
>
>> On 08/18/2014 11:08 PM, Sean Farley wrote:
>>>
>>> Pierre-Yves David writes:
>>>
>>>> On 08/18/2014 07:42 PM, Sean Farley wrote:
>>>>> # HG changeset patch
>>>>> # User Sean Farley <sean.michael.farley@gmail.com>
>>>>> # Date 1406339242 18000
>>>>> #      Fri Jul 25 20:47:22 2014 -0500
>>>>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>>>>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>>>>> merge: use file context objects instead of acting directly on localrepo
>>>>
>>>> Can you remind me what is the motivation behind have those methods
>>>> (write/removes) on the filectx instead of the changectx itself?
>>>
>>> Merge (and friends) need a way to change the data for a filectx.
>>> Currently, this is done by actually writing merged data onto disk, then
>>> having localrepo.commit read that off of disk.
>>
>> Why does it especially needs to be from a filectx? Why can't it be from
>> a changectx?
>
> Because filectxs hold the data of the file.
>
>>> If we want to do a merge in memory, then we need a way to change this
>>> data in a filectx. That's why the previous patch needs to cache the
>>> filectx in __getitem__ (so that the changed data is not thrown away).
>>
>> I know we need a was to store the data. but why can't this be done in
>> changectx?
>
> Because filectxs hold the data of the file.

Let me rephrase:

Why do you insist for having the filectxs holding the data?

The changectx could be holding the data too. What is the argument to get 
put them on the filectxs ?
Sean Farley - Aug. 20, 2014, 8:16 p.m.
Pierre-Yves David writes:

> On 08/20/2014 12:56 PM, Sean Farley wrote:
>>
>> Pierre-Yves David writes:
>>
>>> So, reply to patch 1 failed to be sent so you got reply to patch 2
>>> without reply to patch one, lacking some useful context.
>>
>> Ah, yeah, that did help :-)
>>
>>> On 08/18/2014 11:08 PM, Sean Farley wrote:
>>>>
>>>> Pierre-Yves David writes:
>>>>
>>>>> On 08/18/2014 07:42 PM, Sean Farley wrote:
>>>>>> # HG changeset patch
>>>>>> # User Sean Farley <sean.michael.farley@gmail.com>
>>>>>> # Date 1406339242 18000
>>>>>> #      Fri Jul 25 20:47:22 2014 -0500
>>>>>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>>>>>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>>>>>> merge: use file context objects instead of acting directly on localrepo
>>>>>
>>>>> Can you remind me what is the motivation behind have those methods
>>>>> (write/removes) on the filectx instead of the changectx itself?
>>>>
>>>> Merge (and friends) need a way to change the data for a filectx.
>>>> Currently, this is done by actually writing merged data onto disk, then
>>>> having localrepo.commit read that off of disk.
>>>
>>> Why does it especially needs to be from a filectx? Why can't it be from
>>> a changectx?
>>
>> Because filectxs hold the data of the file.
>>
>>>> If we want to do a merge in memory, then we need a way to change this
>>>> data in a filectx. That's why the previous patch needs to cache the
>>>> filectx in __getitem__ (so that the changed data is not thrown away).
>>>
>>> I know we need a was to store the data. but why can't this be done in
>>> changectx?
>>
>> Because filectxs hold the data of the file.
>
> Let me rephrase:
>
> Why do you insist for having the filectxs holding the data?
>
> The changectx could be holding the data too. What is the argument to get 
> put them on the filectxs ?

This is the way filectx and context is designed. It makes the most
sense. If you're going to put file data into contexts, then why not just
put it all into localrepo? You could also have a convenient way to
access it:

repo.getdata(ctx, filename)
Pierre-Yves David - Aug. 20, 2014, 8:30 p.m.
On 08/20/2014 01:16 PM, Sean Farley wrote:
>
> Pierre-Yves David writes:
>
>> On 08/20/2014 12:56 PM, Sean Farley wrote:
>>>
>>> Pierre-Yves David writes:
>>>
>>>> So, reply to patch 1 failed to be sent so you got reply to patch 2
>>>> without reply to patch one, lacking some useful context.
>>>
>>> Ah, yeah, that did help :-)
>>>
>>>> On 08/18/2014 11:08 PM, Sean Farley wrote:
>>>>>
>>>>> Pierre-Yves David writes:
>>>>>
>>>>>> On 08/18/2014 07:42 PM, Sean Farley wrote:
>>>>>>> # HG changeset patch
>>>>>>> # User Sean Farley <sean.michael.farley@gmail.com>
>>>>>>> # Date 1406339242 18000
>>>>>>> #      Fri Jul 25 20:47:22 2014 -0500
>>>>>>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>>>>>>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>>>>>>> merge: use file context objects instead of acting directly on localrepo
>>>>>>
>>>>>> Can you remind me what is the motivation behind have those methods
>>>>>> (write/removes) on the filectx instead of the changectx itself?
>>>>>
>>>>> Merge (and friends) need a way to change the data for a filectx.
>>>>> Currently, this is done by actually writing merged data onto disk, then
>>>>> having localrepo.commit read that off of disk.
>>>>
>>>> Why does it especially needs to be from a filectx? Why can't it be from
>>>> a changectx?
>>>
>>> Because filectxs hold the data of the file.
>>>
>>>>> If we want to do a merge in memory, then we need a way to change this
>>>>> data in a filectx. That's why the previous patch needs to cache the
>>>>> filectx in __getitem__ (so that the changed data is not thrown away).
>>>>
>>>> I know we need a was to store the data. but why can't this be done in
>>>> changectx?
>>>
>>> Because filectxs hold the data of the file.
>>
>> Let me rephrase:
>>
>> Why do you insist for having the filectxs holding the data?
>>
>> The changectx could be holding the data too. What is the argument to get
>> put them on the filectxs ?
>
> This is the way filectx and context is designed. It makes the most
> sense.

So, data are stored in filelog. And filectx is just a nice overlay to 
access it. The filelog is not storing the data itself.

> If you're going to put file data into contexts, then why not just
> put it all into localrepo? You could also have a convenient way to
> access it:
>
> repo.getdata(ctx, filename)

We are talking about setting additional, in memory data here. They need 
to be stored somewhere and having them in the memctx would solve the 
cycle probleme (as the memctx need access to this data and cannot 
reference filectx).

(Finding the appropriate name for the fallacy you just perpetrated is 
left as an exercise for the reader)
Sean Farley - Aug. 21, 2014, 1:59 a.m.
Pierre-Yves David writes:

> On 08/20/2014 01:16 PM, Sean Farley wrote:
>>
>> Pierre-Yves David writes:
>>
>>> On 08/20/2014 12:56 PM, Sean Farley wrote:
>>>>
>>>> Pierre-Yves David writes:
>>>>
>>>>> So, reply to patch 1 failed to be sent so you got reply to patch 2
>>>>> without reply to patch one, lacking some useful context.
>>>>
>>>> Ah, yeah, that did help :-)
>>>>
>>>>> On 08/18/2014 11:08 PM, Sean Farley wrote:
>>>>>>
>>>>>> Pierre-Yves David writes:
>>>>>>
>>>>>>> On 08/18/2014 07:42 PM, Sean Farley wrote:
>>>>>>>> # HG changeset patch
>>>>>>>> # User Sean Farley <sean.michael.farley@gmail.com>
>>>>>>>> # Date 1406339242 18000
>>>>>>>> #      Fri Jul 25 20:47:22 2014 -0500
>>>>>>>> # Node ID d1a21cc0bc4ba5764a1f8760fd041a91939b4d00
>>>>>>>> # Parent  594d00e49cea6cd2ebb92c0570e7502f8797e232
>>>>>>>> merge: use file context objects instead of acting directly on localrepo
>>>>>>>
>>>>>>> Can you remind me what is the motivation behind have those methods
>>>>>>> (write/removes) on the filectx instead of the changectx itself?
>>>>>>
>>>>>> Merge (and friends) need a way to change the data for a filectx.
>>>>>> Currently, this is done by actually writing merged data onto disk, then
>>>>>> having localrepo.commit read that off of disk.
>>>>>
>>>>> Why does it especially needs to be from a filectx? Why can't it be from
>>>>> a changectx?
>>>>
>>>> Because filectxs hold the data of the file.
>>>>
>>>>>> If we want to do a merge in memory, then we need a way to change this
>>>>>> data in a filectx. That's why the previous patch needs to cache the
>>>>>> filectx in __getitem__ (so that the changed data is not thrown away).
>>>>>
>>>>> I know we need a was to store the data. but why can't this be done in
>>>>> changectx?
>>>>
>>>> Because filectxs hold the data of the file.
>>>
>>> Let me rephrase:
>>>
>>> Why do you insist for having the filectxs holding the data?
>>>
>>> The changectx could be holding the data too. What is the argument to get
>>> put them on the filectxs ?
>>
>> This is the way filectx and context is designed. It makes the most
>> sense.
>
> So, data are stored in filelog. And filectx is just a nice overlay to 
> access it. The filelog is not storing the data itself.
>
>> If you're going to put file data into contexts, then why not just
>> put it all into localrepo? You could also have a convenient way to
>> access it:
>>
>> repo.getdata(ctx, filename)
>
> We are talking about setting additional, in memory data here. They need 
> to be stored somewhere and having them in the memctx would solve the 
> cycle probleme (as the memctx need access to this data and cannot 
> reference filectx).

I didn't understand why you were so concerned about object cycles until
I read about garbage collection. I'll rework this series to store the
data in memctx and access it through memfilectx.

> (Finding the appropriate name for the fallacy you just perpetrated is 
> left as an exercise for the reader)

It was supposed to be a joke about adding new methods to localrepo :-P

Patch

diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -553,53 +553,49 @@  def manifestmerge(repo, wctx, p2, pa, br
         else:
             _checkcollision(repo, m1, actions)
 
     return actions
 
-def batchremove(repo, actions):
+def batchremove(repo, wctx, actions):
     """apply removes to the working directory
 
     yields tuples for progress updates
     """
     verbose = repo.ui.verbose
-    unlink = util.unlinkpath
-    wjoin = repo.wjoin
     audit = repo.wopener.audit
     i = 0
     for f, args, msg in actions:
         repo.ui.debug(" %s: %s -> r\n" % (f, msg))
         if verbose:
             repo.ui.note(_("removing %s\n") % f)
         audit(f)
         try:
-            unlink(wjoin(f), ignoremissing=True)
+            wctx[f].remove(ignoremissing=True)
         except OSError, inst:
             repo.ui.warn(_("update failed to remove %s: %s!\n") %
                          (f, inst.strerror))
         if i == 100:
             yield i, f
             i = 0
         i += 1
     if i > 0:
         yield i, f
 
-def batchget(repo, mctx, actions):
+def batchget(repo, wctx, mctx, actions):
     """apply gets to the working directory
 
     mctx is the context to get from
 
     yields tuples for progress updates
     """
     verbose = repo.ui.verbose
-    fctx = mctx.filectx
-    wwrite = repo.wwrite
     i = 0
     for f, args, msg in actions:
         repo.ui.debug(" %s: %s -> g\n" % (f, msg))
         if verbose:
             repo.ui.note(_("getting %s\n") % f)
-        wwrite(f, fctx(f).data(), args[0])
+        wctx[f].write(mctx[f].data(), args[0])
         if i == 100:
             yield i, f
             i = 0
         i += 1
     if i > 0:
@@ -656,18 +652,18 @@  def applyupdates(repo, actions, wctx, mc
     if [a for a in actions['r'] if a[0] == '.hgsubstate']:
         subrepo.submerge(repo, wctx, mctx, wctx, overwrite)
 
     # remove in parallel (must come first)
     z = 0
-    prog = worker.worker(repo.ui, 0.001, batchremove, (repo,), actions['r'])
+    prog = worker.worker(repo.ui, 0.001, batchremove, (repo, wctx), actions['r'])
     for i, item in prog:
         z += i
         progress(_updating, z, item=item, total=numupdates, unit=_files)
     removed = len(actions['r'])
 
     # get in parallel
-    prog = worker.worker(repo.ui, 0.001, batchget, (repo, mctx), actions['g'])
+    prog = worker.worker(repo.ui, 0.001, batchget, (repo, wctx, mctx), actions['g'])
     for i, item in prog:
         z += i
         progress(_updating, z, item=item, total=numupdates, unit=_files)
     updated = len(actions['g'])
 
@@ -717,22 +713,22 @@  def applyupdates(repo, actions, wctx, mc
         z += 1
         progress(_updating, z, item=f, total=numupdates, unit=_files)
         f0, flags = args
         repo.ui.note(_("moving %s to %s\n") % (f0, f))
         audit(f)
-        repo.wwrite(f, wctx.filectx(f0).data(), flags)
-        util.unlinkpath(repo.wjoin(f0))
+        wctx[f].write(wctx.filectx(f0).data(), flags)
+        wctx[f0].remove()
         updated += 1
 
     # local directory rename, get
     for f, args, msg in actions['dg']:
         repo.ui.debug(" %s: %s -> dg\n" % (f, msg))
         z += 1
         progress(_updating, z, item=f, total=numupdates, unit=_files)
         f0, flags = args
         repo.ui.note(_("getting %s to %s\n") % (f0, f))
-        repo.wwrite(f, mctx.filectx(f0).data(), flags)
+        wctx[f].write(mctx.filectx(f0).data(), flags)
         updated += 1
 
     # divergent renames
     for f, args, msg in actions['dr']:
         repo.ui.debug(" %s: %s -> dr\n" % (f, msg))