Patchwork [1,of,8] largefiles: replace "_isXXXXing" attributes with "_lfautocommit"

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Sept. 9, 2014, 6:18 p.m.
Message ID <a0d96709737b2fb101d2.1410286725@feefifofum>
Download mbox | patch
Permalink /patch/5747/
State Changes Requested
Headers show

Comments

Katsunori FUJIWARA - Sept. 9, 2014, 6:18 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1410286453 -32400
#      Wed Sep 10 03:14:13 2014 +0900
# Node ID a0d96709737b2fb101d2a937a54e044aefadd7e8
# Parent  5c153c69fdb28ff5b1bf6578e5f07c50bf25833c
largefiles: replace "_isXXXXing" attributes with "_lfautocommit"

Before this patch, Boolean attribute "_isrebasing" and
"_istransplanting" are used to indicate whether special care for
automated committing/updating is needed or not. But it is not
extensible way.

For example, commands below will also need same special care for
largefiles ready-ness, but adding "_isXXXXing" attribute for each of
them is too complicated to maintain.

  - backout
  - graft
  - import
  - histedit
  - mq (qpush, qpop)

In addition to it, individual "_isXXXXing" attributes prevent from
introducing the framework to handle special care for automated
committing/updating in the unified way.

This patch replaces "_isXXXXing" attributes with "_lfautocommit".

"_lfautocommit" attribute is the command name in progress or
False. This is safe enough, because the scope of automated
committing/updating for each commands shouldn't overlap with each
other.
Mads Kiilerich - Sept. 22, 2014, 9:14 a.m.
Hi

I like that this patch series improve largefiles to handle more corner 
cases correctly. Nice work!

I do not so much like increasing amount of hacks we have to use. It is 
not like I feel we finally got it right and now can start cleaning up 
and simplifying the code. Instead, in order to take the next steps, this 
series has to introduce more complexity and more callbacks in core 
Mercurial.

I wonder if we should try a different approach. As you point out, it is 
almost impossible to do the right thing in the internal commit functions 
- they don't have enough context. One alternative could be to hook in at 
the very low level, updating the standin files whenever they are read 
and propagate their change whenever they are written. Another approach 
would be to do all synchronization before/after calling in to core 
Mercurial and let the core pretty much treat standin files like any 
other file. I have played around with that approach and posted a proof 
of concept.

/Mads
Sean Farley - Sept. 22, 2014, 5:02 p.m.
Mads Kiilerich writes:

> I wonder if we should try a different approach. As you point out, it is 
> almost impossible to do the right thing in the internal commit functions 
> - they don't have enough context. One alternative could be to hook in at 
> the very low level, updating the standin files whenever they are read 
> and propagate their change whenever they are written. Another approach 
> would be to do all synchronization before/after calling in to core 
> Mercurial and let the core pretty much treat standin files like any 
> other file. I have played around with that approach and posted a proof 
> of concept.

Just a shot in the dark here, but what about using a custom context
object (that will soon grow a commit method)?
Mads Kiilerich - Sept. 22, 2014, 5:18 p.m.
On 09/22/2014 07:02 PM, Sean Farley wrote:
> Mads Kiilerich writes:
>
>> I wonder if we should try a different approach. As you point out, it is
>> almost impossible to do the right thing in the internal commit functions
>> - they don't have enough context. One alternative could be to hook in at
>> the very low level, updating the standin files whenever they are read
>> and propagate their change whenever they are written. Another approach
>> would be to do all synchronization before/after calling in to core
>> Mercurial and let the core pretty much treat standin files like any
>> other file. I have played around with that approach and posted a proof
>> of concept.
> Just a shot in the dark here, but what about using a custom context
> object (that will soon grow a commit method)?

I doubt that would help.

The problem (or a part of it) is that commands like rebase, transplant 
and histedit do several commits and updates - and that it is quite a bit 
blurry when they should propagate largefiles to standins or the other 
way around. I guess it would require quite a bit of largefile awareness 
in creation/initialization of these context objects to handle that 
"correctly".

/Mads
Sean Farley - Sept. 22, 2014, 5:41 p.m.
Mads Kiilerich writes:

> On 09/22/2014 07:02 PM, Sean Farley wrote:
>> Mads Kiilerich writes:
>>
>>> I wonder if we should try a different approach. As you point out, it is
>>> almost impossible to do the right thing in the internal commit functions
>>> - they don't have enough context. One alternative could be to hook in at
>>> the very low level, updating the standin files whenever they are read
>>> and propagate their change whenever they are written. Another approach
>>> would be to do all synchronization before/after calling in to core
>>> Mercurial and let the core pretty much treat standin files like any
>>> other file. I have played around with that approach and posted a proof
>>> of concept.
>> Just a shot in the dark here, but what about using a custom context
>> object (that will soon grow a commit method)?
>
> I doubt that would help.
>
> The problem (or a part of it) is that commands like rebase, transplant 
> and histedit do several commits and updates - and that it is quite a bit 
> blurry when they should propagate largefiles to standins or the other 
> way around. I guess it would require quite a bit of largefile awareness 
> in creation/initialization of these context objects to handle that 
> "correctly".

Hopefully in the future, those commands will just take arbitrary context
objects (I am the bottleneck in that, though).
Pierre-Yves David - Sept. 24, 2014, midnight
On 09/22/2014 02:14 AM, Mads Kiilerich wrote:
> Hi
>
> I like that this patch series improve largefiles to handle more corner
> cases correctly. Nice work!

So what is the plan for this series, any chance it get accepted as is or 
should I drop it from patchwork?
Katsunori FUJIWARA - Sept. 24, 2014, 1:11 p.m.
On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote:
> Hi
> 
> I like that this patch series improve largefiles to handle more corner 
> cases correctly. Nice work!
> 
> I do not so much like increasing amount of hacks we have to use. It is 
> not like I feel we finally got it right and now can start cleaning up 
> and simplifying the code. Instead, in order to take the next steps, this 
> series has to introduce more complexity and more callbacks in core 
> Mercurial.
> 
> I wonder if we should try a different approach. As you point out, it is 
> almost impossible to do the right thing in the internal commit functions 
> - they don't have enough context. One alternative could be to hook in at 
> the very low level, updating the standin files whenever they are read 
> and propagate their change whenever they are written. Another approach 
> would be to do all synchronization before/after calling in to core 
> Mercurial and let the core pretty much treat standin files like any 
> other file. I have played around with that approach and posted a proof 
> of concept.
> 
> /Mads

Thank you for your comments, and sorry for late response.

Your approach posted as PoC looks simple and good !

But on the other hand, I'm also worry about below for it.

- performance impact on large scale repositories

  Examination of updating for ALL standins after original command
  logic may have serious performance impact on large scale
  repositories, because it may imply file I/O on many standins.

  In many cases, updated standins can be known exactly, and deep
  hooking can avoid such performance impact (even though revert,
  update and merge cases are not yet optimized :-P)

    - import       => scmutil.marktouched with filelist
    - revert       => cmdutil._performrevert with actions
    - update/merge => merge.applyupdates with actions


- visibility of largefiles for hooks

  Hooks below can't see appropriate largefiles in the working
  directory, because only standins are updated while command
  execution.

  - precommit, pretxncommit and commit hooks for automated
    committing

  - update hooks for "hg update", "hg merge" and automated
    committing implying the working directory update/merge.

  Even though these hooks can't see appropriate largefiles now too :-),
  deep hooking can fix this problem.


What do you think about these points ?
Katsunori FUJIWARA - Sept. 24, 2014, 2:31 p.m.
On Tue, 23 Sep 2014 17:00:54 -0700, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> On 09/22/2014 02:14 AM, Mads Kiilerich wrote:
>> Hi
>>
>> I like that this patch series improve largefiles to handle more corner
>> cases correctly. Nice work!
> 
> So what is the plan for this series, any chance it get accepted as is or 
> should I drop it from patchwork?

I withdraw this series and please drop it from patchwork.

Even if my approach will be accepted, I'll post revised series including
detail about conclusion of discussion and so on.
Mads Kiilerich - Sept. 24, 2014, 10:43 p.m.
On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote:
> On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote:
>> Hi
>>
>> I like that this patch series improve largefiles to handle more corner 
>> cases correctly. Nice work!
>>
>> I do not so much like increasing amount of hacks we have to use. It is 
>> not like I feel we finally got it right and now can start cleaning up 
>> and simplifying the code. Instead, in order to take the next steps, this 
>> series has to introduce more complexity and more callbacks in core 
>> Mercurial.
>>
>> I wonder if we should try a different approach. As you point out, it is 
>> almost impossible to do the right thing in the internal commit functions 
>> - they don't have enough context. One alternative could be to hook in at 
>> the very low level, updating the standin files whenever they are read 
>> and propagate their change whenever they are written. Another approach 
>> would be to do all synchronization before/after calling in to core 
>> Mercurial and let the core pretty much treat standin files like any 
>> other file. I have played around with that approach and posted a proof 
>> of concept.
>>
>> /Mads
> Thank you for your comments, and sorry for late response.
>
> Your approach posted as PoC looks simple and good !
>
> But on the other hand, I'm also worry about below for it.
>
> - performance impact on large scale repositories
>
>   Examination of updating for ALL standins after original command
>   logic may have serious performance impact on large scale
>   repositories, because it may imply file I/O on many standins.

In the PoC I reuse the existing simple method of reading all standins.
That could be optimized. We could rely on the dirstates to see which
files it is relevant to process.

>   In many cases, updated standins can be known exactly, and deep
>   hooking can avoid such performance impact (even though revert,
>   update and merge cases are not yet optimized :-P)
>
>     - import       => scmutil.marktouched with filelist
>     - revert       => cmdutil._performrevert with actions
>     - update/merge => merge.applyupdates with actions

Yes, it is impressive that these methods work. But they are also complex
and fragile and hard to maintain.

> - visibility of largefiles for hooks
>
>   Hooks below can't see appropriate largefiles in the working
>   directory, because only standins are updated while command
>   execution.

Yes, I think that is a acceptable if that is what it takes to give a
simple model that works consistently everywhere.

>   Even though these hooks can't see appropriate largefiles now too :-),
>   deep hooking can fix this problem.

If we really want deep hooking then I would consider experimenting with
hooking in to the vfs layer, updating standin files every time they are
read or walked and are out of date, and propagating their value every
time they are written.

> What do you think about these points ?

I think they are valid. There is no perfect solution ... and probably
not even a good one. But I think we should reconsider whether the
current approach of violating all layers and add hacks everywhere is the
best solution.

/Mads
Matt Mackall - Sept. 29, 2014, 10:51 p.m.
On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote:
> On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote:
> > On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote:
> >> Hi
> >>
> >> I like that this patch series improve largefiles to handle more corner 
> >> cases correctly. Nice work!
> >>
> >> I do not so much like increasing amount of hacks we have to use. It is 
> >> not like I feel we finally got it right and now can start cleaning up 
> >> and simplifying the code. Instead, in order to take the next steps, this 
> >> series has to introduce more complexity and more callbacks in core 
> >> Mercurial.
> >>
> >> I wonder if we should try a different approach. As you point out, it is 
> >> almost impossible to do the right thing in the internal commit functions 
> >> - they don't have enough context. One alternative could be to hook in at 
> >> the very low level, updating the standin files whenever they are read 
> >> and propagate their change whenever they are written. Another approach 
> >> would be to do all synchronization before/after calling in to core 
> >> Mercurial and let the core pretty much treat standin files like any 
> >> other file. I have played around with that approach and posted a proof 
> >> of concept.
> >>
> >> /Mads
> > Thank you for your comments, and sorry for late response.
> >
> > Your approach posted as PoC looks simple and good !
> >
> > But on the other hand, I'm also worry about below for it.
> >
> > - performance impact on large scale repositories
> >
> >   Examination of updating for ALL standins after original command
> >   logic may have serious performance impact on large scale
> >   repositories, because it may imply file I/O on many standins.
> 
> In the PoC I reuse the existing simple method of reading all standins.
> That could be optimized. We could rely on the dirstates to see which
> files it is relevant to process.
> 
> >   In many cases, updated standins can be known exactly, and deep
> >   hooking can avoid such performance impact (even though revert,
> >   update and merge cases are not yet optimized :-P)
> >
> >     - import       => scmutil.marktouched with filelist
> >     - revert       => cmdutil._performrevert with actions
> >     - update/merge => merge.applyupdates with actions
> 
> Yes, it is impressive that these methods work. But they are also complex
> and fragile and hard to maintain.
> 
> > - visibility of largefiles for hooks
> >
> >   Hooks below can't see appropriate largefiles in the working
> >   directory, because only standins are updated while command
> >   execution.
> 
> Yes, I think that is a acceptable if that is what it takes to give a
> simple model that works consistently everywhere.
> 
> >   Even though these hooks can't see appropriate largefiles now too :-),
> >   deep hooking can fix this problem.
> 
> If we really want deep hooking then I would consider experimenting with
> hooking in to the vfs layer, updating standin files every time they are
> read or walked and are out of date, and propagating their value every
> time they are written.
> 
> > What do you think about these points ?
> 
> I think they are valid. There is no perfect solution ... and probably
> not even a good one. But I think we should reconsider whether the
> current approach of violating all layers and add hacks everywhere is the
> best solution.

I think if we know how to get largefiles more correct (for instance, by
applying this patch series), then we should do that. While this might
increase the complexity, it also puts us in a better place for measuring
regressions when refactoring.. and also delivers immediate benefits to
users.
Mads Kiilerich - Sept. 29, 2014, 11:06 p.m.
On 09/30/2014 12:51 AM, Matt Mackall wrote:
> On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote:
>> I think they are valid. There is no perfect solution ... and probably
>> not even a good one. But I think we should reconsider whether the
>> current approach of violating all layers and add hacks everywhere is the
>> best solution.
> I think if we know how to get largefiles more correct (for instance, by
> applying this patch series), then we should do that. While this might
> increase the complexity, it also puts us in a better place for measuring
> regressions when refactoring.. and also delivers immediate benefits to
> users.

Fine with me. I would love to get the benefits too.

/Mads
Matt Mackall - Sept. 29, 2014, 11:15 p.m.
On Tue, 2014-09-30 at 01:06 +0200, Mads Kiilerich wrote:
> On 09/30/2014 12:51 AM, Matt Mackall wrote:
> > On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote:
> >> I think they are valid. There is no perfect solution ... and probably
> >> not even a good one. But I think we should reconsider whether the
> >> current approach of violating all layers and add hacks everywhere is the
> >> best solution.
> > I think if we know how to get largefiles more correct (for instance, by
> > applying this patch series), then we should do that. While this might
> > increase the complexity, it also puts us in a better place for measuring
> > regressions when refactoring.. and also delivers immediate benefits to
> > users.
> 
> Fine with me. I would love to get the benefits too.

I guess we're waiting for a resend in any case.
Katsunori FUJIWARA - Oct. 6, 2014, 3:24 p.m.
At Thu, 25 Sep 2014 00:43:23 +0200,
Mads Kiilerich wrote:
> 
> On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote:
>
> > On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote:

> >   Even though these hooks can't see appropriate largefiles now too :-),
> >   deep hooking can fix this problem.
> 
> If we really want deep hooking then I would consider experimenting with
> hooking in to the vfs layer, updating standin files every time they are
> read or walked and are out of date, and propagating their value every
> time they are written.

"vfs layer hooking" looked good and I thought about feasibility of
it. But I finally concluded that it causes performance problem at
getting largefiles from remote. For example, in the case of "hg
update":

  - now:

    largefiles are gotten all together from the remote by
    "updatelfiles" at the end of "merge.update".

    in this case, network channel is opened only once for each "hg
    update"

  - after "vfs layer hooking":

    each largefiles are gotten separately at the end of "vfs.write"
    for standins.

    in this case, network channel is opened for each largefiles while
    "hg update" (or a kind of batching of getting largefiles while "hg
    update" is needed)


> > What do you think about these points ?
> 
> I think they are valid. There is no perfect solution ... and probably
> not even a good one. But I think we should reconsider whether the
> current approach of violating all layers and add hacks everywhere is the
> best solution.

Then, I've hit upon another approach to solve the issue without adding
hook points into Mercurial core (like "automatedcommit" in this series).

I'll send the revised series soon !


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
--- a/hgext/largefiles/overrides.py
+++ b/hgext/largefiles/overrides.py
@@ -723,7 +723,7 @@ 
         source = 'default'
     repo.lfpullsource = source
     if opts.get('rebase', False):
-        repo._isrebasing = True
+        repo._lfautocommit = 'rebase'
         try:
             if opts.get('update'):
                 del opts['update']
@@ -742,7 +742,7 @@ 
             if revspostpull > revsprepull:
                 result = result or rebase.rebase(ui, repo)
         finally:
-            repo._isrebasing = False
+            repo._lfautocommit = False
     else:
         result = orig(ui, repo, source, **opts)
     revspostpull = len(repo)
@@ -818,11 +818,11 @@ 
     return result
 
 def overriderebase(orig, ui, repo, **opts):
-    repo._isrebasing = True
+    repo._lfautocommit = 'rebase'
     try:
         return orig(ui, repo, **opts)
     finally:
-        repo._isrebasing = False
+        repo._lfautocommit = False
 
 def overridearchive(orig, repo, dest, node, kind, decode=True, matchfn=None,
             prefix=None, mtime=None, subrepos=None):
@@ -1199,14 +1199,14 @@ 
 def overridetransplant(orig, ui, repo, *revs, **opts):
     try:
         oldstandins = lfutil.getstandinsstate(repo)
-        repo._istransplanting = True
+        repo._lfautocommit = 'transplant'
         result = orig(ui, repo, *revs, **opts)
         newstandins = lfutil.getstandinsstate(repo)
         filelist = lfutil.getlfilestoupdate(oldstandins, newstandins)
         lfcommands.updatelfiles(repo.ui, repo, filelist=filelist,
                                 printmessage=True)
     finally:
-        repo._istransplanting = False
+        repo._lfautocommit = False
     return result
 
 def overridecat(orig, ui, repo, file1, *pats, **opts):
@@ -1309,8 +1309,7 @@ 
             filelist = lfutil.getlfilestoupdate(oldstandins, newstandins)
 
         # suppress status message while automated committing
-        printmessage = not (getattr(repo, "_isrebasing", False) or
-                            getattr(repo, "_istransplanting", False))
+        printmessage = not getattr(repo, "_lfautocommit", False)
         lfcommands.updatelfiles(repo.ui, repo, filelist=filelist,
                                 printmessage=printmessage,
                                 normallookup=partial)
diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py
--- a/hgext/largefiles/reposetup.py
+++ b/hgext/largefiles/reposetup.py
@@ -281,8 +281,7 @@ 
                 # (2) aborting when stadnins are matched by "match",
                 #     because automated committing may specify them directly
                 #
-                if getattr(self, "_isrebasing", False) or \
-                        getattr(self, "_istransplanting", False):
+                if getattr(self, "_lfautocommit", False):
                     result = orig(text=text, user=user, date=date, match=match,
                                     force=force, editor=editor, extra=extra)