Submitter | Katsunori FUJIWARA |
---|---|
Date | Sept. 9, 2014, 6:18 p.m. |
Message ID | <a0d96709737b2fb101d2.1410286725@feefifofum> |
Download | mbox | patch |
Permalink | /patch/5747/ |
State | Changes Requested |
Headers | show |
Comments
Hi I like that this patch series improve largefiles to handle more corner cases correctly. Nice work! I do not so much like increasing amount of hacks we have to use. It is not like I feel we finally got it right and now can start cleaning up and simplifying the code. Instead, in order to take the next steps, this series has to introduce more complexity and more callbacks in core Mercurial. I wonder if we should try a different approach. As you point out, it is almost impossible to do the right thing in the internal commit functions - they don't have enough context. One alternative could be to hook in at the very low level, updating the standin files whenever they are read and propagate their change whenever they are written. Another approach would be to do all synchronization before/after calling in to core Mercurial and let the core pretty much treat standin files like any other file. I have played around with that approach and posted a proof of concept. /Mads
Mads Kiilerich writes: > I wonder if we should try a different approach. As you point out, it is > almost impossible to do the right thing in the internal commit functions > - they don't have enough context. One alternative could be to hook in at > the very low level, updating the standin files whenever they are read > and propagate their change whenever they are written. Another approach > would be to do all synchronization before/after calling in to core > Mercurial and let the core pretty much treat standin files like any > other file. I have played around with that approach and posted a proof > of concept. Just a shot in the dark here, but what about using a custom context object (that will soon grow a commit method)?
On 09/22/2014 07:02 PM, Sean Farley wrote: > Mads Kiilerich writes: > >> I wonder if we should try a different approach. As you point out, it is >> almost impossible to do the right thing in the internal commit functions >> - they don't have enough context. One alternative could be to hook in at >> the very low level, updating the standin files whenever they are read >> and propagate their change whenever they are written. Another approach >> would be to do all synchronization before/after calling in to core >> Mercurial and let the core pretty much treat standin files like any >> other file. I have played around with that approach and posted a proof >> of concept. > Just a shot in the dark here, but what about using a custom context > object (that will soon grow a commit method)? I doubt that would help. The problem (or a part of it) is that commands like rebase, transplant and histedit do several commits and updates - and that it is quite a bit blurry when they should propagate largefiles to standins or the other way around. I guess it would require quite a bit of largefile awareness in creation/initialization of these context objects to handle that "correctly". /Mads
Mads Kiilerich writes: > On 09/22/2014 07:02 PM, Sean Farley wrote: >> Mads Kiilerich writes: >> >>> I wonder if we should try a different approach. As you point out, it is >>> almost impossible to do the right thing in the internal commit functions >>> - they don't have enough context. One alternative could be to hook in at >>> the very low level, updating the standin files whenever they are read >>> and propagate their change whenever they are written. Another approach >>> would be to do all synchronization before/after calling in to core >>> Mercurial and let the core pretty much treat standin files like any >>> other file. I have played around with that approach and posted a proof >>> of concept. >> Just a shot in the dark here, but what about using a custom context >> object (that will soon grow a commit method)? > > I doubt that would help. > > The problem (or a part of it) is that commands like rebase, transplant > and histedit do several commits and updates - and that it is quite a bit > blurry when they should propagate largefiles to standins or the other > way around. I guess it would require quite a bit of largefile awareness > in creation/initialization of these context objects to handle that > "correctly". Hopefully in the future, those commands will just take arbitrary context objects (I am the bottleneck in that, though).
On 09/22/2014 02:14 AM, Mads Kiilerich wrote: > Hi > > I like that this patch series improve largefiles to handle more corner > cases correctly. Nice work! So what is the plan for this series, any chance it get accepted as is or should I drop it from patchwork?
On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote: > Hi > > I like that this patch series improve largefiles to handle more corner > cases correctly. Nice work! > > I do not so much like increasing amount of hacks we have to use. It is > not like I feel we finally got it right and now can start cleaning up > and simplifying the code. Instead, in order to take the next steps, this > series has to introduce more complexity and more callbacks in core > Mercurial. > > I wonder if we should try a different approach. As you point out, it is > almost impossible to do the right thing in the internal commit functions > - they don't have enough context. One alternative could be to hook in at > the very low level, updating the standin files whenever they are read > and propagate their change whenever they are written. Another approach > would be to do all synchronization before/after calling in to core > Mercurial and let the core pretty much treat standin files like any > other file. I have played around with that approach and posted a proof > of concept. > > /Mads Thank you for your comments, and sorry for late response. Your approach posted as PoC looks simple and good ! But on the other hand, I'm also worry about below for it. - performance impact on large scale repositories Examination of updating for ALL standins after original command logic may have serious performance impact on large scale repositories, because it may imply file I/O on many standins. In many cases, updated standins can be known exactly, and deep hooking can avoid such performance impact (even though revert, update and merge cases are not yet optimized :-P) - import => scmutil.marktouched with filelist - revert => cmdutil._performrevert with actions - update/merge => merge.applyupdates with actions - visibility of largefiles for hooks Hooks below can't see appropriate largefiles in the working directory, because only standins are updated while command execution. - precommit, pretxncommit and commit hooks for automated committing - update hooks for "hg update", "hg merge" and automated committing implying the working directory update/merge. Even though these hooks can't see appropriate largefiles now too :-), deep hooking can fix this problem. What do you think about these points ?
On Tue, 23 Sep 2014 17:00:54 -0700, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > On 09/22/2014 02:14 AM, Mads Kiilerich wrote: >> Hi >> >> I like that this patch series improve largefiles to handle more corner >> cases correctly. Nice work! > > So what is the plan for this series, any chance it get accepted as is or > should I drop it from patchwork? I withdraw this series and please drop it from patchwork. Even if my approach will be accepted, I'll post revised series including detail about conclusion of discussion and so on.
On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote: > On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote: >> Hi >> >> I like that this patch series improve largefiles to handle more corner >> cases correctly. Nice work! >> >> I do not so much like increasing amount of hacks we have to use. It is >> not like I feel we finally got it right and now can start cleaning up >> and simplifying the code. Instead, in order to take the next steps, this >> series has to introduce more complexity and more callbacks in core >> Mercurial. >> >> I wonder if we should try a different approach. As you point out, it is >> almost impossible to do the right thing in the internal commit functions >> - they don't have enough context. One alternative could be to hook in at >> the very low level, updating the standin files whenever they are read >> and propagate their change whenever they are written. Another approach >> would be to do all synchronization before/after calling in to core >> Mercurial and let the core pretty much treat standin files like any >> other file. I have played around with that approach and posted a proof >> of concept. >> >> /Mads > Thank you for your comments, and sorry for late response. > > Your approach posted as PoC looks simple and good ! > > But on the other hand, I'm also worry about below for it. > > - performance impact on large scale repositories > > Examination of updating for ALL standins after original command > logic may have serious performance impact on large scale > repositories, because it may imply file I/O on many standins. In the PoC I reuse the existing simple method of reading all standins. That could be optimized. We could rely on the dirstates to see which files it is relevant to process. > In many cases, updated standins can be known exactly, and deep > hooking can avoid such performance impact (even though revert, > update and merge cases are not yet optimized :-P) > > - import => scmutil.marktouched with filelist > - revert => cmdutil._performrevert with actions > - update/merge => merge.applyupdates with actions Yes, it is impressive that these methods work. But they are also complex and fragile and hard to maintain. > - visibility of largefiles for hooks > > Hooks below can't see appropriate largefiles in the working > directory, because only standins are updated while command > execution. Yes, I think that is a acceptable if that is what it takes to give a simple model that works consistently everywhere. > Even though these hooks can't see appropriate largefiles now too :-), > deep hooking can fix this problem. If we really want deep hooking then I would consider experimenting with hooking in to the vfs layer, updating standin files every time they are read or walked and are out of date, and propagating their value every time they are written. > What do you think about these points ? I think they are valid. There is no perfect solution ... and probably not even a good one. But I think we should reconsider whether the current approach of violating all layers and add hacks everywhere is the best solution. /Mads
On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote: > On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote: > > On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote: > >> Hi > >> > >> I like that this patch series improve largefiles to handle more corner > >> cases correctly. Nice work! > >> > >> I do not so much like increasing amount of hacks we have to use. It is > >> not like I feel we finally got it right and now can start cleaning up > >> and simplifying the code. Instead, in order to take the next steps, this > >> series has to introduce more complexity and more callbacks in core > >> Mercurial. > >> > >> I wonder if we should try a different approach. As you point out, it is > >> almost impossible to do the right thing in the internal commit functions > >> - they don't have enough context. One alternative could be to hook in at > >> the very low level, updating the standin files whenever they are read > >> and propagate their change whenever they are written. Another approach > >> would be to do all synchronization before/after calling in to core > >> Mercurial and let the core pretty much treat standin files like any > >> other file. I have played around with that approach and posted a proof > >> of concept. > >> > >> /Mads > > Thank you for your comments, and sorry for late response. > > > > Your approach posted as PoC looks simple and good ! > > > > But on the other hand, I'm also worry about below for it. > > > > - performance impact on large scale repositories > > > > Examination of updating for ALL standins after original command > > logic may have serious performance impact on large scale > > repositories, because it may imply file I/O on many standins. > > In the PoC I reuse the existing simple method of reading all standins. > That could be optimized. We could rely on the dirstates to see which > files it is relevant to process. > > > In many cases, updated standins can be known exactly, and deep > > hooking can avoid such performance impact (even though revert, > > update and merge cases are not yet optimized :-P) > > > > - import => scmutil.marktouched with filelist > > - revert => cmdutil._performrevert with actions > > - update/merge => merge.applyupdates with actions > > Yes, it is impressive that these methods work. But they are also complex > and fragile and hard to maintain. > > > - visibility of largefiles for hooks > > > > Hooks below can't see appropriate largefiles in the working > > directory, because only standins are updated while command > > execution. > > Yes, I think that is a acceptable if that is what it takes to give a > simple model that works consistently everywhere. > > > Even though these hooks can't see appropriate largefiles now too :-), > > deep hooking can fix this problem. > > If we really want deep hooking then I would consider experimenting with > hooking in to the vfs layer, updating standin files every time they are > read or walked and are out of date, and propagating their value every > time they are written. > > > What do you think about these points ? > > I think they are valid. There is no perfect solution ... and probably > not even a good one. But I think we should reconsider whether the > current approach of violating all layers and add hacks everywhere is the > best solution. I think if we know how to get largefiles more correct (for instance, by applying this patch series), then we should do that. While this might increase the complexity, it also puts us in a better place for measuring regressions when refactoring.. and also delivers immediate benefits to users.
On 09/30/2014 12:51 AM, Matt Mackall wrote: > On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote: >> I think they are valid. There is no perfect solution ... and probably >> not even a good one. But I think we should reconsider whether the >> current approach of violating all layers and add hacks everywhere is the >> best solution. > I think if we know how to get largefiles more correct (for instance, by > applying this patch series), then we should do that. While this might > increase the complexity, it also puts us in a better place for measuring > regressions when refactoring.. and also delivers immediate benefits to > users. Fine with me. I would love to get the benefits too. /Mads
On Tue, 2014-09-30 at 01:06 +0200, Mads Kiilerich wrote: > On 09/30/2014 12:51 AM, Matt Mackall wrote: > > On Thu, 2014-09-25 at 00:43 +0200, Mads Kiilerich wrote: > >> I think they are valid. There is no perfect solution ... and probably > >> not even a good one. But I think we should reconsider whether the > >> current approach of violating all layers and add hacks everywhere is the > >> best solution. > > I think if we know how to get largefiles more correct (for instance, by > > applying this patch series), then we should do that. While this might > > increase the complexity, it also puts us in a better place for measuring > > regressions when refactoring.. and also delivers immediate benefits to > > users. > > Fine with me. I would love to get the benefits too. I guess we're waiting for a resend in any case.
At Thu, 25 Sep 2014 00:43:23 +0200, Mads Kiilerich wrote: > > On 09/24/2014 03:11 PM, FUJIWARA Katsunori wrote: > > > On Mon, 22 Sep 2014 11:14:56 +0200, Mads Kiilerich <mads@kiilerich.com> wrote: > > Even though these hooks can't see appropriate largefiles now too :-), > > deep hooking can fix this problem. > > If we really want deep hooking then I would consider experimenting with > hooking in to the vfs layer, updating standin files every time they are > read or walked and are out of date, and propagating their value every > time they are written. "vfs layer hooking" looked good and I thought about feasibility of it. But I finally concluded that it causes performance problem at getting largefiles from remote. For example, in the case of "hg update": - now: largefiles are gotten all together from the remote by "updatelfiles" at the end of "merge.update". in this case, network channel is opened only once for each "hg update" - after "vfs layer hooking": each largefiles are gotten separately at the end of "vfs.write" for standins. in this case, network channel is opened for each largefiles while "hg update" (or a kind of batching of getting largefiles while "hg update" is needed) > > What do you think about these points ? > > I think they are valid. There is no perfect solution ... and probably > not even a good one. But I think we should reconsider whether the > current approach of violating all layers and add hacks everywhere is the > best solution. Then, I've hit upon another approach to solve the issue without adding hook points into Mercurial core (like "automatedcommit" in this series). I'll send the revised series soon ! ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
Patch
diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py --- a/hgext/largefiles/overrides.py +++ b/hgext/largefiles/overrides.py @@ -723,7 +723,7 @@ source = 'default' repo.lfpullsource = source if opts.get('rebase', False): - repo._isrebasing = True + repo._lfautocommit = 'rebase' try: if opts.get('update'): del opts['update'] @@ -742,7 +742,7 @@ if revspostpull > revsprepull: result = result or rebase.rebase(ui, repo) finally: - repo._isrebasing = False + repo._lfautocommit = False else: result = orig(ui, repo, source, **opts) revspostpull = len(repo) @@ -818,11 +818,11 @@ return result def overriderebase(orig, ui, repo, **opts): - repo._isrebasing = True + repo._lfautocommit = 'rebase' try: return orig(ui, repo, **opts) finally: - repo._isrebasing = False + repo._lfautocommit = False def overridearchive(orig, repo, dest, node, kind, decode=True, matchfn=None, prefix=None, mtime=None, subrepos=None): @@ -1199,14 +1199,14 @@ def overridetransplant(orig, ui, repo, *revs, **opts): try: oldstandins = lfutil.getstandinsstate(repo) - repo._istransplanting = True + repo._lfautocommit = 'transplant' result = orig(ui, repo, *revs, **opts) newstandins = lfutil.getstandinsstate(repo) filelist = lfutil.getlfilestoupdate(oldstandins, newstandins) lfcommands.updatelfiles(repo.ui, repo, filelist=filelist, printmessage=True) finally: - repo._istransplanting = False + repo._lfautocommit = False return result def overridecat(orig, ui, repo, file1, *pats, **opts): @@ -1309,8 +1309,7 @@ filelist = lfutil.getlfilestoupdate(oldstandins, newstandins) # suppress status message while automated committing - printmessage = not (getattr(repo, "_isrebasing", False) or - getattr(repo, "_istransplanting", False)) + printmessage = not getattr(repo, "_lfautocommit", False) lfcommands.updatelfiles(repo.ui, repo, filelist=filelist, printmessage=printmessage, normallookup=partial) diff --git a/hgext/largefiles/reposetup.py b/hgext/largefiles/reposetup.py --- a/hgext/largefiles/reposetup.py +++ b/hgext/largefiles/reposetup.py @@ -281,8 +281,7 @@ # (2) aborting when stadnins are matched by "match", # because automated committing may specify them directly # - if getattr(self, "_isrebasing", False) or \ - getattr(self, "_istransplanting", False): + if getattr(self, "_lfautocommit", False): result = orig(text=text, user=user, date=date, match=match, force=force, editor=editor, extra=extra)