Patchwork [stable?] largefiles: don't cache largefiles for all new pulled branchheads

login
register
mail settings
Submitter Mads Kiilerich
Date Feb. 26, 2013, 3:21 a.m.
Message ID <edce3a16a9ececeeb4cc.1361848910@mk-desktop>
Download mbox | patch
Permalink /patch/1057/
State Superseded, archived
Headers show

Comments

Mads Kiilerich - Feb. 26, 2013, 3:21 a.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1361848730 -3600
# Branch stable
# Node ID edce3a16a9ececeeb4cc212d50dcb1cc81bf4dc6
# Parent  89fd28cd4c011a974aa683a28e7302008f01e81e
largefiles: don't cache largefiles for all new pulled branchheads

Fetching largefiles for all brancheads doesn't scale to setups with several
branches and huge volume of changed largefiles.

The only case where it is relevant to fetch when pulling is when pulling from a
repository with only temporary access ... but in that case it is essential that
--all-largefiles is used anyway. If fetching largefiles from all revisions for
some reason is too much, then it is very likely that fetching largefiles from
all branchheads is too much too, and a better solution would be to use 'pull
-u'.

In all other cases the largefiles will be fetched on demand anyway.

And with or without this automatic caching of largefiles there will be cases
where a workaround must be used anyway - for instance by setting a default pull
path with '--config paths.default=...' .

This is thus a slight change of behavior, but only in strategy and heuristic
for when it is most appropriate to fetch largefiles. There is no fundamental
change.
Mads Kiilerich - Feb. 26, 2013, 3:23 a.m.
On 02/26/2013 04:21 AM, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1361848730 -3600
> # Branch stable
> # Node ID edce3a16a9ececeeb4cc212d50dcb1cc81bf4dc6
> # Parent  89fd28cd4c011a974aa683a28e7302008f01e81e
> largefiles: don't cache largefiles for all new pulled branchheads

Na'Tosha's "largefiles: don't cache largefiles for pulled heads by 
default" is now in http://selenic.com/hg/rev/d69585a5c5c0 . The same 
issue is however getting more of a problem for us.

I don't think it has been stated sufficiently explicit what is going on. 
When 2.5.1 is pulling from a repository with several named branches, 
then it will retrieve largefiles for all the branch heads. The more 
named branches a repository gets and the more activity there is on 
otherwise independent branches, the more of a problem it gets that it 
fetches largefiles for all named branches - no matter if they were 
pulled explicitly or pulled because they had been merged to something 
you pulled. The user will have to download more and more data that he 
didn't ask for and never will use. That goes directly against the idea 
of largefiles.

It is "only" a performance/resource issue, it is not really a 
regression, and fixing it is a bit of a change of behaviour. But to us 
it has become an issue that we would like to see solved in stable.

And to get back to d69585a5c5c0: I don't think there is any reason to 
introduce a command line option for the old behaviour. I would prefer to 
remove it and the supporting code again, as this patch shows.

Opinions?

/Mads


> Fetching largefiles for all brancheads doesn't scale to setups with several
> branches and huge volume of changed largefiles.
>
> The only case where it is relevant to fetch when pulling is when pulling from a
> repository with only temporary access ... but in that case it is essential that
> --all-largefiles is used anyway. If fetching largefiles from all revisions for
> some reason is too much, then it is very likely that fetching largefiles from
> all branchheads is too much too, and a better solution would be to use 'pull
> -u'.
>
> In all other cases the largefiles will be fetched on demand anyway.
>
> And with or without this automatic caching of largefiles there will be cases
> where a workaround must be used anyway - for instance by setting a default pull
> path with '--config paths.default=...' .
>
> This is thus a slight change of behavior, but only in strategy and heuristic
> for when it is most appropriate to fetch largefiles. There is no fundamental
> change.
>
> diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
> --- a/hgext/largefiles/overrides.py
> +++ b/hgext/largefiles/overrides.py
> @@ -722,20 +722,7 @@
>           if not source:
>               source = 'default'
>           repo.lfpullsource = source
> -        oldheads = lfutil.getcurrentheads(repo)
>           result = orig(ui, repo, source, **opts)
> -        # If we do not have the new largefiles for any new heads we pulled, we
> -        # will run into a problem later if we try to merge or rebase with one of
> -        # these heads, so cache the largefiles now directly into the system
> -        # cache.
> -        ui.status(_("caching new largefiles\n"))
> -        numcached = 0
> -        heads = lfutil.getcurrentheads(repo)
> -        newheads = set(heads).difference(set(oldheads))
> -        for head in newheads:
> -            (cached, missing) = lfcommands.cachelfiles(ui, repo, head)
> -            numcached += len(cached)
> -        ui.status(_("%d largefiles cached\n") % numcached)
>       if opts.get('all_largefiles'):
>           revspostpull = len(repo)
>           revs = []
> diff --git a/tests/test-largefiles-cache.t b/tests/test-largefiles-cache.t
> --- a/tests/test-largefiles-cache.t
> +++ b/tests/test-largefiles-cache.t
> @@ -40,8 +40,6 @@
>     adding file changes
>     added 2 changesets with 1 changes to 1 files
>     (run 'hg update' to get a working copy)
> -  caching new largefiles
> -  0 largefiles cached
>   
>   Update working directory to "tip", which requires largefile("large"),
>   but there is no cache file for it.  So, hg must treat it as
> @@ -83,8 +81,6 @@
>     adding file changes
>     added 1 changesets with 1 changes to 1 files
>     (run 'hg update' to get a working copy)
> -  caching new largefiles
> -  1 largefiles cached
>   
>   #if unix-permissions
>   
> diff --git a/tests/test-largefiles.t b/tests/test-largefiles.t
> --- a/tests/test-largefiles.t
> +++ b/tests/test-largefiles.t
> @@ -883,9 +883,7 @@
>     adding file changes
>     added 6 changesets with 16 changes to 8 files
>     (run 'hg update' to get a working copy)
> -  caching new largefiles
> -  3 largefiles cached
> -  3 additional largefiles cached
> +  6 additional largefiles cached
>     $ cd ..
>   
>   Rebasing between two repositories does not revert largefiles to old
> @@ -969,8 +967,6 @@
>     adding file changes
>     added 1 changesets with 2 changes to 2 files (+1 heads)
>     (run 'hg heads' to see heads, 'hg merge' to merge)
> -  caching new largefiles
> -  0 largefiles cached
>     $ hg rebase
>     Invoking status precommit hook
>     M sub/normal4
> @@ -1323,8 +1319,6 @@
>     pulling from $TESTTMP/d (glob)
>     searching for changes
>     no changes found
> -  caching new largefiles
> -  0 largefiles cached
>     0 additional largefiles cached
>   
>   Merging does not revert to old versions of largefiles and also check
> @@ -1362,8 +1356,6 @@
>     adding file changes
>     added 2 changesets with 4 changes to 4 files (+1 heads)
>     (run 'hg heads' to see heads, 'hg merge' to merge)
> -  caching new largefiles
> -  2 largefiles cached
>     $ hg merge
>     merging sub/large4
>     largefile sub/large4 has a merge conflict
> @@ -1371,6 +1363,20 @@
>     3 files updated, 1 files merged, 0 files removed, 0 files unresolved
>     (branch merge, don't forget to commit)
>     getting changed largefiles
> +  error getting id b9ac37c6767a5dbc01e43afa32957d8d789be72a from url file:$TESTTMP/temp for file sub2/large6: can't get file locally (glob)
> +  0 largefiles updated, 0 removed
> +  $ hg st
> +  M normal3
> +  M sub/normal4
> +  ! sub2/large6
> +  $ hg up -Cqr.
> +  $ hg merge --config paths.default=../e
> +  merging sub/large4
> +  largefile sub/large4 has a merge conflict
> +  keep (l)ocal or take (o)ther? l
> +  3 files updated, 1 files merged, 0 files removed, 0 files unresolved
> +  (branch merge, don't forget to commit)
> +  getting changed largefiles
>     1 largefiles updated, 0 removed
>     $ hg commit -m "Merge repos e and f"
>     Invoking status precommit hook

Patch

diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
--- a/hgext/largefiles/overrides.py
+++ b/hgext/largefiles/overrides.py
@@ -722,20 +722,7 @@ 
         if not source:
             source = 'default'
         repo.lfpullsource = source
-        oldheads = lfutil.getcurrentheads(repo)
         result = orig(ui, repo, source, **opts)
-        # If we do not have the new largefiles for any new heads we pulled, we
-        # will run into a problem later if we try to merge or rebase with one of
-        # these heads, so cache the largefiles now directly into the system
-        # cache.
-        ui.status(_("caching new largefiles\n"))
-        numcached = 0
-        heads = lfutil.getcurrentheads(repo)
-        newheads = set(heads).difference(set(oldheads))
-        for head in newheads:
-            (cached, missing) = lfcommands.cachelfiles(ui, repo, head)
-            numcached += len(cached)
-        ui.status(_("%d largefiles cached\n") % numcached)
     if opts.get('all_largefiles'):
         revspostpull = len(repo)
         revs = []
diff --git a/tests/test-largefiles-cache.t b/tests/test-largefiles-cache.t
--- a/tests/test-largefiles-cache.t
+++ b/tests/test-largefiles-cache.t
@@ -40,8 +40,6 @@ 
   adding file changes
   added 2 changesets with 1 changes to 1 files
   (run 'hg update' to get a working copy)
-  caching new largefiles
-  0 largefiles cached
 
 Update working directory to "tip", which requires largefile("large"),
 but there is no cache file for it.  So, hg must treat it as
@@ -83,8 +81,6 @@ 
   adding file changes
   added 1 changesets with 1 changes to 1 files
   (run 'hg update' to get a working copy)
-  caching new largefiles
-  1 largefiles cached
 
 #if unix-permissions
 
diff --git a/tests/test-largefiles.t b/tests/test-largefiles.t
--- a/tests/test-largefiles.t
+++ b/tests/test-largefiles.t
@@ -883,9 +883,7 @@ 
   adding file changes
   added 6 changesets with 16 changes to 8 files
   (run 'hg update' to get a working copy)
-  caching new largefiles
-  3 largefiles cached
-  3 additional largefiles cached
+  6 additional largefiles cached
   $ cd ..
 
 Rebasing between two repositories does not revert largefiles to old
@@ -969,8 +967,6 @@ 
   adding file changes
   added 1 changesets with 2 changes to 2 files (+1 heads)
   (run 'hg heads' to see heads, 'hg merge' to merge)
-  caching new largefiles
-  0 largefiles cached
   $ hg rebase
   Invoking status precommit hook
   M sub/normal4
@@ -1323,8 +1319,6 @@ 
   pulling from $TESTTMP/d (glob)
   searching for changes
   no changes found
-  caching new largefiles
-  0 largefiles cached
   0 additional largefiles cached
 
 Merging does not revert to old versions of largefiles and also check
@@ -1362,8 +1356,6 @@ 
   adding file changes
   added 2 changesets with 4 changes to 4 files (+1 heads)
   (run 'hg heads' to see heads, 'hg merge' to merge)
-  caching new largefiles
-  2 largefiles cached
   $ hg merge
   merging sub/large4
   largefile sub/large4 has a merge conflict
@@ -1371,6 +1363,20 @@ 
   3 files updated, 1 files merged, 0 files removed, 0 files unresolved
   (branch merge, don't forget to commit)
   getting changed largefiles
+  error getting id b9ac37c6767a5dbc01e43afa32957d8d789be72a from url file:$TESTTMP/temp for file sub2/large6: can't get file locally (glob)
+  0 largefiles updated, 0 removed
+  $ hg st
+  M normal3
+  M sub/normal4
+  ! sub2/large6
+  $ hg up -Cqr.
+  $ hg merge --config paths.default=../e
+  merging sub/large4
+  largefile sub/large4 has a merge conflict
+  keep (l)ocal or take (o)ther? l
+  3 files updated, 1 files merged, 0 files removed, 0 files unresolved
+  (branch merge, don't forget to commit)
+  getting changed largefiles
   1 largefiles updated, 0 removed
   $ hg commit -m "Merge repos e and f"
   Invoking status precommit hook