Patchwork [2,of,2] largefiles: don't rehash largefiles in updatelfiles if standin hash changed

login
register
mail settings
Submitter Mads Kiilerich
Date Jan. 16, 2015, 6:51 p.m.
Message ID <a6e406d51bf21152025f.1421434305@ssl.google-analytics.com>
Download mbox | patch
Permalink /patch/7491/
State Accepted
Commit f21a0d6d6efddfcb286d094113b2d82e8fe05d92
Headers show

Comments

Mads Kiilerich - Jan. 16, 2015, 6:51 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1420827009 -3600
#      Fri Jan 09 19:10:09 2015 +0100
# Node ID a6e406d51bf21152025f185cdb79d2ed8c871774
# Parent  dead34ad89f94629703d2595e2df6e0e351c5cc4
largefiles: don't rehash largefiles in updatelfiles if standin hash changed

Standins are read before and after an update/merge, and all the standins that
changes are handed to updatelfiles for getting their corresponding largefiles
updated. updatelfiles would then hash the largefile and see if it already
matched the new expected hash. If so, it would skip the update. If different,
the largefile would be updated.

It would happen very rarely that the largefile happened to match the new hash
(and thus not the old one) and the hashing would thus be pointless ... and
hashing is not cheap.

Instead, when it is known that the standin hash changed (from an update), just
update the standin unconditionally. If the largefile was "unsure" before the
update, it was hashed at that point, so we know there is nothing to preserve.
(Also, the hashing in updatelfiles was not used to preserve changes, but only
to be lazy about updating the largefile, so nothing is lost by not doing this
extra hashing.)

There might be rare situations where we now will update largefiles that didn't
have to be updated, but in all relevant cases (?) this will improve
performance.

Updates on a repo with some big largefiles has been seen to go from 9.19 s to
6.8 s - that is 26% less painful.
Matt Mackall - Jan. 17, 2015, 12:02 a.m.
On Fri, 2015-01-16 at 19:51 +0100, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1420827009 -3600
> #      Fri Jan 09 19:10:09 2015 +0100
> # Node ID a6e406d51bf21152025f185cdb79d2ed8c871774
> # Parent  dead34ad89f94629703d2595e2df6e0e351c5cc4
> largefiles: don't rehash largefiles in updatelfiles if standin hash changed

These are queued for default, thanks.

Patch

diff --git a/hgext/largefiles/lfcommands.py b/hgext/largefiles/lfcommands.py
--- a/hgext/largefiles/lfcommands.py
+++ b/hgext/largefiles/lfcommands.py
@@ -437,7 +437,7 @@  def downloadlfiles(ui, repo, rev=None):
     return totalsuccess, totalmissing
 
 def updatelfiles(ui, repo, filelist=None, printmessage=None,
-                 normallookup=False):
+                 normallookup=False, checked=False):
     '''Update largefiles according to standins in the working directory
 
     If ``printmessage`` is other than ``None``, it means "print (or
@@ -465,7 +465,8 @@  def updatelfiles(ui, repo, filelist=None
                     util.unlinkpath(absstandin + '.orig')
                 expecthash = lfutil.readstandin(repo, lfile)
                 if (expecthash != '' and
-                    (not os.path.exists(abslfile) or
+                    (checked or
+                     not os.path.exists(abslfile) or
                      expecthash != lfutil.hashfile(abslfile))):
                     if lfile not in repo[None]: # not switched to normal file
                         util.unlinkpath(abslfile, ignoremissing=True)
diff --git a/hgext/largefiles/overrides.py b/hgext/largefiles/overrides.py
--- a/hgext/largefiles/overrides.py
+++ b/hgext/largefiles/overrides.py
@@ -1324,7 +1324,7 @@  def mergeupdate(orig, repo, node, branch
             filelist = lfutil.getlfilestoupdate(oldstandins, newstandins)
 
         lfcommands.updatelfiles(repo.ui, repo, filelist=filelist,
-                                normallookup=partial)
+                                normallookup=partial, checked=linearmerge)
 
         return result
     finally: