Patchwork D11784: update: filter the ambiguous mtime in update directly

login
register
mail settings
Submitter phabricator
Date Nov. 24, 2021, 11:12 a.m.
Message ID <differential-rev-PHID-DREV-eod3rw6ejq5cylgeaeyr-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/50110/
State New
Headers show

Comments

phabricator - Nov. 24, 2021, 11:12 a.m.
marmoute created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  Right now, this filtering is done at `dirstate.write` time. However that
  filtering is done too late for most of the case, we are about to change that
  code to be more accurate. However `hg update` still need such filtering (mostly
  because it is actually quite racy, even with such filtering). So we explicitly
  implement it there.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D11784

AFFECTED FILES
  mercurial/merge.py

CHANGE DETAILS




To: marmoute, #hg-reviewers
Cc: mercurial-patches, mercurial-devel

Patch

diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -2169,6 +2169,64 @@ 
                 mresult.len((mergestatemod.ACTION_GET,)) if wantfiledata else 0
             )
             with repo.dirstate.parentchange():
+                ### Filter Filedata
+                #
+                # We gathered "cache" information for the clean file while
+                # updating them: mtime, size and mode.
+                #
+                # At the time this comment is written, they are various issues
+                # with how we gather this information (see the comment in
+                # `batchget`).
+                #
+                # We are going to smooth one of this issue here : mtime ambiguity.
+                #
+                # i.e. even if the mtime gathered during `batchget` was
+                # correct[1] a change happening right after it could change the
+                # content while keeping the same mtime[2].
+                #
+                # So, not we the update operation is finished, we will only
+                # keep the file data for files with mtime that are stricly in
+                # the past, i.e. whose mtime is strictly lower than the current
+                # time. This protect us from race condition from operation that
+                # could run right after this one, especially other Mercurial
+                # operation that could be waiting for the wlock to touch file
+                # content and the dirstate.
+                #
+                # In an ideal world, we could only get reliable information in
+                # `getfiledata` (from `getbatch`), however the current approach
+                # have been a successful compromise since many years.
+                #
+                # At the time this comment is written, not using any "cache"
+                # file data at all here would not be viable. As it would result is
+                # a very large amount of work (equivalement to the previous `hg
+                # update` during the next status after an update).
+                #
+                # [1] the current code cannot garantee that the mtime is
+                # correct, but the result is "okay in practice", so lets ignore
+                # that sub issue.
+                #
+                # [2] using nano-second precision can greatly help here because
+                # it makes the "different write with same mtime" issue
+                # virtually vanish. However, dirstate v1 cannot store such
+                # precision and a bunch of python runtime, operating system and
+                # filesystem does not provide use with such precision, so we
+                # have to operate as if it wasn't available.
+                if getfiledata:
+                    ambiguous_mtime = {}
+                    now = timestamp.get_fs_now(repo.vfs)
+                    if now is None:
+                        # we can't write to the FS, so we won't actually update
+                        # the dirstate content anyway, no need to put cache
+                        # information.
+                        getfiledata = None
+                    else:
+                        now_sec = now[0]
+                        for f, m in pycompat.iteritems(getfiledata):
+                            if m is not None and m[2][0] >= now_sec:
+                                ambiguous_mtime[f] = (m[0], m[1], None)
+                        for f, m in pycompat.iteritems(ambiguous_mtime):
+                            getfiledata[f] = m
+
                 repo.setparents(fp1, fp2)
                 mergestatemod.recordupdates(
                     repo, mresult.actionsdict, branchmerge, getfiledata