Patchwork [4,of,5] dirstate: make writing in-memory changes aware of transaction activity

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Oct. 9, 2015, 6:59 p.m.
Message ID <fdbfcbe61c00c684c713.1444417168@feefifofum>
Download mbox | patch
Permalink /patch/10929/
State Superseded
Commit 09bb1ee7e73e057179964ae51c9217da4dc76e61
Delegated to: Pierre-Yves David
Headers show

Comments

Katsunori FUJIWARA - Oct. 9, 2015, 6:59 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1444416862 -32400
#      Sat Oct 10 03:54:22 2015 +0900
# Node ID fdbfcbe61c00c684c71335b3be483df5a20d526a
# Parent  0c77ac7b070e4109765ee2e7759c40de7a52b2fc
dirstate: make writing in-memory changes aware of transaction activity

This patch delays writing in-memory changes out, if transaction is
running.

'_getfsnow()' is defined as a function, to hook it easily for
ambiguous timestamp tests (see also fakedirstatewritetime.py)

'if tr:' code path in this patch is still disabled at this revision,
because there is no client invoking 'dirstate.write()' with repo
object.

BTW, this patch changes 'dirstate.invalidate()' semantics around
'dirstate.write()' in a transaction scope:

  before:
    with repo.transaction():
        dirstate.CHANGE('A')
        dirstate.write() # change for A is written out here
        dirstate.CHANGE('B')
        dirstate.invalidate() # discards only change for B

  after:
    with repo.transaction():
        dirstate.CHANGE('A')
        dirstate.write() # change for A is still kept in memory
        dirstate.CHANGE('B')
        dirstate.invalidate() # discards changes for A and B

Fortunately, there is no code path expecting the former, at least, in
Mercurial itself, because 'dirstateguard' was introduced to remove
such 'dirstate.invalidate()'.
Yuya Nishihara - Oct. 11, 2015, 1:48 p.m.
On Sat, 10 Oct 2015 03:59:28 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1444416862 -32400
> #      Sat Oct 10 03:54:22 2015 +0900
> # Node ID fdbfcbe61c00c684c71335b3be483df5a20d526a
> # Parent  0c77ac7b070e4109765ee2e7759c40de7a52b2fc
> dirstate: make writing in-memory changes aware of transaction activity

> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -27,6 +27,15 @@
>      def join(self, obj, fname):
>          return obj._join(fname)
>  
> +def _getfsnow(vfs):
> +    '''Get "now" timestamp on filesystem'''
> +    tmpfd, tmpname = vfs.mkstemp()
> +    try:
> +        return os.fstat(tmpfd).st_mtime
> +    finally:
> +        os.close(tmpfd)
> +        vfs.unlink(tmpname)
[...]
> +        if tr:
> +            # 'dirstate.write()' is not only for writing in-memory
> +            # changes out, but also for dropping ambiguous timestamp.
> +            # delayed writing re-raise "ambiguous timestamp issue".
> +            # See also the wiki page below for detail:
> +            # https://www.mercurial-scm.org/wiki/DirstateTransactionPlan
> +
> +            # emulate dropping timestamp in 'parsers.pack_dirstate'
> +            now = _getfsnow(repo.vfs)
> +            dmap = self._map
> +            for f, e in dmap.iteritems():
> +                if e[0] == 'n' and e[3] == now:
> +                    dmap[f] = dirstatetuple(e[0], e[1], e[2], -1)

It seems "now" is float but "e[3]" is int, so "e[3] == now" won't be true
on modern filesystem.
Katsunori FUJIWARA - Oct. 11, 2015, 4:11 p.m.
At Sun, 11 Oct 2015 22:48:01 +0900,
Yuya Nishihara wrote:
> 
> On Sat, 10 Oct 2015 03:59:28 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1444416862 -32400
> > #      Sat Oct 10 03:54:22 2015 +0900
> > # Node ID fdbfcbe61c00c684c71335b3be483df5a20d526a
> > # Parent  0c77ac7b070e4109765ee2e7759c40de7a52b2fc
> > dirstate: make writing in-memory changes aware of transaction activity
> 
> > diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> > --- a/mercurial/dirstate.py
> > +++ b/mercurial/dirstate.py
> > @@ -27,6 +27,15 @@
> >      def join(self, obj, fname):
> >          return obj._join(fname)
> >  
> > +def _getfsnow(vfs):
> > +    '''Get "now" timestamp on filesystem'''
> > +    tmpfd, tmpname = vfs.mkstemp()
> > +    try:
> > +        return os.fstat(tmpfd).st_mtime
> > +    finally:
> > +        os.close(tmpfd)
> > +        vfs.unlink(tmpname)
> [...]
> > +        if tr:
> > +            # 'dirstate.write()' is not only for writing in-memory
> > +            # changes out, but also for dropping ambiguous timestamp.
> > +            # delayed writing re-raise "ambiguous timestamp issue".
> > +            # See also the wiki page below for detail:
> > +            # https://www.mercurial-scm.org/wiki/DirstateTransactionPlan
> > +
> > +            # emulate dropping timestamp in 'parsers.pack_dirstate'
> > +            now = _getfsnow(repo.vfs)
> > +            dmap = self._map
> > +            for f, e in dmap.iteritems():
> > +                if e[0] == 'n' and e[3] == now:
> > +                    dmap[f] = dirstatetuple(e[0], e[1], e[2], -1)
> 
> It seems "now" is float but "e[3]" is int, so "e[3] == now" won't be true
> on modern filesystem.

Oh, I forgot that now we should use 'util.statmtimesec' for st_mtime
in sec. Thank you for pointing out.

BTW, for consistent st_mtime handling, we should use
'util.statmtimesec'-ed st_mtime of 'util.fstat(st)' also for calling
'parsers.pack_dirstate()', shouldn't we ?

  - dirstate.write calls parsers.pack_dirstate" with
    util.fstat(st).st_mtime directly

    https://selenic.com/repo/hg/file/6e715040c172/mercurial/dirstate.py#l627

  - pack_dirstate() in parsers.c uses downcast "(uint32_t)now"

  - pack_dirstate() in pure/parsers.py uses downcast "int(now)"




----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - Oct. 11, 2015, 5:08 p.m.
On Mon, 12 Oct 2015 01:11:00 +0900, FUJIWARA Katsunori wrote:
> At Sun, 11 Oct 2015 22:48:01 +0900,
> Yuya Nishihara wrote:
> > 
> > On Sat, 10 Oct 2015 03:59:28 +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > > # Date 1444416862 -32400
> > > #      Sat Oct 10 03:54:22 2015 +0900
> > > # Node ID fdbfcbe61c00c684c71335b3be483df5a20d526a
> > > # Parent  0c77ac7b070e4109765ee2e7759c40de7a52b2fc
> > > dirstate: make writing in-memory changes aware of transaction activity
> > 
> > > diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> > > --- a/mercurial/dirstate.py
> > > +++ b/mercurial/dirstate.py
> > > @@ -27,6 +27,15 @@
> > >      def join(self, obj, fname):
> > >          return obj._join(fname)
> > >  
> > > +def _getfsnow(vfs):
> > > +    '''Get "now" timestamp on filesystem'''
> > > +    tmpfd, tmpname = vfs.mkstemp()
> > > +    try:
> > > +        return os.fstat(tmpfd).st_mtime
> > > +    finally:
> > > +        os.close(tmpfd)
> > > +        vfs.unlink(tmpname)
> > [...]
> > > +        if tr:
> > > +            # 'dirstate.write()' is not only for writing in-memory
> > > +            # changes out, but also for dropping ambiguous timestamp.
> > > +            # delayed writing re-raise "ambiguous timestamp issue".
> > > +            # See also the wiki page below for detail:
> > > +            # https://www.mercurial-scm.org/wiki/DirstateTransactionPlan
> > > +
> > > +            # emulate dropping timestamp in 'parsers.pack_dirstate'
> > > +            now = _getfsnow(repo.vfs)
> > > +            dmap = self._map
> > > +            for f, e in dmap.iteritems():
> > > +                if e[0] == 'n' and e[3] == now:
> > > +                    dmap[f] = dirstatetuple(e[0], e[1], e[2], -1)
> > 
> > It seems "now" is float but "e[3]" is int, so "e[3] == now" won't be true
> > on modern filesystem.
> 
> Oh, I forgot that now we should use 'util.statmtimesec' for st_mtime
> in sec. Thank you for pointing out.
> 
> BTW, for consistent st_mtime handling, we should use
> 'util.statmtimesec'-ed st_mtime of 'util.fstat(st)' also for calling
> 'parsers.pack_dirstate()', shouldn't we ?
> 
>   - dirstate.write calls parsers.pack_dirstate" with
>     util.fstat(st).st_mtime directly
> 
>     https://selenic.com/repo/hg/file/6e715040c172/mercurial/dirstate.py#l627
> 
>   - pack_dirstate() in parsers.c uses downcast "(uint32_t)now"
> 
>   - pack_dirstate() in pure/parsers.py uses downcast "int(now)"

Perhaps int will be good for consistency.

I didn't change them at 13272104bb07 because it might cause TypeError if
old .py calls new C pack_dirstate() or vice versa, and I thought floating-point
inaccuracy would be acceptable there from the comment in pure/parsers.py.

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -27,6 +27,15 @@ 
     def join(self, obj, fname):
         return obj._join(fname)
 
+def _getfsnow(vfs):
+    '''Get "now" timestamp on filesystem'''
+    tmpfd, tmpname = vfs.mkstemp()
+    try:
+        return os.fstat(tmpfd).st_mtime
+    finally:
+        os.close(tmpfd)
+        vfs.unlink(tmpname)
+
 class dirstate(object):
 
     def __init__(self, opener, ui, root, validate):
@@ -611,7 +620,7 @@ 
         self._pl = (parent, nullid)
         self._dirty = True
 
-    def write(self):
+    def write(self, repo=None):
         if not self._dirty:
             return
 
@@ -622,7 +631,41 @@ 
             import time # to avoid useless import
             time.sleep(delaywrite)
 
-        st = self._opener(self._filename, "w", atomictemp=True)
+        filename = self._filename
+        if not repo:
+            tr = None
+            if self._opener.lexists(self._pendingfilename):
+                # if pending file already exists, in-memory changes
+                # should be written into it, because it has priority
+                # to '.hg/dirstate' at reading under HG_PENDING mode
+                filename = self._pendingfilename
+        else:
+            tr = repo.currenttransaction()
+
+        if tr:
+            # 'dirstate.write()' is not only for writing in-memory
+            # changes out, but also for dropping ambiguous timestamp.
+            # delayed writing re-raise "ambiguous timestamp issue".
+            # See also the wiki page below for detail:
+            # https://www.mercurial-scm.org/wiki/DirstateTransactionPlan
+
+            # emulate dropping timestamp in 'parsers.pack_dirstate'
+            now = _getfsnow(repo.vfs)
+            dmap = self._map
+            for f, e in dmap.iteritems():
+                if e[0] == 'n' and e[3] == now:
+                    dmap[f] = dirstatetuple(e[0], e[1], e[2], -1)
+
+            # emulate that all 'dirstate.normal' results are written out
+            self._lastnormaltime = 0
+
+            # delay writing in-memory changes out
+            tr.addfilegenerator('dirstate', (self._filename,),
+                                self._writedirstate,
+                                location='plain', backup=False)
+            return
+
+        st = self._opener(filename, "w", atomictemp=True)
         self._writedirstate(st)
 
     def _writedirstate(self, st):
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1000,6 +1000,11 @@ 
         def releasefn(tr, success):
             repo = reporef()
             if success:
+                # this should be explicitly invoked here, because
+                # in-memory changes aren't written out at closing
+                # transaction, if tr.addfilegenerator (via
+                # dirstate.write or so) isn't invoked while
+                # transaction running
                 repo.dirstate.write()
             else:
                 # prevent in-memory changes from being written out at
diff --git a/tests/fakedirstatewritetime.py b/tests/fakedirstatewritetime.py
--- a/tests/fakedirstatewritetime.py
+++ b/tests/fakedirstatewritetime.py
@@ -5,7 +5,7 @@ 
 #   - 'workingctx._checklookup()' (= 'repo.status()')
 #   - 'committablectx.markcommitted()'
 
-from mercurial import context, extensions, parsers, util
+from mercurial import context, dirstate, extensions, parsers, util
 
 def pack_dirstate(fakenow, orig, dmap, copymap, pl, now):
     # execute what original parsers.pack_dirstate should do actually
@@ -35,13 +35,16 @@ 
     fakenow = float(timestamp)
 
     orig_pack_dirstate = parsers.pack_dirstate
+    orig_dirstate_getfsnow = dirstate._getfsnow
     wrapper = lambda *args: pack_dirstate(fakenow, orig_pack_dirstate, *args)
 
     parsers.pack_dirstate = wrapper
+    dirstate._getfsnow = lambda *args: fakenow
     try:
         return func()
     finally:
         parsers.pack_dirstate = orig_pack_dirstate
+        dirstate._getfsnow = orig_dirstate_getfsnow
 
 def _checklookup(orig, workingctx, files):
     ui = workingctx.repo().ui