Patchwork [2,of,7] clone: put streaming clones in a transaction

login
register
mail settings
Submitter Durham Goode
Date March 25, 2014, 2:33 a.m.
Message ID <08595987c5b0e8af5aa8.1395714831@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/4048/
State Superseded
Commit 925c2d6043890e64c26b2b802a2e681f2464c5b2
Headers show

Comments

Durham Goode - March 25, 2014, 2:33 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1395700700 25200
#      Mon Mar 24 15:38:20 2014 -0700
# Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f
# Parent  ab3be74e8022c31b7c95975fb09b3602ed7775d8
clone: put streaming clones in a transaction

Streaming clones were writing to files outside of a transaction. Soon the
fncache will be written at transaction close time, so we need streaming clones
to be in a transaction.
Matt Mackall - March 26, 2014, 7:24 p.m.
On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1395700700 25200
> #      Mon Mar 24 15:38:20 2014 -0700
> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f
> # Parent  ab3be74e8022c31b7c95975fb09b3602ed7775d8
> clone: put streaming clones in a transaction
> 
> Streaming clones were writing to files outside of a transaction. Soon the
> fncache will be written at transaction close time, so we need streaming clones
> to be in a transaction.

Hmm. But you're not adding the streamed files to the transaction? 

> +            # Writing straight to files circumvented the inmemory caches
> +            self.invalidate()

Isn't there a lock held here that will invalidate() on release?
Durham Goode - March 31, 2014, 5:22 p.m.
On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote:

>On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1395700700 25200
>> #      Mon Mar 24 15:38:20 2014 -0700
>> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f
>> # Parent  ab3be74e8022c31b7c95975fb09b3602ed7775d8
>> clone: put streaming clones in a transaction
>> 
>> Streaming clones were writing to files outside of a transaction. Soon
>>the
>> fncache will be written at transaction close time, so we need streaming
>>clones
>> to be in a transaction.
>
>Hmm. But you're not adding the streamed files to the transaction?

The clone command deletes the entire repo if something goes wrong, so
transacting these writes wasn't necessary.

> 
>
>> +            # Writing straight to files circumvented the inmemory
>>caches
>> +            self.invalidate()
>
>Isn't there a lock held here that will invalidate() on release?

No, things get invalidated at lock acquisition time, but not at lock
release (since they shouldn't be invalid since no one else should have
touched the disk while we had the lock). Regardless of that though, we're
doing this invalidate inside the lock so lock release won't have run yet.
Matt Mackall - March 31, 2014, 6:27 p.m.
On Mon, 2014-03-31 at 17:22 +0000, Durham Goode wrote:
> On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote:
> 
> >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1395700700 25200
> >> #      Mon Mar 24 15:38:20 2014 -0700
> >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f
> >> # Parent  ab3be74e8022c31b7c95975fb09b3602ed7775d8
> >> clone: put streaming clones in a transaction
> >> 
> >> Streaming clones were writing to files outside of a transaction. Soon
> >>the
> >> fncache will be written at transaction close time, so we need streaming
> >>clones
> >> to be in a transaction.
> >
> >Hmm. But you're not adding the streamed files to the transaction?
> 
> The clone command deletes the entire repo if something goes wrong, so
> transacting these writes wasn't necessary.

Sure.

> >> +            # Writing straight to files circumvented the inmemory
> >>caches
> >> +            self.invalidate()
> >
> >Isn't there a lock held here that will invalidate() on release?
> 
> No, things get invalidated at lock acquisition time, but not at lock
> release (since they shouldn't be invalid since no one else should have
> touched the disk while we had the lock).

Actually, both are places where we should be invalidating. But we
actually invalidate on unlock:

localrepo:def lock:

        l = self._lock(self.svfs, "lock", wait, unlock,
                      self.invalidate, _('repository %s') % self.origroot)

lock.py:def init:

    def __init__(self, vfs, file, timeout=-1, releasefn=None, desc=None):

lock.py:def release:

            if self.releasefn:
                self.releasefn()

That's because:

a) be reading before locking is wasted effort we should avoid
b) if we're holding a lock, we may have writers queued up to immediately
change what's on disk when we unlock
Durham Goode - March 31, 2014, 9:51 p.m.
On 3/31/14 11:27 AM, "Matt Mackall" <mpm@selenic.com> wrote:

>On Mon, 2014-03-31 at 17:22 +0000, Durham Goode wrote:
>> On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote:
>> 
>> >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote:
>> >> # HG changeset patch
>> >> # User Durham Goode <durham@fb.com>
>> >> # Date 1395700700 25200
>> >> #      Mon Mar 24 15:38:20 2014 -0700
>> >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f
>> >> # Parent  ab3be74e8022c31b7c95975fb09b3602ed7775d8
>> >> clone: put streaming clones in a transaction
>> >>
>> >> +            # Writing straight to files circumvented the inmemory
>> >>caches
>> >> +            self.invalidate()
>> >
>> >Isn't there a lock held here that will invalidate() on release?
>> 
>> No, things get invalidated at lock acquisition time, but not at lock
>> release (since they shouldn't be invalid since no one else should have
>> touched the disk while we had the lock).
>
>Actually, both are places where we should be invalidating. But we
>actually invalidate on unlock:
>
>localrepo:def lock:
>
>        l = self._lock(self.svfs, "lock", wait, unlock,
>                      self.invalidate, _('repository %s') % self.origroot)

I think you misread (or maybe I am). self.invalidate is being passed in as
the 'acquirefn' to localrepo._lock

    def _lock(self, vfs, lockname, wait, releasefn, acquirefn, desc):

And the acquirefn is not passed to the lock object.  'unlock', which is
the releasefn, doesn't call invalidate:

         def unlock():
            self.store.write()
            if hasunfilteredcache(self, '_phasecache'):
                self._phasecache.write()
            for k, ce in self._filecache.items():
                if k == 'dirstate' or k not in self.__dict__:
                    continue
                ce.refresh()

'unlock' is actually jumping through a few hoops to *avoid* invalidating
caches (by calling refresh on each filecache).
Matt Mackall - April 2, 2014, 11:40 p.m.
On Mon, 2014-03-31 at 21:51 +0000, Durham Goode wrote:
> On 3/31/14 11:27 AM, "Matt Mackall" <mpm@selenic.com> wrote:
> 
> >On Mon, 2014-03-31 at 17:22 +0000, Durham Goode wrote:
> >> On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote:
> >> 
> >> >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote:
> >> >> # HG changeset patch
> >> >> # User Durham Goode <durham@fb.com>
> >> >> # Date 1395700700 25200
> >> >> #      Mon Mar 24 15:38:20 2014 -0700
> >> >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f
> >> >> # Parent  ab3be74e8022c31b7c95975fb09b3602ed7775d8
> >> >> clone: put streaming clones in a transaction
> >> >>
> >> >> +            # Writing straight to files circumvented the inmemory
> >> >>caches
> >> >> +            self.invalidate()
> >> >
> >> >Isn't there a lock held here that will invalidate() on release?
> >> 
> >> No, things get invalidated at lock acquisition time, but not at lock
> >> release (since they shouldn't be invalid since no one else should have
> >> touched the disk while we had the lock).
> >
> >Actually, both are places where we should be invalidating. But we
> >actually invalidate on unlock:
> >
> >localrepo:def lock:
> >
> >        l = self._lock(self.svfs, "lock", wait, unlock,
> >                      self.invalidate, _('repository %s') % self.origroot)
> 
> I think you misread (or maybe I am). self.invalidate is being passed in as
> the 'acquirefn' to localrepo._lock
> 
>     def _lock(self, vfs, lockname, wait, releasefn, acquirefn, desc):
> 
> And the acquirefn is not passed to the lock object.  'unlock', which is
> the releasefn, doesn't call invalidate:
> 
>          def unlock():
>             self.store.write()
>             if hasunfilteredcache(self, '_phasecache'):
>                 self._phasecache.write()
>             for k, ce in self._filecache.items():
>                 if k == 'dirstate' or k not in self.__dict__:
>                     continue
>                 ce.refresh()
> 
> 'unlock' is actually jumping through a few hoops to *avoid* invalidating
> caches (by calling refresh on each filecache).

Yeah, my mistake here. Our terminology is all a little broken here: the
act of marking data in a cache potentially-stale is also invalidating.
And we need some form of invalidation at both acquisition and release
time.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -2022,26 +2022,36 @@ 
             handled_bytes = 0
             self.ui.progress(_('clone'), 0, total=total_bytes)
             start = time.time()
-            for i in xrange(total_files):
-                # XXX doesn't support '\n' or '\r' in filenames
-                l = fp.readline()
-                try:
-                    name, size = l.split('\0', 1)
-                    size = int(size)
-                except (ValueError, TypeError):
-                    raise error.ResponseError(
-                        _('unexpected response from remote server:'), l)
-                if self.ui.debugflag:
-                    self.ui.debug('adding %s (%s)\n' %
-                                  (name, util.bytecount(size)))
-                # for backwards compat, name was partially encoded
-                ofp = self.sopener(store.decodedir(name), 'w')
-                for chunk in util.filechunkiter(fp, limit=size):
-                    handled_bytes += len(chunk)
-                    self.ui.progress(_('clone'), handled_bytes,
-                                     total=total_bytes)
-                    ofp.write(chunk)
-                ofp.close()
+
+            tr = self.transaction(_('clone'))
+            try:
+                for i in xrange(total_files):
+                    # XXX doesn't support '\n' or '\r' in filenames
+                    l = fp.readline()
+                    try:
+                        name, size = l.split('\0', 1)
+                        size = int(size)
+                    except (ValueError, TypeError):
+                        raise error.ResponseError(
+                            _('unexpected response from remote server:'), l)
+                    if self.ui.debugflag:
+                        self.ui.debug('adding %s (%s)\n' %
+                                      (name, util.bytecount(size)))
+                    # for backwards compat, name was partially encoded
+                    ofp = self.sopener(store.decodedir(name), 'w')
+                    for chunk in util.filechunkiter(fp, limit=size):
+                        handled_bytes += len(chunk)
+                        self.ui.progress(_('clone'), handled_bytes,
+                                         total=total_bytes)
+                        ofp.write(chunk)
+                    ofp.close()
+                tr.close()
+            finally:
+                tr.release()
+
+            # Writing straight to files circumvented the inmemory caches
+            self.invalidate()
+
             elapsed = time.time() - start
             if elapsed <= 0:
                 elapsed = 0.001