Submitter | Durham Goode |
---|---|
Date | March 25, 2014, 2:33 a.m. |
Message ID | <08595987c5b0e8af5aa8.1395714831@dev2000.prn2.facebook.com> |
Download | mbox | patch |
Permalink | /patch/4048/ |
State | Superseded |
Commit | 925c2d6043890e64c26b2b802a2e681f2464c5b2 |
Headers | show |
Comments
On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1395700700 25200 > # Mon Mar 24 15:38:20 2014 -0700 > # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f > # Parent ab3be74e8022c31b7c95975fb09b3602ed7775d8 > clone: put streaming clones in a transaction > > Streaming clones were writing to files outside of a transaction. Soon the > fncache will be written at transaction close time, so we need streaming clones > to be in a transaction. Hmm. But you're not adding the streamed files to the transaction? > + # Writing straight to files circumvented the inmemory caches > + self.invalidate() Isn't there a lock held here that will invalidate() on release?
On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote: >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote: >> # HG changeset patch >> # User Durham Goode <durham@fb.com> >> # Date 1395700700 25200 >> # Mon Mar 24 15:38:20 2014 -0700 >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f >> # Parent ab3be74e8022c31b7c95975fb09b3602ed7775d8 >> clone: put streaming clones in a transaction >> >> Streaming clones were writing to files outside of a transaction. Soon >>the >> fncache will be written at transaction close time, so we need streaming >>clones >> to be in a transaction. > >Hmm. But you're not adding the streamed files to the transaction? The clone command deletes the entire repo if something goes wrong, so transacting these writes wasn't necessary. > > >> + # Writing straight to files circumvented the inmemory >>caches >> + self.invalidate() > >Isn't there a lock held here that will invalidate() on release? No, things get invalidated at lock acquisition time, but not at lock release (since they shouldn't be invalid since no one else should have touched the disk while we had the lock). Regardless of that though, we're doing this invalidate inside the lock so lock release won't have run yet.
On Mon, 2014-03-31 at 17:22 +0000, Durham Goode wrote: > On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote: > > >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote: > >> # HG changeset patch > >> # User Durham Goode <durham@fb.com> > >> # Date 1395700700 25200 > >> # Mon Mar 24 15:38:20 2014 -0700 > >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f > >> # Parent ab3be74e8022c31b7c95975fb09b3602ed7775d8 > >> clone: put streaming clones in a transaction > >> > >> Streaming clones were writing to files outside of a transaction. Soon > >>the > >> fncache will be written at transaction close time, so we need streaming > >>clones > >> to be in a transaction. > > > >Hmm. But you're not adding the streamed files to the transaction? > > The clone command deletes the entire repo if something goes wrong, so > transacting these writes wasn't necessary. Sure. > >> + # Writing straight to files circumvented the inmemory > >>caches > >> + self.invalidate() > > > >Isn't there a lock held here that will invalidate() on release? > > No, things get invalidated at lock acquisition time, but not at lock > release (since they shouldn't be invalid since no one else should have > touched the disk while we had the lock). Actually, both are places where we should be invalidating. But we actually invalidate on unlock: localrepo:def lock: l = self._lock(self.svfs, "lock", wait, unlock, self.invalidate, _('repository %s') % self.origroot) lock.py:def init: def __init__(self, vfs, file, timeout=-1, releasefn=None, desc=None): lock.py:def release: if self.releasefn: self.releasefn() That's because: a) be reading before locking is wasted effort we should avoid b) if we're holding a lock, we may have writers queued up to immediately change what's on disk when we unlock
On 3/31/14 11:27 AM, "Matt Mackall" <mpm@selenic.com> wrote: >On Mon, 2014-03-31 at 17:22 +0000, Durham Goode wrote: >> On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote: >> >> >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote: >> >> # HG changeset patch >> >> # User Durham Goode <durham@fb.com> >> >> # Date 1395700700 25200 >> >> # Mon Mar 24 15:38:20 2014 -0700 >> >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f >> >> # Parent ab3be74e8022c31b7c95975fb09b3602ed7775d8 >> >> clone: put streaming clones in a transaction >> >> >> >> + # Writing straight to files circumvented the inmemory >> >>caches >> >> + self.invalidate() >> > >> >Isn't there a lock held here that will invalidate() on release? >> >> No, things get invalidated at lock acquisition time, but not at lock >> release (since they shouldn't be invalid since no one else should have >> touched the disk while we had the lock). > >Actually, both are places where we should be invalidating. But we >actually invalidate on unlock: > >localrepo:def lock: > > l = self._lock(self.svfs, "lock", wait, unlock, > self.invalidate, _('repository %s') % self.origroot) I think you misread (or maybe I am). self.invalidate is being passed in as the 'acquirefn' to localrepo._lock def _lock(self, vfs, lockname, wait, releasefn, acquirefn, desc): And the acquirefn is not passed to the lock object. 'unlock', which is the releasefn, doesn't call invalidate: def unlock(): self.store.write() if hasunfilteredcache(self, '_phasecache'): self._phasecache.write() for k, ce in self._filecache.items(): if k == 'dirstate' or k not in self.__dict__: continue ce.refresh() 'unlock' is actually jumping through a few hoops to *avoid* invalidating caches (by calling refresh on each filecache).
On Mon, 2014-03-31 at 21:51 +0000, Durham Goode wrote: > On 3/31/14 11:27 AM, "Matt Mackall" <mpm@selenic.com> wrote: > > >On Mon, 2014-03-31 at 17:22 +0000, Durham Goode wrote: > >> On 3/26/14 12:24 PM, "Matt Mackall" <mpm@selenic.com> wrote: > >> > >> >On Mon, 2014-03-24 at 19:33 -0700, Durham Goode wrote: > >> >> # HG changeset patch > >> >> # User Durham Goode <durham@fb.com> > >> >> # Date 1395700700 25200 > >> >> # Mon Mar 24 15:38:20 2014 -0700 > >> >> # Node ID 08595987c5b0e8af5aa8fec4debd7260f5a79e8f > >> >> # Parent ab3be74e8022c31b7c95975fb09b3602ed7775d8 > >> >> clone: put streaming clones in a transaction > >> >> > >> >> + # Writing straight to files circumvented the inmemory > >> >>caches > >> >> + self.invalidate() > >> > > >> >Isn't there a lock held here that will invalidate() on release? > >> > >> No, things get invalidated at lock acquisition time, but not at lock > >> release (since they shouldn't be invalid since no one else should have > >> touched the disk while we had the lock). > > > >Actually, both are places where we should be invalidating. But we > >actually invalidate on unlock: > > > >localrepo:def lock: > > > > l = self._lock(self.svfs, "lock", wait, unlock, > > self.invalidate, _('repository %s') % self.origroot) > > I think you misread (or maybe I am). self.invalidate is being passed in as > the 'acquirefn' to localrepo._lock > > def _lock(self, vfs, lockname, wait, releasefn, acquirefn, desc): > > And the acquirefn is not passed to the lock object. 'unlock', which is > the releasefn, doesn't call invalidate: > > def unlock(): > self.store.write() > if hasunfilteredcache(self, '_phasecache'): > self._phasecache.write() > for k, ce in self._filecache.items(): > if k == 'dirstate' or k not in self.__dict__: > continue > ce.refresh() > > 'unlock' is actually jumping through a few hoops to *avoid* invalidating > caches (by calling refresh on each filecache). Yeah, my mistake here. Our terminology is all a little broken here: the act of marking data in a cache potentially-stale is also invalidating. And we need some form of invalidation at both acquisition and release time.
Patch
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py --- a/mercurial/localrepo.py +++ b/mercurial/localrepo.py @@ -2022,26 +2022,36 @@ handled_bytes = 0 self.ui.progress(_('clone'), 0, total=total_bytes) start = time.time() - for i in xrange(total_files): - # XXX doesn't support '\n' or '\r' in filenames - l = fp.readline() - try: - name, size = l.split('\0', 1) - size = int(size) - except (ValueError, TypeError): - raise error.ResponseError( - _('unexpected response from remote server:'), l) - if self.ui.debugflag: - self.ui.debug('adding %s (%s)\n' % - (name, util.bytecount(size))) - # for backwards compat, name was partially encoded - ofp = self.sopener(store.decodedir(name), 'w') - for chunk in util.filechunkiter(fp, limit=size): - handled_bytes += len(chunk) - self.ui.progress(_('clone'), handled_bytes, - total=total_bytes) - ofp.write(chunk) - ofp.close() + + tr = self.transaction(_('clone')) + try: + for i in xrange(total_files): + # XXX doesn't support '\n' or '\r' in filenames + l = fp.readline() + try: + name, size = l.split('\0', 1) + size = int(size) + except (ValueError, TypeError): + raise error.ResponseError( + _('unexpected response from remote server:'), l) + if self.ui.debugflag: + self.ui.debug('adding %s (%s)\n' % + (name, util.bytecount(size))) + # for backwards compat, name was partially encoded + ofp = self.sopener(store.decodedir(name), 'w') + for chunk in util.filechunkiter(fp, limit=size): + handled_bytes += len(chunk) + self.ui.progress(_('clone'), handled_bytes, + total=total_bytes) + ofp.write(chunk) + ofp.close() + tr.close() + finally: + tr.release() + + # Writing straight to files circumvented the inmemory caches + self.invalidate() + elapsed = time.time() - start if elapsed <= 0: elapsed = 0.001