Submitter | Durham Goode |
---|---|
Date | March 2, 2017, 12:37 a.m. |
Message ID | <f99c019063bea8a0b5f6.1488415055@dev111.prn1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/18856/ |
State | Changes Requested |
Delegated to: | Martin von Zweigbergk |
Headers | show |
Comments
For now, I'm just adding back mercurial-devel that I accidentally dropped in my reply (sorry, the default is different between Gmail and Inbox, and I use both). On Fri, Mar 3, 2017 at 10:05 AM, Durham Goode <durham@fb.com> wrote: > > > On 3/3/17 9:25 AM, Martin von Zweigbergk wrote: >> >> On Wed, Mar 1, 2017 at 4:37 PM, Durham Goode <durham@fb.com> wrote: >>> >>> # HG changeset patch >>> # User Durham Goode <durham@fb.com> >>> # Date 1488413981 28800 >>> # Wed Mar 01 16:19:41 2017 -0800 >>> # Node ID f99c019063bea8a0b5f66c1e79dbd0b98b1326d8 >>> # Parent 9c89185b63f574cad1d4009f6c9dd511a2cba986 >>> treemanifest: make node reuse match flat manifest behavior >>> >>> In a flat manifest, a node with the same content but different parents is >>> still >>> considered a new node. In the current tree manifests however, if the >>> content is >>> the same, we ignore the parents entirely and just reuse the existing >>> node. >> >> >> I think I copied that behavior from filelogs. > > > filelog.add() calls straight to revlog.addrevision() which only exits early > if the full p1+p2+data hash matches an existing full hash. In fact, it > couldn't exit early for a content-only match because it would have to > decompress the p1 content to even check. The only reason treemanifest could > do it is because it happens to always have the p1 in memory and available > for comparison. > >>> >>> In our external treemanifest extension, we want to allow having one >>> treemanifest >>> for every flat manifests, >> >> >> Nit: did you mean it the other way around? Because it sounds like you >> can already have one treemanifest per flat manifest, but not one flat >> manifest per treemanifest. But the next sentence says you want them >> 1:1, so maybe add "and vice versa" to be clear, or "one unique >> treemanifest per flat manifest". > > > K, I can add the word unique to it. When I said 'one x for every y' I meant > len(x) == len(y), but maybe it's not so clear. > >>> as a way of easeing the migration to treemanifests. To >>> make this possible, let's change the root node treemanifest behavior to >>> match >>> the behavior for flat manifests, so we can have a 1:1 relationship. >> >> >> Could you elaborate on what complications the current implementation >> causes for you? It may not be necessary to put it in an updated commit >> message, but I'd like to understand the reason better. > > > As part of our migration to treemanifest, I'm backfilling all the trees for > our entire repo, and I'm giving the root node the original flat-manifest > hash, so eventually we can even delete the flat manifest and our old commits > will still work. To make this possible, I need to have 1 unique treemanifest > for every flat manifest, so all the old manifest hashes can still be > referenced. > >> It's a little strange to me to use a different condition for the root >> manifest. If we think that we should always emit a new revision if >> either content or parents have changed, then maybe we should do that >> for all directories. If we don't, it seems like we should not do it >> for the root manifest either. In that case, I'd be happier to extract >> a function or two that makes it easier for you guys to override it >> internally instead. > > > One difference between the root manifest and the sub-trees is that > manifest.py does not control the p1/p2 for the root, but it does control the > p1/p2 for the sub-trees. If manifest.py controlled p1/p2 for the root, we > could be smart and not produce a new tree when the content matches p1 or p2. > But because an external caller provides p1/p2, if manifest.add(p1, p2, > content) returns a node that doesn't actually have p1 and p2 as the parents > (which is what can happen today), then we're not actually respecting what > the caller asked us to do. That's why we need to respect it for the root, > but not necessarily for the sub-trees. > > Usually Mercurial is good about not calling manifest.add(p1, p2, content) > when the content matches p1, which is why we don't see a ton of empty > manifest deltas in the repo. This does become an issue for merge commits > though, where the code doesn't just want to throw out the p2 information so > it calls mf.add(p1, p2, same_content), which is how we found the issue in > the first place. > > Technically we could apply this behavior to every sub-tree as well, but that > introduces complications. For instance, I think it'd mean we have to create > a new empty sub-tree for every commit, since for every sub-tree the content > is the same but the parents are not. > >>> >>> While this sounds like a BC breakage, it's not actually a state users can >>> normally get in because: A) you can't make empty commits, and B) even if >>> you try >>> to make an empty commit (by making a commit then amending it's changes >>> away), >>> the higher level commit logic in localrepo.commitctx() forces the commit >>> to use >>> the original p1 manifest node if no files were changed. So this would >>> only >>> affect extensions and automation that reached passed the normal >>> localrepo.commit() logic straight into the manifest logic. >> >> >> So what does the included test case do? There is no extension involved >> there. > > > It just verifies the existing behavior. I wrote it hoping it'd expose the > bug, but it just worked, which led me notice the localrepo.commitctx logic. > > >> >>> >>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py >>> --- a/mercurial/manifest.py >>> +++ b/mercurial/manifest.py >>> @@ -1233,7 +1233,7 @@ class manifestrevlog(revlog.revlog): >>> def _addtree(self, m, transaction, link, m1, m2, readtree): >>> # If the manifest is unchanged compared to one parent, >>> # don't write a new revision >>> - if m.unmodifiedsince(m1) or m.unmodifiedsince(m2): >>> + if self._dir != '' and (m.unmodifiedsince(m1) or >>> m.unmodifiedsince(m2)): >>> return m.node() >>> def writesubtree(subm, subp1, subp2): >>> sublog = self.dirlog(subm.dir()) >>> @@ -1241,13 +1241,17 @@ class manifestrevlog(revlog.revlog): >>> readtree=readtree) >>> m.writesubtrees(m1, m2, writesubtree) >>> text = m.dirtext(self._usemanifestv2) >>> - # Double-check whether contents are unchanged to one parent >>> - if text == m1.dirtext(self._usemanifestv2): >>> - n = m1.node() >>> - elif text == m2.dirtext(self._usemanifestv2): >>> - n = m2.node() >>> - else: >>> + n = None >>> + if self._dir != '': >>> + # Double-check whether contents are unchanged to one parent >>> + if text == m1.dirtext(self._usemanifestv2): >>> + n = m1.node() >>> + elif text == m2.dirtext(self._usemanifestv2): >>> + n = m2.node() >>> + >>> + if not n: >>> n = self.addrevision(text, transaction, link, m1.node(), >>> m2.node()) >>> + >>> # Save nodeid so parent manifest can calculate its nodeid >>> m.setnode(n) >>> return n >>> diff --git a/tests/test-treemanifest.t b/tests/test-treemanifest.t >>> --- a/tests/test-treemanifest.t >>> +++ b/tests/test-treemanifest.t >>> @@ -825,3 +825,13 @@ other branch >>> added 1 changesets with 1 changes to 1 files (+1 heads) >>> (run 'hg heads' to see heads, 'hg merge' to merge) >>> >>> +Committing a empty commit does not duplicate root treemanifest >>> + $ echo z >> z >>> + $ hg commit -Aqm 'pre-empty commit' >>> + $ hg rm z >>> + $ hg commit --amend -m 'empty commit' >>> + saved backup bundle to >>> $TESTTMP/grafted-dir-repo-clone/.hg/strip-backup/cb99d5717cea-de37743b-amend-backup.hg >>> (glob) >> >> >> I slightly simpler way of creating a commit with not manifest changes: >> >> $ hg branch foo >> $ hg ci -m 'new branch' >> >>> + $ hg log -r 'tip + tip^' -T '{manifest}\n' >>> + 1:678d3574b88c >>> + 1:678d3574b88c >>> + $ hg --config extensions.strip= strip -r . -q
On Fri, Mar 3, 2017 at 10:35 AM, Martin von Zweigbergk <martinvonz@google.com> wrote: > For now, I'm just adding back mercurial-devel that I accidentally > dropped in my reply (sorry, the default is different between Gmail and > Inbox, and I use both). > > On Fri, Mar 3, 2017 at 10:05 AM, Durham Goode <durham@fb.com> wrote: >> >> >> On 3/3/17 9:25 AM, Martin von Zweigbergk wrote: >>> >>> On Wed, Mar 1, 2017 at 4:37 PM, Durham Goode <durham@fb.com> wrote: >>>> >>>> # HG changeset patch >>>> # User Durham Goode <durham@fb.com> >>>> # Date 1488413981 28800 >>>> # Wed Mar 01 16:19:41 2017 -0800 >>>> # Node ID f99c019063bea8a0b5f66c1e79dbd0b98b1326d8 >>>> # Parent 9c89185b63f574cad1d4009f6c9dd511a2cba986 >>>> treemanifest: make node reuse match flat manifest behavior Durham and I looked into this a bit more and compared it to how filelogs and manifests behave. I do wonder if it wouldn't be more consistent also create new manifest revisions for subdirectories that have changed on at least one side of the merge, but I also don't know that it matters. We can do that later. So, I've queued this patch, thanks. >>>> >>>> In a flat manifest, a node with the same content but different parents is >>>> still >>>> considered a new node. In the current tree manifests however, if the >>>> content is >>>> the same, we ignore the parents entirely and just reuse the existing >>>> node. >>> >>> >>> I think I copied that behavior from filelogs. >> >> >> filelog.add() calls straight to revlog.addrevision() which only exits early >> if the full p1+p2+data hash matches an existing full hash. In fact, it >> couldn't exit early for a content-only match because it would have to >> decompress the p1 content to even check. The only reason treemanifest could >> do it is because it happens to always have the p1 in memory and available >> for comparison. >> >>>> >>>> In our external treemanifest extension, we want to allow having one >>>> treemanifest >>>> for every flat manifests, >>> >>> >>> Nit: did you mean it the other way around? Because it sounds like you >>> can already have one treemanifest per flat manifest, but not one flat >>> manifest per treemanifest. But the next sentence says you want them >>> 1:1, so maybe add "and vice versa" to be clear, or "one unique >>> treemanifest per flat manifest". >> >> >> K, I can add the word unique to it. When I said 'one x for every y' I meant >> len(x) == len(y), but maybe it's not so clear. >> >>>> as a way of easeing the migration to treemanifests. To >>>> make this possible, let's change the root node treemanifest behavior to >>>> match >>>> the behavior for flat manifests, so we can have a 1:1 relationship. >>> >>> >>> Could you elaborate on what complications the current implementation >>> causes for you? It may not be necessary to put it in an updated commit >>> message, but I'd like to understand the reason better. >> >> >> As part of our migration to treemanifest, I'm backfilling all the trees for >> our entire repo, and I'm giving the root node the original flat-manifest >> hash, so eventually we can even delete the flat manifest and our old commits >> will still work. To make this possible, I need to have 1 unique treemanifest >> for every flat manifest, so all the old manifest hashes can still be >> referenced. >> >>> It's a little strange to me to use a different condition for the root >>> manifest. If we think that we should always emit a new revision if >>> either content or parents have changed, then maybe we should do that >>> for all directories. If we don't, it seems like we should not do it >>> for the root manifest either. In that case, I'd be happier to extract >>> a function or two that makes it easier for you guys to override it >>> internally instead. >> >> >> One difference between the root manifest and the sub-trees is that >> manifest.py does not control the p1/p2 for the root, but it does control the >> p1/p2 for the sub-trees. If manifest.py controlled p1/p2 for the root, we >> could be smart and not produce a new tree when the content matches p1 or p2. >> But because an external caller provides p1/p2, if manifest.add(p1, p2, >> content) returns a node that doesn't actually have p1 and p2 as the parents >> (which is what can happen today), then we're not actually respecting what >> the caller asked us to do. That's why we need to respect it for the root, >> but not necessarily for the sub-trees. >> >> Usually Mercurial is good about not calling manifest.add(p1, p2, content) >> when the content matches p1, which is why we don't see a ton of empty >> manifest deltas in the repo. This does become an issue for merge commits >> though, where the code doesn't just want to throw out the p2 information so >> it calls mf.add(p1, p2, same_content), which is how we found the issue in >> the first place. >> >> Technically we could apply this behavior to every sub-tree as well, but that >> introduces complications. For instance, I think it'd mean we have to create >> a new empty sub-tree for every commit, since for every sub-tree the content >> is the same but the parents are not. >> >>>> >>>> While this sounds like a BC breakage, it's not actually a state users can >>>> normally get in because: A) you can't make empty commits, and B) even if >>>> you try >>>> to make an empty commit (by making a commit then amending it's changes >>>> away), >>>> the higher level commit logic in localrepo.commitctx() forces the commit >>>> to use >>>> the original p1 manifest node if no files were changed. So this would >>>> only >>>> affect extensions and automation that reached passed the normal >>>> localrepo.commit() logic straight into the manifest logic. >>> >>> >>> So what does the included test case do? There is no extension involved >>> there. >> >> >> It just verifies the existing behavior. I wrote it hoping it'd expose the >> bug, but it just worked, which led me notice the localrepo.commitctx logic. >> >> >>> >>>> >>>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py >>>> --- a/mercurial/manifest.py >>>> +++ b/mercurial/manifest.py >>>> @@ -1233,7 +1233,7 @@ class manifestrevlog(revlog.revlog): >>>> def _addtree(self, m, transaction, link, m1, m2, readtree): >>>> # If the manifest is unchanged compared to one parent, >>>> # don't write a new revision >>>> - if m.unmodifiedsince(m1) or m.unmodifiedsince(m2): >>>> + if self._dir != '' and (m.unmodifiedsince(m1) or >>>> m.unmodifiedsince(m2)): >>>> return m.node() >>>> def writesubtree(subm, subp1, subp2): >>>> sublog = self.dirlog(subm.dir()) >>>> @@ -1241,13 +1241,17 @@ class manifestrevlog(revlog.revlog): >>>> readtree=readtree) >>>> m.writesubtrees(m1, m2, writesubtree) >>>> text = m.dirtext(self._usemanifestv2) >>>> - # Double-check whether contents are unchanged to one parent >>>> - if text == m1.dirtext(self._usemanifestv2): >>>> - n = m1.node() >>>> - elif text == m2.dirtext(self._usemanifestv2): >>>> - n = m2.node() >>>> - else: >>>> + n = None >>>> + if self._dir != '': >>>> + # Double-check whether contents are unchanged to one parent >>>> + if text == m1.dirtext(self._usemanifestv2): >>>> + n = m1.node() >>>> + elif text == m2.dirtext(self._usemanifestv2): >>>> + n = m2.node() >>>> + >>>> + if not n: >>>> n = self.addrevision(text, transaction, link, m1.node(), >>>> m2.node()) >>>> + >>>> # Save nodeid so parent manifest can calculate its nodeid >>>> m.setnode(n) >>>> return n >>>> diff --git a/tests/test-treemanifest.t b/tests/test-treemanifest.t >>>> --- a/tests/test-treemanifest.t >>>> +++ b/tests/test-treemanifest.t >>>> @@ -825,3 +825,13 @@ other branch >>>> added 1 changesets with 1 changes to 1 files (+1 heads) >>>> (run 'hg heads' to see heads, 'hg merge' to merge) >>>> >>>> +Committing a empty commit does not duplicate root treemanifest >>>> + $ echo z >> z >>>> + $ hg commit -Aqm 'pre-empty commit' >>>> + $ hg rm z >>>> + $ hg commit --amend -m 'empty commit' >>>> + saved backup bundle to >>>> $TESTTMP/grafted-dir-repo-clone/.hg/strip-backup/cb99d5717cea-de37743b-amend-backup.hg >>>> (glob) >>> >>> >>> I slightly simpler way of creating a commit with not manifest changes: >>> >>> $ hg branch foo >>> $ hg ci -m 'new branch' >>> >>>> + $ hg log -r 'tip + tip^' -T '{manifest}\n' >>>> + 1:678d3574b88c >>>> + 1:678d3574b88c >>>> + $ hg --config extensions.strip= strip -r . -q
Patch
diff --git a/mercurial/manifest.py b/mercurial/manifest.py --- a/mercurial/manifest.py +++ b/mercurial/manifest.py @@ -1233,7 +1233,7 @@ class manifestrevlog(revlog.revlog): def _addtree(self, m, transaction, link, m1, m2, readtree): # If the manifest is unchanged compared to one parent, # don't write a new revision - if m.unmodifiedsince(m1) or m.unmodifiedsince(m2): + if self._dir != '' and (m.unmodifiedsince(m1) or m.unmodifiedsince(m2)): return m.node() def writesubtree(subm, subp1, subp2): sublog = self.dirlog(subm.dir()) @@ -1241,13 +1241,17 @@ class manifestrevlog(revlog.revlog): readtree=readtree) m.writesubtrees(m1, m2, writesubtree) text = m.dirtext(self._usemanifestv2) - # Double-check whether contents are unchanged to one parent - if text == m1.dirtext(self._usemanifestv2): - n = m1.node() - elif text == m2.dirtext(self._usemanifestv2): - n = m2.node() - else: + n = None + if self._dir != '': + # Double-check whether contents are unchanged to one parent + if text == m1.dirtext(self._usemanifestv2): + n = m1.node() + elif text == m2.dirtext(self._usemanifestv2): + n = m2.node() + + if not n: n = self.addrevision(text, transaction, link, m1.node(), m2.node()) + # Save nodeid so parent manifest can calculate its nodeid m.setnode(n) return n diff --git a/tests/test-treemanifest.t b/tests/test-treemanifest.t --- a/tests/test-treemanifest.t +++ b/tests/test-treemanifest.t @@ -825,3 +825,13 @@ other branch added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge) +Committing a empty commit does not duplicate root treemanifest + $ echo z >> z + $ hg commit -Aqm 'pre-empty commit' + $ hg rm z + $ hg commit --amend -m 'empty commit' + saved backup bundle to $TESTTMP/grafted-dir-repo-clone/.hg/strip-backup/cb99d5717cea-de37743b-amend-backup.hg (glob) + $ hg log -r 'tip + tip^' -T '{manifest}\n' + 1:678d3574b88c + 1:678d3574b88c + $ hg --config extensions.strip= strip -r . -q