Patchwork [3,of,3] test-convert: demonstrate an unstable hash issue for bzr -> hg -> hg

login
register
mail settings
Submitter Matt Harbison
Date July 6, 2018, 2:11 a.m.
Message ID <7ba60a5ffc69e2fc713f.1530843095@Envy>
Download mbox | patch
Permalink /patch/32662/
State Accepted
Headers show

Comments

Matt Harbison - July 6, 2018, 2:11 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1530817649 14400
#      Thu Jul 05 15:07:29 2018 -0400
# Node ID 7ba60a5ffc69e2fc713f0510a7b53e36b3dbdc10
# Parent  7871e05503668294a5cf35697fe51db0045d8086
test-convert: demonstrate an unstable hash issue for bzr -> hg -> hg

It looks like the manifest value changing is the only difference, but I'm not
sure why it's happening.  I've got a similar divergence in a production repo
that was also converted from bzr and has an octopus merge[1].  Unlike here, the
manifest values for the destination merge commits reflect the initial merge
only, instead of all four merges agreeing like this test.

    $ hg -R src_repo manifest -r 310 --debug | grep file  # octopus fixup merge
    2d8775bc2481bd28ac87038ecdf33e1dbddc80e9 644   file1
    6adb9353a55bb8be76e71382efc724ec3ccf7ed5 644   file2

    $ hg -R src_repo manifest -r 309 --debug | grep file  # first merge
    362e7cb5163153c4989daad1a834871ae849f205 644   file1
    2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2

    $ hg -R dst_repo manifest -r 273 --debug | grep file  # octopus fixup merge
    362e7cb5163153c4989daad1a834871ae849f205 644   file1
    2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2

    $ hg -R dst_repo manifest -r 272 --debug | grep file  # first merge
    362e7cb5163153c4989daad1a834871ae849f205 644   file1
    2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2

This divergence is espcially annoying because unlike changelog differences, I
haven't figured out a way to fix this in code.  The only way I found to work
around it is to convert up to the point of divergence, `hg bundle` the bad
revision in the source, apply it to the destination, add a line to the shamap,
and fire off the conversion again.

But I suspect that there's more to it than just the octopus merge because
I also have a commit in the same repo, done in Mercurial (well after the
conversion) that is exhibiting a similar issue (and it's not a merge commit).
I'm almost positive that it was created with 4.4 or later.  Any ideas?

[1] https://www.mercurial-scm.org/pipermail/mercurial/2018-June/050924.html
Yuya Nishihara - July 6, 2018, 3:51 p.m.
On Thu, 05 Jul 2018 22:11:35 -0400, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1530817649 14400
> #      Thu Jul 05 15:07:29 2018 -0400
> # Node ID 7ba60a5ffc69e2fc713f0510a7b53e36b3dbdc10
> # Parent  7871e05503668294a5cf35697fe51db0045d8086
> test-convert: demonstrate an unstable hash issue for bzr -> hg -> hg
> 
> It looks like the manifest value changing is the only difference, but I'm not
> sure why it's happening.  I've got a similar divergence in a production repo
> that was also converted from bzr and has an octopus merge[1].  Unlike here, the
> manifest values for the destination merge commits reflect the initial merge
> only, instead of all four merges agreeing like this test.
> 
>     $ hg -R src_repo manifest -r 310 --debug | grep file  # octopus fixup merge
>     2d8775bc2481bd28ac87038ecdf33e1dbddc80e9 644   file1
>     6adb9353a55bb8be76e71382efc724ec3ccf7ed5 644   file2
> 
>     $ hg -R src_repo manifest -r 309 --debug | grep file  # first merge
>     362e7cb5163153c4989daad1a834871ae849f205 644   file1
>     2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2
> 
>     $ hg -R dst_repo manifest -r 273 --debug | grep file  # octopus fixup merge
>     362e7cb5163153c4989daad1a834871ae849f205 644   file1
>     2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2
> 
>     $ hg -R dst_repo manifest -r 272 --debug | grep file  # first merge
>     362e7cb5163153c4989daad1a834871ae849f205 644   file1
>     2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2
> 
> This divergence is espcially annoying because unlike changelog differences, I
> haven't figured out a way to fix this in code.  The only way I found to work
> around it is to convert up to the point of divergence, `hg bundle` the bad
> revision in the source, apply it to the destination, add a line to the shamap,
> and fire off the conversion again.
> 
> But I suspect that there's more to it than just the octopus merge because
> I also have a commit in the same repo, done in Mercurial (well after the
> conversion) that is exhibiting a similar issue (and it's not a merge commit).
> I'm almost positive that it was created with 4.4 or later.  Any ideas?
> 
> [1] https://www.mercurial-scm.org/pipermail/mercurial/2018-June/050924.html

> +  $ hg -R source-hg manifest --debug -r tip
> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
> +  $ hg -R source-hg manifest --debug -r 'tip^'
> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
> +
> +  $ hg -R hg2hg manifest --debug -r tip
> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
> +  $ hg -R hg2hg manifest --debug -r 'tip^'
> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent

Appears that the source-hg repository contains an unchanged manifest entry
in a way that ctx.files() will get empty (and thus p1 manifest will be reused)
on hg2hg conversion.

https://www.mercurial-scm.org/repo/hg/file/4.6.2/mercurial/localrepo.py#l2031

   $ hg -R source-hg debugindex -m
    rev linkrev nodeid       p1           p2
      0       0 4c7fb614fc68 000000000000 000000000000
      1       1 f75efda72db5 4c7fb614fc68 000000000000
      2       2 bdc9800ba402 4c7fb614fc68 000000000000
      3       3 59e0bab8c58e bdc9800ba402 000000000000
      4       4 daa315d56a98 bdc9800ba402 f75efda72db5
      5       5 1109e42bdcbd daa315d56a98 59e0bab8c58e
   $ hg -R hg2hg debugindex -m
    rev linkrev nodeid       p1           p2
      0       0 4c7fb614fc68 000000000000 000000000000
      1       1 f75efda72db5 4c7fb614fc68 000000000000
      2       2 bdc9800ba402 4c7fb614fc68 000000000000
      3       3 59e0bab8c58e bdc9800ba402 000000000000
      4       4 daa315d56a98 bdc9800ba402 f75efda72db5

I don't know where things goes wrong, but I suspect that a commit using memctx
could make such entry somehow.
Matt Harbison - July 7, 2018, 4:19 a.m.
On Fri, 06 Jul 2018 11:51:07 -0400, Yuya Nishihara <yuya@tcha.org> wrote:

> On Thu, 05 Jul 2018 22:11:35 -0400, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1530817649 14400
>> #      Thu Jul 05 15:07:29 2018 -0400
>> # Node ID 7ba60a5ffc69e2fc713f0510a7b53e36b3dbdc10
>> # Parent  7871e05503668294a5cf35697fe51db0045d8086
>> test-convert: demonstrate an unstable hash issue for bzr -> hg -> hg
>>
>> It looks like the manifest value changing is the only difference, but  
>> I'm not
>> sure why it's happening.  I've got a similar divergence in a production  
>> repo
>> that was also converted from bzr and has an octopus merge[1].  Unlike  
>> here, the
>> manifest values for the destination merge commits reflect the initial  
>> merge
>> only, instead of all four merges agreeing like this test.
>>
>>     $ hg -R src_repo manifest -r 310 --debug | grep file  # octopus  
>> fixup merge
>>     2d8775bc2481bd28ac87038ecdf33e1dbddc80e9 644   file1
>>     6adb9353a55bb8be76e71382efc724ec3ccf7ed5 644   file2
>>
>>     $ hg -R src_repo manifest -r 309 --debug | grep file  # first merge
>>     362e7cb5163153c4989daad1a834871ae849f205 644   file1
>>     2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2
>>
>>     $ hg -R dst_repo manifest -r 273 --debug | grep file  # octopus  
>> fixup merge
>>     362e7cb5163153c4989daad1a834871ae849f205 644   file1
>>     2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2
>>
>>     $ hg -R dst_repo manifest -r 272 --debug | grep file  # first merge
>>     362e7cb5163153c4989daad1a834871ae849f205 644   file1
>>     2c65d947191938c3ea616b7ceb7648ff3843261f 644   file2
>>
>> This divergence is espcially annoying because unlike changelog  
>> differences, I
>> haven't figured out a way to fix this in code.  The only way I found to  
>> work
>> around it is to convert up to the point of divergence, `hg bundle` the  
>> bad
>> revision in the source, apply it to the destination, add a line to the  
>> shamap,
>> and fire off the conversion again.
>>
>> But I suspect that there's more to it than just the octopus merge  
>> because
>> I also have a commit in the same repo, done in Mercurial (well after the
>> conversion) that is exhibiting a similar issue (and it's not a merge  
>> commit).
>> I'm almost positive that it was created with 4.4 or later.  Any ideas?
>>
>> [1]  
>> https://www.mercurial-scm.org/pipermail/mercurial/2018-June/050924.html
>
>> +  $ hg -R source-hg manifest --debug -r tip
>> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
>> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
>> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
>> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
>> +  $ hg -R source-hg manifest --debug -r 'tip^'
>> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
>> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
>> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
>> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
>> +
>> +  $ hg -R hg2hg manifest --debug -r tip
>> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
>> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
>> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
>> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
>> +  $ hg -R hg2hg manifest --debug -r 'tip^'
>> +  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
>> +  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
>> +  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
>> +  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
>
> Appears that the source-hg repository contains an unchanged manifest  
> entry
> in a way that ctx.files() will get empty (and thus p1 manifest will be  
> reused)
> on hg2hg conversion.
>
> https://www.mercurial-scm.org/repo/hg/file/4.6.2/mercurial/localrepo.py#l2031
>
>    $ hg -R source-hg debugindex -m
>     rev linkrev nodeid       p1           p2
>       0       0 4c7fb614fc68 000000000000 000000000000
>       1       1 f75efda72db5 4c7fb614fc68 000000000000
>       2       2 bdc9800ba402 4c7fb614fc68 000000000000
>       3       3 59e0bab8c58e bdc9800ba402 000000000000
>       4       4 daa315d56a98 bdc9800ba402 f75efda72db5
>       5       5 1109e42bdcbd daa315d56a98 59e0bab8c58e
>    $ hg -R hg2hg debugindex -m
>     rev linkrev nodeid       p1           p2
>       0       0 4c7fb614fc68 000000000000 000000000000
>       1       1 f75efda72db5 4c7fb614fc68 000000000000
>       2       2 bdc9800ba402 4c7fb614fc68 000000000000
>       3       3 59e0bab8c58e bdc9800ba402 000000000000
>       4       4 daa315d56a98 bdc9800ba402 f75efda72db5
>
> I don't know where things goes wrong, but I suspect that a commit using  
> memctx
> could make such entry somehow.

Thanks, I wasn't aware of this command.

It looks like in the test case, the dict of files passed to  
sink.putcommit() is empty for the fixup merge in hg2hg.  In the initial  
bzr -> hg processing, the octopus merge gets a list of files on input, and  
both loop iterations in putcommit() use that same list.  So memctx doesn't  
seem to be the issue here, and it seems like the source class decided  
nothing changed.  Eventually I saw that passing --full fixed it for the  
test.  I wonder if it would be feasible to detect when this happens in the  
source (but I'm not sure what all needs to be handled), and temporarily  
switch to --full mode.  (But that has to be coordinated with the sink.)  I  
can't think of a scenario where non-merges would have this problem.

When I tried --full on the production repo giving me problems, it made a  
10 minute convert of ~1700 commits take 2 hours to only get a couple  
hundred commits processed.  It also seemed to change other commit hashes  
that weren't previously a problem, but then I realized I was using a  
hacked up hg.  So it's running over the weekend with -stable to see what  
happens.

Patch

diff --git a/tests/test-convert-bzr-merges.t b/tests/test-convert-bzr-merges.t
--- a/tests/test-convert-bzr-merges.t
+++ b/tests/test-convert-bzr-merges.t
@@ -69,4 +69,76 @@  test multiple merges at once
   644   file-branch2
   644   file-parent
 
+  $ hg convert source-hg hg2hg
+  initializing destination hg2hg repository
+  scanning source...
+  sorting...
+  converting...
+  5 Initial add
+  4 Added branch1 file
+  3 Added parent file
+  2 Added brach2 file
+  1 Merged branches
+  0 (octopus merge fixup)
+  $ hg -R hg2hg out source-hg -T compact
+  comparing with source-hg
+  searching for changes
+  5[tip]:4,3   6bd55e826939   2009-10-10 08:00 +0100   foo
+    (octopus merge fixup)
+  
+XXX: The manifest lines should probably agree, to avoid changing the hash when
+converting hg -> hg
+
+  $ hg -R source-hg log --debug -r tip
+  changeset:   5:b209510f11b2c987f920749cd8e352aa4b3230f2
+  branch:      source
+  tag:         tip
+  phase:       draft
+  parent:      4:1dc38c377bb35eeea4fa955056fbe4440d54a743
+  parent:      3:4aaba1bfb426b8941bbf63f9dd52301152695164
+  manifest:    5:1109e42bdcbd1f51baa69bc91079011d77057dbb
+  user:        Foo Bar <foo.bar@example.com>
+  date:        Sat Oct 10 08:00:04 2009 +0100
+  extra:       branch=source
+  description:
+  (octopus merge fixup)
+  
+  
+  $ hg -R hg2hg log --debug -r tip
+  changeset:   5:6bd55e8269392769783345686faf7ff7b3b0215d
+  branch:      source
+  tag:         tip
+  phase:       draft
+  parent:      4:1dc38c377bb35eeea4fa955056fbe4440d54a743
+  parent:      3:4aaba1bfb426b8941bbf63f9dd52301152695164
+  manifest:    4:daa315d56a98ba20811fdd0d9d575861f65cfa8c
+  user:        Foo Bar <foo.bar@example.com>
+  date:        Sat Oct 10 08:00:04 2009 +0100
+  extra:       branch=source
+  description:
+  (octopus merge fixup)
+  
+  
+  $ hg -R source-hg manifest --debug -r tip
+  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
+  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
+  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
+  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
+  $ hg -R source-hg manifest --debug -r 'tip^'
+  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
+  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
+  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
+  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
+
+  $ hg -R hg2hg manifest --debug -r tip
+  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
+  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
+  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
+  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
+  $ hg -R hg2hg manifest --debug -r 'tip^'
+  cdf31ed9242b209cd94697112160e2c5b37a667d 644   file
+  5108144f585149b29779d7c7e51d61dd22303ffe 644   file-branch1
+  80753c4a9ac3806858405b96b24a907b309e3616 644   file-branch2
+  7108421418404a937c684d2479a34a24d2ce4757 644   file-parent
+
   $ cd ..