Patchwork [STABLE] convert: don't drop missing or corrupt tag entries

login
register
mail settings
Submitter Matt Harbison
Date Aug. 15, 2018, 2:09 a.m.
Message ID <op.znrmxdlv9lwrgf@envy>
Download mbox | patch
Permalink /patch/33741/
State New
Headers show

Comments

Matt Harbison - Aug. 15, 2018, 2:09 a.m.
On Tue, 14 Aug 2018 19:21:05 -0400, Yuya Nishihara <yuya@tcha.org> wrote:

> On Tue, 14 Aug 2018 17:18:08 -0400, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1534269635 14400
>> #      Tue Aug 14 14:00:35 2018 -0400
>> # Branch stable
>> # Node ID 48c3e568caa07eab1ca4e0514451335f4b074dba
>> # Parent  5caee9f923ae19baa74386fb0ac8ff50ab8e2669
>> convert: don't drop missing or corrupt tag entries
>>
>> Cleaning up the tags file could be a useful feature in some cases, so  
>> maybe
>> there should be a switch for this.  However, the default hg -> hg  
>> convert tries
>> to maintain identical hashes (thus convert.hg.saverev is off by  
>> default, but is
>> on by default for other source types).  It looks like  
>> _rewritesubstate() has a
>> `continue` in it, and therefore a similar problem.
>>
>> I ran into this conversion divergence when a coworker "merged" two  
>> repositories
>> by copy/pasting all of the files from the source repo and massaging the  
>> code,
>> and forgetting to revert the .hg* files.  That silently emptied the  
>> .hgtags file
>> after the conversion.  (This isn't the manifest node bug Yuya has been  
>> helping
>> with- this occurred well after the bzr -> hg conversion and wasn't a  
>> merge
>> commit, which made it extra puzzling.  That bug is still an issue.)
>>
>> diff --git a/hgext/convert/hg.py b/hgext/convert/hg.py
>> --- a/hgext/convert/hg.py
>> +++ b/hgext/convert/hg.py
>> @@ -143,12 +143,17 @@ class mercurial_sink(common.converter_si
>>          for line in data.splitlines():
>>              s = line.split(' ', 1)
>>              if len(s) != 2:
>> +                self.ui.warn(_('invalid tag entry: "%s"\n') % line)
>> +                fp.write('%s\n' % line)  # Bogus, but keep for hash  
>> stability
>>                  continue
>>              revid = revmap.get(source.lookuprev(s[0]))
>>              if not revid:
>>                  if s[0] == nodemod.nullhex:
>>                      revid = s[0]
>>                  else:
>> +                    # missing, but keep for hash stability
>> +                    self.ui.warn(_('missing tag entry: "%s"\n') % line)
>> +                    fp.write('%s\n' % line)
>
> Doesn't it leave tags on skipped revisions?

Prior to this patch, not exactly.  I verified this is still a problem:

https://bz.mercurial-scm.org/show_bug.cgi?id=5737#c2

Prior to trying that test, I tried this change, and it seems to drop the  
tag entry entirely without the patch proposed in this thread.  If this is  
changed to `hg tag --rev 2`, it keeps the tag, but moves it to the  
parent(!) (So I guess dropping it with --rev 0 is because there is no  
p1(0)?)  The proposed patch doesn't interfere with that shift.


(Remaining output changes omitted for brevity.)


> I understand this sort of feature is needed, but there would be no point  
> to keep invalid/unknown tags if we aren't trying to preserve hashes.

Mostly agree.  But I think preserving the hashes by default is desirable,  
and what
at least part of the code is trying to do.  For 5737, it should probably
keep the original hash unchanged, and warn (which the proposed patch  
does).  I'd be fine with a config knob that defaults to preserve.

I guess potential concerns with silently dropping entries is that empty  
commits are created, and it isn't obvious or easy to tell that this  
happened or where.  I think it's slightly easier to track down if there's  
a tag entry that doesn't show in `hg tags`, rather than an old tag that's  
been forgotten about and suddenly gone.
Yuya Nishihara - Aug. 15, 2018, 4:32 a.m.
On Tue, 14 Aug 2018 22:09:03 -0400, Matt Harbison wrote:
> On Tue, 14 Aug 2018 19:21:05 -0400, Yuya Nishihara <yuya@tcha.org> wrote:
> > Doesn't it leave tags on skipped revisions?
> 
> Prior to this patch, not exactly.  I verified this is still a problem:

Ok, I agree that this patch doesn't make things worse. Queued for stable,
thanks.

> https://bz.mercurial-scm.org/show_bug.cgi?id=5737#c2
> 
> Prior to trying that test, I tried this change, and it seems to drop the  
> tag entry entirely without the patch proposed in this thread.

That seems correct (except it results in an empty commit.)

> If this is
> changed to `hg tag --rev 2`, it keeps the tag, but moves it to the  
> parent(!) (So I guess dropping it with --rev 0 is because there is no  
> p1(0)?)

Perhaps. A skipped revision is mapped to its parent at converter.copy().

> > I understand this sort of feature is needed, but there would be no point  
> > to keep invalid/unknown tags if we aren't trying to preserve hashes.
> 
> Mostly agree.  But I think preserving the hashes by default is desirable,  
> and what
> at least part of the code is trying to do.  For 5737, it should probably
> keep the original hash unchanged, and warn (which the proposed patch  
> does).  I'd be fine with a config knob that defaults to preserve.

Suppose the purpose of hg->hg conversion is to rewrite a part of history,
I don't think hash preservation is the most important thing. That said,
it should be fine to keep originally invalid tags as invalid.

Patch

diff --git a/tests/test-convert-hg-sink.t b/tests/test-convert-hg-sink.t
--- a/tests/test-convert-hg-sink.t
+++ b/tests/test-convert-hg-sink.t
@@ -206,6 +206,8 @@  Convert excluding rev 0 and dir/ (and th
    > exclude dir
    > EOF

+  $ hg -R source tag -r 0 skipped_tag
+
    $ hg convert --filemap filemap source dest --config convert.hg.revs=1::
    initializing destination dest repository
    scanning source...
@@ -231,6 +233,7 @@  Verify that conversion skipped rev 2:
    |/
    o  0 a4a1dae0fe35 (draft) "1: add a and dir/b" files: 0 a

+  $ hg --cwd dest cat .hgtags -r tip

  Verify mapping correct in both directions: