Patchwork [1,of,2,RFC] export: list all key/values from extra in the patch header

login
register
mail settings
Submitter Matt Harbison
Date Dec. 21, 2015, 3:52 a.m.
Message ID <eff20351cc9b29232493.1450669946@Envy>
Download mbox | patch
Permalink /patch/12193/
State Deferred
Headers show

Comments

Matt Harbison - Dec. 21, 2015, 3:52 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1450661679 18000
#      Sun Dec 20 20:34:39 2015 -0500
# Node ID eff20351cc9b29232493885887e82576f772d1b6
# Parent  5df74b2f296df7f44a08106df4f9dd97a5aa726a
export: list all key/values from extra in the patch header

This uses the existing mechanism for extensions to export data in the header,
but deviates from the design of handling specific keys.  That seems reasonable
for core behavior, since:

  1) I don't see any way to locate all of the extras used in core and hgext/
  2) Even if there was, I'm not sure that there's a way to enforce handling them
     here
  3) Even if there was, exporting via a new hg and importing with an older one
     would result in silent data loss.

If each extension handled the import/export processing for its own keys, that
would mean the extension would need to be loaded for both import and export.
That may not be reasonable for some extensions.  (e.g. I only load convert when
actively converting something.)  This doesn't preclude extensions from adding
headers.  For example, it may be useful for transplant to emit a human readable
header for 'transplant_source', since the raw value is an escaped binary.  No
processing would be necessary on import, since the raw key/value is still
present.

The generic handling necessitated a common prefix.  This also has the nice side
effect of avoiding conflicts between headers and keys in extra that have the
same name.  (e.g. if somebody added 'Parent' as key in extras, the parsing isn't
thrown off).

Side note: Misunderstanding what str.join() would do with an empty list, I
initially returned None if header == '\n# '.  That had the effect of adding an
extra "# \n" line, which in turn affected hashes in several tests, such as
test-alias.t.  It would be simple enough to change the caller of these
registered hooks to ignore '' instead of writing out the line, but this implies
an issue with the existing import code that I wasn't able to figure out.  When I
logged the alias test with --debug, the only difference was the manifest value-
the extras were the same.  I was however, unable to induce this behavior by
adding an extra "# \n" to the manually created patch at the bottom of
test-import.t.
Yuya Nishihara - Dec. 25, 2015, 1:16 p.m.
On Sun, 20 Dec 2015 22:52:26 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1450661679 18000
> #      Sun Dec 20 20:34:39 2015 -0500
> # Node ID eff20351cc9b29232493885887e82576f772d1b6
> # Parent  5df74b2f296df7f44a08106df4f9dd97a5aa726a
> export: list all key/values from extra in the patch header
> 
> This uses the existing mechanism for extensions to export data in the header,
> but deviates from the design of handling specific keys.  That seems reasonable
> for core behavior, since:
> 
>   1) I don't see any way to locate all of the extras used in core and hgext/
>   2) Even if there was, I'm not sure that there's a way to enforce handling them
>      here
>   3) Even if there was, exporting via a new hg and importing with an older one
>      would result in silent data loss.

I'm not excited about this. I think most of extra data (e.g. amend_source)
are noisy rather than informative.

> If each extension handled the import/export processing for its own keys, that
> would mean the extension would need to be loaded for both import and export.
> That may not be reasonable for some extensions.  (e.g. I only load convert when
> actively converting something.)  This doesn't preclude extensions from adding
> headers.  For example, it may be useful for transplant to emit a human readable
> header for 'transplant_source', since the raw value is an escaped binary.  No
> processing would be necessary on import, since the raw key/value is still
> present.

Really? IIRC, 'transplant_source' is an unescaped binary node id, which may
contain random bytes including NUL, CR and LF.
Matt Harbison - Dec. 27, 2015, 8:29 p.m.
On Fri, 25 Dec 2015 08:16:01 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 20 Dec 2015 22:52:26 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1450661679 18000
>> #      Sun Dec 20 20:34:39 2015 -0500
>> # Node ID eff20351cc9b29232493885887e82576f772d1b6
>> # Parent  5df74b2f296df7f44a08106df4f9dd97a5aa726a
>> export: list all key/values from extra in the patch header
>>
>> This uses the existing mechanism for extensions to export data in the  
>> header,
>> but deviates from the design of handling specific keys.  That seems  
>> reasonable
>> for core behavior, since:
>>
>>   1) I don't see any way to locate all of the extras used in core and  
>> hgext/
>>   2) Even if there was, I'm not sure that there's a way to enforce  
>> handling them
>>      here
>>   3) Even if there was, exporting via a new hg and importing with an  
>> older one
>>      would result in silent data loss.
>
> I'm not excited about this. I think most of extra data (e.g.  
> amend_source)
> are noisy rather than informative.

I agree preserving some may not be generally useful if exporting from one  
repo and importing elsewhere (e.g. amend_source).  It's hard to tell what  
a user will be interested in though, and picking and choosing is  
problematic as mentioned above.  I can see how unconditional import of  
everything would be a pain for people running a patch based submission  
process.

I went looking for a bug report I read this summer where someone  
complained about import trouble, and it was diagnosed as not preserving  
all of the extras.  I ended up finding this bug where mpm pondered making  
"more/all" extra available, as well as again in the more recent bug  
referenced in the second patch.  Maybe he can clarify.

   https://bz.mercurial-scm.org/show_bug.cgi?id=2742

(I couldn't find the report I'm thinking of, either in bz or the ML, but  
maybe it was a rehash of something like this?   
http://markmail.org/message/pnfg7exvvtc7wzzx)

>> If each extension handled the import/export processing for its own  
>> keys, that
>> would mean the extension would need to be loaded for both import and  
>> export.
>> That may not be reasonable for some extensions.  (e.g. I only load  
>> convert when
>> actively converting something.)  This doesn't preclude extensions from  
>> adding
>> headers.  For example, it may be useful for transplant to emit a human  
>> readable
>> header for 'transplant_source', since the raw value is an escaped  
>> binary.  No
>> processing would be necessary on import, since the raw key/value is  
>> still
>> present.
>
> Really? IIRC, 'transplant_source' is an unescaped binary node id, which  
> may
> contain random bytes including NUL, CR and LF.

Sorry, I meant this as two separate thoughts:

1) Needing extensions loaded at the time of export or import is bad.

2) Extensions could still export machine write-only, but human readable  
lines, and it would be OK.  Similar to how there are two lines for date-  
the old machine readable line is the only thing parsed on import.  If the  
transplant extension wrote a line with the hex node to be human friendly,  
it wouldn't need to deal with that line on import because core already  
handled the raw escaped value (see the tests).  Therefore, no data loss on  
import if transplant isn't loaded.

Basically, I was trying to justify that even though extras are generically  
exported from core, there may still be a reason for the existing framework  
to provide extensions a way to hook into the process.  (I assume that the  
"Available At" line that's popped up recently is courtesy of an extension,  
but dealing with things outside of ctx.extra() like that is probably rare.)
Yuya Nishihara - Dec. 28, 2015, 3:21 p.m.
On Sun, 27 Dec 2015 15:29:57 -0500, Matt Harbison wrote:
> On Fri, 25 Dec 2015 08:16:01 -0500, Yuya Nishihara <yuya@tcha.org> wrote:
> >> This uses the existing mechanism for extensions to export data in the  
> >> header,
> >> but deviates from the design of handling specific keys.  That seems  
> >> reasonable
> >> for core behavior, since:
> >>
> >>   1) I don't see any way to locate all of the extras used in core and  
> >> hgext/
> >>   2) Even if there was, I'm not sure that there's a way to enforce  
> >> handling them
> >>      here
> >>   3) Even if there was, exporting via a new hg and importing with an  
> >> older one
> >>      would result in silent data loss.
> >
> > I'm not excited about this. I think most of extra data (e.g.  
> > amend_source)
> > are noisy rather than informative.
> 
> I agree preserving some may not be generally useful if exporting from one  
> repo and importing elsewhere (e.g. amend_source).  It's hard to tell what  
> a user will be interested in though, and picking and choosing is  
> problematic as mentioned above.  I can see how unconditional import of  
> everything would be a pain for people running a patch based submission  
> process.
> 
> I went looking for a bug report I read this summer where someone  
> complained about import trouble, and it was diagnosed as not preserving  
> all of the extras.  I ended up finding this bug where mpm pondered making  
> "more/all" extra available, as well as again in the more recent bug  
> referenced in the second patch.  Maybe he can clarify.
> 
>    https://bz.mercurial-scm.org/show_bug.cgi?id=2742
> 
> (I couldn't find the report I'm thinking of, either in bz or the ML, but  
> maybe it was a rehash of something like this?   
> http://markmail.org/message/pnfg7exvvtc7wzzx)

Good point. Importing all extras would sometimes useful if he want to
reproduce exactly the same commit.

> >> If each extension handled the import/export processing for its own  
> >> keys, that
> >> would mean the extension would need to be loaded for both import and  
> >> export.
> >> That may not be reasonable for some extensions.  (e.g. I only load  
> >> convert when
> >> actively converting something.)  This doesn't preclude extensions from  
> >> adding
> >> headers.  For example, it may be useful for transplant to emit a human  
> >> readable
> >> header for 'transplant_source', since the raw value is an escaped  
> >> binary.  No
> >> processing would be necessary on import, since the raw key/value is  
> >> still
> >> present.
> >
> > Really? IIRC, 'transplant_source' is an unescaped binary node id, which  
> > may
> > contain random bytes including NUL, CR and LF.
> 
> Sorry, I meant this as two separate thoughts:
> 
> 1) Needing extensions loaded at the time of export or import is bad.
> 
> 2) Extensions could still export machine write-only, but human readable  
> lines, and it would be OK.  Similar to how there are two lines for date-  
> the old machine readable line is the only thing parsed on import.  If the  
> transplant extension wrote a line with the hex node to be human friendly,  
> it wouldn't need to deal with that line on import because core already  
> handled the raw escaped value (see the tests).  Therefore, no data loss on  
> import if transplant isn't loaded.

Maybe I don't follow, but the core can't handle raw "transplant_source"
because it may contain LF. It isn't escaped, the test runner does escape.
So the core have to at least escape or ignore "transplant_source".

> Basically, I was trying to justify that even though extras are generically  
> exported from core, there may still be a reason for the existing framework  
> to provide extensions a way to hook into the process.  (I assume that the  
> "Available At" line that's popped up recently is courtesy of an extension,  
> but dealing with things outside of ctx.extra() like that is probably rare.)
Matt Harbison - Jan. 6, 2016, 4:48 a.m.
On Mon, 28 Dec 2015 10:21:58 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 27 Dec 2015 15:29:57 -0500, Matt Harbison wrote:
>> On Fri, 25 Dec 2015 08:16:01 -0500, Yuya Nishihara <yuya@tcha.org>  
>> wrote:
>> >> This uses the existing mechanism for extensions to export data in the
>> >> header,
>> >> but deviates from the design of handling specific keys.  That seems
>> >> reasonable
>> >> for core behavior, since:
>> >>
>> >>   1) I don't see any way to locate all of the extras used in core and
>> >> hgext/
>> >>   2) Even if there was, I'm not sure that there's a way to enforce
>> >> handling them
>> >>      here
>> >>   3) Even if there was, exporting via a new hg and importing with an
>> >> older one
>> >>      would result in silent data loss.
>> >
>> > I'm not excited about this. I think most of extra data (e.g.
>> > amend_source)
>> > are noisy rather than informative.
>>
>> I agree preserving some may not be generally useful if exporting from  
>> one
>> repo and importing elsewhere (e.g. amend_source).  It's hard to tell  
>> what
>> a user will be interested in though, and picking and choosing is
>> problematic as mentioned above.  I can see how unconditional import of
>> everything would be a pain for people running a patch based submission
>> process.
>>
>> I went looking for a bug report I read this summer where someone
>> complained about import trouble, and it was diagnosed as not preserving
>> all of the extras.  I ended up finding this bug where mpm pondered  
>> making
>> "more/all" extra available, as well as again in the more recent bug
>> referenced in the second patch.  Maybe he can clarify.
>>
>>    https://bz.mercurial-scm.org/show_bug.cgi?id=2742
>>
>> (I couldn't find the report I'm thinking of, either in bz or the ML, but
>> maybe it was a rehash of something like this?
>> http://markmail.org/message/pnfg7exvvtc7wzzx)
>
> Good point. Importing all extras would sometimes useful if he want to
> reproduce exactly the same commit.
>
>> >> If each extension handled the import/export processing for its own
>> >> keys, that
>> >> would mean the extension would need to be loaded for both import and
>> >> export.
>> >> That may not be reasonable for some extensions.  (e.g. I only load
>> >> convert when
>> >> actively converting something.)  This doesn't preclude extensions  
>> from
>> >> adding
>> >> headers.  For example, it may be useful for transplant to emit a  
>> human
>> >> readable
>> >> header for 'transplant_source', since the raw value is an escaped
>> >> binary.  No
>> >> processing would be necessary on import, since the raw key/value is
>> >> still
>> >> present.
>> >
>> > Really? IIRC, 'transplant_source' is an unescaped binary node id,  
>> which
>> > may
>> > contain random bytes including NUL, CR and LF.
>>
>> Sorry, I meant this as two separate thoughts:
>>
>> 1) Needing extensions loaded at the time of export or import is bad.
>>
>> 2) Extensions could still export machine write-only, but human readable
>> lines, and it would be OK.  Similar to how there are two lines for date-
>> the old machine readable line is the only thing parsed on import.  If  
>> the
>> transplant extension wrote a line with the hex node to be human  
>> friendly,
>> it wouldn't need to deal with that line on import because core already
>> handled the raw escaped value (see the tests).  Therefore, no data loss  
>> on
>> import if transplant isn't loaded.
>
> Maybe I don't follow, but the core can't handle raw "transplant_source"
> because it may contain LF. It isn't escaped, the test runner does escape.
> So the core have to at least escape or ignore "transplant_source".

Oh, I see what you mean.  I didn't realize the test runner was what was  
doing that, since I've never transplanted outside of the tests.  (I'm a  
little surprised nobody has complained about mangled output with -T  
{extras}.)

So I guess if core has to special case this one attribute, it might as  
well go a few more lines and node.hex()/node.bin() it.  Then this whole  
paragraph goes away.  I guess it also points out that it would be useful  
to test for and skip attributes with non printable characters in case some  
3rd party extension decided to stuff binary data in extra.  Then it would  
be on them to handle it.

>> Basically, I was trying to justify that even though extras are  
>> generically
>> exported from core, there may still be a reason for the existing  
>> framework
>> to provide extensions a way to hook into the process.  (I assume that  
>> the
>> "Available At" line that's popped up recently is courtesy of an  
>> extension,
>> but dealing with things outside of ctx.extra() like that is probably  
>> rare.)
Matt Mackall - Jan. 16, 2016, 6:09 p.m.
On Sun, 2015-12-20 at 22:52 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1450661679 18000
> #      Sun Dec 20 20:34:39 2015 -0500
> # Node ID eff20351cc9b29232493885887e82576f772d1b6
> # Parent  5df74b2f296df7f44a08106df4f9dd97a5aa726a
> export: list all key/values from extra in the patch header

One problem I foresee here is that there is going to be a set of extras that
need rewriting to be correct. Rebase/graft headers come to mind, as do the
signature headers we've been recently discussing. And we may not be able to
identify them or have enough information to rewrite them.

But I agree that it should be possible. So we either want to put this behind a
flag/config or provide some sort of opt-out method. For instance, don't export
keys that start with "_".

-- 
Mathematics is the supreme nostalgia of our time.

Patch

diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -1016,13 +1016,24 @@ 
     finally:
         os.unlink(tmpname)
 
+def _exportextras(seqno, ctx):
+    header = '\n# '.join(['HgExtra.%s %s' % (k, v)
+                          for k, v in sorted(ctx.extra().iteritems())
+                          if k != 'branch'])
+    if header == '':
+        return None
+
+    return header
+
 # facility to let extensions include additional data in an exported patch
 # list of identifiers to be executed in order
-extraexport = []
+extraexport = ['hgextra']
 # mapping from identifier to actual export function
 # function as to return a string to be added to the header or None
 # it is given two arguments (sequencenumber, changectx)
-extraexportmap = {}
+extraexportmap = {
+    'hgextra': _exportextras
+}
 
 def export(repo, revs, template='hg-%h.patch', fp=None, switch_parent=False,
            opts=None, match=None):
diff --git a/tests/test-graft.t b/tests/test-graft.t
--- a/tests/test-graft.t
+++ b/tests/test-graft.t
@@ -101,6 +101,7 @@ 
   #      Thu Jan 01 00:00:00 1970 +0000
   # Node ID ef0ef43d49e79e81ddafdc7997401ba0041efc82
   # Parent  68795b066622ca79a25816a662041d8f78f3cd9e
+  # HgExtra.source 5c095ad7e90f871700f02dd1fa5012cb4498a2d4
   2
   
   diff --git a/a b/b
@@ -343,7 +344,7 @@ 
   $ hg extdiff --config extensions.extdiff= --patch -r 2 -r 13
   --- */hg-5c095ad7e90f.patch	* +0000 (glob)
   +++ */hg-7a4785234d87.patch	* +0000 (glob)
-  @@ -1,18 +1,18 @@
+  @@ -1,18 +1,20 @@
    # HG changeset patch
   -# User test
   +# User foo
@@ -353,6 +354,8 @@ 
   -# Parent  5d205f8b35b66bc36375c9534ffd3237730e8f04
   +# Node ID 7a4785234d87ec1aa420ed6b11afe40fa73e12a9
   +# Parent  b592ea63bb0c19a6c5c44685ee29a2284f9f1b8f
+  +# HgExtra.intermediate-source ef0ef43d49e79e81ddafdc7997401ba0041efc82
+  +# HgExtra.source 5c095ad7e90f871700f02dd1fa5012cb4498a2d4
    2
    
   -diff -r 5d205f8b35b6 -r 5c095ad7e90f a
@@ -374,7 +377,7 @@ 
   $ hg extdiff --config extensions.extdiff= --patch -r 2 -r 13 -X .
   --- */hg-5c095ad7e90f.patch	* +0000 (glob)
   +++ */hg-7a4785234d87.patch	* +0000 (glob)
-  @@ -1,8 +1,8 @@
+  @@ -1,8 +1,10 @@
    # HG changeset patch
   -# User test
   +# User foo
@@ -384,6 +387,8 @@ 
   -# Parent  5d205f8b35b66bc36375c9534ffd3237730e8f04
   +# Node ID 7a4785234d87ec1aa420ed6b11afe40fa73e12a9
   +# Parent  b592ea63bb0c19a6c5c44685ee29a2284f9f1b8f
+  +# HgExtra.intermediate-source ef0ef43d49e79e81ddafdc7997401ba0041efc82
+  +# HgExtra.source 5c095ad7e90f871700f02dd1fa5012cb4498a2d4
    2
    
   [1]
@@ -449,6 +454,7 @@ 
   #      Thu Jan 01 00:00:00 1970 +0000
   # Node ID f67661df0c4804d301f064f332b57e7d5ddaf2be
   # Parent  aaa4406d4f0ae9befd6e58c82ec63706460cbca6
+  # HgExtra.source 5d205f8b35b66bc36375c9534ffd3237730e8f04
   1
   
   diff --git a/a b/a
@@ -478,6 +484,7 @@ 
   #      Thu Jan 01 00:00:00 1970 +0000
   # Node ID 9627f653b421c61fc1ea4c4e366745070fa3d2bc
   # Parent  ee295f490a40b97f3d18dd4c4f1c8936c233b612
+  # HgExtra.source 5c095ad7e90f871700f02dd1fa5012cb4498a2d4
   2
   
   diff --git a/a b/b
diff --git a/tests/test-histedit-commute.t b/tests/test-histedit-commute.t
--- a/tests/test-histedit-commute.t
+++ b/tests/test-histedit-commute.t
@@ -416,6 +416,7 @@ 
   #      Thu Jan 01 00:00:00 1970 +0000
   # Node ID fffadc26f8f85623ce60b028a3f1ccc3730f8530
   # Parent  0000000000000000000000000000000000000000
+  # HgExtra.histedit_source b0f4233702ca3d2dc495941dc98d7136d67a4cf6,5e8704a8f2d2e850b45d790e34c225e3893424a1
   pick b0f4233702ca 0 initial commit
   fold 5e8704a8f2d2 1 moved and changed
   pick 40e7299e8fa7 2 renamed
@@ -434,6 +435,7 @@ 
   #      Thu Jan 01 00:00:00 1970 +0000
   # Node ID 9b730d82b00af8a2766facebfa47cc124405a118
   # Parent  fffadc26f8f85623ce60b028a3f1ccc3730f8530
+  # HgExtra.histedit_source 40e7299e8fa76d5f470efb6eed04bc91b0e55575
   renamed
   
   diff --git a/another-dir/initial-file b/another-dir/renamed-file
diff --git a/tests/test-histedit-fold.t b/tests/test-histedit-fold.t
--- a/tests/test-histedit-fold.t
+++ b/tests/test-histedit-fold.t
@@ -398,6 +398,7 @@ 
   #      Thu Jan 01 00:00:00 1970 +0000
   # Node ID 10c647b2cdd54db0603ecb99b2ff5ce66d5a5323
   # Parent  0189ba417d34df9dda55f88b637dcae9917b5964
+  # HgExtra.histedit_source 617f94f13c0faff2ff307641901637b91cbd7c7b,251d831eeec50a48af84c8eba1be20b9fb63de9e
   +4
   ***
   +5.2
diff --git a/tests/test-rebase-mq.t b/tests/test-rebase-mq.t
--- a/tests/test-rebase-mq.t
+++ b/tests/test-rebase-mq.t
@@ -109,6 +109,7 @@ 
   #      Thu Jan 01 00:00:01 1970 +0000
   # Node ID ebe9914c0d1c3f60096e952fa4dbb3d377dea3ab
   # Parent  bac9ed9960d8992bcad75864a879fa76cadaf1b0
+  # HgExtra.rebase_source 3504f44bffc02ec095be7b0b2ff403da909bece9
   P0
   
   diff -r bac9ed9960d8 -r ebe9914c0d1c f
@@ -132,6 +133,7 @@ 
   #      Thu Jan 01 00:00:02 1970 +0000
   # Node ID 462012cf340c97d44d62377c985a423f6bb82f07
   # Parent  ebe9914c0d1c3f60096e952fa4dbb3d377dea3ab
+  # HgExtra.rebase_source 929394423cd3ca95f8dee2dd72ea518a75a02af0
   P1
   
   diff -r ebe9914c0d1c -r 462012cf340c f
@@ -224,6 +226,7 @@ 
   #      Thu Jan 01 00:00:03 1970 +0000
   # Node ID 12d9f6a3bbe560dee50c7c454d434add7fb8e837
   # Parent  bac9ed9960d8992bcad75864a879fa76cadaf1b0
+  # HgExtra.rebase_source 0c587ffcb4803566520e16bc213bed72d8445875
   P0 (git)
   
   diff --git a/p b/p
@@ -240,6 +243,7 @@ 
   #      Thu Jan 01 00:00:04 1970 +0000
   # Node ID c77a2661c64c60d82f63c4f7aefd95b3a948a557
   # Parent  12d9f6a3bbe560dee50c7c454d434add7fb8e837
+  # HgExtra.rebase_source c7f18665e4bcbfbfc6ed843eb1f6d28fa6b4a252
   P1
   
   diff -r 12d9f6a3bbe5 -r c77a2661c64c p