Patchwork [03,of,15] commitctx: extract copy information encoding into extra into commit.py

login
register
mail settings
Submitter Pierre-Yves David
Date July 29, 2020, 4:57 p.m.
Message ID <3e4e9837d03bd1345438.1596041853@nodosa.octobus.net>
Download mbox | patch
Permalink /patch/46920/
State Accepted
Headers show

Comments

Pierre-Yves David - July 29, 2020, 4:57 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1595682805 -7200
#      Sat Jul 25 15:13:25 2020 +0200
# Node ID 3e4e9837d03bd1345438e69ae154b72daa5cee6b
# Parent  a926f7ac92a4024511389ea5c5d9a8e8eae065d4
# EXP-Topic files-change
# Available At https://foss.heptapod.net/octobus/mercurial-devel/
#              hg pull https://foss.heptapod.net/octobus/mercurial-devel/ -r 3e4e9837d03b
commitctx: extract copy information encoding into extra into commit.py

The encoding of copy information into extra has multiple subcases and become
quite complicated (eg: empty list can be explicitly or implicitly stored for
example). In addition, it is niche experimental feature since as it affect the
hash, it is only suitable for user who don't mercurial for storage server side
(ie: Google).

Having this complexity part of the changelog will get in the way of further
cleanup. We could have to either move more of that logic into the changelog or
to move or extract more of the logic at the higher level. We take the second
approach and start gather logic in dedicated function in commit.py.
Augie Fackler - Sept. 7, 2020, 7:41 p.m.
> On Jul 29, 2020, at 12:57 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1595682805 -7200
> #      Sat Jul 25 15:13:25 2020 +0200
> # Node ID 3e4e9837d03bd1345438e69ae154b72daa5cee6b
> # Parent  a926f7ac92a4024511389ea5c5d9a8e8eae065d4
> # EXP-Topic files-change
> # Available At https://foss.heptapod.net/octobus/mercurial-devel/
> #              hg pull https://foss.heptapod.net/octobus/mercurial-devel/ -r 3e4e9837d03b
> commitctx: extract copy information encoding into extra into commit.py

[snip]

> --- a/mercurial/commit.py
> +++ b/mercurial/commit.py
> @@ -69,6 +69,20 @@ def commitctx(repo, ctx, error=False, or
> 
>         extra = ctx.extra().copy()
> 
> +        files = sorted(files)
> +        if extra is not None:
> +            for name in (
> +                b'p1copies',
> +                b'p2copies',
> +                b'filesadded',
> +                b'filesremoved',
> +            ):
> +                extra.pop(name, None)
> +        if repo.changelog._copiesstorage == b'extra':

This kind of abstraction-breaking needs to stop being a crutch in our codebase: this broke the git extension because it bypasses the revlog interface, and probably also broke the sql storage extension. I’ll see about mailing a workaround.

(I’m not very upset, but wanted to bring this up since it’s not the first time I’ve observed this kind of architectural backsliding in low-level core code in 2020.)

> +            extra = _extra_with_copies(
> +                repo, extra, files, p1copies, p2copies, filesadded, filesremoved
> +            )
> +
Pierre-Yves David - Sept. 23, 2020, 7:03 a.m.
On 9/7/20 9:41 PM, Augie Fackler wrote:
> 
> 
>> On Jul 29, 2020, at 12:57 PM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
>>
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@octobus.net>
>> # Date 1595682805 -7200
>> #      Sat Jul 25 15:13:25 2020 +0200
>> # Node ID 3e4e9837d03bd1345438e69ae154b72daa5cee6b
>> # Parent  a926f7ac92a4024511389ea5c5d9a8e8eae065d4
>> # EXP-Topic files-change
>> # Available At https://foss.heptapod.net/octobus/mercurial-devel/
>> #              hg pull https://foss.heptapod.net/octobus/mercurial-devel/ -r 3e4e9837d03b
>> commitctx: extract copy information encoding into extra into commit.py
> 
> [snip]
> 
>> --- a/mercurial/commit.py
>> +++ b/mercurial/commit.py
>> @@ -69,6 +69,20 @@ def commitctx(repo, ctx, error=False, or
>>
>>          extra = ctx.extra().copy()
>>
>> +        files = sorted(files)
>> +        if extra is not None:
>> +            for name in (
>> +                b'p1copies',
>> +                b'p2copies',
>> +                b'filesadded',
>> +                b'filesremoved',
>> +            ):
>> +                extra.pop(name, None)
>> +        if repo.changelog._copiesstorage == b'extra':
> 
> This kind of abstraction-breaking needs to stop being a crutch in our codebase: this broke the git extension because it bypasses the revlog interface, and probably also broke the sql storage extension. I’ll see about mailing a workaround.
> 
> (I’m not very upset, but wanted to bring this up since it’s not the first time I’ve observed this kind of architectural backsliding in low-level core code in 2020.)

My plan here is mostly to do a large cleanup once the sidedata copy 
tracing is ready for prime time and the "extra" mode can be eventually 
dropped.

Patch

diff --git a/mercurial/changelog.py b/mercurial/changelog.py
--- a/mercurial/changelog.py
+++ b/mercurial/changelog.py
@@ -561,43 +561,21 @@  class changelog(revlog.revlog):
                 )
         sortedfiles = sorted(files)
         sidedata = None
-        if extra is not None:
-            for name in (
-                b'p1copies',
-                b'p2copies',
-                b'filesadded',
-                b'filesremoved',
-            ):
-                extra.pop(name, None)
-        if p1copies is not None:
-            p1copies = metadata.encodecopies(sortedfiles, p1copies)
-        if p2copies is not None:
-            p2copies = metadata.encodecopies(sortedfiles, p2copies)
-        if filesadded is not None:
-            filesadded = metadata.encodefileindices(sortedfiles, filesadded)
-        if filesremoved is not None:
-            filesremoved = metadata.encodefileindices(sortedfiles, filesremoved)
-        if self._copiesstorage == b'extra':
-            extrasentries = p1copies, p2copies, filesadded, filesremoved
-            if extra is None and any(x is not None for x in extrasentries):
-                extra = {}
-            if p1copies is not None:
-                extra[b'p1copies'] = p1copies
-            if p2copies is not None:
-                extra[b'p2copies'] = p2copies
-            if filesadded is not None:
-                extra[b'filesadded'] = filesadded
-            if filesremoved is not None:
-                extra[b'filesremoved'] = filesremoved
-        elif self._copiesstorage == b'changeset-sidedata':
+        if self._copiesstorage == b'changeset-sidedata':
             sidedata = {}
             if p1copies:
+                p1copies = metadata.encodecopies(sortedfiles, p1copies)
                 sidedata[sidedatamod.SD_P1COPIES] = p1copies
             if p2copies:
+                p2copies = metadata.encodecopies(sortedfiles, p2copies)
                 sidedata[sidedatamod.SD_P2COPIES] = p2copies
             if filesadded:
+                filesadded = metadata.encodefileindices(sortedfiles, filesadded)
                 sidedata[sidedatamod.SD_FILESADDED] = filesadded
             if filesremoved:
+                filesremoved = metadata.encodefileindices(
+                    sortedfiles, filesremoved
+                )
                 sidedata[sidedatamod.SD_FILESREMOVED] = filesremoved
             if not sidedata:
                 sidedata = None
diff --git a/mercurial/commit.py b/mercurial/commit.py
--- a/mercurial/commit.py
+++ b/mercurial/commit.py
@@ -69,6 +69,20 @@  def commitctx(repo, ctx, error=False, or
 
         extra = ctx.extra().copy()
 
+        files = sorted(files)
+        if extra is not None:
+            for name in (
+                b'p1copies',
+                b'p2copies',
+                b'filesadded',
+                b'filesremoved',
+            ):
+                extra.pop(name, None)
+        if repo.changelog._copiesstorage == b'extra':
+            extra = _extra_with_copies(
+                repo, extra, files, p1copies, p2copies, filesadded, filesremoved
+            )
+
         # update changelog
         repo.ui.note(_(b"committing changelog\n"))
         repo.changelog.delayupdate(tr)
@@ -407,3 +421,25 @@  def _commit_manifest(tr, linkrev, ctx, m
         mn = p1.manifestnode()
 
     return mn
+
+
+def _extra_with_copies(
+    repo, extra, files, p1copies, p2copies, filesadded, filesremoved
+):
+    """encode copy information into a `extra` dictionnary"""
+    extrasentries = p1copies, p2copies, filesadded, filesremoved
+    if extra is None and any(x is not None for x in extrasentries):
+        extra = {}
+    if p1copies is not None:
+        p1copies = metadata.encodecopies(files, p1copies)
+        extra[b'p1copies'] = p1copies
+    if p2copies is not None:
+        p2copies = metadata.encodecopies(files, p2copies)
+        extra[b'p2copies'] = p2copies
+    if filesadded is not None:
+        filesadded = metadata.encodefileindices(files, filesadded)
+        extra[b'filesadded'] = filesadded
+    if filesremoved is not None:
+        filesremoved = metadata.encodefileindices(files, filesremoved)
+        extra[b'filesremoved'] = filesremoved
+    return extra