Patchwork D6167: fix: allow fixer tools to return metadata in addition to the file content

login
register
mail settings
Submitter phabricator
Date March 22, 2019, 2:57 a.m.
Message ID <differential-rev-PHID-DREV-mzweyt3e573hk4r6aozr-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/39356/
State New
Headers show

Comments

phabricator - March 22, 2019, 2:57 a.m.
hooper created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  With this change, fixer tools can be configured to output a JSON object that
  will be parsed and passed to hooks that can be used to print summaries of what
  code was formatted or perform other post-fixing work.
  
  The motivation for this change is to allow parallel executions of a
  "meta-formatter" tool to report back statistics, which are then aggregated and
  processed after all formatting has completed. Providing an extensible mechanism
  inside fix.py is far simpler, and more portable, than trying to make a tool
  like this communicate through some other channel.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6167

AFFECTED FILES
  hgext/fix.py
  tests/test-fix-metadata.t
  tests/test-fix.t

CHANGE DETAILS




To: hooper, #hg-reviewers
Cc: mercurial-devel
phabricator - March 27, 2019, 6:13 p.m.
durin42 added subscribers: indygreg, durin42.
durin42 accepted this revision as: durin42.
durin42 added a comment.


  I'm what I'll call an unenthusiastic fan of this: it feels a little gross, but I have no better ideas (in fact, at least part of this interface was my idea...) so I'd like to see this land as it'll open us up to carry fewer painful patches at Google.
  
  As such, it has my +1, but I want someone not from Google to land it since it feels borderline.
  
  @indygreg maybe you have opinions?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6167

To: hooper, #hg-reviewers, durin42
Cc: durin42, indygreg, mercurial-devel
phabricator - April 11, 2019, 5:35 p.m.
durin42 added a subscriber: lothiraldan.
durin42 added a comment.


  @lothiraldan do y'all have opinions on this? I'd love to see this get landed.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6167

To: hooper, #hg-reviewers, durin42
Cc: lothiraldan, durin42, indygreg, mercurial-devel

Patch

diff --git a/tests/test-fix.t b/tests/test-fix.t
--- a/tests/test-fix.t
+++ b/tests/test-fix.t
@@ -185,6 +185,34 @@ 
   tool may see different values for the arguments added by the :linerange
   suboption.
   
+  Each fixer tool is allowed to return some metadata in addition to the fixed
+  file content. The metadata must be placed before the file content on stdout,
+  separated from the file content by a zero byte. The metadata is parsed as a
+  JSON value (so, it should be UTF-8 encoded and contain no zero bytes). A fixer
+  tool is expected to produce this metadata encoding if and only if the
+  :metadata suboption is true:
+  
+    [fix]
+    tool:command = tool --prepend-json-metadata
+    tool:metadata = true
+  
+  The metadata values are passed to hooks, which can be used to print summaries
+  or perform other post-fixing work. The supported hooks are:
+  
+  "postfixfile"
+    Run once for each file in each revision where any fixer tools made changes
+    to the file content. Provides "$HG_REV" and "$HG_PATH" to identify the file,
+    and "$HG_METADATA" with a list of metadata values from fixer tools that
+    affected the file.
+  
+  "postfix"
+    Run once after all files and revisions have been handled. Provides
+    "$HG_REPLACEMENTS" with information about what revisions were created and
+    made obsolete. Provides a boolean "$HG_WDIRWRITTEN" to indicate whether any
+    files in the working copy were updated. Provides a list "$HG_ALLMETADATA"
+    containing all metadata values returned from fixer tool executions that
+    modified a file.
+  
   list of commands:
   
    fix           rewrite file content in changesets or working directory
diff --git a/tests/test-fix-metadata.t b/tests/test-fix-metadata.t
new file mode 100644
--- /dev/null
+++ b/tests/test-fix-metadata.t
@@ -0,0 +1,83 @@ 
+A python hook for "hg fix" that prints out the number of files and revisions
+that were affected, along with which fixer tools were applied. Also checks how
+many times it sees a specific key generated by one of the fixer tools defined
+below.
+
+  $ cat >> $TESTTMP/postfixhook.py <<EOF
+  > import collections
+  > def file(ui, repo, rev=None, path='', metadata=None, **kwargs):
+  >   ui.status('fixed %s in revision %d\n' % (path, rev))
+  > def summarize(ui, repo, replacements=None, wdirwritten=False,
+  >               allmetadata=None, **kwargs):
+  >     counts = collections.defaultdict(int)
+  >     keys = 0
+  >     for metadata in allmetadata:
+  >         if 'fixer-applied' in metadata:
+  >             counts[metadata['fixer-applied']] += 1
+  >         if 'key' in metadata:
+  >             keys += 1
+  >     ui.status('saw "key" %d times\n' % (keys,))
+  >     for name, count in sorted(counts.items()):
+  >         ui.status('fixed %d files with %s\n' % (count, name))
+  >     if replacements:
+  >         ui.status('fixed %d revisions\n' % (len(replacements),))
+  >     if wdirwritten:
+  >         ui.status('fixed the working copy\n')
+  > EOF
+
+Some mock output for fixer tools that demonstrate what could go wrong with
+expecting the metadata output format.
+
+  $ printf 'new content\n' > $TESTTMP/missing
+  $ printf 'not valid json\0new content\n' > $TESTTMP/invalid
+  $ printf '{"key": "value"}\0new content\n' > $TESTTMP/valid
+
+Configure some fixer tools based on the output defined above, and enable the
+hooks defined above. Disable parallelism to make output of the parallel file
+processing phase stable.
+
+  $ cat >> $HGRCPATH <<EOF
+  > [extensions]
+  > fix =
+  > [fix]
+  > missing:command=cat $TESTTMP/missing
+  > missing:pattern=missing
+  > missing:metadata=true
+  > invalid:command=cat $TESTTMP/invalid
+  > invalid:pattern=invalid
+  > invalid:metadata=true
+  > valid:command=cat $TESTTMP/valid
+  > valid:pattern=valid
+  > valid:metadata=true
+  > [hooks]
+  > postfixfile = python:$TESTTMP/postfixhook.py:file
+  > postfix = python:$TESTTMP/postfixhook.py:summarize
+  > [worker]
+  > enabled=false
+  > EOF
+
+See what happens when we execute each of the fixer tools. Some print warnings,
+some write back to the file.
+
+  $ hg init repo
+  $ cd repo
+
+  $ printf "old content\n" > invalid
+  $ printf "old content\n" > missing
+  $ printf "old content\n" > valid
+  $ hg add -q
+
+  $ hg fix -w
+  ignored invalid output from fixer tool: invalid
+  ignored invalid output from fixer tool: missing
+  fixed valid in revision 2147483647
+  saw "key" 1 times
+  fixed 1 files with valid
+  fixed the working copy
+
+  $ cat missing invalid valid
+  old content
+  old content
+  new content
+
+  $ cd ..
diff --git a/hgext/fix.py b/hgext/fix.py
--- a/hgext/fix.py
+++ b/hgext/fix.py
@@ -72,12 +72,40 @@ 
 To account for changes made by each tool, the line numbers used for incremental
 formatting are recomputed before executing the next tool. So, each tool may see
 different values for the arguments added by the :linerange suboption.
+
+Each fixer tool is allowed to return some metadata in addition to the fixed file
+content. The metadata must be placed before the file content on stdout,
+separated from the file content by a zero byte. The metadata is parsed as a JSON
+value (so, it should be UTF-8 encoded and contain no zero bytes). A fixer tool
+is expected to produce this metadata encoding if and only if the :metadata
+suboption is true::
+
+  [fix]
+  tool:command = tool --prepend-json-metadata
+  tool:metadata = true
+
+The metadata values are passed to hooks, which can be used to print summaries or
+perform other post-fixing work. The supported hooks are::
+
+"postfixfile"
+  Run once for each file in each revision where any fixer tools made changes to
+  the file content. Provides "$HG_REV" and "$HG_PATH" to identify the file, and
+  "$HG_METADATA" with a list of metadata values from fixer tools that affected
+  the file.
+
+"postfix"
+  Run once after all files and revisions have been handled. Provides
+  "$HG_REPLACEMENTS" with information about what revisions were created and made
+  obsolete. Provides a boolean "$HG_WDIRWRITTEN" to indicate whether any files
+  in the working copy were updated. Provides a list "$HG_ALLMETADATA" containing
+  all metadata values returned from fixer tool executions that modified a file.
 """
 
 from __future__ import absolute_import
 
 import collections
 import itertools
+import json
 import os
 import re
 import subprocess
@@ -124,6 +152,7 @@ 
     'fileset': None,
     'pattern': None,
     'priority': 0,
+    'metadata': False,
 }
 
 for key, default in FIXER_ATTRS.items():
@@ -201,10 +230,12 @@ 
             for rev, path in items:
                 ctx = repo[rev]
                 olddata = ctx[path].data()
-                newdata = fixfile(ui, opts, fixers, ctx, path, basectxs[rev])
+                metadata, newdata = fixfile(ui, opts, fixers, ctx, path,
+                                            basectxs[rev])
                 # Don't waste memory/time passing unchanged content back, but
                 # produce one result per item either way.
-                yield (rev, path, newdata if newdata != olddata else None)
+                yield (rev, path, metadata,
+                       newdata if newdata != olddata else None)
         results = worker.worker(ui, 1.0, getfixes, tuple(), workqueue,
                                 threadsafe=False)
 
@@ -215,15 +246,24 @@ 
         # the tests deterministic. It might also be considered a feature since
         # it makes the results more easily reproducible.
         filedata = collections.defaultdict(dict)
+        allmetadata = []
         replacements = {}
         wdirwritten = False
         commitorder = sorted(revstofix, reverse=True)
         with ui.makeprogress(topic=_('fixing'), unit=_('files'),
                              total=sum(numitems.values())) as progress:
-            for rev, path, newdata in results:
+            for rev, path, metadata, newdata in results:
                 progress.increment(item=path)
+                allmetadata.extend(metadata)
                 if newdata is not None:
                     filedata[rev][path] = newdata
+                    hookargs = {
+                      'rev': rev,
+                      'path': path,
+                      'metadata': metadata,
+                    }
+                    repo.hook('postfixfile', throw=False,
+                              **pycompat.strkwargs(hookargs))
                 numitems[rev] -= 1
                 # Apply the fixes for this and any other revisions that are
                 # ready and sitting at the front of the queue. Using a loop here
@@ -240,6 +280,12 @@ 
                     del filedata[rev]
 
         cleanup(repo, replacements, wdirwritten)
+        hookargs = {
+            'replacements': replacements,
+            'wdirwritten': wdirwritten,
+            'allmetadata': allmetadata,
+        }
+        repo.hook('postfix', throw=True, **pycompat.strkwargs(hookargs))
 
 def cleanup(repo, replacements, wdirwritten):
     """Calls scmutil.cleanupnodes() with the given replacements.
@@ -491,6 +537,7 @@ 
     A fixer tool's stdout will become the file's new content if and only if it
     exits with code zero.
     """
+    metadata = []
     newdata = fixctx[path].data()
     for fixername, fixer in fixers.iteritems():
         if fixer.affects(opts, fixctx, path):
@@ -506,20 +553,30 @@ 
                 stdin=subprocess.PIPE,
                 stdout=subprocess.PIPE,
                 stderr=subprocess.PIPE)
-            newerdata, stderr = proc.communicate(newdata)
+            stdout, stderr = proc.communicate(newdata)
             if stderr:
                 showstderr(ui, fixctx.rev(), fixername, stderr)
+            newerdata = stdout
+            if fixer.shouldoutputmetadata():
+                try:
+                    metadatajson, newerdata = stdout.split('\0', 1)
+                    metadata.append(json.loads(metadatajson))
+                except ValueError:
+                    ui.warn(_('ignored invalid output from fixer tool: %s\n') %
+                            (fixername,))
+                    continue
             if proc.returncode == 0:
                 newdata = newerdata
+                metadata.append({'fixer-applied': fixername})
             else:
                 if not stderr:
                     message = _('exited with status %d\n') % (proc.returncode,)
                     showstderr(ui, fixctx.rev(), fixername, message)
                 checktoolfailureaction(
                     ui, _('no fixes will be applied'),
                     hint=_('use --config fix.failure=continue to apply any '
                            'successful fixes anyway'))
-    return newdata
+    return metadata, newdata
 
 def showstderr(ui, rev, fixername, stderr):
     """Writes the lines of the stderr string as warnings on the ui
@@ -667,6 +724,10 @@ 
         """Should this fixer run on the file at the given path and context?"""
         return scmutil.match(fixctx, [self._pattern], opts)(path)
 
+    def shouldoutputmetadata(self):
+        """Should the stdout of this fixer start with JSON and a null byte?"""
+        return self._metadata
+
     def command(self, ui, path, rangesfn):
         """A shell command to use to invoke this fixer on the given file/lines