Patchwork D2791: hgweb: refactor fake file object proxy for archiving

login
register
mail settings
Submitter phabricator
Date March 11, 2018, 5:24 a.m.
Message ID <differential-rev-PHID-DREV-yxgdagi5zxfxjiakwuga-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29287/
State Superseded
Headers show

Comments

phabricator - March 11, 2018, 5:24 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Python's zip file writer operates on a file object. When doing work,
  it periodically calls write(), flush(), and tell() on that object.
  
  In WSGI contexts, the start_response function returns a write()
  function. That's a function to write data, not a full file object.
  So, when the archival code was first introduced in https://phab.mercurial-scm.org/rHG2b03c6733efae2a76179cfc49bd591023fb0fa3a in
  2006, someone invented a proxy "tellable" type that wrapped a file
  object like object and kept track of write count so it could
  implement tell() and satisfy zipfile's needs.
  
  When our archival code runs, it attempts to tell() the destination
  and if that fails, converts it to a "tellable" instance. Our WSGI
  application passes the "wsgirequest" instance to the archival
  function. It fails the tell() test and is converted to a "tellable."
  It's worth noting that "wsgirequest" implements flush(), so
  "tellable" doesn't.
  
  This hackery all seems very specific to the WSGI code. So this commit
  moves the "tellable" type and the conversion of the destination file
  object into the WSGI code. There's a chance some other caller may be
  passing a file object like object that doesn't implement tell(). But
  I doubt it.
  
  As part of the refactor, our new type implements flush() and doesn't
  implement __getattr__. Given the intended limited use of this type,
  I want things to fail fast if there is an attempt to access attributes
  because I think it is important to document which attributes are being
  used for what purposes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2791

AFFECTED FILES
  mercurial/archival.py
  mercurial/hgweb/request.py
  mercurial/hgweb/webcommands.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 15, 2018, 9:54 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> request.py:294
> +class offsettrackingwriter(object):
> +    """A file object like object that is append only and tracks write count.
> +

s/file object like object/file-like object/ ?
s/append only/append-only/ ?

> request.py:319
> +    def flush(self):
> +        pass
> +

IIUC, the old code would forward the call via __getattr__, but the new code ignores it. Any practical consequence? I don't know what flush() on a WSGI request does.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2791

To: indygreg, #hg-reviewers, durin42
Cc: martinvonz, mercurial-devel

Patch

diff --git a/mercurial/hgweb/webcommands.py b/mercurial/hgweb/webcommands.py
--- a/mercurial/hgweb/webcommands.py
+++ b/mercurial/hgweb/webcommands.py
@@ -24,6 +24,9 @@ 
     paritygen,
     staticfile,
 )
+from . import (
+    request as requestmod,
+)
 
 from .. import (
     archival,
@@ -1215,7 +1218,9 @@ 
     req.headers.extend(headers)
     req.respond(HTTP_OK, mimetype)
 
-    archival.archive(web.repo, req, cnode, artype, prefix=name,
+    bodyfh = requestmod.offsettrackingwriter(req.write)
+
+    archival.archive(web.repo, bodyfh, cnode, artype, prefix=name,
                      matchfn=match,
                      subrepos=web.configbool("web", "archivesubrepos"))
     return []
diff --git a/mercurial/hgweb/request.py b/mercurial/hgweb/request.py
--- a/mercurial/hgweb/request.py
+++ b/mercurial/hgweb/request.py
@@ -290,6 +290,37 @@ 
                          headers=headers,
                          bodyfh=bodyfh)
 
+class offsettrackingwriter(object):
+    """A file object like object that is append only and tracks write count.
+
+    Instances are bound to a callable. This callable is called with data
+    whenever a ``write()`` is attempted.
+
+    Instances track the amount of written data so they can answer ``tell()``
+    requests.
+
+    The intent of this class is to wrap the ``write()`` function returned by
+    a WSGI ``start_response()`` function. Since ``write()`` is a callable and
+    not a file object, it doesn't implement other file object methods.
+    """
+    def __init__(self, writefn):
+        self._write = writefn
+        self._offset = 0
+
+    def write(self, s):
+        res = self._write(s)
+        # Some Python objects don't report the number of bytes written.
+        if res is None:
+            self._offset += len(s)
+        else:
+            self._offset += res
+
+    def flush(self):
+        pass
+
+    def tell(self):
+        return self._offset
+
 class wsgiresponse(object):
     """Represents a response to a WSGI request.
 
diff --git a/mercurial/archival.py b/mercurial/archival.py
--- a/mercurial/archival.py
+++ b/mercurial/archival.py
@@ -195,34 +195,11 @@ 
         if self.fileobj:
             self.fileobj.close()
 
-class tellable(object):
-    '''provide tell method for zipfile.ZipFile when writing to http
-    response file object.'''
-
-    def __init__(self, fp):
-        self.fp = fp
-        self.offset = 0
-
-    def __getattr__(self, key):
-        return getattr(self.fp, key)
-
-    def write(self, s):
-        self.fp.write(s)
-        self.offset += len(s)
-
-    def tell(self):
-        return self.offset
-
 class zipit(object):
     '''write archive to zip file or stream.  can write uncompressed,
     or compressed with deflate.'''
 
     def __init__(self, dest, mtime, compress=True):
-        if not isinstance(dest, bytes):
-            try:
-                dest.tell()
-            except (AttributeError, IOError):
-                dest = tellable(dest)
         self.z = zipfile.ZipFile(pycompat.fsdecode(dest), r'w',
                                  compress and zipfile.ZIP_DEFLATED or
                                  zipfile.ZIP_STORED)