Patchwork D2939: fsmonitor: layer on another hack in bser.c for os.stat() compat (issue5811)

login
register
mail settings
Submitter phabricator
Date March 24, 2018, 6:30 p.m.
Message ID <differential-rev-PHID-DREV-zs3s7r4oup7ffmbifdru-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29820/
State Superseded
Headers show

Comments

phabricator - March 24, 2018, 6:30 p.m.
durin42 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  It's unclear to me how these `bserobj_tuple` objects are used, other
  than as stat objects. This should fix fsmonitor in the wake of
  https://phab.mercurial-scm.org/rHGffa3026d41964b2d06358c4f21f7e722264d1b3f and similar changes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

AFFECTED FILES
  hgext/fsmonitor/pywatchman/bser.c

CHANGE DETAILS




To: durin42, #hg-reviewers
Cc: mercurial-devel
phabricator - March 24, 2018, 6:30 p.m.
durin42 added a subscriber: indygreg.
durin42 added a comment.


  @indygreg I think you saw this failure mode, I'd appreciate it if you could check if this fixes the watchman failures I introduced (I don't use watchman, and so I'm not quite sure how to reproduce.)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers
Cc: indygreg, mercurial-devel
phabricator - March 26, 2018, 4:04 p.m.
indygreg requested changes to this revision.
indygreg added a comment.
This revision now requires changes to proceed.


  This still fails for me with:
  
      File "/home/gps/src/hg/mercurial/context.py", line 1590, in _dirstatestatus
        clean=clean, unknown=unknown)
      File "/home/gps/src/hg/mercurial/dirstate.py", line 1073, in status
        elif (time != st[stat.ST_MTIME]
    IndexError: tuple index out of range
  
  Also, hacking the low-level bser data type at the bser level feels **very** wrong. I would rather we roll our own C type that holds a reference to the `bserObject` and provides the necessary object API. Although there are already Mercurial hacks in `bser.c` for `st_*` attribute access. So I could be convinced that this hack is acceptable.
  
  I can try debugging this failure if that would be helpful. Perhaps something in CPython land is evaluating `len()` and short-circuiting the `[8]` access.

INLINE COMMENTS

> bser.c:105
>  
> +  if (i == 8 && PySequence_Size(obj->values) < 7) {
> +    // Hack alert: Python 3 removed support for os.stat().st_mtime

Is there an off by 2 with `7`? Shouldn't it be `9` (since accesses up to `[7]` - an object with 8 elements) should be allowed?

> bser.c:112
> +    // st_mtime instead.
> +    return bserobj_getattrro(o, PyBytes_FromString("st_mtime"));
> +  }

This leaks the `st_mtime` PyObject.

Also, strictly speaking, we should verify that the return value is not `NULL`.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers, indygreg
Cc: indygreg, mercurial-devel
phabricator - March 26, 2018, 4:47 p.m.
durin42 added a comment.


  In https://phab.mercurial-scm.org/D2939#47601, @indygreg wrote:
  
  > This still fails for me with:
  >
  >     File "/home/gps/src/hg/mercurial/context.py", line 1590, in _dirstatestatus
  >       clean=clean, unknown=unknown)
  >     File "/home/gps/src/hg/mercurial/dirstate.py", line 1073, in status
  >       elif (time != st[stat.ST_MTIME]
  >   IndexError: tuple index out of range
  >
  >
  > Also, hacking the low-level bser data type at the bser level feels **very** wrong. I would rather we roll our own C type that holds a reference to the `bserObject` and provides the necessary object API. Although there are already Mercurial hacks in `bser.c` for `st_*` attribute access. So I could be convinced that this hack is acceptable.
  
  
  Yeah, I mostly wanted to verify a fix before trying to do something principled.
  
  > I can try debugging this failure if that would be helpful. Perhaps something in CPython land is evaluating `len()` and short-circuiting the `[8]` access.
  
  Yes, debugging would help a ton. I don't really have enough understanding to make enough headway here :(

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers, indygreg
Cc: indygreg, mercurial-devel
phabricator - March 26, 2018, 4:56 p.m.
indygreg added a comment.


  I may look into this today. Having fsmonitor disabled makes operations on Firefox repos much slower and this is hampering my ability to hack on that repo :/

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers, indygreg
Cc: indygreg, mercurial-devel
phabricator - March 29, 2018, 12:15 a.m.
indygreg added a comment.


  I managed to debug this.
  
  `st[state.ST_MTIME]` will invoke `PyObject_GetItem`. Its implementation does this:
  
    m = o->ob_type->tp_as_mapping;
    if (m && m->mp_subscript)
        return m->mp_subscript(o, key);
    
    if (o->ob_type->tp_as_sequence) {
        if (PyIndex_Check(key)) {
            Py_ssize_t key_value;
            key_value = PyNumber_AsSsize_t(key, PyExc_IndexError);
            if (key_value == -1 && PyErr_Occurred())
                return NULL;
            return PySequence_GetItem(o, key_value);
        }
  
  Our `bserObjectType` defines `tp_as_mapping` and `st[state.ST_MTIME]` ends up calling `bserobj_getattrro()`. That function sees the argument is an integer and ends up calling `PySequence_GetItem()`:
  
    if (PyIndex_Check(name)) {
      i = PyNumber_AsSsize_t(name, PyExc_IndexError);
      if (i == -1 && PyErr_Occurred()) {
        goto bail;
      }
      ret = PySequence_GetItem(obj->values, i);
      goto bail;
    }
  
  `PySequence_GetItem()` does the following:
  
    m = s->ob_type->tp_as_sequence;
    if (m && m->sq_item) {
        if (i < 0) {
            if (m->sq_length) {
                Py_ssize_t l = (*m->sq_length)(s);
                if (l < 0)
                    return NULL;
                i += l;
            }
        }
        return m->sq_item(s, i);
    }
  
  `sq_item` here is `tupleitem()`, not `bserobj_tuple_item()`. It tells the truth and `IndexError` is raised.
  
  So, I think we want the hack to live in `bserobj_getattrro()`. I don't think the hack needs to live in `bserobj_tuple_item()` because that function only ever gets called through `bserobj_getattrro()`.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers, indygreg
Cc: spectral, indygreg, mercurial-devel
phabricator - April 9, 2018, 9:31 p.m.
durin42 added a comment.


  PTAL - I think this should be ready to go.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers, indygreg
Cc: spectral, indygreg, mercurial-devel
phabricator - April 12, 2018, 3:52 p.m.
indygreg accepted this revision.
indygreg added a comment.
This revision is now accepted and ready to land.


  This is super hacky. That's par for the course for this file, sadly.
  
  Also, I'm kinda surprised we're fully realizing the Python tuple to represent stat results. If we made tuple attribute access lazy, that would probably make `hg status` a bit faster, since we never access every field of that tuple. These kinds of things never show up in Python profilers though. And it is difficult to pin performance problems on Python object creation. But we know it is a very real thing. Obviously work for another day.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2939

To: durin42, #hg-reviewers, indygreg
Cc: spectral, indygreg, mercurial-devel

Patch

diff --git a/hgext/fsmonitor/pywatchman/bser.c b/hgext/fsmonitor/pywatchman/bser.c
--- a/hgext/fsmonitor/pywatchman/bser.c
+++ b/hgext/fsmonitor/pywatchman/bser.c
@@ -99,6 +99,15 @@ 
 static PyObject* bserobj_tuple_item(PyObject* o, Py_ssize_t i) {
   bserObject* obj = (bserObject*)o;
 
+  if (i == 8 && PySequence_Size(obj->values) < 7) {
+    // Hack alert: Python 3 removed support for os.stat().st_mtime
+    // being an integer.Instead, if you need an integer, you have to
+    // use os.stat()[stat.ST_MTIME] instead. stat.ST_MTIME is 8, and
+    // our stat tuples are shorter than that, so we can detect
+    // requests for index 8 on tuples shorter than that and return
+    // st_mtime instead.
+    return bserobj_getattrro(o, "st_mtime");
+  }
   return PySequence_GetItem(obj->values, i);
 }