Patchwork [1,of,3,RFC] mercurial: add python re2 bindings

login
register
mail settings
Submitter Siddharth Agarwal
Date Sept. 2, 2014, 9:18 p.m.
Message ID <f3b022e755dd7cf54e1d.1409692688@devbig136.prn2.facebook.com>
Download mbox | patch
Permalink /patch/5684/
State Changes Requested
Headers show

Comments

Siddharth Agarwal - Sept. 2, 2014, 9:18 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1409591324 25200
#      Mon Sep 01 10:08:44 2014 -0700
# Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
# Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
mercurial: add python re2 bindings

These bindings will enable packagers to build Mercurial with re2 support.

The bindings are licensed as 3-clause BSD.

I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
This is my first time doing this, so it is very likely I got some things wrong.
Augie Fackler - Sept. 9, 2014, 4:28 p.m.
On Tue, Sep 02, 2014 at 02:18:08PM -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1409591324 25200
> #      Mon Sep 01 10:08:44 2014 -0700
> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
> mercurial: add python re2 bindings

It looks like this is bringing in a package that would otherwise live
on pypi, is that right?

Is it possible that we could just recommend that packagers make sure
those bindings are packaged too, rather than embedding a bonus copy in
mercurial?

>
> These bindings will enable packagers to build Mercurial with re2 support.
>
> The bindings are licensed as 3-clause BSD.
>
> I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
> This is my first time doing this, so it is very likely I got some things wrong.
>
> diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/pyre2/LICENSE
> @@ -0,0 +1,25 @@
> +Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
> +
> +Redistribution and use in source and binary forms, with or without
> +modification, are permitted provided that the following conditions
> +are met:
> +* Redistributions of source code must retain the above copyright
> +  notice, this list of conditions and the following disclaimer.
> +* Redistributions in binary form must reproduce the above copyright
> +  notice, this list of conditions and the following disclaimer in the
> +  documentation and/or other materials provided with the distribution.
> +* Neither the name of Facebook nor the names of its contributors
> +  may be used to endorse or promote products derived from this software
> +  without specific prior written permission.
> +
> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/pyre2/README.rst
> @@ -0,0 +1,71 @@
> +=====
> +pyre2
> +=====
> +
> +.. contents::
> +
> +Summary
> +=======
> +
> +pyre2 is a Python extension that wraps
> +`Google's RE2 regular expression library
> +<http://code.google.com/p/re2/>`_.
> +It implements many of the features of Python's built-in
> +``re`` module with compatible interfaces.
> +
> +
> +New Features
> +============
> +
> +* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
> +  but anchors the match at both the start and the end.
> +* ``Regexp`` objects have
> +  ``test_search``, ``test_match``, and ``test_fullmatch``
> +  methods that work like ``search``, ``match``, and ``fullmatch``,
> +  but only return ``True`` or ``False`` to indicate
> +  whether the match was successful.
> +  These methods should be faster than the full versions,
> +  especially for patterns with capturing groups.
> +
> +
> +Missing Features
> +================
> +
> +* No substitution methods.
> +* No flags.
> +* No ``split``, ``findall``, or ``finditer``.
> +* No top-level convenience functions like ``search`` and ``match``.
> +  (Just use compile.)
> +* No compile cache.
> +  (If you care enough about performance to use RE2,
> +  you probably care enough to cache your own patterns.)
> +* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
> +
> +
> +Current Status
> +==============
> +
> +pyre2 has only received basic testing,
> +and I am by no means a Python extension expert,
> +so it is quite possible that it contains bugs.
> +I'd guess the most likely are reference leaks in error cases.
> +
> +RE2 doesn't build with fPIC, so I had to bulid it with
> +
> +::
> +
> +  make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
> +
> +I also had to add it to my compiler search path when building the module
> +with a command like
> +
> +::
> +
> +  env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
> +
> +
> +Contact
> +=======
> +
> +You can file bug reports on GitHub, or email the author:
> +David Reiss <dreiss@facebook.com>.
> diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/pyre2/_re2.cc
> @@ -0,0 +1,753 @@
> +/*
> + * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in the
> + *   documentation and/or other materials provided with the distribution.
> + * * Neither the name of Facebook nor the names of its contributors
> + *   may be used to endorse or promote products derived from this software
> + *   without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +
> +#include <cstddef>
> +
> +#include <string>
> +#include <new>
> +using std::nothrow;
> +
> +#include <re2/re2.h>
> +using re2::RE2;
> +using re2::StringPiece;
> +
> +
> +typedef struct _RegexpObject2 {
> +  PyObject_HEAD
> +  // __dict__.  Simpler than implementing getattr and possibly faster.
> +  PyObject* attr_dict;
> +  RE2* re2_obj;
> +} RegexpObject2;
> +
> +typedef struct _MatchObject2 {
> +  PyObject_HEAD
> +  // __dict__.  Simpler than implementing getattr and possibly faster.
> +  PyObject* attr_dict;
> +  // Cache of __dict__["re"] and __dict__["string", which are used for group()
> +  // calls. These fields do *not* own their own references.  They piggyback on
> +  // the references in attr_dict.
> +  PyObject* re;
> +  PyObject* string;
> +  // There are several possible approaches to storing the matched groups:
> +  // 1. Fully materialize the groups tuple at match time.
> +  // 2. Cache allocate PyString objects when groups are requested.
> +  // 3. Always allocate new PyStrings on demand.
> +  // I've chosen to go with #3.  It's the simplest, and I'm pretty sure it's
> +  // optimal in all cases where no group is fetched more than once.
> +  StringPiece* groups;
> +} MatchObject2;
> +
> +
> +// Imported from sre_constants.
> +static PyObject* error_class;
> +
> +
> +// Forward declarations of methods, creators, and destructors.
> +static void regexp_dealloc(RegexpObject2* self);
> +static PyObject* create_regexp(PyObject* pattern);
> +static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
> +static void match_dealloc(MatchObject2* self);
> +static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
> +static PyObject* match_group(MatchObject2* self, PyObject* args);
> +static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
> +static PyObject* match_start(MatchObject2* self, PyObject* args);
> +static PyObject* match_end(MatchObject2* self, PyObject* args);
> +static PyObject* match_span(MatchObject2* self, PyObject* args);
> +
> +
> +static PyMethodDef regexp_methods[] = {
> +  {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
> +    "search(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Scan through string looking for a match, and return a corresponding\n"
> +    "    MatchObject instance. Return None if no position in the string matches."
> +  },
> +  {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
> +    "match(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Matches zero or more characters at the beginning of the string"
> +  },
> +  {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
> +    "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Matches the entire string"
> +  },
> +  {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
> +    "test_search(string[, pos[, endpos]]) --> bool.\n"
> +    "    Like 'search', but only returns whether a match was found."
> +  },
> +  {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
> +    "test_match(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Like 'match', but only returns whether a match was found."
> +  },
> +  {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
> +    "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
> +    "    Like 'fullmatch', but only returns whether a match was found."
> +  },
> +  {NULL}  /* Sentinel */
> +};
> +
> +static PyMethodDef match_methods[] = {
> +  {"group", (PyCFunction)match_group, METH_VARARGS,
> +    NULL
> +  },
> +  {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
> +    NULL
> +  },
> +  {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
> +    NULL
> +  },
> +  {"start", (PyCFunction)match_start, METH_VARARGS,
> +    NULL
> +  },
> +  {"end", (PyCFunction)match_end, METH_VARARGS,
> +    NULL
> +  },
> +  {"span", (PyCFunction)match_span, METH_VARARGS,
> +    NULL
> +  },
> +  {NULL}  /* Sentinel */
> +};
> +
> +
> +// Simple method to block setattr.
> +static int
> +_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
> +  (void)name;
> +  (void)v;
> +	PyErr_Format(PyExc_AttributeError,
> +      "'%s' object attributes are read-only",
> +      obj->ob_type->tp_name);
> +  return -1;
> +}
> +
> +
> +static PyTypeObject Regexp_Type2 = {
> +  PyObject_HEAD_INIT(NULL)
> +  0,                           /*ob_size*/
> +  "_re2.RE2_Regexp",           /*tp_name*/
> +  sizeof(RegexpObject2),       /*tp_basicsize*/
> +  0,                           /*tp_itemsize*/
> +  (destructor)regexp_dealloc,  /*tp_dealloc*/
> +  0,                           /*tp_print*/
> +  0,                           /*tp_getattr*/
> +  0,                           /*tp_setattr*/
> +  0,                           /*tp_compare*/
> +  0,                           /*tp_repr*/
> +  0,                           /*tp_as_number*/
> +  0,                           /*tp_as_sequence*/
> +  0,                           /*tp_as_mapping*/
> +  0,                           /*tp_hash*/
> +  0,                           /*tp_call*/
> +  0,                           /*tp_str*/
> +  0,                           /*tp_getattro*/
> +  _no_setattr,                 /*tp_setattro*/
> +  0,                           /*tp_as_buffer*/
> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
> +  "RE2 regexp objects",        /*tp_doc*/
> +  0,                           /*tp_traverse*/
> +  0,                           /*tp_clear*/
> +  0,                           /*tp_richcompare*/
> +  0,                           /*tp_weaklistoffset*/
> +  0,                           /*tp_iter*/
> +  0,                           /*tp_iternext*/
> +  regexp_methods,              /*tp_methods*/
> +  0,                           /*tp_members*/
> +  0,                           /*tp_getset*/
> +  0,                           /*tp_base*/
> +  0,                           /*tp_dict*/
> +  0,                           /*tp_descr_get*/
> +  0,                           /*tp_descr_set*/
> +  offsetof(RegexpObject2, attr_dict),  /*tp_dictoffset*/
> +  0,                           /*tp_init*/
> +  0,                           /*tp_alloc*/
> +  0,                           /*tp_new*/
> +};
> +
> +static PyTypeObject Match_Type2 = {
> +  PyObject_HEAD_INIT(NULL)
> +  0,                           /*ob_size*/
> +  "_re2.RE2_Match",            /*tp_name*/
> +  sizeof(MatchObject2),        /*tp_basicsize*/
> +  0,                           /*tp_itemsize*/
> +  (destructor)match_dealloc,   /*tp_dealloc*/
> +  0,                           /*tp_print*/
> +  0,                           /*tp_getattr*/
> +  0,                           /*tp_setattr*/
> +  0,                           /*tp_compare*/
> +  0,                           /*tp_repr*/
> +  0,                           /*tp_as_number*/
> +  0,                           /*tp_as_sequence*/
> +  0,                           /*tp_as_mapping*/
> +  0,                           /*tp_hash*/
> +  0,                           /*tp_call*/
> +  0,                           /*tp_str*/
> +  0,                           /*tp_getattro*/
> +  _no_setattr,                 /*tp_setattro*/
> +  0,                           /*tp_as_buffer*/
> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
> +  "RE2 match objects",         /*tp_doc*/
> +  0,                           /*tp_traverse*/
> +  0,                           /*tp_clear*/
> +  0,                           /*tp_richcompare*/
> +  0,                           /*tp_weaklistoffset*/
> +  0,                           /*tp_iter*/
> +  0,                           /*tp_iternext*/
> +  match_methods,               /*tp_methods*/
> +  0,                           /*tp_members*/
> +  0,                           /*tp_getset*/
> +  0,                           /*tp_base*/
> +  0,                           /*tp_dict*/
> +  0,                           /*tp_descr_get*/
> +  0,                           /*tp_descr_set*/
> +  offsetof(MatchObject2, attr_dict),  /*tp_dictoffset*/
> +  0,                           /*tp_init*/
> +  0,                           /*tp_alloc*/
> +  0,                           /*tp_new*/
> +};
> +
> +
> +static void
> +regexp_dealloc(RegexpObject2* self)
> +{
> +  delete self->re2_obj;
> +  Py_XDECREF(self->attr_dict);
> +  PyObject_Del(self);
> +}
> +
> +static PyObject*
> +create_regexp(PyObject* pattern)
> +{
> +  RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
> +  if (regexp == NULL) {
> +    return NULL;
> +  }
> +  regexp->re2_obj = NULL;
> +  regexp->attr_dict = NULL;
> +
> +  const char* raw_pattern = PyString_AS_STRING(pattern);
> +  Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
> +
> +  RE2::Options options;
> +  options.set_log_errors(false);
> +
> +  regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
> +
> +  if (regexp->re2_obj == NULL) {
> +    PyErr_NoMemory();
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  if (!regexp->re2_obj->ok()) {
> +    long code = (long)regexp->re2_obj->error_code();
> +    const std::string& msg = regexp->re2_obj->error();
> +    PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
> +    if (value == NULL) {
> +      Py_DECREF(regexp);
> +      return NULL;
> +    }
> +    PyErr_SetObject(error_class, value);
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  PyObject* groupindex = PyDict_New();
> +  if (groupindex == NULL) {
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  // Build up the attr_dict early so regexp can take ownership of our reference
> +  // to groupindex.
> +  regexp->attr_dict = Py_BuildValue("{sisNsO}",
> +      "groups", regexp->re2_obj->NumberOfCapturingGroups(),
> +      "groupindex", groupindex,
> +      "pattern", pattern);
> +  if (regexp->attr_dict == NULL) {
> +    Py_DECREF(regexp);
> +    return NULL;
> +  }
> +
> +  const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
> +    PyObject* index = PyInt_FromLong(it->second);
> +    if (index == NULL) {
> +      Py_DECREF(regexp);
> +      return NULL;
> +    }
> +    int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
> +    Py_DECREF(index);
> +    if (res < 0) {
> +      Py_DECREF(regexp);
> +      return NULL;
> +    }
> +  }
> +
> +  return (PyObject*)regexp;
> +}
> +
> +static PyObject*
> +_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
> +{
> +  PyObject* string;
> +  const char* subject;
> +  Py_ssize_t slen;
> +  long pos = 0;
> +  long endpos = LONG_MAX;
> +
> +  static const char* kwlist[] = {
> +    "string",
> +    "pos",
> +    "endpos",
> +    NULL};
> +
> +  // Using O! instead of s# here, because we want to stash the original
> +  // PyObject* in the match object on a successful match.
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
> +        &PyString_Type, &string,
> +        &pos, &endpos)) {
> +    return NULL;
> +  }
> +
> +  subject = PyString_AS_STRING(string);
> +  slen = PyString_GET_SIZE(string);
> +  if (pos < 0) pos = 0;
> +  if (pos > slen) pos = slen;
> +  if (endpos < pos) endpos = pos;
> +  if (endpos > slen) endpos = slen;
> +
> +  // Don't bother allocating these if we are just doing a test.
> +  int n_groups = 0;
> +  StringPiece* groups = NULL;
> +  if (return_match) {
> +    n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
> +    groups = new(nothrow) StringPiece[n_groups];
> +
> +    if (groups == NULL) {
> +      PyErr_NoMemory();
> +      return NULL;
> +    }
> +  }
> +
> +  bool matched = self->re2_obj->Match(
> +      StringPiece(subject, (int) slen),
> +      (int) pos,
> +      (int) endpos,
> +      anchor,
> +      groups,
> +      n_groups);
> +
> +  if (!return_match) {
> +    if (matched) {
> +      Py_RETURN_TRUE;
> +    }
> +    Py_RETURN_FALSE;
> +  }
> +
> +  if (!matched) {
> +    delete[] groups;
> +    Py_RETURN_NONE;
> +  }
> +
> +  // create_match is going to Py_BuildValue the pos and endpos into
> +  // PyObjects.  We could optimize the case where pos and/or endpos were
> +  // explicitly passed in by forwarding the existing PyObjects.
> +  // That requires much more intricate code, though.
> +  return create_match((PyObject*)self, string, pos, endpos, groups);
> +}
> +
> +static PyObject*
> +regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::UNANCHORED, true);
> +}
> +
> +static PyObject*
> +regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
> +}
> +
> +static PyObject*
> +regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
> +}
> +
> +static PyObject*
> +regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::UNANCHORED, false);
> +}
> +
> +static PyObject*
> +regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
> +}
> +
> +static PyObject*
> +regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
> +}
> +
> +
> +static void
> +match_dealloc(MatchObject2* self)
> +{
> +  delete[] self->groups;
> +  Py_XDECREF(self->attr_dict);
> +  PyObject_Del(self);
> +}
> +
> +static PyObject*
> +create_match(PyObject* re, PyObject* string,
> +    long pos, long endpos,
> +    StringPiece* groups)
> +{
> +  MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
> +  if (match == NULL) {
> +    delete[] groups;
> +    return NULL;
> +  }
> +  match->attr_dict = NULL;
> +  match->groups = groups;
> +  match->re = re;
> +  match->string = string;
> +
> +  match->attr_dict = Py_BuildValue("{sOsOslsl}",
> +      "re", re,
> +      "string", string,
> +      "pos", pos,
> +      "endpos", endpos);
> +  if (match->attr_dict == NULL) {
> +    Py_DECREF(match);
> +    return NULL;
> +  }
> +
> +  return (PyObject*)match;
> +}
> +
> +/**
> + * Attempt to convert an untrusted group index (PyObject* group) into
> + * a trusted one (*idx_p).  Return false on failure (exception).
> + */
> +static bool
> +_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
> +{
> +  if (group == NULL) {
> +    return false;
> +  }
> +  PyErr_Clear(); // Is this necessary?
> +  long idx = PyInt_AsLong(group);
> +  if (idx == -1 && PyErr_Occurred() != NULL) {
> +    return false;
> +  }
> +  // TODO: Consider caching NumberOfCapturingGroups.
> +  if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
> +    PyErr_SetString(PyExc_IndexError, "no such group");
> +    return false;
> +  }
> +  *idx_p = idx;
> +  return true;
> +}
> +
> +/**
> + * Extract the start and end indexes of a pre-checked group number.
> + * Sets both to -1 if it did not participate in the match.
> + */
> +static bool
> +_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
> +{
> +  // "idx" is expected to be verified.
> +  StringPiece& piece = self->groups[idx];
> +  if (piece.data() == NULL) {
> +    *o_start = -1;
> +    *o_end = -1;
> +    return false;
> +  }
> +  Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
> +  *o_start = start;
> +  *o_end = start + piece.length();
> +  return true;
> +}
> +
> +/**
> + * Return a pre-checked group number as a string, or default_obj
> + * if it didn't participate in the match.
> + */
> +static PyObject*
> +_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
> +{
> +  Py_ssize_t start;
> +  Py_ssize_t end;
> +  if (!_group_span(self, idx, &start, &end)) {
> +    Py_INCREF(default_obj);
> +    return default_obj;
> +  }
> +  return PySequence_GetSlice(self->string, start, end);
> +}
> +
> +/**
> + * Return n un-checked group number as a string.
> + */
> +static PyObject*
> +_group_get_o(MatchObject2* self, PyObject* group)
> +{
> +  long idx;
> +  if (!_group_idx(self, group, &idx)) {
> +    return NULL;
> +  }
> +  return _group_get_i(self, idx, Py_None);
> +}
> +
> +
> +static PyObject*
> +match_group(MatchObject2* self, PyObject* args)
> +{
> +  long idx = 0;
> +  Py_ssize_t nargs = PyTuple_GET_SIZE(args);
> +  switch (nargs) {
> +    case 1:
> +      if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
> +        return NULL;
> +      }
> +      // Fall through.
> +    case 0:
> +      return _group_get_i(self, idx, Py_None);
> +    default:
> +      PyObject* ret = PyTuple_New(nargs);
> +      if (ret == NULL) {
> +        return NULL;
> +      }
> +
> +      for (int i = 0; i < nargs; i++) {
> +        PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
> +        if (group == NULL) {
> +          Py_DECREF(ret);
> +          return NULL;
> +        }
> +        PyTuple_SET_ITEM(ret, i, group);
> +      }
> +      return ret;
> +  }
> +}
> +
> +static PyObject*
> +match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  static const char* kwlist[] = {
> +    "default",
> +    NULL};
> +
> +  PyObject* default_obj = Py_None;
> +
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
> +        &default_obj)) {
> +    return NULL;
> +  }
> +
> +  int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
> +
> +  PyObject* ret = PyTuple_New(ngroups);
> +  if (ret == NULL) {
> +    return NULL;
> +  }
> +
> +  for (int i = 1; i <= ngroups; i++) {
> +    PyObject* group = _group_get_i(self, i, default_obj);
> +    if (group == NULL) {
> +      Py_DECREF(ret);
> +      return NULL;
> +    }
> +    PyTuple_SET_ITEM(ret, i-1, group);
> +  }
> +
> +  return ret;
> +}
> +
> +static PyObject*
> +match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
> +{
> +  static const char* kwlist[] = {
> +    "default",
> +    NULL};
> +
> +  PyObject* default_obj = Py_None;
> +
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
> +        &default_obj)) {
> +    return NULL;
> +  }
> +
> +  PyObject* ret = PyDict_New();
> +  if (ret == NULL) {
> +    return NULL;
> +  }
> +
> +  const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
> +    PyObject* group = _group_get_i(self, it->second, default_obj);
> +    if (group == NULL) {
> +      Py_DECREF(ret);
> +      return NULL;
> +    }
> +    // TODO: Group names with embedded zeroes?
> +    int res = PyDict_SetItemString(ret, it->first.data(), group);
> +    Py_DECREF(group);
> +    if (res < 0) {
> +      Py_DECREF(ret);
> +      return NULL;
> +    }
> +  }
> +
> +  return ret;
> +}
> +
> +enum span_mode_t { START, END, SPAN };
> +
> +static PyObject*
> +_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
> +{
> +  long idx = 0;
> +  PyObject* group = NULL;
> +  if (!PyArg_UnpackTuple(args, name, 0, 1,
> +        &group)) {
> +    return NULL;
> +  }
> +  if (group != NULL) {
> +    if (!_group_idx(self, group, &idx)) {
> +      return NULL;
> +    }
> +  }
> +
> +  Py_ssize_t start = - 1;
> +  Py_ssize_t end = - 1;
> +
> +  (void)_group_span(self, idx, &start, &end);
> +  switch (mode) {
> +    case START : return Py_BuildValue("n", start );
> +    case END   : return Py_BuildValue("n", end   );
> +    case SPAN:
> +      return Py_BuildValue("nn", start, end);
> +  }
> +
> +  // Make gcc happy.
> +  return NULL;
> +}
> +
> +static PyObject*
> +match_start(MatchObject2* self, PyObject* args)
> +{
> +  return _do_span(self, args, "start", START);
> +}
> +
> +static PyObject*
> +match_end(MatchObject2* self, PyObject* args)
> +{
> +  return _do_span(self, args, "end", END);
> +}
> +
> +static PyObject*
> +match_span(MatchObject2* self, PyObject* args)
> +{
> +  return _do_span(self, args, "span", SPAN);
> +}
> +
> +
> +static PyObject*
> +_compile(PyObject* self, PyObject* args, PyObject* kwds)
> +{
> +  static const char* kwlist[] = {
> +    "pattern",
> +    NULL};
> +
> +  PyObject* pattern;
> +
> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
> +        &pattern)) {
> +    return NULL;
> +  }
> +
> +  return create_regexp(pattern);
> +}
> +
> +static PyObject*
> +escape(PyObject* self, PyObject* args)
> +{
> +  char *str;
> +  Py_ssize_t len;
> +
> +  if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
> +    return NULL;
> +  }
> +
> +  std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
> +
> +  return PyString_FromStringAndSize(esc.c_str(), esc.size());
> +}
> +
> +static PyMethodDef methods[] = {
> +  {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
> +  {"escape", (PyCFunction)escape, METH_VARARGS,
> +   "Escape all potentially meaningful regexp characters."},
> +  {NULL}  /* Sentinel */
> +};
> +
> +PyMODINIT_FUNC
> +init_re2(void)
> +{
> +  if (PyType_Ready(&Regexp_Type2) < 0) {
> +    return;
> +  }
> +
> +  if (PyType_Ready(&Match_Type2) < 0) {
> +    return;
> +  }
> +
> +  PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
> +  if (sre_mod == NULL) {
> +    return;
> +  }
> +  /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
> +  if (error_class == NULL) {
> +    return;
> +  }
> +
> +  PyObject* mod = Py_InitModule("_re2", methods);
> +
> +  Py_INCREF(error_class);
> +  PyModule_AddObject(mod, "error", error_class);
> +}
> diff --git a/mercurial/re2.py b/mercurial/re2.py
> new file mode 100644
> --- /dev/null
> +++ b/mercurial/re2.py
> @@ -0,0 +1,63 @@
> +#!/usr/bin/env python
> +
> +# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in the
> +#   documentation and/or other materials provided with the distribution.
> +# * Neither the name of Facebook nor the names of its contributors
> +#   may be used to endorse or promote products derived from this software
> +#   without specific prior written permission.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
> +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +import _re2
> +
> +__all__ = [
> +    "error",
> +    "escape",
> +    "compile",
> +    "search",
> +    "match",
> +    "fullmatch",
> +    ]
> +
> +# Module-private compilation function, for future caching, other enhancements
> +_compile = _re2._compile
> +
> +error = _re2.error
> +escape = _re2.escape
> +
> +def compile(pattern):
> +    "Compile a regular expression pattern, returning a pattern object."
> +    return _compile(pattern)
> +
> +def search(pattern, string):
> +    """Scan through string looking for a match to the pattern, returning
> +    a match object, or None if no match was found."""
> +    return _compile(pattern).search(string)
> +
> +def match(pattern, string):
> +    """Try to apply the pattern at the start of the string, returning
> +    a match object, or None if no match was found."""
> +    return _compile(pattern).match(string)
> +
> +def fullmatch(pattern, string):
> +    """Try to apply the pattern to the entire string, returning
> +    a match object, or None if no match was found."""
> +    return _compile(pattern).fullmatch(string)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Siddharth Agarwal - Sept. 9, 2014, 5:07 p.m.
On 09/09/2014 09:28 AM, Augie Fackler wrote:
> On Tue, Sep 02, 2014 at 02:18:08PM -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0@fb.com>
>> # Date 1409591324 25200
>> #      Mon Sep 01 10:08:44 2014 -0700
>> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
>> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
>> mercurial: add python re2 bindings
> It looks like this is bringing in a package that would otherwise live
> on pypi, is that right?

Actually, no. There are two separate Python bindings for re2, both 
called pyre2:

- The bindings written by Facebook available at 
https://github.com/facebook/pyre2. These are known to work.
- The bindings at https://pypi.python.org/pypi/re2/ written by someone 
else. These are known to be broken, and indeed we have code in util.py 
that detects these broken bindings and disables re2 if they're found.

> Is it possible that we could just recommend that packagers make sure
> those bindings are packaged too, rather than embedding a bonus copy in
> mercurial?

Not if they fetch from pypi :/ I discussed this with mpm and he thought 
checking the bindings in was reasonable.

>
>> These bindings will enable packagers to build Mercurial with re2 support.
>>
>> The bindings are licensed as 3-clause BSD.
>>
>> I've moved re2.py to mercurial/ to allow 'from mercurial import re2' to work.
>> This is my first time doing this, so it is very likely I got some things wrong.
>>
>> diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/LICENSE
>> @@ -0,0 +1,25 @@
>> +Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> +
>> +Redistribution and use in source and binary forms, with or without
>> +modification, are permitted provided that the following conditions
>> +are met:
>> +* Redistributions of source code must retain the above copyright
>> +  notice, this list of conditions and the following disclaimer.
>> +* Redistributions in binary form must reproduce the above copyright
>> +  notice, this list of conditions and the following disclaimer in the
>> +  documentation and/or other materials provided with the distribution.
>> +* Neither the name of Facebook nor the names of its contributors
>> +  may be used to endorse or promote products derived from this software
>> +  without specific prior written permission.
>> +
>> +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/README.rst
>> @@ -0,0 +1,71 @@
>> +=====
>> +pyre2
>> +=====
>> +
>> +.. contents::
>> +
>> +Summary
>> +=======
>> +
>> +pyre2 is a Python extension that wraps
>> +`Google's RE2 regular expression library
>> +<http://code.google.com/p/re2/>`_.
>> +It implements many of the features of Python's built-in
>> +``re`` module with compatible interfaces.
>> +
>> +
>> +New Features
>> +============
>> +
>> +* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
>> +  but anchors the match at both the start and the end.
>> +* ``Regexp`` objects have
>> +  ``test_search``, ``test_match``, and ``test_fullmatch``
>> +  methods that work like ``search``, ``match``, and ``fullmatch``,
>> +  but only return ``True`` or ``False`` to indicate
>> +  whether the match was successful.
>> +  These methods should be faster than the full versions,
>> +  especially for patterns with capturing groups.
>> +
>> +
>> +Missing Features
>> +================
>> +
>> +* No substitution methods.
>> +* No flags.
>> +* No ``split``, ``findall``, or ``finditer``.
>> +* No top-level convenience functions like ``search`` and ``match``.
>> +  (Just use compile.)
>> +* No compile cache.
>> +  (If you care enough about performance to use RE2,
>> +  you probably care enough to cache your own patterns.)
>> +* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
>> +
>> +
>> +Current Status
>> +==============
>> +
>> +pyre2 has only received basic testing,
>> +and I am by no means a Python extension expert,
>> +so it is quite possible that it contains bugs.
>> +I'd guess the most likely are reference leaks in error cases.
>> +
>> +RE2 doesn't build with fPIC, so I had to bulid it with
>> +
>> +::
>> +
>> +  make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
>> +
>> +I also had to add it to my compiler search path when building the module
>> +with a command like
>> +
>> +::
>> +
>> +  env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
>> +
>> +
>> +Contact
>> +=======
>> +
>> +You can file bug reports on GitHub, or email the author:
>> +David Reiss <dreiss@facebook.com>.
>> diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/pyre2/_re2.cc
>> @@ -0,0 +1,753 @@
>> +/*
>> + * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * * Redistributions of source code must retain the above copyright
>> + *   notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above copyright
>> + *   notice, this list of conditions and the following disclaimer in the
>> + *   documentation and/or other materials provided with the distribution.
>> + * * Neither the name of Facebook nor the names of its contributors
>> + *   may be used to endorse or promote products derived from this software
>> + *   without specific prior written permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +#define PY_SSIZE_T_CLEAN
>> +#include <Python.h>
>> +
>> +#include <cstddef>
>> +
>> +#include <string>
>> +#include <new>
>> +using std::nothrow;
>> +
>> +#include <re2/re2.h>
>> +using re2::RE2;
>> +using re2::StringPiece;
>> +
>> +
>> +typedef struct _RegexpObject2 {
>> +  PyObject_HEAD
>> +  // __dict__.  Simpler than implementing getattr and possibly faster.
>> +  PyObject* attr_dict;
>> +  RE2* re2_obj;
>> +} RegexpObject2;
>> +
>> +typedef struct _MatchObject2 {
>> +  PyObject_HEAD
>> +  // __dict__.  Simpler than implementing getattr and possibly faster.
>> +  PyObject* attr_dict;
>> +  // Cache of __dict__["re"] and __dict__["string", which are used for group()
>> +  // calls. These fields do *not* own their own references.  They piggyback on
>> +  // the references in attr_dict.
>> +  PyObject* re;
>> +  PyObject* string;
>> +  // There are several possible approaches to storing the matched groups:
>> +  // 1. Fully materialize the groups tuple at match time.
>> +  // 2. Cache allocate PyString objects when groups are requested.
>> +  // 3. Always allocate new PyStrings on demand.
>> +  // I've chosen to go with #3.  It's the simplest, and I'm pretty sure it's
>> +  // optimal in all cases where no group is fetched more than once.
>> +  StringPiece* groups;
>> +} MatchObject2;
>> +
>> +
>> +// Imported from sre_constants.
>> +static PyObject* error_class;
>> +
>> +
>> +// Forward declarations of methods, creators, and destructors.
>> +static void regexp_dealloc(RegexpObject2* self);
>> +static PyObject* create_regexp(PyObject* pattern);
>> +static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
>> +static void match_dealloc(MatchObject2* self);
>> +static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
>> +static PyObject* match_group(MatchObject2* self, PyObject* args);
>> +static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
>> +static PyObject* match_start(MatchObject2* self, PyObject* args);
>> +static PyObject* match_end(MatchObject2* self, PyObject* args);
>> +static PyObject* match_span(MatchObject2* self, PyObject* args);
>> +
>> +
>> +static PyMethodDef regexp_methods[] = {
>> +  {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
>> +    "search(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Scan through string looking for a match, and return a corresponding\n"
>> +    "    MatchObject instance. Return None if no position in the string matches."
>> +  },
>> +  {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
>> +    "match(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Matches zero or more characters at the beginning of the string"
>> +  },
>> +  {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
>> +    "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Matches the entire string"
>> +  },
>> +  {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
>> +    "test_search(string[, pos[, endpos]]) --> bool.\n"
>> +    "    Like 'search', but only returns whether a match was found."
>> +  },
>> +  {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
>> +    "test_match(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Like 'match', but only returns whether a match was found."
>> +  },
>> +  {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
>> +    "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
>> +    "    Like 'fullmatch', but only returns whether a match was found."
>> +  },
>> +  {NULL}  /* Sentinel */
>> +};
>> +
>> +static PyMethodDef match_methods[] = {
>> +  {"group", (PyCFunction)match_group, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
>> +    NULL
>> +  },
>> +  {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
>> +    NULL
>> +  },
>> +  {"start", (PyCFunction)match_start, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {"end", (PyCFunction)match_end, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {"span", (PyCFunction)match_span, METH_VARARGS,
>> +    NULL
>> +  },
>> +  {NULL}  /* Sentinel */
>> +};
>> +
>> +
>> +// Simple method to block setattr.
>> +static int
>> +_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
>> +  (void)name;
>> +  (void)v;
>> +	PyErr_Format(PyExc_AttributeError,
>> +      "'%s' object attributes are read-only",
>> +      obj->ob_type->tp_name);
>> +  return -1;
>> +}
>> +
>> +
>> +static PyTypeObject Regexp_Type2 = {
>> +  PyObject_HEAD_INIT(NULL)
>> +  0,                           /*ob_size*/
>> +  "_re2.RE2_Regexp",           /*tp_name*/
>> +  sizeof(RegexpObject2),       /*tp_basicsize*/
>> +  0,                           /*tp_itemsize*/
>> +  (destructor)regexp_dealloc,  /*tp_dealloc*/
>> +  0,                           /*tp_print*/
>> +  0,                           /*tp_getattr*/
>> +  0,                           /*tp_setattr*/
>> +  0,                           /*tp_compare*/
>> +  0,                           /*tp_repr*/
>> +  0,                           /*tp_as_number*/
>> +  0,                           /*tp_as_sequence*/
>> +  0,                           /*tp_as_mapping*/
>> +  0,                           /*tp_hash*/
>> +  0,                           /*tp_call*/
>> +  0,                           /*tp_str*/
>> +  0,                           /*tp_getattro*/
>> +  _no_setattr,                 /*tp_setattro*/
>> +  0,                           /*tp_as_buffer*/
>> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
>> +  "RE2 regexp objects",        /*tp_doc*/
>> +  0,                           /*tp_traverse*/
>> +  0,                           /*tp_clear*/
>> +  0,                           /*tp_richcompare*/
>> +  0,                           /*tp_weaklistoffset*/
>> +  0,                           /*tp_iter*/
>> +  0,                           /*tp_iternext*/
>> +  regexp_methods,              /*tp_methods*/
>> +  0,                           /*tp_members*/
>> +  0,                           /*tp_getset*/
>> +  0,                           /*tp_base*/
>> +  0,                           /*tp_dict*/
>> +  0,                           /*tp_descr_get*/
>> +  0,                           /*tp_descr_set*/
>> +  offsetof(RegexpObject2, attr_dict),  /*tp_dictoffset*/
>> +  0,                           /*tp_init*/
>> +  0,                           /*tp_alloc*/
>> +  0,                           /*tp_new*/
>> +};
>> +
>> +static PyTypeObject Match_Type2 = {
>> +  PyObject_HEAD_INIT(NULL)
>> +  0,                           /*ob_size*/
>> +  "_re2.RE2_Match",            /*tp_name*/
>> +  sizeof(MatchObject2),        /*tp_basicsize*/
>> +  0,                           /*tp_itemsize*/
>> +  (destructor)match_dealloc,   /*tp_dealloc*/
>> +  0,                           /*tp_print*/
>> +  0,                           /*tp_getattr*/
>> +  0,                           /*tp_setattr*/
>> +  0,                           /*tp_compare*/
>> +  0,                           /*tp_repr*/
>> +  0,                           /*tp_as_number*/
>> +  0,                           /*tp_as_sequence*/
>> +  0,                           /*tp_as_mapping*/
>> +  0,                           /*tp_hash*/
>> +  0,                           /*tp_call*/
>> +  0,                           /*tp_str*/
>> +  0,                           /*tp_getattro*/
>> +  _no_setattr,                 /*tp_setattro*/
>> +  0,                           /*tp_as_buffer*/
>> +  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
>> +  "RE2 match objects",         /*tp_doc*/
>> +  0,                           /*tp_traverse*/
>> +  0,                           /*tp_clear*/
>> +  0,                           /*tp_richcompare*/
>> +  0,                           /*tp_weaklistoffset*/
>> +  0,                           /*tp_iter*/
>> +  0,                           /*tp_iternext*/
>> +  match_methods,               /*tp_methods*/
>> +  0,                           /*tp_members*/
>> +  0,                           /*tp_getset*/
>> +  0,                           /*tp_base*/
>> +  0,                           /*tp_dict*/
>> +  0,                           /*tp_descr_get*/
>> +  0,                           /*tp_descr_set*/
>> +  offsetof(MatchObject2, attr_dict),  /*tp_dictoffset*/
>> +  0,                           /*tp_init*/
>> +  0,                           /*tp_alloc*/
>> +  0,                           /*tp_new*/
>> +};
>> +
>> +
>> +static void
>> +regexp_dealloc(RegexpObject2* self)
>> +{
>> +  delete self->re2_obj;
>> +  Py_XDECREF(self->attr_dict);
>> +  PyObject_Del(self);
>> +}
>> +
>> +static PyObject*
>> +create_regexp(PyObject* pattern)
>> +{
>> +  RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
>> +  if (regexp == NULL) {
>> +    return NULL;
>> +  }
>> +  regexp->re2_obj = NULL;
>> +  regexp->attr_dict = NULL;
>> +
>> +  const char* raw_pattern = PyString_AS_STRING(pattern);
>> +  Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
>> +
>> +  RE2::Options options;
>> +  options.set_log_errors(false);
>> +
>> +  regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
>> +
>> +  if (regexp->re2_obj == NULL) {
>> +    PyErr_NoMemory();
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  if (!regexp->re2_obj->ok()) {
>> +    long code = (long)regexp->re2_obj->error_code();
>> +    const std::string& msg = regexp->re2_obj->error();
>> +    PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
>> +    if (value == NULL) {
>> +      Py_DECREF(regexp);
>> +      return NULL;
>> +    }
>> +    PyErr_SetObject(error_class, value);
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  PyObject* groupindex = PyDict_New();
>> +  if (groupindex == NULL) {
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  // Build up the attr_dict early so regexp can take ownership of our reference
>> +  // to groupindex.
>> +  regexp->attr_dict = Py_BuildValue("{sisNsO}",
>> +      "groups", regexp->re2_obj->NumberOfCapturingGroups(),
>> +      "groupindex", groupindex,
>> +      "pattern", pattern);
>> +  if (regexp->attr_dict == NULL) {
>> +    Py_DECREF(regexp);
>> +    return NULL;
>> +  }
>> +
>> +  const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
>> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
>> +    PyObject* index = PyInt_FromLong(it->second);
>> +    if (index == NULL) {
>> +      Py_DECREF(regexp);
>> +      return NULL;
>> +    }
>> +    int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
>> +    Py_DECREF(index);
>> +    if (res < 0) {
>> +      Py_DECREF(regexp);
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  return (PyObject*)regexp;
>> +}
>> +
>> +static PyObject*
>> +_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
>> +{
>> +  PyObject* string;
>> +  const char* subject;
>> +  Py_ssize_t slen;
>> +  long pos = 0;
>> +  long endpos = LONG_MAX;
>> +
>> +  static const char* kwlist[] = {
>> +    "string",
>> +    "pos",
>> +    "endpos",
>> +    NULL};
>> +
>> +  // Using O! instead of s# here, because we want to stash the original
>> +  // PyObject* in the match object on a successful match.
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
>> +        &PyString_Type, &string,
>> +        &pos, &endpos)) {
>> +    return NULL;
>> +  }
>> +
>> +  subject = PyString_AS_STRING(string);
>> +  slen = PyString_GET_SIZE(string);
>> +  if (pos < 0) pos = 0;
>> +  if (pos > slen) pos = slen;
>> +  if (endpos < pos) endpos = pos;
>> +  if (endpos > slen) endpos = slen;
>> +
>> +  // Don't bother allocating these if we are just doing a test.
>> +  int n_groups = 0;
>> +  StringPiece* groups = NULL;
>> +  if (return_match) {
>> +    n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
>> +    groups = new(nothrow) StringPiece[n_groups];
>> +
>> +    if (groups == NULL) {
>> +      PyErr_NoMemory();
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  bool matched = self->re2_obj->Match(
>> +      StringPiece(subject, (int) slen),
>> +      (int) pos,
>> +      (int) endpos,
>> +      anchor,
>> +      groups,
>> +      n_groups);
>> +
>> +  if (!return_match) {
>> +    if (matched) {
>> +      Py_RETURN_TRUE;
>> +    }
>> +    Py_RETURN_FALSE;
>> +  }
>> +
>> +  if (!matched) {
>> +    delete[] groups;
>> +    Py_RETURN_NONE;
>> +  }
>> +
>> +  // create_match is going to Py_BuildValue the pos and endpos into
>> +  // PyObjects.  We could optimize the case where pos and/or endpos were
>> +  // explicitly passed in by forwarding the existing PyObjects.
>> +  // That requires much more intricate code, though.
>> +  return create_match((PyObject*)self, string, pos, endpos, groups);
>> +}
>> +
>> +static PyObject*
>> +regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::UNANCHORED, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::UNANCHORED, false);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
>> +}
>> +
>> +static PyObject*
>> +regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
>> +}
>> +
>> +
>> +static void
>> +match_dealloc(MatchObject2* self)
>> +{
>> +  delete[] self->groups;
>> +  Py_XDECREF(self->attr_dict);
>> +  PyObject_Del(self);
>> +}
>> +
>> +static PyObject*
>> +create_match(PyObject* re, PyObject* string,
>> +    long pos, long endpos,
>> +    StringPiece* groups)
>> +{
>> +  MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
>> +  if (match == NULL) {
>> +    delete[] groups;
>> +    return NULL;
>> +  }
>> +  match->attr_dict = NULL;
>> +  match->groups = groups;
>> +  match->re = re;
>> +  match->string = string;
>> +
>> +  match->attr_dict = Py_BuildValue("{sOsOslsl}",
>> +      "re", re,
>> +      "string", string,
>> +      "pos", pos,
>> +      "endpos", endpos);
>> +  if (match->attr_dict == NULL) {
>> +    Py_DECREF(match);
>> +    return NULL;
>> +  }
>> +
>> +  return (PyObject*)match;
>> +}
>> +
>> +/**
>> + * Attempt to convert an untrusted group index (PyObject* group) into
>> + * a trusted one (*idx_p).  Return false on failure (exception).
>> + */
>> +static bool
>> +_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
>> +{
>> +  if (group == NULL) {
>> +    return false;
>> +  }
>> +  PyErr_Clear(); // Is this necessary?
>> +  long idx = PyInt_AsLong(group);
>> +  if (idx == -1 && PyErr_Occurred() != NULL) {
>> +    return false;
>> +  }
>> +  // TODO: Consider caching NumberOfCapturingGroups.
>> +  if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
>> +    PyErr_SetString(PyExc_IndexError, "no such group");
>> +    return false;
>> +  }
>> +  *idx_p = idx;
>> +  return true;
>> +}
>> +
>> +/**
>> + * Extract the start and end indexes of a pre-checked group number.
>> + * Sets both to -1 if it did not participate in the match.
>> + */
>> +static bool
>> +_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
>> +{
>> +  // "idx" is expected to be verified.
>> +  StringPiece& piece = self->groups[idx];
>> +  if (piece.data() == NULL) {
>> +    *o_start = -1;
>> +    *o_end = -1;
>> +    return false;
>> +  }
>> +  Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
>> +  *o_start = start;
>> +  *o_end = start + piece.length();
>> +  return true;
>> +}
>> +
>> +/**
>> + * Return a pre-checked group number as a string, or default_obj
>> + * if it didn't participate in the match.
>> + */
>> +static PyObject*
>> +_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
>> +{
>> +  Py_ssize_t start;
>> +  Py_ssize_t end;
>> +  if (!_group_span(self, idx, &start, &end)) {
>> +    Py_INCREF(default_obj);
>> +    return default_obj;
>> +  }
>> +  return PySequence_GetSlice(self->string, start, end);
>> +}
>> +
>> +/**
>> + * Return n un-checked group number as a string.
>> + */
>> +static PyObject*
>> +_group_get_o(MatchObject2* self, PyObject* group)
>> +{
>> +  long idx;
>> +  if (!_group_idx(self, group, &idx)) {
>> +    return NULL;
>> +  }
>> +  return _group_get_i(self, idx, Py_None);
>> +}
>> +
>> +
>> +static PyObject*
>> +match_group(MatchObject2* self, PyObject* args)
>> +{
>> +  long idx = 0;
>> +  Py_ssize_t nargs = PyTuple_GET_SIZE(args);
>> +  switch (nargs) {
>> +    case 1:
>> +      if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
>> +        return NULL;
>> +      }
>> +      // Fall through.
>> +    case 0:
>> +      return _group_get_i(self, idx, Py_None);
>> +    default:
>> +      PyObject* ret = PyTuple_New(nargs);
>> +      if (ret == NULL) {
>> +        return NULL;
>> +      }
>> +
>> +      for (int i = 0; i < nargs; i++) {
>> +        PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
>> +        if (group == NULL) {
>> +          Py_DECREF(ret);
>> +          return NULL;
>> +        }
>> +        PyTuple_SET_ITEM(ret, i, group);
>> +      }
>> +      return ret;
>> +  }
>> +}
>> +
>> +static PyObject*
>> +match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  static const char* kwlist[] = {
>> +    "default",
>> +    NULL};
>> +
>> +  PyObject* default_obj = Py_None;
>> +
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
>> +        &default_obj)) {
>> +    return NULL;
>> +  }
>> +
>> +  int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
>> +
>> +  PyObject* ret = PyTuple_New(ngroups);
>> +  if (ret == NULL) {
>> +    return NULL;
>> +  }
>> +
>> +  for (int i = 1; i <= ngroups; i++) {
>> +    PyObject* group = _group_get_i(self, i, default_obj);
>> +    if (group == NULL) {
>> +      Py_DECREF(ret);
>> +      return NULL;
>> +    }
>> +    PyTuple_SET_ITEM(ret, i-1, group);
>> +  }
>> +
>> +  return ret;
>> +}
>> +
>> +static PyObject*
>> +match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
>> +{
>> +  static const char* kwlist[] = {
>> +    "default",
>> +    NULL};
>> +
>> +  PyObject* default_obj = Py_None;
>> +
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
>> +        &default_obj)) {
>> +    return NULL;
>> +  }
>> +
>> +  PyObject* ret = PyDict_New();
>> +  if (ret == NULL) {
>> +    return NULL;
>> +  }
>> +
>> +  const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
>> +  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
>> +    PyObject* group = _group_get_i(self, it->second, default_obj);
>> +    if (group == NULL) {
>> +      Py_DECREF(ret);
>> +      return NULL;
>> +    }
>> +    // TODO: Group names with embedded zeroes?
>> +    int res = PyDict_SetItemString(ret, it->first.data(), group);
>> +    Py_DECREF(group);
>> +    if (res < 0) {
>> +      Py_DECREF(ret);
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  return ret;
>> +}
>> +
>> +enum span_mode_t { START, END, SPAN };
>> +
>> +static PyObject*
>> +_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
>> +{
>> +  long idx = 0;
>> +  PyObject* group = NULL;
>> +  if (!PyArg_UnpackTuple(args, name, 0, 1,
>> +        &group)) {
>> +    return NULL;
>> +  }
>> +  if (group != NULL) {
>> +    if (!_group_idx(self, group, &idx)) {
>> +      return NULL;
>> +    }
>> +  }
>> +
>> +  Py_ssize_t start = - 1;
>> +  Py_ssize_t end = - 1;
>> +
>> +  (void)_group_span(self, idx, &start, &end);
>> +  switch (mode) {
>> +    case START : return Py_BuildValue("n", start );
>> +    case END   : return Py_BuildValue("n", end   );
>> +    case SPAN:
>> +      return Py_BuildValue("nn", start, end);
>> +  }
>> +
>> +  // Make gcc happy.
>> +  return NULL;
>> +}
>> +
>> +static PyObject*
>> +match_start(MatchObject2* self, PyObject* args)
>> +{
>> +  return _do_span(self, args, "start", START);
>> +}
>> +
>> +static PyObject*
>> +match_end(MatchObject2* self, PyObject* args)
>> +{
>> +  return _do_span(self, args, "end", END);
>> +}
>> +
>> +static PyObject*
>> +match_span(MatchObject2* self, PyObject* args)
>> +{
>> +  return _do_span(self, args, "span", SPAN);
>> +}
>> +
>> +
>> +static PyObject*
>> +_compile(PyObject* self, PyObject* args, PyObject* kwds)
>> +{
>> +  static const char* kwlist[] = {
>> +    "pattern",
>> +    NULL};
>> +
>> +  PyObject* pattern;
>> +
>> +  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
>> +        &pattern)) {
>> +    return NULL;
>> +  }
>> +
>> +  return create_regexp(pattern);
>> +}
>> +
>> +static PyObject*
>> +escape(PyObject* self, PyObject* args)
>> +{
>> +  char *str;
>> +  Py_ssize_t len;
>> +
>> +  if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
>> +    return NULL;
>> +  }
>> +
>> +  std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
>> +
>> +  return PyString_FromStringAndSize(esc.c_str(), esc.size());
>> +}
>> +
>> +static PyMethodDef methods[] = {
>> +  {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
>> +  {"escape", (PyCFunction)escape, METH_VARARGS,
>> +   "Escape all potentially meaningful regexp characters."},
>> +  {NULL}  /* Sentinel */
>> +};
>> +
>> +PyMODINIT_FUNC
>> +init_re2(void)
>> +{
>> +  if (PyType_Ready(&Regexp_Type2) < 0) {
>> +    return;
>> +  }
>> +
>> +  if (PyType_Ready(&Match_Type2) < 0) {
>> +    return;
>> +  }
>> +
>> +  PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
>> +  if (sre_mod == NULL) {
>> +    return;
>> +  }
>> +  /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
>> +  if (error_class == NULL) {
>> +    return;
>> +  }
>> +
>> +  PyObject* mod = Py_InitModule("_re2", methods);
>> +
>> +  Py_INCREF(error_class);
>> +  PyModule_AddObject(mod, "error", error_class);
>> +}
>> diff --git a/mercurial/re2.py b/mercurial/re2.py
>> new file mode 100644
>> --- /dev/null
>> +++ b/mercurial/re2.py
>> @@ -0,0 +1,63 @@
>> +#!/usr/bin/env python
>> +
>> +# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
>> +#
>> +# Redistribution and use in source and binary forms, with or without
>> +# modification, are permitted provided that the following conditions
>> +# are met:
>> +# * Redistributions of source code must retain the above copyright
>> +#   notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +#   notice, this list of conditions and the following disclaimer in the
>> +#   documentation and/or other materials provided with the distribution.
>> +# * Neither the name of Facebook nor the names of its contributors
>> +#   may be used to endorse or promote products derived from this software
>> +#   without specific prior written permission.
>> +#
>> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>> +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> +
>> +import _re2
>> +
>> +__all__ = [
>> +    "error",
>> +    "escape",
>> +    "compile",
>> +    "search",
>> +    "match",
>> +    "fullmatch",
>> +    ]
>> +
>> +# Module-private compilation function, for future caching, other enhancements
>> +_compile = _re2._compile
>> +
>> +error = _re2.error
>> +escape = _re2.escape
>> +
>> +def compile(pattern):
>> +    "Compile a regular expression pattern, returning a pattern object."
>> +    return _compile(pattern)
>> +
>> +def search(pattern, string):
>> +    """Scan through string looking for a match to the pattern, returning
>> +    a match object, or None if no match was found."""
>> +    return _compile(pattern).search(string)
>> +
>> +def match(pattern, string):
>> +    """Try to apply the pattern at the start of the string, returning
>> +    a match object, or None if no match was found."""
>> +    return _compile(pattern).match(string)
>> +
>> +def fullmatch(pattern, string):
>> +    """Try to apply the pattern to the entire string, returning
>> +    a match object, or None if no match was found."""
>> +    return _compile(pattern).fullmatch(string)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Matt Mackall - Oct. 13, 2014, 8:49 p.m.
On Tue, 2014-09-02 at 14:18 -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1409591324 25200
> #      Mon Sep 01 10:08:44 2014 -0700
> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
> mercurial: add python re2 bindings
> 
> These bindings will enable packagers to build Mercurial with re2 support.

As mentioned on IRC, I think the right next step here is to try to get
this package properly hosted on PyPI and then mention it as
recommendation in our package config and docs. We can revisit the
bundle-with-hg approach if that proves unworkable for some reason.
Siddharth Agarwal - Dec. 13, 2014, 9:23 p.m.
On 10/13/2014 01:49 PM, Matt Mackall wrote:
> On Tue, 2014-09-02 at 14:18 -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0@fb.com>
>> # Date 1409591324 25200
>> #      Mon Sep 01 10:08:44 2014 -0700
>> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
>> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
>> mercurial: add python re2 bindings
>>
>> These bindings will enable packagers to build Mercurial with re2 support.

Following up on this.

> As mentioned on IRC, I think the right next step here is to try to get
> this package properly hosted on PyPI

This is done: https://pypi.python.org/pypi/fb-re2

> and then mention it as
> recommendation in our package config and docs.

I'm not exactly sure how to do this -- the extras_require stuff you
mentioned seems to be setuptools only. The 'requires' that distutils
supports doesn't appear to be PyPI aware. Did you have any other
approaches in mind?
Matt Mackall - Dec. 15, 2014, 8:35 p.m.
On Sat, 2014-12-13 at 13:23 -0800, Siddharth Agarwal wrote:
> On 10/13/2014 01:49 PM, Matt Mackall wrote:
> > On Tue, 2014-09-02 at 14:18 -0700, Siddharth Agarwal wrote:
> >> # HG changeset patch
> >> # User Siddharth Agarwal <sid0@fb.com>
> >> # Date 1409591324 25200
> >> #      Mon Sep 01 10:08:44 2014 -0700
> >> # Node ID f3b022e755dd7cf54e1ded55f0217bb1415348e2
> >> # Parent  a98f6def97bc4b1bc74e37ca207445639e60b1a5
> >> mercurial: add python re2 bindings
> >>
> >> These bindings will enable packagers to build Mercurial with re2 support.
> 
> Following up on this.
> 
> > As mentioned on IRC, I think the right next step here is to try to get
> > this package properly hosted on PyPI
> 
> This is done: https://pypi.python.org/pypi/fb-re2

Nice!

> > and then mention it as
> > recommendation in our package config and docs.
> 
> I'm not exactly sure how to do this -- the extras_require stuff you
> mentioned seems to be setuptools only. The 'requires' that distutils
> supports doesn't appear to be PyPI aware. Did you have any other
> approaches in mind?

I frankly have no idea what the difference is between
setuptools/distutils/eggs/pip/whatnot, and I'm pretty sure I sleep
better because of it. The 'extra_requires' thing is just what I found
when googling for "optional dependency pypi" or somesuch, and not any
arcana I actually possess.

We should probably:

- ask the PyPI community if something suitable exists
- recommend it here: http://mercurial.selenic.com/wiki/Packaging
- ping the Debian/Ubuntu/Redhat/Mac/Windows people about it
- maybe add a hint message to setup.py if it's not found
- mention it in debuginstall
Pierre-Yves David - Dec. 15, 2014, 8:39 p.m.
On 12/15/2014 12:35 PM, Matt Mackall wrote:
>>> > >and then mention it as
>>> > >recommendation in our package config and docs.
>> >
>> >I'm not exactly sure how to do this -- the extras_require stuff you
>> >mentioned seems to be setuptools only. The 'requires' that distutils
>> >supports doesn't appear to be PyPI aware. Did you have any other
>> >approaches in mind?
> I frankly have no idea what the difference is between
> setuptools/distutils/eggs/pip/whatnot, and I'm pretty sure I sleep
> better because of it. The 'extra_requires' thing is just what I found
> when googling for "optional dependency pypi" or somesuch, and not any
> arcana I actually possess.
>
> We should probably:
>
> - ask the PyPI community if something suitable exists
> - recommend it here:http://mercurial.selenic.com/wiki/Packaging
> - ping the Debian/Ubuntu/Redhat/Mac/Windows people about it
> - maybe add a hint message to setup.py if it's not found
> - mention it in debuginstall


I'm fairly sure real package system have a proper way to recommend a 
package (Debian has it for example). We can probably start with ensuring 
it is done in such tool first.
Siddharth Agarwal - Dec. 15, 2014, 8:42 p.m.
On 12/15/2014 12:39 PM, Pierre-Yves David wrote:
>
> I'm fairly sure real package system have a proper way to recommend a
> package (Debian has it for example). We can probably start with
> ensuring it is done in such tool first. 

The problem is that while re2 and re2-dev(el) are available in the
CentOS/Fedora/Debian/Ubuntu repos, the Python bindings aren't. None of
the packaging systems support referring to packages from PyPI, afaik.
That's the original problem I was trying to solve by checking the
package in.
Matt Mackall - Dec. 15, 2014, 10:06 p.m.
On Mon, 2014-12-15 at 12:42 -0800, Siddharth Agarwal wrote:
> On 12/15/2014 12:39 PM, Pierre-Yves David wrote:
> >
> > I'm fairly sure real package system have a proper way to recommend a
> > package (Debian has it for example). We can probably start with
> > ensuring it is done in such tool first. 
> 
> The problem is that while re2 and re2-dev(el) are available in the
> CentOS/Fedora/Debian/Ubuntu repos, the Python bindings aren't. None of
> the packaging systems support referring to packages from PyPI, afaik.

Nope (and that's probably for the best). However, they're much more
likely to take the step of actually packaging the thing now that it's in
PyPI and something wants to use it.

Patch

diff --git a/mercurial/pyre2/LICENSE b/mercurial/pyre2/LICENSE
new file mode 100644
--- /dev/null
+++ b/mercurial/pyre2/LICENSE
@@ -0,0 +1,25 @@ 
+Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimer in the
+  documentation and/or other materials provided with the distribution.
+* Neither the name of Facebook nor the names of its contributors
+  may be used to endorse or promote products derived from this software
+  without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/mercurial/pyre2/README.rst b/mercurial/pyre2/README.rst
new file mode 100644
--- /dev/null
+++ b/mercurial/pyre2/README.rst
@@ -0,0 +1,71 @@ 
+=====
+pyre2
+=====
+
+.. contents::
+
+Summary
+=======
+
+pyre2 is a Python extension that wraps
+`Google's RE2 regular expression library
+<http://code.google.com/p/re2/>`_.
+It implements many of the features of Python's built-in
+``re`` module with compatible interfaces.
+
+
+New Features
+============
+
+* ``Regexp`` objects have a ``fullmatch`` method that works like ``match``,
+  but anchors the match at both the start and the end.
+* ``Regexp`` objects have
+  ``test_search``, ``test_match``, and ``test_fullmatch``
+  methods that work like ``search``, ``match``, and ``fullmatch``,
+  but only return ``True`` or ``False`` to indicate
+  whether the match was successful.
+  These methods should be faster than the full versions,
+  especially for patterns with capturing groups.
+
+
+Missing Features
+================
+
+* No substitution methods.
+* No flags.
+* No ``split``, ``findall``, or ``finditer``.
+* No top-level convenience functions like ``search`` and ``match``.
+  (Just use compile.)
+* No compile cache.
+  (If you care enough about performance to use RE2,
+  you probably care enough to cache your own patterns.)
+* No ``lastindex`` or ``lastgroup`` on ``Match`` objects.
+
+
+Current Status
+==============
+
+pyre2 has only received basic testing,
+and I am by no means a Python extension expert,
+so it is quite possible that it contains bugs.
+I'd guess the most likely are reference leaks in error cases.
+
+RE2 doesn't build with fPIC, so I had to bulid it with
+
+::
+
+  make CFLAGS='-fPIC -c -Wall -Wno-sign-compare -O3 -g -I.'
+
+I also had to add it to my compiler search path when building the module
+with a command like
+
+::
+
+  env CPPFLAGS='-I/path/to/re2' LDFLAGS='-L/path/to/re2/obj' ./setup.py build
+
+
+Contact
+=======
+
+You can file bug reports on GitHub, or email the author:
+David Reiss <dreiss@facebook.com>.
diff --git a/mercurial/pyre2/_re2.cc b/mercurial/pyre2/_re2.cc
new file mode 100644
--- /dev/null
+++ b/mercurial/pyre2/_re2.cc
@@ -0,0 +1,753 @@ 
+/*
+ * Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Facebook nor the names of its contributors
+ *   may be used to endorse or promote products derived from this software
+ *   without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+ * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+
+#include <cstddef>
+
+#include <string>
+#include <new>
+using std::nothrow;
+
+#include <re2/re2.h>
+using re2::RE2;
+using re2::StringPiece;
+
+
+typedef struct _RegexpObject2 {
+  PyObject_HEAD
+  // __dict__.  Simpler than implementing getattr and possibly faster.
+  PyObject* attr_dict;
+  RE2* re2_obj;
+} RegexpObject2;
+
+typedef struct _MatchObject2 {
+  PyObject_HEAD
+  // __dict__.  Simpler than implementing getattr and possibly faster.
+  PyObject* attr_dict;
+  // Cache of __dict__["re"] and __dict__["string", which are used for group()
+  // calls. These fields do *not* own their own references.  They piggyback on
+  // the references in attr_dict.
+  PyObject* re;
+  PyObject* string;
+  // There are several possible approaches to storing the matched groups:
+  // 1. Fully materialize the groups tuple at match time.
+  // 2. Cache allocate PyString objects when groups are requested.
+  // 3. Always allocate new PyStrings on demand.
+  // I've chosen to go with #3.  It's the simplest, and I'm pretty sure it's
+  // optimal in all cases where no group is fetched more than once.
+  StringPiece* groups;
+} MatchObject2;
+
+
+// Imported from sre_constants.
+static PyObject* error_class;
+
+
+// Forward declarations of methods, creators, and destructors.
+static void regexp_dealloc(RegexpObject2* self);
+static PyObject* create_regexp(PyObject* pattern);
+static PyObject* regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds);
+static void match_dealloc(MatchObject2* self);
+static PyObject* create_match(PyObject* re, PyObject* string, long pos, long endpos, StringPiece* groups);
+static PyObject* match_group(MatchObject2* self, PyObject* args);
+static PyObject* match_groups(MatchObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds);
+static PyObject* match_start(MatchObject2* self, PyObject* args);
+static PyObject* match_end(MatchObject2* self, PyObject* args);
+static PyObject* match_span(MatchObject2* self, PyObject* args);
+
+
+static PyMethodDef regexp_methods[] = {
+  {"search", (PyCFunction)regexp_search, METH_VARARGS | METH_KEYWORDS,
+    "search(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Scan through string looking for a match, and return a corresponding\n"
+    "    MatchObject instance. Return None if no position in the string matches."
+  },
+  {"match", (PyCFunction)regexp_match, METH_VARARGS | METH_KEYWORDS,
+    "match(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Matches zero or more characters at the beginning of the string"
+  },
+  {"fullmatch", (PyCFunction)regexp_fullmatch, METH_VARARGS | METH_KEYWORDS,
+    "fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Matches the entire string"
+  },
+  {"test_search", (PyCFunction)regexp_test_search, METH_VARARGS | METH_KEYWORDS,
+    "test_search(string[, pos[, endpos]]) --> bool.\n"
+    "    Like 'search', but only returns whether a match was found."
+  },
+  {"test_match", (PyCFunction)regexp_test_match, METH_VARARGS | METH_KEYWORDS,
+    "test_match(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Like 'match', but only returns whether a match was found."
+  },
+  {"test_fullmatch", (PyCFunction)regexp_test_fullmatch, METH_VARARGS | METH_KEYWORDS,
+    "test_fullmatch(string[, pos[, endpos]]) --> match object or None.\n"
+    "    Like 'fullmatch', but only returns whether a match was found."
+  },
+  {NULL}  /* Sentinel */
+};
+
+static PyMethodDef match_methods[] = {
+  {"group", (PyCFunction)match_group, METH_VARARGS,
+    NULL
+  },
+  {"groups", (PyCFunction)match_groups, METH_VARARGS | METH_KEYWORDS,
+    NULL
+  },
+  {"groupdict", (PyCFunction)match_groupdict, METH_VARARGS | METH_KEYWORDS,
+    NULL
+  },
+  {"start", (PyCFunction)match_start, METH_VARARGS,
+    NULL
+  },
+  {"end", (PyCFunction)match_end, METH_VARARGS,
+    NULL
+  },
+  {"span", (PyCFunction)match_span, METH_VARARGS,
+    NULL
+  },
+  {NULL}  /* Sentinel */
+};
+
+
+// Simple method to block setattr.
+static int
+_no_setattr(PyObject* obj, PyObject* name, PyObject* v) {
+  (void)name;
+  (void)v;
+	PyErr_Format(PyExc_AttributeError,
+      "'%s' object attributes are read-only",
+      obj->ob_type->tp_name);
+  return -1;
+}
+
+
+static PyTypeObject Regexp_Type2 = {
+  PyObject_HEAD_INIT(NULL)
+  0,                           /*ob_size*/
+  "_re2.RE2_Regexp",           /*tp_name*/
+  sizeof(RegexpObject2),       /*tp_basicsize*/
+  0,                           /*tp_itemsize*/
+  (destructor)regexp_dealloc,  /*tp_dealloc*/
+  0,                           /*tp_print*/
+  0,                           /*tp_getattr*/
+  0,                           /*tp_setattr*/
+  0,                           /*tp_compare*/
+  0,                           /*tp_repr*/
+  0,                           /*tp_as_number*/
+  0,                           /*tp_as_sequence*/
+  0,                           /*tp_as_mapping*/
+  0,                           /*tp_hash*/
+  0,                           /*tp_call*/
+  0,                           /*tp_str*/
+  0,                           /*tp_getattro*/
+  _no_setattr,                 /*tp_setattro*/
+  0,                           /*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
+  "RE2 regexp objects",        /*tp_doc*/
+  0,                           /*tp_traverse*/
+  0,                           /*tp_clear*/
+  0,                           /*tp_richcompare*/
+  0,                           /*tp_weaklistoffset*/
+  0,                           /*tp_iter*/
+  0,                           /*tp_iternext*/
+  regexp_methods,              /*tp_methods*/
+  0,                           /*tp_members*/
+  0,                           /*tp_getset*/
+  0,                           /*tp_base*/
+  0,                           /*tp_dict*/
+  0,                           /*tp_descr_get*/
+  0,                           /*tp_descr_set*/
+  offsetof(RegexpObject2, attr_dict),  /*tp_dictoffset*/
+  0,                           /*tp_init*/
+  0,                           /*tp_alloc*/
+  0,                           /*tp_new*/
+};
+
+static PyTypeObject Match_Type2 = {
+  PyObject_HEAD_INIT(NULL)
+  0,                           /*ob_size*/
+  "_re2.RE2_Match",            /*tp_name*/
+  sizeof(MatchObject2),        /*tp_basicsize*/
+  0,                           /*tp_itemsize*/
+  (destructor)match_dealloc,   /*tp_dealloc*/
+  0,                           /*tp_print*/
+  0,                           /*tp_getattr*/
+  0,                           /*tp_setattr*/
+  0,                           /*tp_compare*/
+  0,                           /*tp_repr*/
+  0,                           /*tp_as_number*/
+  0,                           /*tp_as_sequence*/
+  0,                           /*tp_as_mapping*/
+  0,                           /*tp_hash*/
+  0,                           /*tp_call*/
+  0,                           /*tp_str*/
+  0,                           /*tp_getattro*/
+  _no_setattr,                 /*tp_setattro*/
+  0,                           /*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,          /*tp_flags*/
+  "RE2 match objects",         /*tp_doc*/
+  0,                           /*tp_traverse*/
+  0,                           /*tp_clear*/
+  0,                           /*tp_richcompare*/
+  0,                           /*tp_weaklistoffset*/
+  0,                           /*tp_iter*/
+  0,                           /*tp_iternext*/
+  match_methods,               /*tp_methods*/
+  0,                           /*tp_members*/
+  0,                           /*tp_getset*/
+  0,                           /*tp_base*/
+  0,                           /*tp_dict*/
+  0,                           /*tp_descr_get*/
+  0,                           /*tp_descr_set*/
+  offsetof(MatchObject2, attr_dict),  /*tp_dictoffset*/
+  0,                           /*tp_init*/
+  0,                           /*tp_alloc*/
+  0,                           /*tp_new*/
+};
+
+
+static void
+regexp_dealloc(RegexpObject2* self)
+{
+  delete self->re2_obj;
+  Py_XDECREF(self->attr_dict);
+  PyObject_Del(self);
+}
+
+static PyObject*
+create_regexp(PyObject* pattern)
+{
+  RegexpObject2* regexp = PyObject_New(RegexpObject2, &Regexp_Type2);
+  if (regexp == NULL) {
+    return NULL;
+  }
+  regexp->re2_obj = NULL;
+  regexp->attr_dict = NULL;
+
+  const char* raw_pattern = PyString_AS_STRING(pattern);
+  Py_ssize_t len_pattern = PyString_GET_SIZE(pattern);
+
+  RE2::Options options;
+  options.set_log_errors(false);
+
+  regexp->re2_obj = new(nothrow) RE2(StringPiece(raw_pattern, (int) len_pattern), options);
+
+  if (regexp->re2_obj == NULL) {
+    PyErr_NoMemory();
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  if (!regexp->re2_obj->ok()) {
+    long code = (long)regexp->re2_obj->error_code();
+    const std::string& msg = regexp->re2_obj->error();
+    PyObject* value = Py_BuildValue("ls#", code, msg.data(), msg.length());
+    if (value == NULL) {
+      Py_DECREF(regexp);
+      return NULL;
+    }
+    PyErr_SetObject(error_class, value);
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  PyObject* groupindex = PyDict_New();
+  if (groupindex == NULL) {
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  // Build up the attr_dict early so regexp can take ownership of our reference
+  // to groupindex.
+  regexp->attr_dict = Py_BuildValue("{sisNsO}",
+      "groups", regexp->re2_obj->NumberOfCapturingGroups(),
+      "groupindex", groupindex,
+      "pattern", pattern);
+  if (regexp->attr_dict == NULL) {
+    Py_DECREF(regexp);
+    return NULL;
+  }
+
+  const std::map<std::string, int>& name_map = regexp->re2_obj->NamedCapturingGroups();
+  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
+    PyObject* index = PyInt_FromLong(it->second);
+    if (index == NULL) {
+      Py_DECREF(regexp);
+      return NULL;
+    }
+    int res = PyDict_SetItemString(groupindex, it->first.c_str(), index);
+    Py_DECREF(index);
+    if (res < 0) {
+      Py_DECREF(regexp);
+      return NULL;
+    }
+  }
+
+  return (PyObject*)regexp;
+}
+
+static PyObject*
+_do_search(RegexpObject2* self, PyObject* args, PyObject* kwds, RE2::Anchor anchor, bool return_match)
+{
+  PyObject* string;
+  const char* subject;
+  Py_ssize_t slen;
+  long pos = 0;
+  long endpos = LONG_MAX;
+
+  static const char* kwlist[] = {
+    "string",
+    "pos",
+    "endpos",
+    NULL};
+
+  // Using O! instead of s# here, because we want to stash the original
+  // PyObject* in the match object on a successful match.
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O!|ll", (char**)kwlist,
+        &PyString_Type, &string,
+        &pos, &endpos)) {
+    return NULL;
+  }
+
+  subject = PyString_AS_STRING(string);
+  slen = PyString_GET_SIZE(string);
+  if (pos < 0) pos = 0;
+  if (pos > slen) pos = slen;
+  if (endpos < pos) endpos = pos;
+  if (endpos > slen) endpos = slen;
+
+  // Don't bother allocating these if we are just doing a test.
+  int n_groups = 0;
+  StringPiece* groups = NULL;
+  if (return_match) {
+    n_groups = self->re2_obj->NumberOfCapturingGroups() + 1;
+    groups = new(nothrow) StringPiece[n_groups];
+
+    if (groups == NULL) {
+      PyErr_NoMemory();
+      return NULL;
+    }
+  }
+
+  bool matched = self->re2_obj->Match(
+      StringPiece(subject, (int) slen),
+      (int) pos,
+      (int) endpos,
+      anchor,
+      groups,
+      n_groups);
+
+  if (!return_match) {
+    if (matched) {
+      Py_RETURN_TRUE;
+    }
+    Py_RETURN_FALSE;
+  }
+
+  if (!matched) {
+    delete[] groups;
+    Py_RETURN_NONE;
+  }
+
+  // create_match is going to Py_BuildValue the pos and endpos into
+  // PyObjects.  We could optimize the case where pos and/or endpos were
+  // explicitly passed in by forwarding the existing PyObjects.
+  // That requires much more intricate code, though.
+  return create_match((PyObject*)self, string, pos, endpos, groups);
+}
+
+static PyObject*
+regexp_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::UNANCHORED, true);
+}
+
+static PyObject*
+regexp_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_START, true);
+}
+
+static PyObject*
+regexp_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, true);
+}
+
+static PyObject*
+regexp_test_search(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::UNANCHORED, false);
+}
+
+static PyObject*
+regexp_test_match(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_START, false);
+}
+
+static PyObject*
+regexp_test_fullmatch(RegexpObject2* self, PyObject* args, PyObject* kwds)
+{
+  return _do_search(self, args, kwds, RE2::ANCHOR_BOTH, false);
+}
+
+
+static void
+match_dealloc(MatchObject2* self)
+{
+  delete[] self->groups;
+  Py_XDECREF(self->attr_dict);
+  PyObject_Del(self);
+}
+
+static PyObject*
+create_match(PyObject* re, PyObject* string,
+    long pos, long endpos,
+    StringPiece* groups)
+{
+  MatchObject2* match = PyObject_New(MatchObject2, &Match_Type2);
+  if (match == NULL) {
+    delete[] groups;
+    return NULL;
+  }
+  match->attr_dict = NULL;
+  match->groups = groups;
+  match->re = re;
+  match->string = string;
+
+  match->attr_dict = Py_BuildValue("{sOsOslsl}",
+      "re", re,
+      "string", string,
+      "pos", pos,
+      "endpos", endpos);
+  if (match->attr_dict == NULL) {
+    Py_DECREF(match);
+    return NULL;
+  }
+
+  return (PyObject*)match;
+}
+
+/**
+ * Attempt to convert an untrusted group index (PyObject* group) into
+ * a trusted one (*idx_p).  Return false on failure (exception).
+ */
+static bool
+_group_idx(MatchObject2* self, PyObject* group, long* idx_p)
+{
+  if (group == NULL) {
+    return false;
+  }
+  PyErr_Clear(); // Is this necessary?
+  long idx = PyInt_AsLong(group);
+  if (idx == -1 && PyErr_Occurred() != NULL) {
+    return false;
+  }
+  // TODO: Consider caching NumberOfCapturingGroups.
+  if (idx < 0 || idx > ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups()) {
+    PyErr_SetString(PyExc_IndexError, "no such group");
+    return false;
+  }
+  *idx_p = idx;
+  return true;
+}
+
+/**
+ * Extract the start and end indexes of a pre-checked group number.
+ * Sets both to -1 if it did not participate in the match.
+ */
+static bool
+_group_span(MatchObject2* self, long idx, Py_ssize_t* o_start, Py_ssize_t* o_end)
+{
+  // "idx" is expected to be verified.
+  StringPiece& piece = self->groups[idx];
+  if (piece.data() == NULL) {
+    *o_start = -1;
+    *o_end = -1;
+    return false;
+  }
+  Py_ssize_t start = piece.data() - PyString_AS_STRING(self->string);
+  *o_start = start;
+  *o_end = start + piece.length();
+  return true;
+}
+
+/**
+ * Return a pre-checked group number as a string, or default_obj
+ * if it didn't participate in the match.
+ */
+static PyObject*
+_group_get_i(MatchObject2* self, long idx, PyObject* default_obj)
+{
+  Py_ssize_t start;
+  Py_ssize_t end;
+  if (!_group_span(self, idx, &start, &end)) {
+    Py_INCREF(default_obj);
+    return default_obj;
+  }
+  return PySequence_GetSlice(self->string, start, end);
+}
+
+/**
+ * Return n un-checked group number as a string.
+ */
+static PyObject*
+_group_get_o(MatchObject2* self, PyObject* group)
+{
+  long idx;
+  if (!_group_idx(self, group, &idx)) {
+    return NULL;
+  }
+  return _group_get_i(self, idx, Py_None);
+}
+
+
+static PyObject*
+match_group(MatchObject2* self, PyObject* args)
+{
+  long idx = 0;
+  Py_ssize_t nargs = PyTuple_GET_SIZE(args);
+  switch (nargs) {
+    case 1:
+      if (!_group_idx(self, PyTuple_GET_ITEM(args, 0), &idx)) {
+        return NULL;
+      }
+      // Fall through.
+    case 0:
+      return _group_get_i(self, idx, Py_None);
+    default:
+      PyObject* ret = PyTuple_New(nargs);
+      if (ret == NULL) {
+        return NULL;
+      }
+
+      for (int i = 0; i < nargs; i++) {
+        PyObject* group = _group_get_o(self, PyTuple_GET_ITEM(args, i));
+        if (group == NULL) {
+          Py_DECREF(ret);
+          return NULL;
+        }
+        PyTuple_SET_ITEM(ret, i, group);
+      }
+      return ret;
+  }
+}
+
+static PyObject*
+match_groups(MatchObject2* self, PyObject* args, PyObject* kwds)
+{
+  static const char* kwlist[] = {
+    "default",
+    NULL};
+
+  PyObject* default_obj = Py_None;
+
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
+        &default_obj)) {
+    return NULL;
+  }
+
+  int ngroups = ((RegexpObject2*)self->re)->re2_obj->NumberOfCapturingGroups();
+
+  PyObject* ret = PyTuple_New(ngroups);
+  if (ret == NULL) {
+    return NULL;
+  }
+
+  for (int i = 1; i <= ngroups; i++) {
+    PyObject* group = _group_get_i(self, i, default_obj);
+    if (group == NULL) {
+      Py_DECREF(ret);
+      return NULL;
+    }
+    PyTuple_SET_ITEM(ret, i-1, group);
+  }
+
+  return ret;
+}
+
+static PyObject*
+match_groupdict(MatchObject2* self, PyObject* args, PyObject* kwds)
+{
+  static const char* kwlist[] = {
+    "default",
+    NULL};
+
+  PyObject* default_obj = Py_None;
+
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "|O", (char**)kwlist,
+        &default_obj)) {
+    return NULL;
+  }
+
+  PyObject* ret = PyDict_New();
+  if (ret == NULL) {
+    return NULL;
+  }
+
+  const std::map<std::string, int>& name_map = ((RegexpObject2*)self->re)->re2_obj->NamedCapturingGroups();
+  for (std::map<std::string, int>::const_iterator it = name_map.begin(); it != name_map.end(); ++it) {
+    PyObject* group = _group_get_i(self, it->second, default_obj);
+    if (group == NULL) {
+      Py_DECREF(ret);
+      return NULL;
+    }
+    // TODO: Group names with embedded zeroes?
+    int res = PyDict_SetItemString(ret, it->first.data(), group);
+    Py_DECREF(group);
+    if (res < 0) {
+      Py_DECREF(ret);
+      return NULL;
+    }
+  }
+
+  return ret;
+}
+
+enum span_mode_t { START, END, SPAN };
+
+static PyObject*
+_do_span(MatchObject2* self, PyObject* args, const char* name, span_mode_t mode)
+{
+  long idx = 0;
+  PyObject* group = NULL;
+  if (!PyArg_UnpackTuple(args, name, 0, 1,
+        &group)) {
+    return NULL;
+  }
+  if (group != NULL) {
+    if (!_group_idx(self, group, &idx)) {
+      return NULL;
+    }
+  }
+
+  Py_ssize_t start = - 1;
+  Py_ssize_t end = - 1;
+
+  (void)_group_span(self, idx, &start, &end);
+  switch (mode) {
+    case START : return Py_BuildValue("n", start );
+    case END   : return Py_BuildValue("n", end   );
+    case SPAN:
+      return Py_BuildValue("nn", start, end);
+  }
+
+  // Make gcc happy.
+  return NULL;
+}
+
+static PyObject*
+match_start(MatchObject2* self, PyObject* args)
+{
+  return _do_span(self, args, "start", START);
+}
+
+static PyObject*
+match_end(MatchObject2* self, PyObject* args)
+{
+  return _do_span(self, args, "end", END);
+}
+
+static PyObject*
+match_span(MatchObject2* self, PyObject* args)
+{
+  return _do_span(self, args, "span", SPAN);
+}
+
+
+static PyObject*
+_compile(PyObject* self, PyObject* args, PyObject* kwds)
+{
+  static const char* kwlist[] = {
+    "pattern",
+    NULL};
+
+  PyObject* pattern;
+
+  if (!PyArg_ParseTupleAndKeywords(args, kwds, "O", (char**)kwlist,
+        &pattern)) {
+    return NULL;
+  }
+
+  return create_regexp(pattern);
+}
+
+static PyObject*
+escape(PyObject* self, PyObject* args)
+{
+  char *str;
+  Py_ssize_t len;
+
+  if (!PyArg_ParseTuple(args, "s#:escape", &str, &len)) {
+    return NULL;
+  }
+
+  std::string esc(RE2::QuoteMeta(StringPiece(str, (int) len)));
+
+  return PyString_FromStringAndSize(esc.c_str(), esc.size());
+}
+
+static PyMethodDef methods[] = {
+  {"_compile", (PyCFunction)_compile, METH_VARARGS | METH_KEYWORDS, NULL},
+  {"escape", (PyCFunction)escape, METH_VARARGS,
+   "Escape all potentially meaningful regexp characters."},
+  {NULL}  /* Sentinel */
+};
+
+PyMODINIT_FUNC
+init_re2(void)
+{
+  if (PyType_Ready(&Regexp_Type2) < 0) {
+    return;
+  }
+
+  if (PyType_Ready(&Match_Type2) < 0) {
+    return;
+  }
+
+  PyObject* sre_mod = PyImport_ImportModuleNoBlock("sre_constants");
+  if (sre_mod == NULL) {
+    return;
+  }
+  /* static global */ error_class = PyObject_GetAttrString(sre_mod, "error");
+  if (error_class == NULL) {
+    return;
+  }
+
+  PyObject* mod = Py_InitModule("_re2", methods);
+
+  Py_INCREF(error_class);
+  PyModule_AddObject(mod, "error", error_class);
+}
diff --git a/mercurial/re2.py b/mercurial/re2.py
new file mode 100644
--- /dev/null
+++ b/mercurial/re2.py
@@ -0,0 +1,63 @@ 
+#!/usr/bin/env python
+
+# Copyright (c) 2010, David Reiss and Facebook, Inc. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in the
+#   documentation and/or other materials provided with the distribution.
+# * Neither the name of Facebook nor the names of its contributors
+#   may be used to endorse or promote products derived from this software
+#   without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+import _re2
+
+__all__ = [
+    "error",
+    "escape",
+    "compile",
+    "search",
+    "match",
+    "fullmatch",
+    ]
+
+# Module-private compilation function, for future caching, other enhancements
+_compile = _re2._compile
+
+error = _re2.error
+escape = _re2.escape
+
+def compile(pattern):
+    "Compile a regular expression pattern, returning a pattern object."
+    return _compile(pattern)
+
+def search(pattern, string):
+    """Scan through string looking for a match to the pattern, returning
+    a match object, or None if no match was found."""
+    return _compile(pattern).search(string)
+
+def match(pattern, string):
+    """Try to apply the pattern at the start of the string, returning
+    a match object, or None if no match was found."""
+    return _compile(pattern).match(string)
+
+def fullmatch(pattern, string):
+    """Try to apply the pattern to the entire string, returning
+    a match object, or None if no match was found."""
+    return _compile(pattern).fullmatch(string)