Submitter | Boris Feld |
---|---|
Date | Nov. 21, 2018, 6:33 p.m. |
Message ID | <03b60ccca50c77a552de.1542825236@localhost.localdomain> |
Download | mbox | patch |
Permalink | /patch/36702/ |
State | Accepted |
Headers | show |
Comments
On Wed, 21 Nov 2018 19:33:56 +0100, Boris Feld wrote: > # HG changeset patch > # User Boris Feld <boris.feld@octobus.net> > # Date 1542651672 0 > # Mon Nov 19 18:21:12 2018 +0000 > # Node ID 03b60ccca50c77a552de85ac3402c5174539f150 > # Parent aeea6de7f1a88e2e710ee01299fe4fe9e1f0d336 > # EXP-Topic perf-ignore > # Available At https://bitbucket.org/octobus/mercurial-devel/ > # hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 03b60ccca50c > match: avoid translating glob to matcher multiple times for large sets > +MAXRESIZE = 20000 > + > +def _groupregexps(regexps): > + return '(?:%s)' % '|'.join(regexps) > + > def _buildregexmatch(kindpats, globsuffix): > """Build a match function from a list of kinds and kindpats, > return regexp string and a matcher function.""" > try: > - regex = '(?:%s)' % '|'.join([_regex(k, p, globsuffix) > - for (k, p, s) in kindpats]) > - if len(regex) > 20000: > - raise OverflowError > - return regex, _rematcher(regex) > - except OverflowError: > - # We're using a Python with a tiny regex engine and we > - # made it explode, so we'll divide the pattern list in two > - # until it works > - l = len(kindpats) > - if l < 2: > - raise > - regexa, a = _buildregexmatch(kindpats[:l//2], globsuffix) > - regexb, b = _buildregexmatch(kindpats[l//2:], globsuffix) > - return regex, lambda s: a(s) or b(s) > + allgroups = [] > + regexps = [_regex(k, p, globsuffix) for (k, p, s) in kindpats] > + fullregexp = _groupregexps(regexps) > + > + startidx = 0 > + groupsize = 3 > + for idx, r in enumerate(regexps): > + piecesize = len(r) > + if piecesize > MAXRESIZE: It leaves an empty group if MAXRESIZE - 3 <= piecesize < MAXRESIZE. > + raise OverflowError Maybe we should make it raise Abort as we no longer catch it. Can you add a test? > + elif (groupsize + 1 + piecesize) > MAXRESIZE: > + group = regexps[startidx:idx] > + allgroups.append(_groupregexps(group)) > + startidx = idx > + groupsize = 3 > + groupsize += piecesize Perhaps, it should be piecesize + 1? > + > + if startidx == 0: > + func = _rematcher(fullregexp) > + else: > + group = regexps[startidx:idx] It should be regexps[startidx:]. idx is the last index, not the length. > + allgroups.append(_groupregexps(group)) > + allmatchers = [_rematcher(g) for g in allgroups] > + func = lambda s: any(m(s) for m in allmatchers) > + return fullregexp, func Just an idea. I think fullregexp could be formatted as (?:(?:..)|(?:..)|..) to reflect the internal structure. That will help debugging.
Patch
diff --git a/mercurial/match.py b/mercurial/match.py --- a/mercurial/match.py +++ b/mercurial/match.py @@ -1184,25 +1184,40 @@ def _buildmatch(kindpats, globsuffix, li else: return regex, lambda f: any(mf(f) for mf in matchfuncs) +MAXRESIZE = 20000 + +def _groupregexps(regexps): + return '(?:%s)' % '|'.join(regexps) + def _buildregexmatch(kindpats, globsuffix): """Build a match function from a list of kinds and kindpats, return regexp string and a matcher function.""" try: - regex = '(?:%s)' % '|'.join([_regex(k, p, globsuffix) - for (k, p, s) in kindpats]) - if len(regex) > 20000: - raise OverflowError - return regex, _rematcher(regex) - except OverflowError: - # We're using a Python with a tiny regex engine and we - # made it explode, so we'll divide the pattern list in two - # until it works - l = len(kindpats) - if l < 2: - raise - regexa, a = _buildregexmatch(kindpats[:l//2], globsuffix) - regexb, b = _buildregexmatch(kindpats[l//2:], globsuffix) - return regex, lambda s: a(s) or b(s) + allgroups = [] + regexps = [_regex(k, p, globsuffix) for (k, p, s) in kindpats] + fullregexp = _groupregexps(regexps) + + startidx = 0 + groupsize = 3 + for idx, r in enumerate(regexps): + piecesize = len(r) + if piecesize > MAXRESIZE: + raise OverflowError + elif (groupsize + 1 + piecesize) > MAXRESIZE: + group = regexps[startidx:idx] + allgroups.append(_groupregexps(group)) + startidx = idx + groupsize = 3 + groupsize += piecesize + + if startidx == 0: + func = _rematcher(fullregexp) + else: + group = regexps[startidx:idx] + allgroups.append(_groupregexps(group)) + allmatchers = [_rematcher(g) for g in allgroups] + func = lambda s: any(m(s) for m in allmatchers) + return fullregexp, func except re.error: for k, p, s in kindpats: try: