Patchwork D12089: encoding: fix trim() to be O(n) instead of O(n^2)

login
register
mail settings
Submitter phabricator
Date Jan. 26, 2022, 6:29 p.m.
Message ID <differential-rev-PHID-DREV-px2qa7rye7nsvm6gyjon-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/50415/
State New
Headers show

Comments

phabricator - Jan. 26, 2022, 6:29 p.m.
martinvonz created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  `encoding.trim()` iterated over the possible lengths smaller than the
  input and created a slice for each. It then calculated the column
  width of the result, which is of course O(n), so the overall algorithm
  was O(n). This patch rewrites it to iterate over the unicode
  characters, keeping track of the length so far. Also, the old
  algorithm started from the end of the string, which made it much worse
  when the input is large and the limit is small (such as the typical 72
  we pass to it).
  
  You can time it by running something like this:
  
    time python3 -c 'from mercurial.utils import stringutil; print(stringutil.ellipsis(b"0123456789" * 1000, 5))'
  
  That drops from 4.05 s to 83 ms with this patch (and most of that is
  of course startup time).

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D12089

AFFECTED FILES
  mercurial/encoding.py

CHANGE DETAILS




To: martinvonz, #hg-reviewers
Cc: mercurial-patches, mercurial-devel

Patch

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -511,17 +511,21 @@ 
     if width <= 0:  # no enough room even for ellipsis
         return ellipsis[: width + len(ellipsis)]
 
+    chars = list(u)
     if leftside:
-        uslice = lambda i: u[i:]
-        concat = lambda s: ellipsis + s
-    else:
-        uslice = lambda i: u[:-i]
-        concat = lambda s: s + ellipsis
-    for i in pycompat.xrange(1, len(u)):
-        usub = uslice(i)
-        if ucolwidth(usub) <= width:
-            return concat(usub.encode(_sysstr(encoding)))
-    return ellipsis  # no enough room for multi-column characters
+        chars.reverse()
+    width_so_far = 0
+    for i, c in enumerate(chars):
+        width_so_far += ucolwidth(c)
+        if width_so_far > width:
+            break
+    chars = chars[:i]
+    if leftside:
+        chars.reverse()
+    u = u''.join(chars).encode(_sysstr(encoding))
+    if leftside:
+        return ellipsis + u
+    return u + ellipsis
 
 
 class normcasespecs(object):