Patchwork [1,of,6,foldmap-in-C,V2] encoding: define an enum that specifies what normcase does to ASCII strings

login
register
mail settings
Submitter Siddharth Agarwal
Date April 2, 2015, 10:51 p.m.
Message ID <137677d5096e0e7f1f0f.1428015110@devbig136.prn2.facebook.com>
Download mbox | patch
Permalink /patch/8459/
State Accepted
Headers show

Comments

Siddharth Agarwal - April 2, 2015, 10:51 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1427872870 25200
#      Wed Apr 01 00:21:10 2015 -0700
# Node ID 137677d5096e0e7f1f0f19d795a0e54310f65a4a
# Parent  d7cf8102bf09a905662c1018e60a06e417a08af3
encoding: define an enum that specifies what normcase does to ASCII strings

For C code we don't want to pay the cost of calling into a Python function for
the common case of ASCII filenames. However, while on most POSIX platforms we
normalize filenames by lowercasing them, on Windows we uppercase them. We
define an enum here indicating the direction that filenames should be
normalized as. Some platforms (notably Cygwin) have more complicated
normalization behavior -- we add a case for that too.

In upcoming patches we'll also define a fallback function that is called if the
string has non-ASCII bytes.

This enum will be replicated in the C code to make foldmaps. There's
unfortunately no nice way to avoid that -- we can't have encoding import
parsers because of import cycles. One way might be to have parsers import
encoding, but accessing Python modules from C code is just awkward.

The name 'normcasespecs' was chosen to indicate that this is merely an integer
that specifies a behavior, not a function. The name was pluralized since in
upcoming patches we'll introduce 'normcasespec' which will be one of these
values.

Patch

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -354,6 +354,19 @@  def upper(s):
     except LookupError, k:
         raise error.Abort(k, hint="please check your locale settings")
 
+class normcasespecs(object):
+    '''what a platform's normcase does to ASCII strings
+
+    This is specified per platform, and should be consistent with what normcase
+    on that platform actually does.
+
+    lower: normcase lowercases ASCII strings
+    upper: normcase uppercases ASCII strings
+    other: the fallback function should always be called'''
+    lower = -1
+    upper = 1
+    other = 0
+
 _jsonmap = {}
 
 def jsonescape(s):