Patchwork D6756: rust-utils: add normalize_case util to mirror Python one

login
register
mail settings
Submitter phabricator
Date Aug. 22, 2019, 12:54 p.m.
Message ID <differential-rev-PHID-DREV-7a4psukywz2wsib7yyvn-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/41381/
State Superseded
Headers show

Comments

phabricator - Aug. 22, 2019, 12:54 p.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  While we still don't handle filenames properly cross-platform, this at least
  sticks closer to the Python behavior.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6756

AFFECTED FILES
  rust/hg-core/src/dirstate/dirstate_map.rs
  rust/hg-core/src/utils/files.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
Yuya Nishihara - Sept. 1, 2019, 3 a.m.
> +/// TODO improve handling of utf8 file names. Our overall strategy for
> +/// filenames has to be revisited anyway, since Windows is UTF-16.
> +pub fn normalize_case(bytes: &[u8]) -> Vec<u8> {
> +    #[cfg(windows)] // NTFS compares via upper()
> +    return bytes.to_ascii_uppercase();
> +    #[cfg(unix)]
> +    bytes.to_ascii_lowercase()
> +}

HFS+ has more complex rules, and some were the source of security issue.
phabricator - Sept. 1, 2019, 3:02 a.m.
yuja added a comment.


  > +/// TODO improve handling of utf8 file names. Our overall strategy for
  > +/// filenames has to be revisited anyway, since Windows is UTF-16.
  > +pub fn normalize_case(bytes: &[u8]) -> Vec<u8> {
  > +    #[cfg(windows)] // NTFS compares via upper()
  > +    return bytes.to_ascii_uppercase();
  > +    #[cfg(unix)]
  > +    bytes.to_ascii_lowercase()
  > +}
  
  HFS+ has more complex rules, and some were the source of security issue.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6756/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6756

To: Alphare, #hg-reviewers, kevincox
Cc: yuja, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/utils/files.rs b/rust/hg-core/src/utils/files.rs
--- a/rust/hg-core/src/utils/files.rs
+++ b/rust/hg-core/src/utils/files.rs
@@ -71,6 +71,15 @@ 
     dirs
 }
 
+/// TODO improve handling of utf8 file names. Our overall strategy for
+/// filenames has to be revisited anyway, since Windows is UTF-16.
+pub fn normalize_case(bytes: &[u8]) -> Vec<u8> {
+    #[cfg(windows)] // NTFS compares via upper()
+    return bytes.to_ascii_uppercase();
+    #[cfg(unix)]
+    bytes.to_ascii_lowercase()
+}
+
 #[cfg(test)]
 mod tests {
     #[test]
diff --git a/rust/hg-core/src/dirstate/dirstate_map.rs b/rust/hg-core/src/dirstate/dirstate_map.rs
--- a/rust/hg-core/src/dirstate/dirstate_map.rs
+++ b/rust/hg-core/src/dirstate/dirstate_map.rs
@@ -7,9 +7,10 @@ 
 
 use crate::{
     dirstate::{parsers::PARENT_SIZE, EntryState},
-    pack_dirstate, parse_dirstate, CopyMap, DirsMultiset, DirstateEntry,
-    DirstateError, DirstateMapError, DirstateParents, DirstateParseError,
-    StateMap,
+    pack_dirstate, parse_dirstate,
+    utils::files::normalize_case,
+    CopyMap, DirsMultiset, DirstateEntry, DirstateError, DirstateMapError,
+    DirstateParents, DirstateParseError, StateMap,
 };
 use core::borrow::Borrow;
 use std::collections::{HashMap, HashSet};
@@ -127,7 +128,7 @@ 
         }
 
         if let Some(ref mut file_fold_map) = self.file_fold_map {
-            file_fold_map.remove(&filename.to_ascii_uppercase());
+            file_fold_map.remove(&normalize_case(filename));
         }
         self.state_map.insert(
             filename.to_owned(),
@@ -162,7 +163,7 @@ 
             }
         }
         if let Some(ref mut file_fold_map) = self.file_fold_map {
-            file_fold_map.remove(&filename.to_ascii_uppercase());
+            file_fold_map.remove(&normalize_case(filename));
         }
         self.non_normal_set.remove(filename);
 
@@ -326,10 +327,8 @@ 
         for (filename, DirstateEntry { state, .. }) in self.state_map.borrow()
         {
             if *state == EntryState::Removed {
-                new_file_fold_map.insert(
-                    filename.to_ascii_uppercase().to_owned(),
-                    filename.to_owned(),
-                );
+                new_file_fold_map
+                    .insert(normalize_case(filename), filename.to_owned());
             }
         }
         self.file_fold_map = Some(new_file_fold_map);