Patchwork D6773: rust-hgpath: add HgPath and HgPathBuf structs to encapsulate handling of paths

login
register
mail settings
Submitter phabricator
Date Aug. 29, 2019, 2:40 p.m.
Message ID <differential-rev-PHID-DREV-pyhc3mbqtgokzmekxeng-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/41428/
State New
Headers show

Comments

phabricator - Aug. 29, 2019, 2:40 p.m.
Alphare created this revision.
Herald added subscribers: mercurial-devel, kevincox, durin42.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This change is a direct consequence of this discussion on the mailing list:
  https://www.mercurial-scm.org/pipermail/mercurial-devel/2019-August/133574.html
  
  The implementations of `HgPath` and `HgPathBuf` are, for the most part, taken
  directly from `OsStr` and `OsString` respectively from the standard library.
  
  What this change does *not* yet do is implement the Windows MBCS to WTF8
  conversion, but it lays the basis for a very flexible interface for paths.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

AFFECTED FILES
  rust/hg-core/src/utils.rs
  rust/hg-core/src/utils/hg_path.rs

CHANGE DETAILS




To: Alphare, #hg-reviewers
Cc: durin42, kevincox, mercurial-devel
phabricator - Aug. 30, 2019, 9:38 a.m.
This revision now requires changes to proceed.
kevincox added a comment.
kevincox requested changes to this revision.


  It looks good overall. I just would like to have a bit more strict definition of what an HgPath can contain and preferably some init-time validation.

INLINE COMMENTS

> hg_path.rs:19
> +/// decoded from MBCS to WTF-8. If WindowsUTF8Plan is implemented, the source
> +/// character encoding will be determined per repository basis.
> +#[derive(Eq, Ord, PartialEq, PartialOrd, Debug, Hash)]

You should probably add a note about what a valid path is. My initial assumption is "A sequence of non-null characters. `/` separates directories, consecutive `/` are forbidden". It would be good to write this down, provide an `is_valid` function and add a debug_assert! on creation/modification to check that everything is as expected.

> hg_path.rs:43
> +    }
> +    pub fn iter(&self) -> HgPathIterator {
> +        HgPathIterator { path: &self }

Please call this `bytes()` to highlight that it iterates over bytes, not characters or directory components.

> hg_path.rs:136
> +#[derive(Debug)]
> +pub struct HgPathIterator<'a> {
> +    path: &'a HgPath,

I would also call this `HgPathByteIterator`

> hg_path.rs:204
> +
> +    #[inline]
> +    fn deref(&self) -> &HgPath {

This is almost certainly unnecessary.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
phabricator - Aug. 30, 2019, 2:41 p.m.
Alphare added inline comments.
Alphare marked 3 inline comments as done.

INLINE COMMENTS

> kevincox wrote in hg_path.rs:204
> This is almost certainly unnecessary.

It is useful for some methods already. The more we add, the more it is useful.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: durin42, kevincox, mercurial-devel
phabricator - Aug. 31, 2019, 3:54 p.m.
Alphare added a comment.


  In D6773#99433 <https://phab.mercurial-scm.org/D6773#99433>, @yuja wrote:
  
  >> +#[derive(Eq, Ord, PartialEq, PartialOrd, Debug, Hash)]
  >> +pub struct HgPath {
  >> +    inner: [u8],
  >> +}
  >
  > I found `std::path::Path` has a well-written inline comment about
  > the unsafe pointer cast. Let's copy it so we won't introduce a
  > memory issue.
  
  I'm not 100% sure which comment you're referring to, sorry.
  
  In D6773#99434 <https://phab.mercurial-scm.org/D6773#99434>, @yuja wrote:
  
  > I'm not sure if this `HgPath` type is designed to be always validated or not.
  > If yes, these functions can be private, and we'll need a conservative
  > `HgPath::from_bytes(s) -> Result<&HgPath>` and `unsafe fn from_bytes_unchecked(s)`
  > functions. If not, maybe `HgPath::new()` shouldn't validate the bytes at all.
  > I feel `debug_assert()` implies that the condition must always be met,
  > so the caller has to guarantee that.
  
  I feel like both the performance implications and the ergonomics of checking everything are going to be a problem. @kevincox's idea of using `debug_assert!` seemed like a decent enough check to ease development. These functions could also be used during conversions to `OsString` and the like.
  
  >> +impl Index<usize> for HgPath {
  >> +    type Output = u8;
  >> +
  >> +    fn index(&self, i: usize) -> &Self::Output {
  >> +        &self.inner[i]
  >> +    }
  >> +}
  >
  > [...]
  > Better to not implement `path[...]` since it can be considered returning
  > a path component at the specified index. We can always do
  > `path.as_bytes()[...]` explicitly.
  
  I'd leave `Index<RangeFull>`since it it not ambiguous and is a neat shorthand.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: yuja, durin42, kevincox, mercurial-devel
Yuya Nishihara - Sept. 1, 2019, 2:14 a.m.
>   > I found `std::path::Path` has a well-written inline comment about
>   > the unsafe pointer cast. Let's copy it so we won't introduce a
>   > memory issue.
>   
>   I'm not 100% sure which comment you're referring to, sorry.

This.

https://github.com/rust-lang/rust/commit/740f8db85572aef58d0734fc60bc2b54862ebbb0#diff-120c5906285d41e976dc4176265d7cbaR1754

Apparently it was added quite recently.

>   > I'm not sure if this `HgPath` type is designed to be always validated or not.
>   > If yes, these functions can be private, and we'll need a conservative
>   > `HgPath::from_bytes(s) -> Result<&HgPath>` and `unsafe fn from_bytes_unchecked(s)`
>   > functions. If not, maybe `HgPath::new()` shouldn't validate the bytes at all.
>   > I feel `debug_assert()` implies that the condition must always be met,
>   > so the caller has to guarantee that.
>   
>   I feel like both the performance implications and the ergonomics of checking everything are going to be a problem. @kevincox's idea of using `debug_assert!` seemed like a decent enough check to ease development. These functions could also be used during conversions to `OsString` and the like.

@kevincox said "it would be good to [...] add a debug_assert! on
creation/modification to check that everything is as expected."
IIUC, his idea is that the `debug_assert!()` condition must always be met,
which is the guarantee that the HgPath type provides. In order to ensure
that, we'll have to validate all paths read from dirstate, for example,
prior to casting to HgPath type. Otherwise `debug_assert!()` would fail,
meaning the implementation had a bug.

So, my question is, do we really need such strong guarantee by default?

Instead, we could lower the bar to "HgPath is merely a path represented in
platform-specific encoding though it's supposed to be a canonical path",
and add e.g. `is_canonicalized()` to check if the path meets the strong
requirement.

>   >> +impl Index<usize> for HgPath {
>   >> +    type Output = u8;
>   >> +
>   >> +    fn index(&self, i: usize) -> &Self::Output {
>   >> +        &self.inner[i]
>   >> +    }
>   >> +}
>   >
>   > [...]
>   > Better to not implement `path[...]` since it can be considered returning
>   > a path component at the specified index. We can always do
>   > `path.as_bytes()[...]` explicitly.
>   
>   I'd leave `Index<RangeFull>`since it it not ambiguous and is a neat shorthand.

What shorthand would it be used for?

I feel it's weird only `path[..]` is allowed.
phabricator - Sept. 1, 2019, 2:17 a.m.
yuja added a comment.


  >   > I found `std::path::Path` has a well-written inline comment about
  >   > the unsafe pointer cast. Let's copy it so we won't introduce a
  >   > memory issue.
  >   I'm not 100% sure which comment you're referring to, sorry.
  
  This.
  
  https://github.com/rust-lang/rust/commit/740f8db85572aef58d0734fc60bc2b54862ebbb0#diff-120c5906285d41e976dc4176265d7cbaR1754
  
  Apparently it was added quite recently.
  
  >   > I'm not sure if this `HgPath` type is designed to be always validated or not.
  >   > If yes, these functions can be private, and we'll need a conservative
  >   > `HgPath::from_bytes(s) -> Result<&HgPath>` and `unsafe fn from_bytes_unchecked(s)`
  >   > functions. If not, maybe `HgPath::new()` shouldn't validate the bytes at all.
  >   > I feel `debug_assert()` implies that the condition must always be met,
  >   > so the caller has to guarantee that.
  >   I feel like both the performance implications and the ergonomics of checking everything are going to be a problem. @kevincox's idea of using `debug_assert!` seemed like a decent enough check to ease development. These functions could also be used during conversions to `OsString` and the like.
  
  @kevincox said "it would be good to [...] add a debug_assert! on
  creation/modification to check that everything is as expected."
  IIUC, his idea is that the `debug_assert!()` condition must always be met,
  which is the guarantee that the HgPath type provides. In order to ensure
  that, we'll have to validate all paths read from dirstate, for example,
  prior to casting to HgPath type. Otherwise `debug_assert!()` would fail,
  meaning the implementation had a bug.
  
  So, my question is, do we really need such strong guarantee by default?
  
  Instead, we could lower the bar to "HgPath is merely a path represented in
  platform-specific encoding though it's supposed to be a canonical path",
  and add e.g. `is_canonicalized()` to check if the path meets the strong
  requirement.
  
  >   >> +impl Index<usize> for HgPath {
  >   >> +    type Output = u8;
  >   >> +
  >   >> +    fn index(&self, i: usize) -> &Self::Output {
  >   >> +        &self.inner[i]
  >   >> +    }
  >   >> +}
  >   >
  >   > [...]
  >   > Better to not implement `path[...]` since it can be considered returning
  >   > a path component at the specified index. We can always do
  >   > `path.as_bytes()[...]` explicitly.
  >   I'd leave `Index<RangeFull>`since it it not ambiguous and is a neat shorthand.
  
  What shorthand would it be used for?
  
  I feel it's weird only `path[..]` is allowed.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: yuja, durin42, kevincox, mercurial-devel
phabricator - Sept. 1, 2019, 9:30 a.m.
Alphare added a comment.


  > @kevincox said "it would be good to [...] add a debug_assert! on
  > creation/modification to check that everything is as expected."
  > IIUC, his idea is that the `debug_assert!()` condition must always be met,
  > which is the guarantee that the HgPath type provides. In order to ensure
  > that, we'll have to validate all paths read from dirstate, for example,
  > prior to casting to HgPath type. Otherwise `debug_assert!()` would fail,
  > meaning the implementation had a bug.
  > So, my question is, do we really need such strong guarantee by default?
  
  I indeed do not think we do, for the reasons I gave in my previous comment.
  
  > Instead, we could lower the bar to "HgPath is merely a path represented in
  > platform-specific encoding though it's supposed to be a canonical path",
  > and add e.g. `is_canonicalized()` to check if the path meets the strong
  > requirement.
  
  Is it really represented in a platform-specific encoding or it just treated as ASCII until file system operations are needed? 
  I would use the strong check when converting to platform-specific types like `OsStr` or `Path`.
  
  >>   >> +impl Index<usize> for HgPath {
  >>   >> +    type Output = u8;
  >>   >> +
  >>   >> +    fn index(&self, i: usize) -> &Self::Output {
  >>   >> +        &self.inner[i]
  >>   >> +    }
  >>   >> +}
  >>   >
  >>   > [...]
  >>   > Better to not implement `path[...]` since it can be considered returning
  >>   > a path component at the specified index. We can always do
  >>   > `path.as_bytes()[...]` explicitly.
  >>   I'd leave `Index<RangeFull>`since it it not ambiguous and is a neat shorthand.
  >
  > What shorthand would it be used for?
  > I feel it's weird only `path[..]` is allowed.
  
  This is useful in `Deref` for example: `&self[..]`. But I can remove it if needed.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: yuja, durin42, kevincox, mercurial-devel
Yuya Nishihara - Sept. 1, 2019, 1:23 p.m.
>   > @kevincox said "it would be good to [...] add a debug_assert! on
>   > creation/modification to check that everything is as expected."
>   > IIUC, his idea is that the `debug_assert!()` condition must always be met,
>   > which is the guarantee that the HgPath type provides. In order to ensure
>   > that, we'll have to validate all paths read from dirstate, for example,
>   > prior to casting to HgPath type. Otherwise `debug_assert!()` would fail,
>   > meaning the implementation had a bug.
>   > So, my question is, do we really need such strong guarantee by default?
>   
>   I indeed do not think we do, for the reasons I gave in my previous comment.

Okay, let's remove the debug_asserts, and update the documentation
accordingly.

>   > Instead, we could lower the bar to "HgPath is merely a path represented in
>   > platform-specific encoding though it's supposed to be a canonical path",
>   > and add e.g. `is_canonicalized()` to check if the path meets the strong
>   > requirement.
>   
>   Is it really represented in a platform-specific encoding or it just treated as ASCII until file system operations are needed?

It's a chunk of bytes. I just said "platform-specific encoding" to clarify
that the encoding might be different from OsStr. Sorry for confusion.

>   I would use the strong check when converting to platform-specific types like `OsStr` or `Path`.

Perhaps, that's similar to what the pathauditor would do in Python.

>   >>   >> +impl Index<usize> for HgPath {
>   >>   >> +    type Output = u8;
>   >>   >> +
>   >>   >> +    fn index(&self, i: usize) -> &Self::Output {
>   >>   >> +        &self.inner[i]
>   >>   >> +    }
>   >>   >> +}
>   >>   >
>   >>   > [...]
>   >>   > Better to not implement `path[...]` since it can be considered returning
>   >>   > a path component at the specified index. We can always do
>   >>   > `path.as_bytes()[...]` explicitly.
>   >>   I'd leave `Index<RangeFull>`since it it not ambiguous and is a neat shorthand.
>   >
>   > What shorthand would it be used for?
>   > I feel it's weird only `path[..]` is allowed.
>   
>   This is useful in `Deref` for example: `&self[..]`. But I can remove it if needed.

Maybe it can be written as `HgPath::new(&self.inner)`.
phabricator - Sept. 1, 2019, 1:25 p.m.
yuja added a comment.


  >   > @kevincox said "it would be good to [...] add a debug_assert! on
  >   > creation/modification to check that everything is as expected."
  >   > IIUC, his idea is that the `debug_assert!()` condition must always be met,
  >   > which is the guarantee that the HgPath type provides. In order to ensure
  >   > that, we'll have to validate all paths read from dirstate, for example,
  >   > prior to casting to HgPath type. Otherwise `debug_assert!()` would fail,
  >   > meaning the implementation had a bug.
  >   > So, my question is, do we really need such strong guarantee by default?
  >   I indeed do not think we do, for the reasons I gave in my previous comment.
  
  Okay, let's remove the debug_asserts, and update the documentation
  accordingly.
  
  >   > Instead, we could lower the bar to "HgPath is merely a path represented in
  >   > platform-specific encoding though it's supposed to be a canonical path",
  >   > and add e.g. `is_canonicalized()` to check if the path meets the strong
  >   > requirement.
  >   Is it really represented in a platform-specific encoding or it just treated as ASCII until file system operations are needed?
  
  It's a chunk of bytes. I just said "platform-specific encoding" to clarify
  that the encoding might be different from OsStr. Sorry for confusion.
  
  >   I would use the strong check when converting to platform-specific types like `OsStr` or `Path`.
  
  Perhaps, that's similar to what the pathauditor would do in Python.
  
  >   >>   >> +impl Index<usize> for HgPath {
  >   >>   >> +    type Output = u8;
  >   >>   >> +
  >   >>   >> +    fn index(&self, i: usize) -> &Self::Output {
  >   >>   >> +        &self.inner[i]
  >   >>   >> +    }
  >   >>   >> +}
  >   >>   >
  >   >>   > [...]
  >   >>   > Better to not implement `path[...]` since it can be considered returning
  >   >>   > a path component at the specified index. We can always do
  >   >>   > `path.as_bytes()[...]` explicitly.
  >   >>   I'd leave `Index<RangeFull>`since it it not ambiguous and is a neat shorthand.
  >   >
  >   > What shorthand would it be used for?
  >   > I feel it's weird only `path[..]` is allowed.
  >   This is useful in `Deref` for example: `&self[..]`. But I can remove it if needed.
  
  Maybe it can be written as `HgPath::new(&self.inner)`.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: yuja, durin42, kevincox, mercurial-devel
phabricator - Sept. 11, 2019, 9:50 a.m.
Alphare added a comment.


  I would love for this series to be queued. Is there anything I didn't address left to do?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: mjpieters, yuja, durin42, kevincox, mercurial-devel
phabricator - Sept. 13, 2019, 10:49 a.m.
kevincox added inline comments.
kevincox accepted this revision.

INLINE COMMENTS

> hg_path.rs:175
> +    pub fn push(&mut self, byte: u8) {
> +        self.inner.push(byte);
> +    }

debug_assert check after the mutation.

> hg_path.rs:242
> +    fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) {
> +        self.inner.extend(iter);
> +    }

debug_assert check after the mutation.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D6773/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D6773

To: Alphare, #hg-reviewers, kevincox
Cc: mjpieters, yuja, durin42, kevincox, mercurial-devel

Patch

diff --git a/rust/hg-core/src/utils/hg_path.rs b/rust/hg-core/src/utils/hg_path.rs
new file mode 100644
--- /dev/null
+++ b/rust/hg-core/src/utils/hg_path.rs
@@ -0,0 +1,350 @@ 
+use crate::PatternFileError;
+use std::borrow::Borrow;
+use std::convert::TryInto;
+use std::ffi::OsString;
+use std::iter::FusedIterator;
+use std::ops::{Deref, Index, Range, RangeFrom, RangeFull, RangeTo};
+use std::path::{Path, PathBuf};
+
+/// This is a repository-relative path (or canonical path):
+///     - posix-like even on Windows,
+///     - no leading slash,
+///     - no "." nor ".." of special meaning,
+///     - stored in repository and shared across platforms.
+///
+/// This allows us to be encoding-transparent as much as possible, until really
+/// needed; `HgPath` can be transformed into a platform-specific path (`OsStr`
+/// or `Path`) whenever more complex operations are needed:
+/// On Unix, it's just byte-to-byte conversion. On Windows, it has to be
+/// decoded from MBCS to WTF-8. If WindowsUTF8Plan is implemented, the source
+/// character encoding will be determined per repository basis.
+#[derive(Eq, Ord, PartialEq, PartialOrd, Debug, Hash)]
+pub struct HgPath {
+    inner: [u8],
+}
+
+impl HgPath {
+    pub fn new<S: AsRef<[u8]> + ?Sized>(s: &S) -> &Self {
+        unsafe { &*(s.as_ref() as *const [u8] as *const Self) }
+    }
+    pub fn is_empty(&self) -> bool {
+        self.inner.len() == 0
+    }
+    pub fn len(&self) -> usize {
+        self.inner.len()
+    }
+    fn to_hg_path_buf(&self) -> HgPathBuf {
+        HgPathBuf {
+            inner: self.inner.to_owned(),
+        }
+    }
+    fn from_inner(inner: &[u8]) -> &Self {
+        unsafe { &*(inner as *const [u8] as *const HgPath) }
+    }
+    pub fn iter(&self) -> HgPathIterator {
+        HgPathIterator { path: &self }
+    }
+    pub fn to_ascii_uppercase(&self) -> HgPathBuf {
+        HgPathBuf::from(self.inner.to_ascii_uppercase())
+    }
+    pub fn to_ascii_lowercase(&self) -> HgPathBuf {
+        HgPathBuf::from(self.inner.to_ascii_lowercase())
+    }
+    pub fn bytes(&self) -> &[u8] {
+        unsafe { &*(&self.inner as *const _ as *const [u8]) }
+    }
+}
+
+impl Index<usize> for HgPath {
+    type Output = u8;
+
+    fn index(&self, i: usize) -> &Self::Output {
+        &self.inner[i]
+    }
+}
+
+impl Index<RangeFull> for HgPath {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, _index: RangeFull) -> &HgPath {
+        &self
+    }
+}
+
+impl Index<RangeTo<usize>> for HgPath {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, range_to: RangeTo<usize>) -> &HgPath {
+        HgPath::new(&self.inner[range_to])
+    }
+}
+
+impl Index<RangeFrom<usize>> for HgPath {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, range_from: RangeFrom<usize>) -> &HgPath {
+        HgPath::new(&self.inner[range_from])
+    }
+}
+impl Index<Range<usize>> for HgPath {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, range: Range<usize>) -> &HgPath {
+        HgPath::new(&self.inner[range])
+    }
+}
+
+impl Index<usize> for HgPathBuf {
+    type Output = u8;
+
+    fn index(&self, i: usize) -> &Self::Output {
+        &self.inner[i]
+    }
+}
+
+impl Index<RangeFull> for HgPathBuf {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, _index: RangeFull) -> &HgPath {
+        HgPath::from_inner(self.inner.as_slice())
+    }
+}
+
+impl Index<RangeTo<usize>> for HgPathBuf {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, range_to: RangeTo<usize>) -> &HgPath {
+        &HgPath::new(&self.inner[range_to])
+    }
+}
+
+impl Index<RangeFrom<usize>> for HgPathBuf {
+    type Output = HgPath;
+
+    #[inline]
+    fn index(&self, range_from: RangeFrom<usize>) -> &HgPath {
+        HgPath::new(&self.inner[range_from])
+    }
+}
+
+#[derive(Debug)]
+pub struct HgPathIterator<'a> {
+    path: &'a HgPath,
+}
+
+impl<'a> Iterator for HgPathIterator<'a> {
+    type Item = u8;
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.path.len() == 0 {
+            None
+        } else {
+            let ret = self.path[0];
+            self.path = &self.path[1..self.path.len()];
+            Some(ret)
+        }
+    }
+}
+
+impl<'a> ExactSizeIterator for HgPathIterator<'a> {
+    // We can easily calculate the remaining number of iterations.
+    fn len(&self) -> usize {
+        self.path.len()
+    }
+}
+
+impl<'a> DoubleEndedIterator for HgPathIterator<'a> {
+    fn next_back(&mut self) -> Option<Self::Item> {
+        if self.path.len() == 0 {
+            None
+        } else {
+            let back_position = self.path.len() - 1;
+            let ret = self.path[back_position];
+            self.path = &self.path[..back_position];
+            Some(ret)
+        }
+    }
+}
+
+impl<'a> FusedIterator for HgPathIterator<'a> {}
+
+#[derive(Eq, Ord, Clone, PartialEq, PartialOrd, Debug, Hash)]
+pub struct HgPathBuf {
+    inner: Vec<u8>,
+}
+
+impl HgPathBuf {
+    pub fn new() -> Self {
+        Self { inner: Vec::new() }
+    }
+    pub fn push(&mut self, byte: u8) {
+        self.inner.push(byte)
+    }
+    pub fn from_bytes(s: &[u8]) -> HgPathBuf {
+        HgPath::new(s).to_owned()
+    }
+    pub fn into_vec(self) -> Vec<u8> {
+        self.inner
+    }
+    pub fn as_vec(&self) -> &Vec<u8> {
+        &self.inner
+    }
+    pub fn as_ref(&self) -> &[u8] {
+        self.inner.as_ref()
+    }
+}
+
+impl Deref for HgPathBuf {
+    type Target = HgPath;
+
+    #[inline]
+    fn deref(&self) -> &HgPath {
+        &self[..]
+    }
+}
+
+impl From<Vec<u8>> for HgPathBuf {
+    fn from(vec: Vec<u8>) -> Self {
+        Self { inner: vec }
+    }
+}
+
+impl<T: ?Sized + AsRef<HgPath>> From<&T> for HgPathBuf {
+    fn from(s: &T) -> HgPathBuf {
+        s.as_ref().to_owned()
+    }
+}
+
+impl Into<Vec<u8>> for HgPathBuf {
+    fn into(self) -> Vec<u8> {
+        self.inner
+    }
+}
+
+impl Borrow<HgPath> for HgPathBuf {
+    fn borrow(&self) -> &HgPath {
+        &self[..]
+    }
+}
+
+impl ToOwned for HgPath {
+    type Owned = HgPathBuf;
+    fn to_owned(&self) -> HgPathBuf {
+        self.to_hg_path_buf()
+    }
+}
+
+impl AsRef<HgPath> for HgPath {
+    fn as_ref(&self) -> &HgPath {
+        self
+    }
+}
+
+impl AsRef<HgPath> for HgPathBuf {
+    fn as_ref(&self) -> &HgPath {
+        self
+    }
+}
+
+impl Extend<u8> for HgPathBuf {
+    fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) {
+        self.inner.extend(iter)
+    }
+}
+
+impl TryInto<PathBuf> for HgPathBuf {
+    type Error = PatternFileError;
+
+    fn try_into(self) -> Result<PathBuf, Self::Error> {
+        let os_str;
+        #[cfg(unix)]
+        {
+            use std::os::unix::ffi::OsStrExt;
+            os_str = std::ffi::OsStr::from_bytes(&self.inner);
+        }
+        #[cfg(windows)]
+        {
+            // TODO: convert from Windows MBCS (ANSI encoding) to WTF8.
+            // Perhaps, the return type would have to be Result<PathBuf>.
+            unimplemented!();
+        }
+
+        Ok(Path::new(os_str).to_path_buf())
+    }
+}
+
+impl TryInto<OsString> for HgPathBuf {
+    type Error = &'static str;
+
+    fn try_into(self) -> Result<OsString, Self::Error> {
+        let os_str;
+        #[cfg(unix)]
+        {
+            use std::os::unix::ffi::OsStrExt;
+            os_str = std::ffi::OsStr::from_bytes(&self.inner);
+        }
+        #[cfg(windows)]
+        {
+            // TODO: convert from Windows MBCS (ANSI encoding) to WTF8.
+            // Perhaps, the return type would have to be Result<PathBuf>.
+            unimplemented!()
+        }
+
+        Ok(os_str.to_os_string())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_iter() {
+        let path = HgPath::new(b"a");
+        let mut iter = path.iter();
+        assert_eq!(Some(b'a'), iter.next());
+        assert_eq!(None, iter.next_back());
+        assert_eq!(None, iter.next());
+
+        let path = HgPath::new(b"a");
+        let mut iter = path.iter();
+        assert_eq!(Some(b'a'), iter.next_back());
+        assert_eq!(None, iter.next_back());
+        assert_eq!(None, iter.next());
+
+        let path = HgPath::new(b"abc");
+        let mut iter = path.iter();
+        assert_eq!(Some(b'a'), iter.next());
+        assert_eq!(Some(b'c'), iter.next_back());
+        assert_eq!(Some(b'b'), iter.next_back());
+        assert_eq!(None, iter.next_back());
+        assert_eq!(None, iter.next());
+
+        let path = HgPath::new(b"abc");
+        let mut iter = path.iter();
+        assert_eq!(Some(b'a'), iter.next());
+        assert_eq!(Some(b'b'), iter.next());
+        assert_eq!(Some(b'c'), iter.next());
+        assert_eq!(None, iter.next_back());
+        assert_eq!(None, iter.next());
+
+        let path = HgPath::new(b"abc");
+        let iter = path.iter();
+        let mut vec = Vec::new();
+        vec.extend(iter);
+        assert_eq!(vec![b'a', b'b', b'c'], vec);
+
+        let path = HgPath::new(b"abc");
+        let mut iter = path.iter();
+        assert_eq!(Some(2), iter.rposition(|c| c == b'c'));
+
+        let path = HgPath::new(b"abc");
+        let mut iter = path.iter();
+        assert_eq!(None, iter.rposition(|c| c == b'd'));
+    }
+}
diff --git a/rust/hg-core/src/utils.rs b/rust/hg-core/src/utils.rs
--- a/rust/hg-core/src/utils.rs
+++ b/rust/hg-core/src/utils.rs
@@ -8,6 +8,7 @@ 
 //! Contains useful functions, traits, structs, etc. for use in core.
 
 pub mod files;
+pub mod hg_path;
 
 /// Replaces the `from` slice with the `to` slice inside the `buf` slice.
 ///