Patchwork posix: don't compare atime when determining if a file has changed

login
register
mail settings
Submitter Siddharth Agarwal
Date Jan. 18, 2013, 11:57 p.m.
Message ID <31607df9bc05063c429f.1358553479@sid0x220>
Download mbox | patch
Permalink /patch/681/
State Accepted
Commit ecba9b0e767254c2b241657a9b591a2480f5811c
Headers show

Comments

Siddharth Agarwal - Jan. 18, 2013, 11:57 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1358553316 28800
# Node ID 31607df9bc05063c429f9071fa143bacab9c9bc1
# Parent  5986c03eea911d4712b6aefda8b41022d664488e
posix: don't compare atime when determining if a file has changed

A file's atime might change even if the file itself doesn't change. This might
cause us to invalidate caches more often than necessary.

Before this change, hg add often resulted in the dirstate being parsed twice
on systems that track atime. After this change, it is only parsed once. For a
repository with over 180,000 files, this speeds up hg add from 1.2 seconds to
1.0.
Matt Mackall - Jan. 19, 2013, 12:09 a.m.
On Fri, 2013-01-18 at 15:57 -0800, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1358553316 28800
> # Node ID 31607df9bc05063c429f9071fa143bacab9c9bc1
> # Parent  5986c03eea911d4712b6aefda8b41022d664488e
> posix: don't compare atime when determining if a file has changed

Queued for default, thanks.
Adrian Buehlmann - Jan. 19, 2013, 12:16 a.m.
On 2013-01-19 00:57, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1358553316 28800
> # Node ID 31607df9bc05063c429f9071fa143bacab9c9bc1
> # Parent  5986c03eea911d4712b6aefda8b41022d664488e
> posix: don't compare atime when determining if a file has changed
> 
> A file's atime might change even if the file itself doesn't change. This might
> cause us to invalidate caches more often than necessary.
> 
> Before this change, hg add often resulted in the dirstate being parsed twice
> on systems that track atime. After this change, it is only parsed once. For a
> repository with over 180,000 files, this speeds up hg add from 1.2 seconds to
> 1.0.
> 
> diff --git a/mercurial/posix.py b/mercurial/posix.py
> --- a/mercurial/posix.py
> +++ b/mercurial/posix.py
> @@ -478,7 +478,20 @@ class cachestat(object):
>  
>      def __eq__(self, other):
>          try:
> -            return self.stat == other.stat
> +            # Only dev, ino, size, mtime and atime are likely to change. Out
> +            # of these, we shouldn't compare atime but should compare the
> +            # rest. However, one of the other fields changing indicates
> +            # something fishy going on, so return False if anything but atime
> +            # changes.
> +            return (self.stat.st_mode == other.stat.st_mode and
> +                    self.stat.st_ino == other.stat.st_ino and
> +                    self.stat.st_dev == other.stat.st_dev and
> +                    self.stat.st_nlink == other.stat.st_nlink and
> +                    self.stat.st_uid == other.stat.st_uid and
> +                    self.stat.st_gid == other.stat.st_gid and
> +                    self.stat.st_size == other.stat.st_size and
> +                    self.stat.st_mtime == other.stat.st_mtime and
> +                    self.stat.st_ctime == other.stat.st_ctime)
>          except AttributeError:
>              return False

Probably not that relevant in practice, but perhaps nlink could be
ignored as well.

Patch

diff --git a/mercurial/posix.py b/mercurial/posix.py
--- a/mercurial/posix.py
+++ b/mercurial/posix.py
@@ -478,7 +478,20 @@  class cachestat(object):
 
     def __eq__(self, other):
         try:
-            return self.stat == other.stat
+            # Only dev, ino, size, mtime and atime are likely to change. Out
+            # of these, we shouldn't compare atime but should compare the
+            # rest. However, one of the other fields changing indicates
+            # something fishy going on, so return False if anything but atime
+            # changes.
+            return (self.stat.st_mode == other.stat.st_mode and
+                    self.stat.st_ino == other.stat.st_ino and
+                    self.stat.st_dev == other.stat.st_dev and
+                    self.stat.st_nlink == other.stat.st_nlink and
+                    self.stat.st_uid == other.stat.st_uid and
+                    self.stat.st_gid == other.stat.st_gid and
+                    self.stat.st_size == other.stat.st_size and
+                    self.stat.st_mtime == other.stat.st_mtime and
+                    self.stat.st_ctime == other.stat.st_ctime)
         except AttributeError:
             return False