Submitter | Jun Wu |
---|---|
Date | March 3, 2017, 6:13 a.m. |
Message ID | <d79c818940ff7e29c76f.1488521610@localhost.localdomain> |
Download | mbox | patch |
Permalink | /patch/18880/ |
State | Accepted |
Headers | show |
Comments
On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1488521308 28800 > # Thu Mar 02 22:08:28 2017 -0800 > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > util: add a getfstype method Can you update debugfsinfo to include this information? > +if pycompat.sysplatform.startswith('linux'): > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > + # to get the current filesystem mount information > + def _getfstypetable(): > + result = {} > + try: > + with open('/etc/mtab', 'r') as f: > + for line in f.read().splitlines(): > + name, path, fs, ops, freq, passno = line.split(' ', 5) > + result[path] = fs > + except OSError: For reminder, it should be IOError. Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file on old Linux. > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > +# mount table does not change during the lifetime of the process, which is > +# reasonable for short-lived process > +_fstypetable = None This can be problem on chg or hgweb. Short-lived cache in vfs would help.
Per discussion with Augie, we have to be more conservative and couldn't trust /proc (not to mention /etc). Fortunately, it seems modern Linux has a syscall to get the filesystem information. I'm investigating the details. Excerpts from Yuya Nishihara's message of 2017-03-11 16:48:54 -0800: > On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1488521308 28800 > > # Thu Mar 02 22:08:28 2017 -0800 > > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > > util: add a getfstype method > > Can you update debugfsinfo to include this information? Good advice. Will do. > > +if pycompat.sysplatform.startswith('linux'): > > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > > + # to get the current filesystem mount information > > + def _getfstypetable(): > > + result = {} > > + try: > > + with open('/etc/mtab', 'r') as f: > > + for line in f.read().splitlines(): > > + name, path, fs, ops, freq, passno = line.split(' ', 5) > > + result[path] = fs > > + except OSError: > > For reminder, it should be IOError. > > Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file > on old Linux. > > > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > > +# mount table does not change during the lifetime of the process, which is > > +# reasonable for short-lived process > > +_fstypetable = None > > This can be problem on chg or hgweb. Short-lived cache in vfs would help. Good idea.
Excerpts from Jun Wu's message of 2017-03-11 17:13:57 -0800: > Per discussion with Augie, we have to be more conservative and couldn't > trust /proc (not to mention /etc). > > Fortunately, it seems modern Linux has a syscall to get the filesystem > information. I'm investigating the details. Seems I (or strace once) was wrong. /proc is still the only way to get the filesystem infomation. Maybe we should hard-code /proc/mounts for Linux to avoid /etc/mtab being wrong. > > Excerpts from Yuya Nishihara's message of 2017-03-11 16:48:54 -0800: > > On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > > > # HG changeset patch > > > # User Jun Wu <quark@fb.com> > > > # Date 1488521308 28800 > > > # Thu Mar 02 22:08:28 2017 -0800 > > > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > > > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > > > util: add a getfstype method > > > > Can you update debugfsinfo to include this information? > > Good advice. Will do. > > > > +if pycompat.sysplatform.startswith('linux'): > > > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > > > + # to get the current filesystem mount information > > > + def _getfstypetable(): > > > + result = {} > > > + try: > > > + with open('/etc/mtab', 'r') as f: > > > + for line in f.read().splitlines(): > > > + name, path, fs, ops, freq, passno = line.split(' ', 5) > > > + result[path] = fs > > > + except OSError: > > > > For reminder, it should be IOError. > > > > Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file > > on old Linux. > > > > > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > > > +# mount table does not change during the lifetime of the process, which is > > > +# reasonable for short-lived process > > > +_fstypetable = None > > > > This can be problem on chg or hgweb. Short-lived cache in vfs would help. > > Good idea.
Excerpts from Yuya Nishihara's message of 2017-03-11 16:48:54 -0800: > On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1488521308 28800 > > # Thu Mar 02 22:08:28 2017 -0800 > > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > > # Available At https://bitbucket.org/quark-zju/hg-draft > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > > util: add a getfstype method > > Can you update debugfsinfo to include this information? > > > +if pycompat.sysplatform.startswith('linux'): > > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > > + # to get the current filesystem mount information > > + def _getfstypetable(): > > + result = {} > > + try: > > + with open('/etc/mtab', 'r') as f: > > + for line in f.read().splitlines(): > > + name, path, fs, ops, freq, passno = line.split(' ', 5) > > + result[path] = fs > > + except OSError: > > For reminder, it should be IOError. > > Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file > on old Linux. > > > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > > +# mount table does not change during the lifetime of the process, which is > > +# reasonable for short-lived process > > +_fstypetable = None > > This can be problem on chg or hgweb. Short-lived cache in vfs would help. The problem is, "util.copyfile" is not in vfs.
On Sat, 11 Mar 2017 23:28:25 -0800, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-03-11 16:48:54 -0800: > > On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > > > # HG changeset patch > > > # User Jun Wu <quark@fb.com> > > > # Date 1488521308 28800 > > > # Thu Mar 02 22:08:28 2017 -0800 > > > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > > > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > > > util: add a getfstype method > > > > Can you update debugfsinfo to include this information? > > > > > +if pycompat.sysplatform.startswith('linux'): > > > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > > > + # to get the current filesystem mount information > > > + def _getfstypetable(): > > > + result = {} > > > + try: > > > + with open('/etc/mtab', 'r') as f: > > > + for line in f.read().splitlines(): > > > + name, path, fs, ops, freq, passno = line.split(' ', 5) > > > + result[path] = fs > > > + except OSError: > > > > For reminder, it should be IOError. > > > > Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file > > on old Linux. > > > > > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > > > +# mount table does not change during the lifetime of the process, which is > > > +# reasonable for short-lived process > > > +_fstypetable = None > > > > This can be problem on chg or hgweb. Short-lived cache in vfs would help. > > The problem is, "util.copyfile" is not in vfs. That shouldn't matter as the transaction knows vfs objects. copyfile() can be proxied through vfs.
Excerpts from Yuya Nishihara's message of 2017-03-12 08:08:55 -0700: > On Sat, 11 Mar 2017 23:28:25 -0800, Jun Wu wrote: > > Excerpts from Yuya Nishihara's message of 2017-03-11 16:48:54 -0800: > > > On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > > > > # HG changeset patch > > > > # User Jun Wu <quark@fb.com> > > > > # Date 1488521308 28800 > > > > # Thu Mar 02 22:08:28 2017 -0800 > > > > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > > > > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > > > > util: add a getfstype method > > > > > > Can you update debugfsinfo to include this information? > > > > > > > +if pycompat.sysplatform.startswith('linux'): > > > > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > > > > + # to get the current filesystem mount information > > > > + def _getfstypetable(): > > > > + result = {} > > > > + try: > > > > + with open('/etc/mtab', 'r') as f: > > > > + for line in f.read().splitlines(): > > > > + name, path, fs, ops, freq, passno = line.split(' ', 5) > > > > + result[path] = fs > > > > + except OSError: > > > > > > For reminder, it should be IOError. > > > > > > Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file > > > on old Linux. > > > > > > > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > > > > +# mount table does not change during the lifetime of the process, which is > > > > +# reasonable for short-lived process > > > > +_fstypetable = None > > > > > > This can be problem on chg or hgweb. Short-lived cache in vfs would help. > > > > The problem is, "util.copyfile" is not in vfs. > > That shouldn't matter as the transaction knows vfs objects. copyfile() can be > proxied through vfs. That's good to know. I actually had 3 patches to move it to vfs, but was concerned about the future "repostorage" layer holding vfs objects that won't get invalidated across chg workers. I think it still makes sense to query the information once per process/thread. So we may want a caching layer where the key is pid+threadid, and gets invalidated when that changes.
On Sun, 12 Mar 2017 11:32:38 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-03-12 08:08:55 -0700: > > On Sat, 11 Mar 2017 23:28:25 -0800, Jun Wu wrote: > > > Excerpts from Yuya Nishihara's message of 2017-03-11 16:48:54 -0800: > > > > On Thu, 2 Mar 2017 22:13:30 -0800, Jun Wu wrote: > > > > > # HG changeset patch > > > > > # User Jun Wu <quark@fb.com> > > > > > # Date 1488521308 28800 > > > > > # Thu Mar 02 22:08:28 2017 -0800 > > > > > # Node ID d79c818940ff7e29c76ff5e985b920885aa4e7c1 > > > > > # Parent 1cf153ec3faaef92c9ad3515372a6d8591195d6e > > > > > # Available At https://bitbucket.org/quark-zju/hg-draft > > > > > # hg pull https://bitbucket.org/quark-zju/hg-draft -r d79c818940ff > > > > > util: add a getfstype method > > > > > > > > Can you update debugfsinfo to include this information? > > > > > > > > > +if pycompat.sysplatform.startswith('linux'): > > > > > + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way > > > > > + # to get the current filesystem mount information > > > > > + def _getfstypetable(): > > > > > + result = {} > > > > > + try: > > > > > + with open('/etc/mtab', 'r') as f: > > > > > + for line in f.read().splitlines(): > > > > > + name, path, fs, ops, freq, passno = line.split(' ', 5) > > > > > + result[path] = fs > > > > > + except OSError: > > > > > > > > For reminder, it should be IOError. > > > > > > > > Shouldn't we take care of mtab corruption as well? IIRC, it was a plain file > > > > on old Linux. > > > > > > > > > +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes > > > > > +# mount table does not change during the lifetime of the process, which is > > > > > +# reasonable for short-lived process > > > > > +_fstypetable = None > > > > > > > > This can be problem on chg or hgweb. Short-lived cache in vfs would help. > > > > > > The problem is, "util.copyfile" is not in vfs. > > > > That shouldn't matter as the transaction knows vfs objects. copyfile() can be > > proxied through vfs. > > That's good to know. I actually had 3 patches to move it to vfs, but was > concerned about the future "repostorage" layer holding vfs objects that > won't get invalidated across chg workers. If the filesystem at the cached repository location changed, cache should be invalidated and repostorage would be recreated anyway. > I think it still makes sense to query the information once per > process/thread. So we may want a caching layer where the key is > pid+threadid, and gets invalidated when that changes.
Excerpts from Yuya Nishihara's message of 2017-03-12 11:58:40 -0700: > > That's good to know. I actually had 3 patches to move it to vfs, but was > > concerned about the future "repostorage" layer holding vfs objects that > > won't get invalidated across chg workers. > > If the filesystem at the cached repository location changed, cache should be > invalidated and repostorage would be recreated anyway. Even if location does not change, the user can run "mount" to change the "mtab" that affects us. Querying "mtab" confidently is expensive. And "mtab" can answer all vfs queries. So I think it's better to cache "mtab" per process, instead of caching "fstype" per vfs.
On Sun, 12 Mar 2017 12:06:32 -0700, Jun Wu wrote: > Excerpts from Yuya Nishihara's message of 2017-03-12 11:58:40 -0700: > > > That's good to know. I actually had 3 patches to move it to vfs, but was > > > concerned about the future "repostorage" layer holding vfs objects that > > > won't get invalidated across chg workers. > > > > If the filesystem at the cached repository location changed, cache should be > > invalidated and repostorage would be recreated anyway. > > Even if location does not change, the user can run "mount" to change the > "mtab" that affects us. My point is that the repository should be considered changed if he mounted another filesystem onto that location. vfs caches cansymlink, execflag, etc., which are all invalid. > Querying "mtab" confidently is expensive. And "mtab" can answer all vfs > queries. So I think it's better to cache "mtab" per process, instead of > caching "fstype" per vfs. I won't object to it if it can be implemented cleanly.
Excerpts from Yuya Nishihara's message of 2017-03-12 12:26:57 -0700: > On Sun, 12 Mar 2017 12:06:32 -0700, Jun Wu wrote: > > Excerpts from Yuya Nishihara's message of 2017-03-12 11:58:40 -0700: > > > > That's good to know. I actually had 3 patches to move it to vfs, but was > > > > concerned about the future "repostorage" layer holding vfs objects that > > > > won't get invalidated across chg workers. > > > > > > If the filesystem at the cached repository location changed, cache should be > > > invalidated and repostorage would be recreated anyway. > > > > Even if location does not change, the user can run "mount" to change the > > "mtab" that affects us. > > My point is that the repository should be considered changed if he mounted > another filesystem onto that location. vfs caches cansymlink, execflag, etc., > which are all invalid. My very early idea about chgcache is that we cache things at a very low-level and vfs is only used to help calculate the hash, vfs is not cached. It's like: # in master's preload process def preloadfunc(repo, ....) hash = repo.vfs.... # vfs is constructed by the preloading framework # repo is a faked one which will be dropped # in the worker def preloading(repo, ...) hash = repo.vfs.... # vfs is new, different from the above # repo is the real one To be clear, it seems there are two objects that "repostorage" may want to do: 1. A cheap object with vfs, svfs etc. helping to calculate hashes, but is designed to be dropped by the preloading process and not reused by workers. My current plan is to just use a "faked" localrepository object in the master. With some methods patched to reduce features. It seems that may be actually cleaner as the faked repo cannot be reused, and vfs will be dropped correctly. 2. An object providing accesses to chg caches. That is the "chgcacche.repocache" object in this series. > > > Querying "mtab" confidently is expensive. And "mtab" can answer all vfs > > queries. So I think it's better to cache "mtab" per process, instead of > > caching "fstype" per vfs. > > I won't object to it if it can be implemented cleanly.
Patch
diff --git a/mercurial/posix.py b/mercurial/posix.py --- a/mercurial/posix.py +++ b/mercurial/posix.py @@ -658,2 +658,40 @@ def bindunixsocket(sock, path): os.fchdir(bakwdfd) os.close(bakwdfd) + +if pycompat.sysplatform.startswith('linux'): + # for Linux, reading /etc/mtab (symlink to /proc/self/mounts) is a sane way + # to get the current filesystem mount information + def _getfstypetable(): + result = {} + try: + with open('/etc/mtab', 'r') as f: + for line in f.read().splitlines(): + name, path, fs, ops, freq, passno = line.split(' ', 5) + result[path] = fs + except OSError: + # /etc/mtab is not guaranteed available + pass + return result +else: + # unknown platform + def _getfstypetable(): + return {} + +# cache _getfstypetable() to avoid repetitive expensive queries, it assumes +# mount table does not change during the lifetime of the process, which is +# reasonable for short-lived process +_fstypetable = None + +def getfstype(path): + """Given a path, return filesystem type or None (best-effort)""" + global _fstypetable + if _fstypetable is None: + _fstypetable = _getfstypetable() + nextpath = os.path.abspath(path) + while True: + path = nextpath + if path in _fstypetable: + return _fstypetable[path] + nextpath = os.path.dirname(path) + if nextpath == path: + return None diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -93,4 +93,5 @@ expandglobs = platform.expandglobs explainexit = platform.explainexit findexe = platform.findexe +getfstype = platform.getfstype gethgcmd = platform.gethgcmd getuser = platform.getuser diff --git a/mercurial/windows.py b/mercurial/windows.py --- a/mercurial/windows.py +++ b/mercurial/windows.py @@ -478,2 +478,7 @@ def readpipe(pipe): def bindunixsocket(sock, path): raise NotImplementedError('unsupported platform') + +def getfstype(path): + """Given a path, return filesystem type or None (best-effort)""" + # not implemented for Windows + return None