Submitter | Kaz Nishimura |
---|---|
Date | Feb. 17, 2014, 3:02 a.m. |
Message ID | <95a976e04589e3d33f69.1392606176@Canopus.local.linuxfront.com> |
Download | mbox | patch |
Permalink | /patch/3681/ |
State | Accepted |
Commit | 1a9ebc83a74c7e7872006491bd984997a9348031 |
Headers | show |
Comments
On 2014-02-17 04:02, Kaz Nishimura: > # HG changeset patch > # User Kaz Nishimura <kazssym@vx68k.org> > # Date 1381984037 -32400 > # Thu Oct 17 13:27:17 2013 +0900 > # Node ID 95a976e04589e3d33f69805dff56a249bee250ae > # Parent 0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd > win32: improve the performance of win32.unlink() over CIFS > > Emulating POSIX unlink() behavior with os.rename() and os.unlink() is often slow > especially over CIFS from Windows clients due to its protocol overhead. This > patch changes win32.unlink() to try first an exclusive open with the Win32 > delete-on-close flag, and if a sharing violation is detected, to fall back to > the original emulation. LGTM, only some minor details. No new test failure on my windows machine. And it seems to run quicker on my machine too, on a local hard drive. (I did not try on CIFS.) Not really sure about the times, they vary too mucht. My results are at the end. I checked that the new case is used by printing some output else. > > This patch also removes a test with os.path.isdir() since we expect opening a > directory shall fail as os.unlink() would. No knowledge about this. I guess os.path.isdir really is unnecessary. Maybe move this in a separate 1st patch. > > Example measurements (repeated 3-times after 1-time calibration): > > (Without this patch: hg update from null to default) > 127 files updated, 0 files merged, 0 files removed, 0 files unresolved > time: real 19.871 secs (user 0.328+0.000 sys 1.794+0.000) > time: real 19.622 secs (user 0.312+0.000 sys 2.044+0.000) > time: real 19.138 secs (user 0.250+0.000 sys 1.872+0.000) > You could add the hash or a tag instead of "default". (Minor detail. You did this in a follow up mail.) Did you mention it is the mercurial source repo? > (Without this patch: hg update from default to null) > 0 files updated, 0 files merged, 127 files removed, 0 files unresolved > time: real 35.158 secs (user 0.156+0.000 sys 2.512+0.000) > time: real 35.272 secs (user 0.250+0.000 sys 2.512+0.000) > time: real 36.569 secs (user 0.203+0.000 sys 2.387+0.000) > > (With this patch: hg update from null to default) > 127 files updated, 0 files merged, 0 files removed, 0 files unresolved > time: real 17.893 secs (user 0.328+0.000 sys 1.700+0.000) > time: real 18.512 secs (user 0.265+0.000 sys 1.529+0.000) > time: real 20.238 secs (user 0.312+0.000 sys 1.685+0.000) > > (With this patch: hg update from default to null) > 0 files updated, 0 files merged, 127 files removed, 0 files unresolved > time: real 12.312 secs (user 0.250+0.000 sys 0.811+0.000) > time: real 12.471 secs (user 0.156+0.000 sys 0.889+0.000) > time: real 9.727 secs (user 0.125+0.000 sys 0.858+0.000) > > diff -r 0e2877f8605d -r 95a976e04589 mercurial/win32.py > --- a/mercurial/win32.py > +++ b/mercurial/win32.py running in a mercurial source repo $ time hg up -r $REV with this patch $ hg up -r 0e2877f8605d 1038 files updated, 0 files merged, 0 files removed, 0 files unresolved real 0m5.946s user 0m0.000s sys 0m0.015s #high load? real 0m6.246s user 0m0.000s sys 0m0.015s #high load? real 0m3.198s user 0m0.015s sys 0m0.000s real 0m2.387s user 0m0.000s sys 0m0.030s $ hg up -r null 0 files updated, 0 files merged, 1038 files removed, 0 files unresolved real 0m1.033s user 0m0.016s sys 0m0.031s real 0m1.006s user 0m0.015s sys 0m0.015s real 0m2.102s user 0m0.000s sys 0m0.030s real 0m0.850s user 0m0.030s sys 0m0.015s without this patch $ hg up -r 0e2877f8605d 1038 files updated, 0 files merged, 0 files removed, 0 files unresolved real 0m3.222s user 0m0.000s sys 0m0.046s real 0m2.475s user 0m0.030s sys 0m0.000s real 0m4.500s user 0m0.000s sys 0m0.030s real 0m2.709s user 0m0.000s sys 0m0.030s (was 32s once!) $ hg up -r null 0 files updated, 0 files merged, 1038 files removed, 0 files unresolved real 0m6.435s user 0m0.000s sys 0m0.015s real 0m2.565s user 0m0.000s sys 0m0.015s real 0m2.400s user 0m0.000s sys 0m0.030s real 0m2.559s user 0m0.000s sys 0m0.030s
On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote: > # HG changeset patch > # User Kaz Nishimura <kazssym@vx68k.org> > # Date 1381984037 -32400 > # Thu Oct 17 13:27:17 2013 +0900 > # Node ID 95a976e04589e3d33f69805dff56a249bee250ae > # Parent 0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd > win32: improve the performance of win32.unlink() over CIFS Queued for default, thanks.
On 2014-02-19 23:03, Matt Mackall wrote: > On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote: >> # HG changeset patch >> # User Kaz Nishimura <kazssym@vx68k.org> >> # Date 1381984037 -32400 >> # Thu Oct 17 13:27:17 2013 +0900 >> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae >> # Parent 0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd >> win32: improve the performance of win32.unlink() over CIFS > > Queued for default, thanks. > Perhaps this is worth the risk. Let me just note that opening a file in exclusive mode possibly prevents deletion of other hardlinks to this file for as long as the file is held open. As per the comment in win32.unlink: # - Calling os.unlink (or os.rename) on a file f fails if f or any # hardlinked copy of f has been opened with Python's open(). There is no # way such a file can be deleted or renamed on Windows (other than # scheduling the delete or rename for the next reboot). and: http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows
This patch closes the file handle immediately after we successfully open the file exclusively for deletion, so the file does not stay open so long and I hope we can ignore the chance that another process tries to delete other hard links to the same file at the same time. Even if it could happen, it would not be so worse as any other processes might have open handles for the file anyway, in which case DeleteFile could also fail to delete the file. On Thu, Feb 20, 2014 at 8:02 PM, Adrian Buehlmann <adrian@cadifra.com>wrote: > On 2014-02-19 23:03, Matt Mackall wrote: > > On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote: > >> # HG changeset patch > >> # User Kaz Nishimura <kazssym@vx68k.org> > >> # Date 1381984037 -32400 > >> # Thu Oct 17 13:27:17 2013 +0900 > >> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae > >> # Parent 0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd > >> win32: improve the performance of win32.unlink() over CIFS > > > > Queued for default, thanks. > > > > Perhaps this is worth the risk. > > Let me just note that opening a file in exclusive mode possibly prevents > deletion of other hardlinks to this file for as long as the file is held > open. > > As per the comment in win32.unlink: > > # - Calling os.unlink (or os.rename) on a file f fails if f or any > # hardlinked copy of f has been opened with Python's open(). There > is no > # way such a file can be deleted or renamed on Windows (other than > # scheduling the delete or rename for the next reboot). > > and: > > http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows >
On 2014-02-20 13:22, Kaz Nishimura wrote: > > On Thu, Feb 20, 2014 at 8:02 PM, Adrian Buehlmann <adrian@cadifra.com > <mailto:adrian@cadifra.com>> wrote: > > On 2014-02-19 23:03, Matt Mackall wrote: > > On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote: > >> # HG changeset patch > >> # User Kaz Nishimura <kazssym@vx68k.org <mailto:kazssym@vx68k.org>> > >> # Date 1381984037 -32400 > >> # Thu Oct 17 13:27:17 2013 +0900 > >> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae > >> # Parent 0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd > >> win32: improve the performance of win32.unlink() over CIFS > > > > Queued for default, thanks. > > > > Perhaps this is worth the risk. > > Let me just note that opening a file in exclusive mode possibly prevents > deletion of other hardlinks to this file for as long as the file is > held open. > > As per the comment in win32.unlink: > > # - Calling os.unlink (or os.rename) on a file f fails if f or any > # hardlinked copy of f has been opened with Python's open(). > There is no > # way such a file can be deleted or renamed on Windows (other than > # scheduling the delete or rename for the next reboot). > > and: > > http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows > > > This patch closes the file handle immediately after we successfully open > the file exclusively for deletion, so the file does not stay open so > long and I hope we can ignore the chance that another process tries to > delete other hard links to the same file at the same time. Even if it > could happen, it would not be so worse as any other processes might have > open handles for the file anyway, in which case DeleteFile could also > fail to delete the file. (please don't top post) On Windows, Mercurial - in general - opens files with FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, (see call to CreateFile() API in osutil.c on line 534) So no, Mercurial in general deliberately opens files in a way which allows rename and delete while the file is open (by another Mercurial process). We did this for a reason. What you (possibly) introduced now is an - albeit - small chance that another process, which is trying to delete a hardlink, will fail.
To make a file completely deleted, we must have the last open file handle. If we open the file non-exclusively, we cannot assure that and must always use slower rename and unlink. If there is a way to make sure we have the last open handle, we can avoid exclusive open totally. Can you suggest any? Win32 API is unclear about hard links and the effect of the DELETE_ON_CLOSE flag. If the flag is set and all open handles are closed, the file must be deleted according to the description, but if a handle is opened from another hard link, nothing is described about what happens. I could be confused. Apparently the flag must be set on the hard link not on the file itself. We could add FLAG_SHARE_DELETE to make deletion of other hard links possible, but if any file handles with DELETE access might remain too long (I think unlikely), the file name (or hard link) can remain longer than we want. Any solution here?
On 2014-02-20 18:13, Kaz Nishimura wrote: > To make a file completely deleted, we must have the last open file > handle. If we open the file non-exclusively, we cannot assure that and > must always use slower rename and unlink. If there is a way to make > sure we have the last open handle, we can avoid exclusive open totally. > Can you suggest any? > > Win32 API is unclear about hard links and the effect of the > DELETE_ON_CLOSE flag. If the flag is set and all open handles are > closed, the file must be deleted according to the description, but if a > handle is opened from another hard link, nothing is described about what > happens. I could be confused. Apparently the flag must be set on the > hard link not on the file itself. > > We could add FLAG_SHARE_DELETE to make deletion of other hard links > possible, but if any file handles with DELETE access might remain too > long (I think unlikely), the file name (or hard link) can remain longer > than we want. Any solution here? Back then when I wrote http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows I think I found out a couple of those surprising (and silly..) aspects by experimenting, not by reading docs. I think especially the amazing things about hardlinks. I'm currently not that much interested in using recent versions of Mercurial, so I can't really help you that much taking decisions here. I just responded because I was wrestling with these kind of things in the past. Perhaps you can safely ignore my comments and just keep them in mind in case problems should appear.
I have confirmed with the following test program that exclusive open actually denies deletion of a hard link to the same file. This program just (1) creates normal file foo, (2) creates hard link bar for foo, (3) opens foo for deletion, (4) waits for input, and (5) close the handle returned in (3). I checked if I could delete bar at (4). If I changed the share mode to FILE_SHARE_DELETE, deletion and rename of bar is not denied. I am not sure if it is OK with that share mode to delete a file name immediately. #include <windows.h> #include <cstdio> using namespace std; //#define SHARE_MODE 0 #define SHARE_MODE FILE_SHARE_DELETE int main(void) { HANDLE fh = CreateFile("foo", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); if (fh == INVALID_HANDLE_VALUE) { fprintf(stderr, "%s: CreateFile failed (error = %lu)\n", "foo", GetLastError()); return 1; } CloseHandle(fh); if (!CreateHardLink("bar", "foo", NULL)) { fprintf(stderr, "%s: CreateHardLink failed (error = %lu)\n", "bar", GetLastError()); return 1; } fh = CreateFile("foo", DELETE, SHARE_MODE, NULL, OPEN_EXISTING, FILE_FLAG_OPEN_REPARSE_POINT | FILE_FLAG_DELETE_ON_CLOSE, NULL); if (fh == INVALID_HANDLE_VALUE) { fprintf(stderr, "%s: CreateFile failed (error = %lu)\n", "foo", GetLastError()); } printf("foo is now open\n"); char line[80]; fgets(line, 80, stdin); CloseHandle(fh); return 0; }
Patch
diff -r 0e2877f8605d -r 95a976e04589 mercurial/win32.py --- a/mercurial/win32.py +++ b/mercurial/win32.py @@ -25,4 +25,5 @@ # GetLastError _ERROR_SUCCESS = 0 +_ERROR_SHARING_VIOLATION = 32 _ERROR_INVALID_PARAMETER = 87 _ERROR_INSUFFICIENT_BUFFER = 122 @@ -60,5 +61,7 @@ _OPEN_EXISTING = 3 +_FILE_FLAG_OPEN_REPARSE_POINT = 0x00200000 _FILE_FLAG_BACKUP_SEMANTICS = 0x02000000 +_FILE_FLAG_DELETE_ON_CLOSE = 0x04000000 # SetFileAttributes @@ -421,9 +424,16 @@ '''try to implement POSIX' unlink semantics on Windows''' - if os.path.isdir(f): - # use EPERM because it is POSIX prescribed value, even though - # unlink(2) on directories returns EISDIR on Linux - raise IOError(errno.EPERM, - "Unlinking directory not permitted: '%s'" % f) + # If we can open f exclusively, no other processes must have open handles + # for it and we can expect its name will be deleted immediately when we + # close the handle unless we have another in the same process. We also + # expect we shall simply fail to open f if it is a directory. + fh = _kernel32.CreateFileA(f, 0, 0, None, _OPEN_EXISTING, + _FILE_FLAG_OPEN_REPARSE_POINT | _FILE_FLAG_DELETE_ON_CLOSE, None) + if fh != _INVALID_HANDLE_VALUE: + _kernel32.CloseHandle(fh) + return + error = _kernel32.GetLastError() + if error != _ERROR_SHARING_VIOLATION: + raise ctypes.WinError(error) # POSIX allows to unlink and rename open files. Windows has serious