Patchwork [V2,RESEND] win32: improve the performance of win32.unlink() over CIFS

login
register
mail settings
Submitter Kaz Nishimura
Date Feb. 17, 2014, 3:02 a.m.
Message ID <95a976e04589e3d33f69.1392606176@Canopus.local.linuxfront.com>
Download mbox | patch
Permalink /patch/3681/
State Accepted
Commit 1a9ebc83a74c7e7872006491bd984997a9348031
Headers show

Comments

Kaz Nishimura - Feb. 17, 2014, 3:02 a.m.
# HG changeset patch
# User Kaz Nishimura <kazssym@vx68k.org>
# Date 1381984037 -32400
#      Thu Oct 17 13:27:17 2013 +0900
# Node ID 95a976e04589e3d33f69805dff56a249bee250ae
# Parent  0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd
win32: improve the performance of win32.unlink() over CIFS

Emulating POSIX unlink() behavior with os.rename() and os.unlink() is often slow
especially over CIFS from Windows clients due to its protocol overhead. This
patch changes win32.unlink() to try first an exclusive open with the Win32
delete-on-close flag, and if a sharing violation is detected, to fall back to
the original emulation.

This patch also removes a test with os.path.isdir() since we expect opening a
directory shall fail as os.unlink() would.

Example measurements (repeated 3-times after 1-time calibration):

(Without this patch: hg update from null to default)
127 files updated, 0 files merged, 0 files removed, 0 files unresolved
time: real 19.871 secs (user 0.328+0.000 sys 1.794+0.000)
time: real 19.622 secs (user 0.312+0.000 sys 2.044+0.000)
time: real 19.138 secs (user 0.250+0.000 sys 1.872+0.000)

(Without this patch: hg update from default to null)
0 files updated, 0 files merged, 127 files removed, 0 files unresolved
time: real 35.158 secs (user 0.156+0.000 sys 2.512+0.000)
time: real 35.272 secs (user 0.250+0.000 sys 2.512+0.000)
time: real 36.569 secs (user 0.203+0.000 sys 2.387+0.000)

(With this patch: hg update from null to default)
127 files updated, 0 files merged, 0 files removed, 0 files unresolved
time: real 17.893 secs (user 0.328+0.000 sys 1.700+0.000)
time: real 18.512 secs (user 0.265+0.000 sys 1.529+0.000)
time: real 20.238 secs (user 0.312+0.000 sys 1.685+0.000)

(With this patch: hg update from default to null)
0 files updated, 0 files merged, 127 files removed, 0 files unresolved
time: real 12.312 secs (user 0.250+0.000 sys 0.811+0.000)
time: real 12.471 secs (user 0.156+0.000 sys 0.889+0.000)
time: real 9.727 secs (user 0.125+0.000 sys 0.858+0.000)
Simon Heimberg - Feb. 19, 2014, 7:47 p.m.
On 2014-02-17 04:02, Kaz Nishimura:
> # HG changeset patch
> # User Kaz Nishimura <kazssym@vx68k.org>
> # Date 1381984037 -32400
> #      Thu Oct 17 13:27:17 2013 +0900
> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae
> # Parent  0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd
> win32: improve the performance of win32.unlink() over CIFS
>
> Emulating POSIX unlink() behavior with os.rename() and os.unlink() is often slow
> especially over CIFS from Windows clients due to its protocol overhead. This
> patch changes win32.unlink() to try first an exclusive open with the Win32
> delete-on-close flag, and if a sharing violation is detected, to fall back to
> the original emulation.

LGTM, only some minor details. No new test failure on my windows machine.
And it seems to run quicker on my machine too, on a local hard drive. (I 
did not try on CIFS.) Not really sure about the times, they vary too 
mucht. My results are at the end.
I checked that the new case is used by printing some output else.

>
> This patch also removes a test with os.path.isdir() since we expect opening a
> directory shall fail as os.unlink() would.

No knowledge about this. I guess os.path.isdir really is unnecessary. 
Maybe move this in a separate 1st patch.

>
> Example measurements (repeated 3-times after 1-time calibration):
>
> (Without this patch: hg update from null to default)
> 127 files updated, 0 files merged, 0 files removed, 0 files unresolved
> time: real 19.871 secs (user 0.328+0.000 sys 1.794+0.000)
> time: real 19.622 secs (user 0.312+0.000 sys 2.044+0.000)
> time: real 19.138 secs (user 0.250+0.000 sys 1.872+0.000)
>
You could add the hash or a tag instead of "default". (Minor detail. You 
did this in a follow up mail.) Did you mention it is the mercurial 
source repo?

> (Without this patch: hg update from default to null)
> 0 files updated, 0 files merged, 127 files removed, 0 files unresolved
> time: real 35.158 secs (user 0.156+0.000 sys 2.512+0.000)
> time: real 35.272 secs (user 0.250+0.000 sys 2.512+0.000)
> time: real 36.569 secs (user 0.203+0.000 sys 2.387+0.000)
>
> (With this patch: hg update from null to default)
> 127 files updated, 0 files merged, 0 files removed, 0 files unresolved
> time: real 17.893 secs (user 0.328+0.000 sys 1.700+0.000)
> time: real 18.512 secs (user 0.265+0.000 sys 1.529+0.000)
> time: real 20.238 secs (user 0.312+0.000 sys 1.685+0.000)
>
> (With this patch: hg update from default to null)
> 0 files updated, 0 files merged, 127 files removed, 0 files unresolved
> time: real 12.312 secs (user 0.250+0.000 sys 0.811+0.000)
> time: real 12.471 secs (user 0.156+0.000 sys 0.889+0.000)
> time: real 9.727 secs (user 0.125+0.000 sys 0.858+0.000)
>
> diff -r 0e2877f8605d -r 95a976e04589 mercurial/win32.py
> --- a/mercurial/win32.py
> +++ b/mercurial/win32.py

running in a mercurial source repo
$ time hg up -r $REV

with this patch
$ hg up -r 0e2877f8605d
1038 files updated, 0 files merged, 0 files removed, 0 files unresolved

real	0m5.946s user	0m0.000s sys	0m0.015s	#high load?
real	0m6.246s user	0m0.000s sys	0m0.015s	#high load?
real	0m3.198s user	0m0.015s sys	0m0.000s
real	0m2.387s user	0m0.000s sys	0m0.030s

$ hg up -r null
0 files updated, 0 files merged, 1038 files removed, 0 files unresolved

real	0m1.033s user	0m0.016s sys	0m0.031s
real	0m1.006s user	0m0.015s sys	0m0.015s
real	0m2.102s user	0m0.000s sys	0m0.030s
real	0m0.850s user	0m0.030s sys	0m0.015s


without this patch
$ hg up -r 0e2877f8605d
1038 files updated, 0 files merged, 0 files removed, 0 files unresolved

real	0m3.222s user	0m0.000s sys	0m0.046s
real	0m2.475s user	0m0.030s sys	0m0.000s
real	0m4.500s user	0m0.000s sys	0m0.030s
real	0m2.709s user	0m0.000s sys	0m0.030s
         (was 32s once!)

$ hg up -r null
0 files updated, 0 files merged, 1038 files removed, 0 files unresolved

real	0m6.435s user	0m0.000s sys	0m0.015s
real	0m2.565s user	0m0.000s sys	0m0.015s
real	0m2.400s user	0m0.000s sys	0m0.030s
real	0m2.559s user	0m0.000s sys	0m0.030s
Matt Mackall - Feb. 19, 2014, 10:03 p.m.
On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote:
> # HG changeset patch
> # User Kaz Nishimura <kazssym@vx68k.org>
> # Date 1381984037 -32400
> #      Thu Oct 17 13:27:17 2013 +0900
> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae
> # Parent  0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd
> win32: improve the performance of win32.unlink() over CIFS

Queued for default, thanks.
Adrian Buehlmann - Feb. 20, 2014, 11:02 a.m.
On 2014-02-19 23:03, Matt Mackall wrote:
> On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote:
>> # HG changeset patch
>> # User Kaz Nishimura <kazssym@vx68k.org>
>> # Date 1381984037 -32400
>> #      Thu Oct 17 13:27:17 2013 +0900
>> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae
>> # Parent  0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd
>> win32: improve the performance of win32.unlink() over CIFS
> 
> Queued for default, thanks.
> 

Perhaps this is worth the risk.

Let me just note that opening a file in exclusive mode possibly prevents
deletion of other hardlinks to this file for as long as the file is held open.

As per the comment in win32.unlink:

    # - Calling os.unlink (or os.rename) on a file f fails if f or any
    #   hardlinked copy of f has been opened with Python's open(). There is no
    #   way such a file can be deleted or renamed on Windows (other than
    #   scheduling the delete or rename for the next reboot).

and:

  http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows
Kaz Nishimura - Feb. 20, 2014, 12:22 p.m.
This patch closes the file handle immediately after we successfully open
the file exclusively for deletion, so the file does not stay open so long
and I hope we can ignore the chance that another process tries to delete
other hard links to the same file at the same time.  Even if it could
happen, it would not be so worse as any other processes might have open
handles for the file anyway, in which case DeleteFile could also fail to
delete the file.



On Thu, Feb 20, 2014 at 8:02 PM, Adrian Buehlmann <adrian@cadifra.com>wrote:

> On 2014-02-19 23:03, Matt Mackall wrote:
> > On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote:
> >> # HG changeset patch
> >> # User Kaz Nishimura <kazssym@vx68k.org>
> >> # Date 1381984037 -32400
> >> #      Thu Oct 17 13:27:17 2013 +0900
> >> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae
> >> # Parent  0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd
> >> win32: improve the performance of win32.unlink() over CIFS
> >
> > Queued for default, thanks.
> >
>
> Perhaps this is worth the risk.
>
> Let me just note that opening a file in exclusive mode possibly prevents
> deletion of other hardlinks to this file for as long as the file is held
> open.
>
> As per the comment in win32.unlink:
>
>     # - Calling os.unlink (or os.rename) on a file f fails if f or any
>     #   hardlinked copy of f has been opened with Python's open(). There
> is no
>     #   way such a file can be deleted or renamed on Windows (other than
>     #   scheduling the delete or rename for the next reboot).
>
> and:
>
>   http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows
>
Adrian Buehlmann - Feb. 20, 2014, 3:34 p.m.
On 2014-02-20 13:22, Kaz Nishimura wrote:
> 
> On Thu, Feb 20, 2014 at 8:02 PM, Adrian Buehlmann <adrian@cadifra.com
> <mailto:adrian@cadifra.com>> wrote:
> 
>     On 2014-02-19 23:03, Matt Mackall wrote:
>     > On Mon, 2014-02-17 at 12:02 +0900, Kaz Nishimura wrote:
>     >> # HG changeset patch
>     >> # User Kaz Nishimura <kazssym@vx68k.org <mailto:kazssym@vx68k.org>>
>     >> # Date 1381984037 -32400
>     >> #      Thu Oct 17 13:27:17 2013 +0900
>     >> # Node ID 95a976e04589e3d33f69805dff56a249bee250ae
>     >> # Parent  0e2877f8605dcaf4fdf2ab7e0046f1f6f80161dd
>     >> win32: improve the performance of win32.unlink() over CIFS
>     >
>     > Queued for default, thanks.
>     >
> 
>     Perhaps this is worth the risk.
> 
>     Let me just note that opening a file in exclusive mode possibly prevents
>     deletion of other hardlinks to this file for as long as the file is
>     held open.
> 
>     As per the comment in win32.unlink:
> 
>         # - Calling os.unlink (or os.rename) on a file f fails if f or any
>         #   hardlinked copy of f has been opened with Python's open().
>     There is no
>         #   way such a file can be deleted or renamed on Windows (other than
>         #   scheduling the delete or rename for the next reboot).
> 
>     and:
> 
>       http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows
> 
> 
> This patch closes the file handle immediately after we successfully open
> the file exclusively for deletion, so the file does not stay open so
> long and I hope we can ignore the chance that another process tries to
> delete other hard links to the same file at the same time.  Even if it
> could happen, it would not be so worse as any other processes might have
> open handles for the file anyway, in which case DeleteFile could also
> fail to delete the file.

(please don't top post)

On Windows, Mercurial - in general - opens files with

  			    FILE_SHARE_READ | FILE_SHARE_WRITE |
			    FILE_SHARE_DELETE,

(see call to CreateFile() API in osutil.c on line 534)

So no, Mercurial in general deliberately opens files in a way which allows
rename and delete while the file is open (by another Mercurial process).
We did this for a reason.

What you (possibly) introduced now is an - albeit - small chance that another
process, which is trying to delete a hardlink, will fail.
Kaz Nishimura - Feb. 20, 2014, 5:13 p.m.
To make a file completely deleted, we must have the last open file handle.
If we open the file non-exclusively, we cannot assure that and must always
use slower rename and unlink.  If there is a way to make sure we have the
last open handle, we can avoid exclusive open totally.  Can you suggest any?

Win32 API is unclear about hard links and the effect of the DELETE_ON_CLOSE
flag.  If the flag is set and all open handles are closed, the file must be
deleted according to the description, but if a handle is opened from
another hard link, nothing is described about what happens.  I could be
confused.  Apparently the flag must be set on the hard link not on the file
itself.

We could add FLAG_SHARE_DELETE to make deletion of other hard links
possible, but if any file handles with DELETE access might remain too long
(I think unlikely), the file name (or hard link) can remain longer than we
want.  Any solution here?
Adrian Buehlmann - Feb. 20, 2014, 6:08 p.m.
On 2014-02-20 18:13, Kaz Nishimura wrote:
> To make a file completely deleted, we must have the last open file
> handle.  If we open the file non-exclusively, we cannot assure that and
> must always use slower rename and unlink.  If there is a way to make
> sure we have the last open handle, we can avoid exclusive open totally. 
> Can you suggest any?
> 
> Win32 API is unclear about hard links and the effect of the
> DELETE_ON_CLOSE flag.  If the flag is set and all open handles are
> closed, the file must be deleted according to the description, but if a
> handle is opened from another hard link, nothing is described about what
> happens.  I could be confused.  Apparently the flag must be set on the
> hard link not on the file itself.
> 
> We could add FLAG_SHARE_DELETE to make deletion of other hard links
> possible, but if any file handles with DELETE access might remain too
> long (I think unlikely), the file name (or hard link) can remain longer
> than we want.  Any solution here?

Back then when I wrote

  http://mercurial.selenic.com/wiki/UnlinkingFilesOnWindows

I think I found out a couple of those surprising (and silly..) aspects
by experimenting, not by reading docs. I think especially the amazing
things about hardlinks.

I'm currently not that much interested in using recent versions of
Mercurial, so I can't really help you that much taking decisions here. I
just responded because I was wrestling with these kind of things in the
past. Perhaps you can safely ignore my comments and just keep them in
mind in case problems should appear.
Kaz Nishimura - Feb. 21, 2014, 3:54 a.m.
I have confirmed with the following test program that exclusive open
actually denies deletion of a hard link to the same file.

This program just (1) creates normal file foo, (2) creates hard link bar
for foo, (3) opens foo for deletion, (4) waits for input, and (5) close the
handle returned in (3).  I checked if I could delete bar at (4).

If I changed the share mode to FILE_SHARE_DELETE, deletion and rename of
bar is not denied.  I am not sure if it is OK with that share mode to
delete a file name immediately.

#include <windows.h>
#include <cstdio>
using namespace std;

//#define SHARE_MODE 0
#define SHARE_MODE FILE_SHARE_DELETE

int main(void) {
    HANDLE fh = CreateFile("foo", GENERIC_READ | GENERIC_WRITE,
            FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL,
            CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
    if (fh == INVALID_HANDLE_VALUE) {
        fprintf(stderr, "%s: CreateFile failed (error = %lu)\n", "foo",
                GetLastError());
        return 1;
    }
    CloseHandle(fh);

    if (!CreateHardLink("bar", "foo", NULL)) {
        fprintf(stderr, "%s: CreateHardLink failed (error = %lu)\n", "bar",
                GetLastError());
        return 1;
    }

    fh = CreateFile("foo", DELETE, SHARE_MODE, NULL, OPEN_EXISTING,
            FILE_FLAG_OPEN_REPARSE_POINT | FILE_FLAG_DELETE_ON_CLOSE, NULL);
    if (fh == INVALID_HANDLE_VALUE) {
        fprintf(stderr, "%s: CreateFile failed (error = %lu)\n", "foo",
                GetLastError());
    }

    printf("foo is now open\n");

    char line[80];
    fgets(line, 80, stdin);

    CloseHandle(fh);
    return 0;
}

Patch

diff -r 0e2877f8605d -r 95a976e04589 mercurial/win32.py
--- a/mercurial/win32.py
+++ b/mercurial/win32.py
@@ -25,4 +25,5 @@ 
 # GetLastError
 _ERROR_SUCCESS = 0
+_ERROR_SHARING_VIOLATION = 32
 _ERROR_INVALID_PARAMETER = 87
 _ERROR_INSUFFICIENT_BUFFER = 122
@@ -60,5 +61,7 @@ 
 _OPEN_EXISTING = 3
 
+_FILE_FLAG_OPEN_REPARSE_POINT = 0x00200000
 _FILE_FLAG_BACKUP_SEMANTICS = 0x02000000
+_FILE_FLAG_DELETE_ON_CLOSE = 0x04000000
 
 # SetFileAttributes
@@ -421,9 +424,16 @@ 
     '''try to implement POSIX' unlink semantics on Windows'''
 
-    if os.path.isdir(f):
-        # use EPERM because it is POSIX prescribed value, even though
-        # unlink(2) on directories returns EISDIR on Linux
-        raise IOError(errno.EPERM,
-                      "Unlinking directory not permitted: '%s'" % f)
+    # If we can open f exclusively, no other processes must have open handles
+    # for it and we can expect its name will be deleted immediately when we
+    # close the handle unless we have another in the same process.  We also
+    # expect we shall simply fail to open f if it is a directory.
+    fh = _kernel32.CreateFileA(f, 0, 0, None, _OPEN_EXISTING,
+        _FILE_FLAG_OPEN_REPARSE_POINT | _FILE_FLAG_DELETE_ON_CLOSE, None)
+    if fh != _INVALID_HANDLE_VALUE:
+        _kernel32.CloseHandle(fh)
+        return
+    error = _kernel32.GetLastError()
+    if error != _ERROR_SHARING_VIOLATION:
+        raise ctypes.WinError(error)
 
     # POSIX allows to unlink and rename open files. Windows has serious