Patchwork [2,of,4] parsers: write dirstate starting with non-normal entries

login
register
mail settings
Submitter Laurent Charignon
Date Nov. 30, 2015, 5:53 p.m.
Message ID <248A8D18-5218-4EA7-A0C6-97EA19D5AFAB@fb.com>
Download mbox | patch
Permalink /patch/11671/
State Not Applicable
Headers show

Comments

Laurent Charignon - Nov. 30, 2015, 5:53 p.m.
On Nov 25, 2015, at 6:24 AM, Yuya Nishihara <yuya@tcha.org<mailto:yuya@tcha.org>> wrote:

On Tue, 24 Nov 2015 17:19:09 -0800, Laurent Charignon wrote:
# HG changeset patch
# User Laurent Charignon <lcharignon@fb.com<mailto:lcharignon@fb.com>>
# Date 1448413597 28800
#      Tue Nov 24 17:06:37 2015 -0800
# Node ID ea9d03d4e85ea3949bb8d16bd9e1a80246a8247b
# Parent  3bd86861a1618aabe6ec7f2cde1223282f9569be
parsers: write dirstate starting with non-normal entries

Before this patch we were writing the dirstate entries in a "random" way,
following the *unstable* order of a Python dictionary. This patch changes the
order in which we write the dirstate entries.

We now start with the non-normal files (that have changed and likely to have
changed) and end with the normal files. This makes the job of hg status easier
as, in most cases, it will need to access the non-normal entries of the
dirstate. This new ordering allows hg status to stop iterating over the dirstate
after processing those entries.

On our large repos, for hg status, we achieve a 40% improvement.
On the same repo, the cost of this change is a slowdown for writing the
dirstate to disk (as we do two passes). I measured the execution time of
hg debugrebuilddirstate with and without the change and observed a 5% slowdown
for the overall command.

+  * status performance as status generally only need the non normal files */
+ for (pass = 0; pass <= 1; pass++) {
for (pos = 0; PyDict_Next(map, &pos, &k, &v); ) {
dirstateTupleObject *tuple;
char state;
@@ -610,6 +613,7 @@
Py_ssize_t len, l;
PyObject *o;
char *t;
+ int normal;

if (!dirstate_tuple_check(v)) {
PyErr_SetString(PyExc_TypeError,
@@ -622,6 +626,9 @@
mode = tuple->mode;
size = tuple->size;
mtime = tuple->mtime;
+ normal = (state == 'n' && mtime != -1);
+ if (normal != pass)
+ continue;

As we have the pass to figure out memory size before the "first pass", maybe
we can know the offset to the first "normal" file beforehand. Then, we won't
need the "second pass".

I don't really see how that could work. We can indeed know the offset (pos here) to the first normal file beforehand. However, it is very likely to still have us go through most of the dict again since number of non-normal files is typically way smaller than number of normal files isn't it?
Yuya Nishihara - Dec. 1, 2015, 12:03 p.m.
On Mon, 30 Nov 2015 17:53:28 +0000, Laurent Charignon wrote:
>   As we have the pass to figure out memory size before the "first pass", maybe
>   we can know the offset to the first "normal" file beforehand. Then, we won't
>   need the "second pass".
> 
> I don't really see how that could work. We can indeed know the offset
> (pos here) to the first normal file beforehand. However, it is very likely
> to still have us go through most of the dict again since number of non-normal
> files is typically way smaller than number of normal files isn't it?

Can't it be something like this?

    /* 0th pass: figure out how much we need to allocate, per normal or not
       nbytes[2] = { non_normal, normal } */
    ...
    packobj = PyString_FromStringAndSize(NULL, nbytes[0] + nbytes[1]);
    ps[0] = packobj;
    ps[1] = packobj + nbytes[0];

    /* 1st pass: pack into either non normal area or normal area */
    for (...) {
        ...
        ps[normal] += pack_entry(ps[normal], k, v);
        ...
    }

Because there were two passes already, this won't increase the cost to iterate
the dict. But I don't know this is faster than your version as this might have
some impact on CPU cache.

Regards,

Patch

diff --git a/mercurial/parsers.c b/mercurial/parsers.c
--- a/mercurial/parsers.c
+++ b/mercurial/parsers.c
@@ -602,7 +602,10 @@ 
}
memcpy(p, s, l);
p += 20;
- if (0 == 0) {
+ int pass;

New variable can't be declared here according to C89, and MSVC does complain it.

I will fix that.

+ /* First pass, non normal files, second pass normal files. This is to improve