Patchwork [5,of,8] manifest: add manifestlog.get to obtain subdirectory instances

login
register
mail settings
Submitter Durham Goode
Date Sept. 14, 2016, 11:04 p.m.
Message ID <33a7df42b989c5559722.1473894275@dev111.prn1.facebook.com>
Download mbox | patch
Permalink /patch/16637/
State Superseded
Headers show

Comments

Durham Goode - Sept. 14, 2016, 11:04 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1473893509 25200
#      Wed Sep 14 15:51:49 2016 -0700
# Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
# Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
manifest: add manifestlog.get to obtain subdirectory instances

Previously manifestlog only allowed obtaining root level manifests. Future
patches will need direct access to subdirectory manifests as part of changegroup
creation, so let's add a get() function that knows how to deal with
subdirectories.
via Mercurial-devel - Sept. 15, 2016, 4:51 p.m.
On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1473893509 25200
> #      Wed Sep 14 15:51:49 2016 -0700
> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
> manifest: add manifestlog.get to obtain subdirectory instances
>
> Previously manifestlog only allowed obtaining root level manifests. Future
> patches will need direct access to subdirectory manifests as part of changegroup
> creation,

Can I ask for this patch to be moved closer to those future patches so
it's easier to see how this is related and that the API makes sense
for those patches? I haven't checked to make sure, but the next
patches in this series didn't seem to depend on this patch. I can just
drop this patch in flight if you agree.

> so let's add a get() function that knows how to deal with
> subdirectories.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -1040,20 +1040,34 @@ class manifestlog(object):
>          """Retrieves the manifest instance for the given node. Throws a KeyError
>          if not found.
>          """
> -        if node in self._mancache:
> -            cachemf = self._mancache[node]
> -            # The old manifest may put non-ctx manifests in the cache, so skip
> -            # those since they don't implement the full api.
> -            if (isinstance(cachemf, manifestctx) or
> -                isinstance(cachemf, treemanifestctx)):
> -                return cachemf
> +        return self.get('', node)
>
> -        if self._treeinmem:
> -            m = treemanifestctx(self._revlog, '', node)
> +    def get(self, dir, node):
> +        """Retrieves the manifest instance for the given node. Throws a KeyError
> +        if not found.
> +        """
> +        if dir:
> +            if self._treeinmem:
> +                m = treemanifestctx(self._revlog.dirlog(dir), dir, node)
> +            else:
> +                raise error.Abort(
> +                        _("cannot ask for manifest directory '%s' in a flat "
> +                          "manifest") % dir)
>          else:
> -            m = manifestctx(self._revlog, node)
> -        if node != revlog.nullid:
> -            self._mancache[node] = m
> +            if node in self._mancache:
> +                cachemf = self._mancache[node]
> +                # The old manifest may put non-ctx manifests in the cache, so
> +                # skip those since they don't implement the full api.
> +                if (isinstance(cachemf, manifestctx) or
> +                    isinstance(cachemf, treemanifestctx)):
> +                    return cachemf
> +
> +            if self._treeinmem:
> +                m = treemanifestctx(self._revlog, '', node)
> +            else:
> +                m = manifestctx(self._revlog, node)
> +            if node != revlog.nullid:
> +                self._mancache[node] = m
>          return m
>
>      def add(self, m, transaction, link, p1, p2, added, removed):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - Sept. 15, 2016, 5:32 p.m.
On 9/15/16 9:51 AM, Martin von Zweigbergk wrote:
> On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1473893509 25200
>> #      Wed Sep 14 15:51:49 2016 -0700
>> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
>> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
>> manifest: add manifestlog.get to obtain subdirectory instances
>>
>> Previously manifestlog only allowed obtaining root level manifests. Future
>> patches will need direct access to subdirectory manifests as part of changegroup
>> creation,
> Can I ask for this patch to be moved closer to those future patches so
> it's easier to see how this is related and that the API makes sense
> for those patches? I haven't checked to make sure, but the next
> patches in this series didn't seem to depend on this patch. I can just
> drop this patch in flight if you agree.
manifestlog.get() is actually necessary for patch #7 in this series, 
which requires access to the directory specific manifestctx to replace 
readshallowfast during changegroup creation.
via Mercurial-devel - Sept. 15, 2016, 6:07 p.m.
On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1473893509 25200
> #      Wed Sep 14 15:51:49 2016 -0700
> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
> manifest: add manifestlog.get to obtain subdirectory instances
>
> Previously manifestlog only allowed obtaining root level manifests. Future
> patches will need direct access to subdirectory manifests as part of changegroup
> creation, so let's add a get() function that knows how to deal with
> subdirectories.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -1040,20 +1040,34 @@ class manifestlog(object):
>          """Retrieves the manifest instance for the given node. Throws a KeyError
>          if not found.
>          """
> -        if node in self._mancache:
> -            cachemf = self._mancache[node]
> -            # The old manifest may put non-ctx manifests in the cache, so skip
> -            # those since they don't implement the full api.
> -            if (isinstance(cachemf, manifestctx) or
> -                isinstance(cachemf, treemanifestctx)):
> -                return cachemf
> +        return self.get('', node)
>
> -        if self._treeinmem:
> -            m = treemanifestctx(self._revlog, '', node)
> +    def get(self, dir, node):
> +        """Retrieves the manifest instance for the given node. Throws a KeyError
> +        if not found.
> +        """
> +        if dir:
> +            if self._treeinmem:

I think this should be checking treeondisk. When using flat manifests
on disk and trees in memory, it still doesn't make sense to ask for a
specific directory and node.

Speaking of that, have you tried setting treeinmem=True a few times
during this refactoring? I wouldn't be surprised if it was broken even
before you started, so that would be the first thing to check if you
haven't already.

> +                m = treemanifestctx(self._revlog.dirlog(dir), dir, node)
> +            else:
> +                raise error.Abort(
> +                        _("cannot ask for manifest directory '%s' in a flat "
> +                          "manifest") % dir)
>          else:
> -            m = manifestctx(self._revlog, node)
> -        if node != revlog.nullid:
> -            self._mancache[node] = m
> +            if node in self._mancache:
> +                cachemf = self._mancache[node]
> +                # The old manifest may put non-ctx manifests in the cache, so
> +                # skip those since they don't implement the full api.
> +                if (isinstance(cachemf, manifestctx) or
> +                    isinstance(cachemf, treemanifestctx)):
> +                    return cachemf
> +
> +            if self._treeinmem:
> +                m = treemanifestctx(self._revlog, '', node)
> +            else:
> +                m = manifestctx(self._revlog, node)
> +            if node != revlog.nullid:
> +                self._mancache[node] = m
>          return m
>
>      def add(self, m, transaction, link, p1, p2, added, removed):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
via Mercurial-devel - Sept. 15, 2016, 6:23 p.m.
On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1473893509 25200
> #      Wed Sep 14 15:51:49 2016 -0700
> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
> manifest: add manifestlog.get to obtain subdirectory instances
>
> Previously manifestlog only allowed obtaining root level manifests. Future
> patches will need direct access to subdirectory manifests as part of changegroup
> creation, so let's add a get() function that knows how to deal with
> subdirectories.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -1040,20 +1040,34 @@ class manifestlog(object):
>          """Retrieves the manifest instance for the given node. Throws a KeyError
>          if not found.
>          """
> -        if node in self._mancache:
> -            cachemf = self._mancache[node]
> -            # The old manifest may put non-ctx manifests in the cache, so skip
> -            # those since they don't implement the full api.
> -            if (isinstance(cachemf, manifestctx) or
> -                isinstance(cachemf, treemanifestctx)):
> -                return cachemf
> +        return self.get('', node)
>
> -        if self._treeinmem:
> -            m = treemanifestctx(self._revlog, '', node)
> +    def get(self, dir, node):
> +        """Retrieves the manifest instance for the given node. Throws a KeyError
> +        if not found.
> +        """
> +        if dir:
> +            if self._treeinmem:
> +                m = treemanifestctx(self._revlog.dirlog(dir), dir, node)

This does not use the _mancache and there doesn't seem to be a
_mancache on the revlog either. I've forgotten whether or not you plan
to have one manifestlog per repo or per directory, but I think you
said per repo. So it seems like only the root directory manifests will
be cached at this point. Am I reading that right? Perhaps you're
fixing that later, but even then it seems like an unfortunate
transitional step to lose it here.

> +            else:
> +                raise error.Abort(
> +                        _("cannot ask for manifest directory '%s' in a flat "
> +                          "manifest") % dir)
>          else:
> -            m = manifestctx(self._revlog, node)
> -        if node != revlog.nullid:
> -            self._mancache[node] = m
> +            if node in self._mancache:
> +                cachemf = self._mancache[node]
> +                # The old manifest may put non-ctx manifests in the cache, so
> +                # skip those since they don't implement the full api.
> +                if (isinstance(cachemf, manifestctx) or
> +                    isinstance(cachemf, treemanifestctx)):
> +                    return cachemf
> +
> +            if self._treeinmem:
> +                m = treemanifestctx(self._revlog, '', node)
> +            else:
> +                m = manifestctx(self._revlog, node)
> +            if node != revlog.nullid:
> +                self._mancache[node] = m
>          return m
>
>      def add(self, m, transaction, link, p1, p2, added, removed):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Durham Goode - Sept. 20, 2016, 12:09 a.m.
On 9/15/16 11:07 AM, Martin von Zweigbergk wrote:
> On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1473893509 25200
>> #      Wed Sep 14 15:51:49 2016 -0700
>> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
>> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
>> manifest: add manifestlog.get to obtain subdirectory instances
>>
>> Previously manifestlog only allowed obtaining root level manifests. Future
>> patches will need direct access to subdirectory manifests as part of changegroup
>> creation, so let's add a get() function that knows how to deal with
>> subdirectories.
>>
>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>> --- a/mercurial/manifest.py
>> +++ b/mercurial/manifest.py
>> @@ -1040,20 +1040,34 @@ class manifestlog(object):
>>           """Retrieves the manifest instance for the given node. Throws a KeyError
>>           if not found.
>>           """
>> -        if node in self._mancache:
>> -            cachemf = self._mancache[node]
>> -            # The old manifest may put non-ctx manifests in the cache, so skip
>> -            # those since they don't implement the full api.
>> -            if (isinstance(cachemf, manifestctx) or
>> -                isinstance(cachemf, treemanifestctx)):
>> -                return cachemf
>> +        return self.get('', node)
>>
>> -        if self._treeinmem:
>> -            m = treemanifestctx(self._revlog, '', node)
>> +    def get(self, dir, node):
>> +        """Retrieves the manifest instance for the given node. Throws a KeyError
>> +        if not found.
>> +        """
>> +        if dir:
>> +            if self._treeinmem:
> I think this should be checking treeondisk. When using flat manifests
> on disk and trees in memory, it still doesn't make sense to ask for a
> specific directory and node.
>
> Speaking of that, have you tried setting treeinmem=True a few times
> during this refactoring? I wouldn't be surprised if it was broken even
> before you started, so that would be the first thing to check if you
> haven't already.
I'll switch this to treeondisk.  I've not tried treeinmem=True.  Is that 
not automatically covered by the tests?
via Mercurial-devel - Sept. 20, 2016, 12:13 a.m.
On Mon, Sep 19, 2016 at 5:09 PM, Durham Goode <durham@fb.com> wrote:
> On 9/15/16 11:07 AM, Martin von Zweigbergk wrote:
>>
>> On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
>>>
>>> # HG changeset patch
>>> # User Durham Goode <durham@fb.com>
>>> # Date 1473893509 25200
>>> #      Wed Sep 14 15:51:49 2016 -0700
>>> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
>>> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
>>> manifest: add manifestlog.get to obtain subdirectory instances
>>>
>>> Previously manifestlog only allowed obtaining root level manifests.
>>> Future
>>> patches will need direct access to subdirectory manifests as part of
>>> changegroup
>>> creation, so let's add a get() function that knows how to deal with
>>> subdirectories.
>>>
>>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>>> --- a/mercurial/manifest.py
>>> +++ b/mercurial/manifest.py
>>> @@ -1040,20 +1040,34 @@ class manifestlog(object):
>>>           """Retrieves the manifest instance for the given node. Throws a
>>> KeyError
>>>           if not found.
>>>           """
>>> -        if node in self._mancache:
>>> -            cachemf = self._mancache[node]
>>> -            # The old manifest may put non-ctx manifests in the cache,
>>> so skip
>>> -            # those since they don't implement the full api.
>>> -            if (isinstance(cachemf, manifestctx) or
>>> -                isinstance(cachemf, treemanifestctx)):
>>> -                return cachemf
>>> +        return self.get('', node)
>>>
>>> -        if self._treeinmem:
>>> -            m = treemanifestctx(self._revlog, '', node)
>>> +    def get(self, dir, node):
>>> +        """Retrieves the manifest instance for the given node. Throws a
>>> KeyError
>>> +        if not found.
>>> +        """
>>> +        if dir:
>>> +            if self._treeinmem:
>>
>> I think this should be checking treeondisk. When using flat manifests
>> on disk and trees in memory, it still doesn't make sense to ask for a
>> specific directory and node.
>>
>> Speaking of that, have you tried setting treeinmem=True a few times
>> during this refactoring? I wouldn't be surprised if it was broken even
>> before you started, so that would be the first thing to check if you
>> haven't already.
>
> I'll switch this to treeondisk.  I've not tried treeinmem=True.  Is that not
> automatically covered by the tests?

Unfortunately not. We could write tests specifically for it if we
added some debug option to enable it, but we don't have any such test
yet. The advantage of setting it manually is that it's easy to enable
it for the entire test suite. There's no reason not to have both,
though. I just haven't done it (well, I also don't know if we have any
existing debug options other than for extra loggin, so I don't know
how to do that).
Durham Goode - Sept. 20, 2016, 7:19 p.m.
On 9/15/16 11:23 AM, Martin von Zweigbergk wrote:
> On Wed, Sep 14, 2016 at 4:04 PM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1473893509 25200
>> #      Wed Sep 14 15:51:49 2016 -0700
>> # Node ID 33a7df42b989c555972280f6b84e2fac38babf7b
>> # Parent  d41da1522f8efb5bf5aa75a51f0093b1129b6b5a
>> manifest: add manifestlog.get to obtain subdirectory instances
>>
>> Previously manifestlog only allowed obtaining root level manifests. Future
>> patches will need direct access to subdirectory manifests as part of changegroup
>> creation, so let's add a get() function that knows how to deal with
>> subdirectories.
>>
>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>> --- a/mercurial/manifest.py
>> +++ b/mercurial/manifest.py
>> @@ -1040,20 +1040,34 @@ class manifestlog(object):
>>           """Retrieves the manifest instance for the given node. Throws a KeyError
>>           if not found.
>>           """
>> -        if node in self._mancache:
>> -            cachemf = self._mancache[node]
>> -            # The old manifest may put non-ctx manifests in the cache, so skip
>> -            # those since they don't implement the full api.
>> -            if (isinstance(cachemf, manifestctx) or
>> -                isinstance(cachemf, treemanifestctx)):
>> -                return cachemf
>> +        return self.get('', node)
>>
>> -        if self._treeinmem:
>> -            m = treemanifestctx(self._revlog, '', node)
>> +    def get(self, dir, node):
>> +        """Retrieves the manifest instance for the given node. Throws a KeyError
>> +        if not found.
>> +        """
>> +        if dir:
>> +            if self._treeinmem:
>> +                m = treemanifestctx(self._revlog.dirlog(dir), dir, node)
> This does not use the _mancache and there doesn't seem to be a
> _mancache on the revlog either. I've forgotten whether or not you plan
> to have one manifestlog per repo or per directory, but I think you
> said per repo. So it seems like only the root directory manifests will
> be cached at this point. Am I reading that right? Perhaps you're
> fixing that later, but even then it seems like an unfortunate
> transitional step to lose it here.
>
You are right.  I'll send a new series with a patch (immediately after 
this patch) that changes the manifestlog's cache to have an entry per tree.

Patch

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1040,20 +1040,34 @@  class manifestlog(object):
         """Retrieves the manifest instance for the given node. Throws a KeyError
         if not found.
         """
-        if node in self._mancache:
-            cachemf = self._mancache[node]
-            # The old manifest may put non-ctx manifests in the cache, so skip
-            # those since they don't implement the full api.
-            if (isinstance(cachemf, manifestctx) or
-                isinstance(cachemf, treemanifestctx)):
-                return cachemf
+        return self.get('', node)
 
-        if self._treeinmem:
-            m = treemanifestctx(self._revlog, '', node)
+    def get(self, dir, node):
+        """Retrieves the manifest instance for the given node. Throws a KeyError
+        if not found.
+        """
+        if dir:
+            if self._treeinmem:
+                m = treemanifestctx(self._revlog.dirlog(dir), dir, node)
+            else:
+                raise error.Abort(
+                        _("cannot ask for manifest directory '%s' in a flat "
+                          "manifest") % dir)
         else:
-            m = manifestctx(self._revlog, node)
-        if node != revlog.nullid:
-            self._mancache[node] = m
+            if node in self._mancache:
+                cachemf = self._mancache[node]
+                # The old manifest may put non-ctx manifests in the cache, so
+                # skip those since they don't implement the full api.
+                if (isinstance(cachemf, manifestctx) or
+                    isinstance(cachemf, treemanifestctx)):
+                    return cachemf
+
+            if self._treeinmem:
+                m = treemanifestctx(self._revlog, '', node)
+            else:
+                m = manifestctx(self._revlog, node)
+            if node != revlog.nullid:
+                self._mancache[node] = m
         return m
 
     def add(self, m, transaction, link, p1, p2, added, removed):