Patchwork [1,of,4,tags-cache-split] tags: make tags cache compatible with future split of filenode cache

login
register
mail settings
Submitter Gregory Szorc
Date March 30, 2015, 6:14 a.m.
Message ID <31716032c6e7f8764a02.1427696043@vm-ubuntu-main.gateway.sonic.net>
Download mbox | patch
Permalink /patch/8366/
State Accepted
Headers show

Comments

Gregory Szorc - March 30, 2015, 6:14 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1427258716 25200
#      Tue Mar 24 21:45:16 2015 -0700
# Node ID 31716032c6e7f8764a02b212dac872cc3436c052
# Parent  2cb6b78cf818a082ee9a2518968f88c1819efed0
tags: make tags cache compatible with future split of filenode cache

Upcoming patches will split the tags cache into separate, per
filter files as well as split the .hgtags filenode cache into a
standalone file. Doing this atomically in one commit would be a
massive, difficult-to-comprehend patch.

This patch adds future-proofing to tags cache reading so this work
can be split across multiple commits.

With this patch applied, clients are able to recognize the planned
future format of the tags cache with external .hgtags filenode data.
When support for that cache lands, clients can read existing .hgtags
filenode data from the prior tags cache files and then write out the
new file format without having to recompute .hgtags filenodes. For
users of large repositories, this will potentially save minutes of
wall time.

The assumption with this commit is that it will land immediately
before per-filter tags cache files are introduced and per-filter tags
cache files will land before the .hgtags filenode split is performed.

The patches landing in this order effectively constitute a new
version of the tags cache since per-filter files introduce new files,
which can include new processing rules.

Assuming this patch lands first and the new external .hgtags
filenodes syntax doesn't land until per-filter files are introduced,
clients that know to read the per-filter tags files are guaranteed to
understand the external .hgtags filenodes syntax. So no backwards
compatibility concern should arise.

This patch could potentially be reordered and merged with the later
patch that introduces the external .hgtags filenodes cache.
Pierre-Yves David - March 30, 2015, 10:59 p.m.
On 03/29/2015 11:14 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1427258716 25200
> #      Tue Mar 24 21:45:16 2015 -0700
> # Node ID 31716032c6e7f8764a02b212dac872cc3436c052
> # Parent  2cb6b78cf818a082ee9a2518968f88c1819efed0
> tags: make tags cache compatible with future split of filenode cache
>
> Upcoming patches will split the tags cache into separate, per
> filter files as well as split the .hgtags filenode cache into a
> standalone file. Doing this atomically in one commit would be a
> massive, difficult-to-comprehend patch.
>
> This patch adds future-proofing to tags cache reading so this work
> can be split across multiple commits.
>
> With this patch applied, clients are able to recognize the planned
> future format of the tags cache with external .hgtags filenode data.
> When support for that cache lands, clients can read existing .hgtags
> filenode data from the prior tags cache files and then write out the
> new file format without having to recompute .hgtags filenodes. For
> users of large repositories, this will potentially save minutes of
> wall time.
>
> The assumption with this commit is that it will land immediately
> before per-filter tags cache files are introduced and per-filter tags
> cache files will land before the .hgtags filenode split is performed.
>
> The patches landing in this order effectively constitute a new
> version of the tags cache since per-filter files introduce new files,
> which can include new processing rules.
>
> Assuming this patch lands first and the new external .hgtags
> filenodes syntax doesn't land until per-filter files are introduced,
> clients that know to read the per-filter tags files are guaranteed to
> understand the external .hgtags filenodes syntax. So no backwards
> compatibility concern should arise.
>
> This patch could potentially be reordered and merged with the later
> patch that introduces the external .hgtags filenodes cache.

All this ordering business is very confusing. Looks like you are trying 
to explain the exponential combinaison of the possible order changes. 
This is confusing, Can you confirm the current series is self containing 
and properly ordered ?


>
> diff --git a/mercurial/tags.py b/mercurial/tags.py
> --- a/mercurial/tags.py
> +++ b/mercurial/tags.py
> @@ -27,8 +27,16 @@ import time
>   # The first part consists of lines of the form:
>   #
>   #   <headrev> <headnode> [<hgtagsnode>]
>   #
> +# *OR* a single line of the form:
> +#
> +#   "external" <tiprev> <tipnode>

How does this "external" value will be handled by old client?

What are the tiprev and tipnode for ?

> +#
> +# The first form is the historical method of storing the .hgtags filenode
> +# mapping inline. The second form (which is reserved for future use) uses
> +# a separate file for this data.
> +#
>   # <headrev> is an integer revision and <headnode> is a 40 character hex
>   # node for that changeset. These redundantly identify a repository
>   # head from the time the cache was written.
>   #
> @@ -271,8 +279,13 @@ def _readtagcache(ui, repo):
>           try:
>               for line in cachelines:
>                   if line == "\n":
>                       break
> +
> +                # Future version of cache encountered. Do nothing yet.
> +                if line.startswith('external '):
> +                    continue
> +
>                   line = line.split()
>                   cacherevs.append(int(line[0]))
>                   headnode = bin(line[1])
>                   cacheheads.append(headnode)
> diff --git a/tests/test-tags.t b/tests/test-tags.t
> --- a/tests/test-tags.t
> +++ b/tests/test-tags.t
> @@ -271,8 +271,33 @@ Dump cache:
>     bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>     bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>     78391a272241d70354aa14c874552cad6b51bb42 bar
>
> +External .hgtags filenode cache marker is handled
> +
> +  $ cat > .hg/cache/tags << EOF
> +  > external 4 0c192d7d5e6b78a714de54a2e9627952a877e25a 2e21d3312350ce63785cda82526c951211e76bab
> +  >
> +  > bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
> +  > bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
> +  > 78391a272241d70354aa14c874552cad6b51bb42 bar
> +  > EOF
> +
> +  $ hg tags
> +  tip                                4:0c192d7d5e6b
> +  bar                                1:78391a272241
> +
> +We should get an old style cache again
> +
> +  $ cat .hg/cache/tags
> +  4 0c192d7d5e6b78a714de54a2e9627952a877e25a 0c04f2a8af31de17fab7422878ee5a2dadbc943d
> +  3 6fa450212aeb2a21ed616a54aea39a4a27894cd7 7d3b718c964ef37b89e550ebdafd5789e76ce1b0
> +  2 7a94127795a33c10a370c93f731fd9fea0b79af6 0c04f2a8af31de17fab7422878ee5a2dadbc943d
> +
> +  bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
> +  bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
> +  78391a272241d70354aa14c874552cad6b51bb42 bar
> +
>   Test tag removal:
>
>     $ hg tag --remove bar     # rev 5
>     $ hg tip -vp
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Gregory Szorc - March 30, 2015, 11:40 p.m.
On Mon, Mar 30, 2015 at 3:59 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 03/29/2015 11:14 PM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.szorc@gmail.com>
>> # Date 1427258716 25200
>> #      Tue Mar 24 21:45:16 2015 -0700
>> # Node ID 31716032c6e7f8764a02b212dac872cc3436c052
>> # Parent  2cb6b78cf818a082ee9a2518968f88c1819efed0
>> tags: make tags cache compatible with future split of filenode cache
>>
>> Upcoming patches will split the tags cache into separate, per
>> filter files as well as split the .hgtags filenode cache into a
>> standalone file. Doing this atomically in one commit would be a
>> massive, difficult-to-comprehend patch.
>>
>> This patch adds future-proofing to tags cache reading so this work
>> can be split across multiple commits.
>>
>> With this patch applied, clients are able to recognize the planned
>> future format of the tags cache with external .hgtags filenode data.
>> When support for that cache lands, clients can read existing .hgtags
>> filenode data from the prior tags cache files and then write out the
>> new file format without having to recompute .hgtags filenodes. For
>> users of large repositories, this will potentially save minutes of
>> wall time.
>>
>> The assumption with this commit is that it will land immediately
>> before per-filter tags cache files are introduced and per-filter tags
>> cache files will land before the .hgtags filenode split is performed.
>>
>> The patches landing in this order effectively constitute a new
>> version of the tags cache since per-filter files introduce new files,
>> which can include new processing rules.
>>
>> Assuming this patch lands first and the new external .hgtags
>> filenodes syntax doesn't land until per-filter files are introduced,
>> clients that know to read the per-filter tags files are guaranteed to
>> understand the external .hgtags filenodes syntax. So no backwards
>> compatibility concern should arise.
>>
>> This patch could potentially be reordered and merged with the later
>> patch that introduces the external .hgtags filenodes cache.
>>
>
> All this ordering business is very confusing. Looks like you are trying to
> explain the exponential combinaison of the possible order changes. This is
> confusing, Can you confirm the current series is self containing and
> properly ordered ?
>

So, if this entire series lands at once, this patch can be rolled into the
3rd patch. However, there is still a chance a bisect could land on part 3+,
produce a new tags cache file with "external" and then we bisect to part 2
and it chokes reading "external." This patch, like its description says, is
all about guaranteeing future compatibility. (I concede I'm not sure what a
sane approach to these things is. This kind of forward compatibility is my
default way to deal with things. Too much dealing with complex
client-server interactions where things you never think occur will occur.)


>
>
>> diff --git a/mercurial/tags.py b/mercurial/tags.py
>> --- a/mercurial/tags.py
>> +++ b/mercurial/tags.py
>> @@ -27,8 +27,16 @@ import time
>>   # The first part consists of lines of the form:
>>   #
>>   #   <headrev> <headnode> [<hgtagsnode>]
>>   #
>> +# *OR* a single line of the form:
>> +#
>> +#   "external" <tiprev> <tipnode>
>>
>
> How does this "external" value will be handled by old client?
>

As is implemented in this patch, it is ignored.


> What are the tiprev and tipnode for ?
>

Cache validation checking. See part 3.


>
>  +#
>> +# The first form is the historical method of storing the .hgtags filenode
>> +# mapping inline. The second form (which is reserved for future use) uses
>> +# a separate file for this data.
>> +#
>>   # <headrev> is an integer revision and <headnode> is a 40 character hex
>>   # node for that changeset. These redundantly identify a repository
>>   # head from the time the cache was written.
>>   #
>> @@ -271,8 +279,13 @@ def _readtagcache(ui, repo):
>>           try:
>>               for line in cachelines:
>>                   if line == "\n":
>>                       break
>> +
>> +                # Future version of cache encountered. Do nothing yet.
>> +                if line.startswith('external '):
>> +                    continue
>> +
>>                   line = line.split()
>>                   cacherevs.append(int(line[0]))
>>                   headnode = bin(line[1])
>>                   cacheheads.append(headnode)
>> diff --git a/tests/test-tags.t b/tests/test-tags.t
>> --- a/tests/test-tags.t
>> +++ b/tests/test-tags.t
>> @@ -271,8 +271,33 @@ Dump cache:
>>     bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>>     bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>>     78391a272241d70354aa14c874552cad6b51bb42 bar
>>
>> +External .hgtags filenode cache marker is handled
>> +
>> +  $ cat > .hg/cache/tags << EOF
>> +  > external 4 0c192d7d5e6b78a714de54a2e9627952a877e25a
>> 2e21d3312350ce63785cda82526c951211e76bab
>> +  >
>> +  > bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>> +  > bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>> +  > 78391a272241d70354aa14c874552cad6b51bb42 bar
>> +  > EOF
>> +
>> +  $ hg tags
>> +  tip                                4:0c192d7d5e6b
>> +  bar                                1:78391a272241
>> +
>> +We should get an old style cache again
>> +
>> +  $ cat .hg/cache/tags
>> +  4 0c192d7d5e6b78a714de54a2e9627952a877e25a
>> 0c04f2a8af31de17fab7422878ee5a2dadbc943d
>> +  3 6fa450212aeb2a21ed616a54aea39a4a27894cd7
>> 7d3b718c964ef37b89e550ebdafd5789e76ce1b0
>> +  2 7a94127795a33c10a370c93f731fd9fea0b79af6
>> 0c04f2a8af31de17fab7422878ee5a2dadbc943d
>> +
>> +  bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>> +  bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
>> +  78391a272241d70354aa14c874552cad6b51bb42 bar
>> +
>>   Test tag removal:
>>
>>     $ hg tag --remove bar     # rev 5
>>     $ hg tip -vp
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
>>
>>
> --
> Pierre-Yves David
>
Matt Mackall - March 31, 2015, 12:52 p.m.
On Sun, 2015-03-29 at 23:14 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1427258716 25200
> #      Tue Mar 24 21:45:16 2015 -0700
> # Node ID 31716032c6e7f8764a02b212dac872cc3436c052
> # Parent  2cb6b78cf818a082ee9a2518968f88c1819efed0
> tags: make tags cache compatible with future split of filenode cache
> 
> Upcoming patches will split the tags cache into separate, per
> filter files as well as split the .hgtags filenode cache into a
> standalone file. Doing this atomically in one commit would be a
> massive, difficult-to-comprehend patch.
> 
> This patch adds future-proofing to tags cache reading so this work
> can be split across multiple commits.

Our usual pattern for changing cache formats is:

- bump the name to something new
- delete the code for the old format

This has the benefit that old and new clients can interact with the same
repository without butting heads and (in this rare case) we don't need
to keep around any legacy code.

Alternately, add a wholly new cache on the side without touching the
existing one. This is what the revbranchcache did.

Since you're changing the format without changing the name, you've
killed people sharing repos on NFS and CIFS.

> With this patch applied, clients are able to recognize the planned
> future format of the tags cache with external .hgtags filenode data.
> When support for that cache lands, clients can read existing .hgtags
> filenode data from the prior tags cache files and then write out the
> new file format without having to recompute .hgtags filenodes. For
> users of large repositories, this will potentially save minutes of
> wall time.

Yeah, but don't we already have a bunch of recomputing that this is
fixing? I don't think eliminating this one-time cost is worth any extra
engineering.
Gregory Szorc - April 2, 2015, 12:05 a.m.
On Tue, Mar 31, 2015 at 5:52 AM, Matt Mackall <mpm@selenic.com> wrote:

> On Sun, 2015-03-29 at 23:14 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1427258716 25200
> > #      Tue Mar 24 21:45:16 2015 -0700
> > # Node ID 31716032c6e7f8764a02b212dac872cc3436c052
> > # Parent  2cb6b78cf818a082ee9a2518968f88c1819efed0
> > tags: make tags cache compatible with future split of filenode cache
> >
> > Upcoming patches will split the tags cache into separate, per
> > filter files as well as split the .hgtags filenode cache into a
> > standalone file. Doing this atomically in one commit would be a
> > massive, difficult-to-comprehend patch.
> >
> > This patch adds future-proofing to tags cache reading so this work
> > can be split across multiple commits.
>
> Our usual pattern for changing cache formats is:
>
> - bump the name to something new
> - delete the code for the old format
>
> This has the benefit that old and new clients can interact with the same
> repository without butting heads and (in this rare case) we don't need
> to keep around any legacy code.
>
> Alternately, add a wholly new cache on the side without touching the
> existing one. This is what the revbranchcache did.
>
> Since you're changing the format without changing the name, you've
> killed people sharing repos on NFS and CIFS.
>

I'm not changing the format. I'm allowing old clients to recognize the new
format as a no-op. The split to per-filter files ensures that an old client
will never see new format.

This patch is really just insurance against per-filter files landing
multiple days or weeks before a format refactor. If things land at the same
time, the only compatibility concern we have is for bisecting. Although,
I'm not sure if that's worth solving.


>
> > With this patch applied, clients are able to recognize the planned
> > future format of the tags cache with external .hgtags filenode data.
> > When support for that cache lands, clients can read existing .hgtags
> > filenode data from the prior tags cache files and then write out the
> > new file format without having to recompute .hgtags filenodes. For
> > users of large repositories, this will potentially save minutes of
> > wall time.
>
> Yeah, but don't we already have a bunch of recomputing that this is
> fixing? I don't think eliminating this one-time cost is worth any extra
> engineering.
>

I think this comment was left over from v1. The plan is to split into
per-filter cache files and not carry anything forward. It's just not worth
the effort.
Matt Mackall - April 2, 2015, 7 p.m.
On Wed, 2015-04-01 at 17:05 -0700, Gregory Szorc wrote:
> On Tue, Mar 31, 2015 at 5:52 AM, Matt Mackall <mpm@selenic.com> wrote:
> 
> > On Sun, 2015-03-29 at 23:14 -0700, Gregory Szorc wrote:
> > > # HG changeset patch
> > > # User Gregory Szorc <gregory.szorc@gmail.com>
> > > # Date 1427258716 25200
> > > #      Tue Mar 24 21:45:16 2015 -0700
> > > # Node ID 31716032c6e7f8764a02b212dac872cc3436c052
> > > # Parent  2cb6b78cf818a082ee9a2518968f88c1819efed0
> > > tags: make tags cache compatible with future split of filenode cache
> > >
> > > Upcoming patches will split the tags cache into separate, per
> > > filter files as well as split the .hgtags filenode cache into a
> > > standalone file. Doing this atomically in one commit would be a
> > > massive, difficult-to-comprehend patch.
> > >
> > > This patch adds future-proofing to tags cache reading so this work
> > > can be split across multiple commits.
> >
> > Our usual pattern for changing cache formats is:
> >
> > - bump the name to something new
> > - delete the code for the old format
> >
> > This has the benefit that old and new clients can interact with the same
> > repository without butting heads and (in this rare case) we don't need
> > to keep around any legacy code.
> >
> > Alternately, add a wholly new cache on the side without touching the
> > existing one. This is what the revbranchcache did.
> >
> > Since you're changing the format without changing the name, you've
> > killed people sharing repos on NFS and CIFS.
> >
> 
> I'm not changing the format. I'm allowing old clients to recognize the new
> format as a no-op. The split to per-filter files ensures that an old client
> will never see new format.

By "old clients" here, you mean "clients that have this changeset",
right? I'd rather you just bump the name to tags1 (or tags0 if this
isn't compatible with your end result).

It also might make sense to add the filenode cache first to have less
staging?

> This patch is really just insurance against per-filter files landing
> multiple days or weeks before a format refactor. If things land at the same
> time, the only compatibility concern we have is for bisecting. Although,
> I'm not sure if that's worth solving.

I bisect many times a week, so it matters to me at least.

> > > With this patch applied, clients are able to recognize the planned
> > > future format of the tags cache with external .hgtags filenode data.
> > > When support for that cache lands, clients can read existing .hgtags
> > > filenode data from the prior tags cache files and then write out the
> > > new file format without having to recompute .hgtags filenodes. For
> > > users of large repositories, this will potentially save minutes of
> > > wall time.
> >
> > Yeah, but don't we already have a bunch of recomputing that this is
> > fixing? I don't think eliminating this one-time cost is worth any extra
> > engineering.
> >
> 
> I think this comment was left over from v1. The plan is to split into
> per-filter cache files and not carry anything forward. It's just not worth
> the effort.

Great.

Patch

diff --git a/mercurial/tags.py b/mercurial/tags.py
--- a/mercurial/tags.py
+++ b/mercurial/tags.py
@@ -27,8 +27,16 @@  import time
 # The first part consists of lines of the form:
 #
 #   <headrev> <headnode> [<hgtagsnode>]
 #
+# *OR* a single line of the form:
+#
+#   "external" <tiprev> <tipnode>
+#
+# The first form is the historical method of storing the .hgtags filenode
+# mapping inline. The second form (which is reserved for future use) uses
+# a separate file for this data.
+#
 # <headrev> is an integer revision and <headnode> is a 40 character hex
 # node for that changeset. These redundantly identify a repository
 # head from the time the cache was written.
 #
@@ -271,8 +279,13 @@  def _readtagcache(ui, repo):
         try:
             for line in cachelines:
                 if line == "\n":
                     break
+
+                # Future version of cache encountered. Do nothing yet.
+                if line.startswith('external '):
+                    continue
+
                 line = line.split()
                 cacherevs.append(int(line[0]))
                 headnode = bin(line[1])
                 cacheheads.append(headnode)
diff --git a/tests/test-tags.t b/tests/test-tags.t
--- a/tests/test-tags.t
+++ b/tests/test-tags.t
@@ -271,8 +271,33 @@  Dump cache:
   bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
   bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
   78391a272241d70354aa14c874552cad6b51bb42 bar
 
+External .hgtags filenode cache marker is handled
+
+  $ cat > .hg/cache/tags << EOF
+  > external 4 0c192d7d5e6b78a714de54a2e9627952a877e25a 2e21d3312350ce63785cda82526c951211e76bab
+  > 
+  > bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
+  > bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
+  > 78391a272241d70354aa14c874552cad6b51bb42 bar
+  > EOF
+
+  $ hg tags
+  tip                                4:0c192d7d5e6b
+  bar                                1:78391a272241
+
+We should get an old style cache again
+
+  $ cat .hg/cache/tags
+  4 0c192d7d5e6b78a714de54a2e9627952a877e25a 0c04f2a8af31de17fab7422878ee5a2dadbc943d
+  3 6fa450212aeb2a21ed616a54aea39a4a27894cd7 7d3b718c964ef37b89e550ebdafd5789e76ce1b0
+  2 7a94127795a33c10a370c93f731fd9fea0b79af6 0c04f2a8af31de17fab7422878ee5a2dadbc943d
+  
+  bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
+  bbd179dfa0a71671c253b3ae0aa1513b60d199fa bar
+  78391a272241d70354aa14c874552cad6b51bb42 bar
+
 Test tag removal:
 
   $ hg tag --remove bar     # rev 5
   $ hg tip -vp