Patchwork [2,of,5,flagprocessor,v6] revlog: add 'raw' argument to revision and _addrevision

login
register
mail settings
Submitter Remi Chaintron
Date Dec. 24, 2016, 4:36 p.m.
Message ID <d0476160913323da1dad.1482597388@remi-mbp2>
Download mbox | patch
Permalink /patch/18025/
State Changes Requested
Delegated to: Pierre-Yves David
Headers show

Comments

Remi Chaintron - Dec. 24, 2016, 4:36 p.m.
# HG changeset patch
# User Remi Chaintron <remi@fb.com>
# Date 1482539184 18000
#      Fri Dec 23 19:26:24 2016 -0500
# Node ID d0476160913323da1dada49ae46e72d6a7c17d78
# Parent  c1bd9b7750cdfe1f2a1437e4ba0ed8f6e48b2dd1
revlog: add 'raw' argument to revision and _addrevision

This patch introduces a new 'raw' argument (defaults to False) to revlog's
revision() and _addrevision() methods.
When the 'raw' argument is set to True, it indicates the revision data should be
handled as raw data by the flagprocessor.

This patch adds a new 'rawrevision()' method setting the 'raw' argument to True
in the revlog.revision() call that is used to differentiate changegroup
generation and debugdata related calls to revision() from regular read accesses.

Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
can always set raw to True when calling revlog._addrevision() from
revlog.addgroup().
Pierre-Yves David - Dec. 30, 2016, 10:25 a.m.
On 12/24/2016 05:36 PM, Remi Chaintron wrote:
> # HG changeset patch
> # User Remi Chaintron <remi@fb.com>
> # Date 1482539184 18000
> #      Fri Dec 23 19:26:24 2016 -0500
> # Node ID d0476160913323da1dada49ae46e72d6a7c17d78
> # Parent  c1bd9b7750cdfe1f2a1437e4ba0ed8f6e48b2dd1
> revlog: add 'raw' argument to revision and _addrevision
>
> This patch introduces a new 'raw' argument (defaults to False) to revlog's
> revision() and _addrevision() methods.
> When the 'raw' argument is set to True, it indicates the revision data should be
> handled as raw data by the flagprocessor.

That part seems fine. (that should probably mention that the argument do 
not have effect yet, but that's fine).

> This patch adds a new 'rawrevision()' method setting the 'raw' argument to True
> in the revlog.revision() call that is used to differentiate changegroup
> generation and debugdata related calls to revision() from regular read accesses.

I don't get that part. it seems like 'rawrevision(…)' is just 
'revision(…, raw=True)' so I do not understant why we need a new method. 
Can't we just call 'revision(…, raw=True)' directly. Am I missing 
something? Otherwise, we should just drop that method.

> Note: Given revlog.addgroup() calls are restricted to changegroup generation, we
> can always set raw to True when calling revlog._addrevision() from
> revlog.addgroup().

There is a couple of extra small comment that are not blocker, but worth 
considering as if we do that extra round-trip.

Can I get you to enable the 'showfunc' feature so that your patch 
includes a bit more context?

   [diff]
   showfunc=1

> diff --git a/mercurial/bundlerepo.py b/mercurial/bundlerepo.py
> --- a/mercurial/bundlerepo.py
> +++ b/mercurial/bundlerepo.py
> @@ -117,7 +117,7 @@
>          return mdiff.textdiff(self.revision(self.node(rev1)),
>                                self.revision(self.node(rev2)))
>
> -    def revision(self, nodeorrev):
> +    def revision(self, nodeorrev, raw=False):
>          """return an uncompressed revision of a given node or revision
>          number.
>          """
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -783,7 +783,7 @@
>          prefix = ''
>          if revlog.iscensored(base) or revlog.iscensored(rev):
>              try:
> -                delta = revlog.revision(node)
> +                delta = revlog.rawrevision(node)
>              except error.CensoredNodeError as e:
>                  delta = e.tombstone
>              if base == nullrev:
> @@ -792,7 +792,7 @@
>                  baselen = revlog.rawsize(base)
>                  prefix = mdiff.replacediffheader(baselen, len(delta))
>          elif base == nullrev:
> -            delta = revlog.revision(node)
> +            delta = revlog.rawrevision(node)
>              prefix = mdiff.trivialdiffheader(len(delta))
>          else:
>              delta = revlog.revdiff(base, rev)
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -1110,6 +1110,9 @@
>          return filectx(self._repo, self._path, fileid=fileid,
>                         filelog=self._filelog, changeid=changeid)
>
> +    def rawdata(self):
> +        return self._filelog.rawrevision(self._filenode)
> +

That new method is not mentioned in the description and not used 
anywhere. What is it about?

>      def data(self):
>          try:
>              return self._filelog.read(self._filenode)
> diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py
> --- a/mercurial/debugcommands.py
> +++ b/mercurial/debugcommands.py
> @@ -445,7 +445,7 @@
>          raise error.CommandError('debugdata', _('invalid arguments'))
>      r = cmdutil.openrevlog(repo, 'debugdata', file_, opts)
>      try:
> -        ui.write(r.revision(r.lookup(rev)))
> +        ui.write(r.rawrevision(r.lookup(rev)))
>      except KeyError:
>          raise error.Abort(_('invalid revision identifier %s') % rev)
>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -1202,12 +1202,18 @@
>          return mdiff.textdiff(self.revision(rev1),
>                                self.revision(rev2))
>
> -    def revision(self, nodeorrev, _df=None):
> +    def rawrevision(self, nodeorrev):
> +        """cf. revision() for argument description."""
> +        return self.revision(nodeorrev, raw=True)
> +
> +    def revision(self, nodeorrev, _df=None, raw=False):
>          """return an uncompressed revision of a given node or revision
>          number.
>
> -        _df is an existing file handle to read from. It is meant to only be
> -        used internally.
> +        _df - an existing file handle to read from. (internal-only)
> +        raw - an optional argument specifying if the revision data is to be
> +        treated as raw data when applying flag transforms. 'raw' should be set
> +        to True when generating changegroups or in debug commands.
>          """
>          if isinstance(nodeorrev, int):
>              rev = nodeorrev
> @@ -1357,7 +1363,8 @@
>          ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
>          try:
>              return self._addrevision(node, text, transaction, link, p1, p2,
> -                                     REVIDX_DEFAULT_FLAGS, cachedelta, ifh, dfh)
> +                                     REVIDX_DEFAULT_FLAGS, cachedelta, ifh, dfh,
> +                                     raw=False)

As the default value is 'False' already, we should probably leave this 
call untouched.

>          finally:
>              if dfh:
>                  dfh.close()
> @@ -1412,13 +1419,16 @@
>          return True
>
>      def _addrevision(self, node, text, transaction, link, p1, p2, flags,
> -                     cachedelta, ifh, dfh, alwayscache=False):
> +                     cachedelta, ifh, dfh, alwayscache=False, raw=False):
>          """internal function to add revisions to the log
>
>          see addrevision for argument descriptions.
>          invariants:
>          - text is optional (can be None); if not set, cachedelta must be set.
>            if both are set, they must correspond to each other.
> +        - raw is optional; if set to True, it indicates the revision data is to
> +          be treated by processflags() as raw. It is usually set by changegroup
> +          generation and debug commands.
>          """
>          btext = [text]
>          def buildtext():
> @@ -1438,8 +1448,9 @@
>                      fh = ifh
>                  else:
>                      fh = dfh
> -                basetext = self.revision(self.node(baserev), _df=fh)
> +                basetext = self.revision(self.node(baserev), _df=fh, raw=raw)
>                  btext[0] = mdiff.patch(basetext, delta)
> +

Gratuitous unrelated new line ;-)

>              try:
>                  self.checkhash(btext[0], node, p1=p1, p2=p2)
>                  if flags & REVIDX_ISCENSORED:
> @@ -1668,10 +1679,14 @@
>                  # the added revision, which will require a call to
>                  # revision(). revision() will fast path if there is a cache
>                  # hit. So, we tell _addrevision() to always cache in this case.
> +                # We're only using addgroup() in the context of changegroup
> +                # generation so the revision data can always be handled as raw
> +                # by the flagprocessor.
>                  chain = self._addrevision(node, None, transaction, link,
>                                            p1, p2, flags, (baserev, delta),
>                                            ifh, dfh,
> -                                          alwayscache=bool(addrevisioncb))
> +                                          alwayscache=bool(addrevisioncb),
> +                                          raw=True)
>
>                  if addrevisioncb:
>                      addrevisioncb(self, chain)
> diff --git a/mercurial/unionrepo.py b/mercurial/unionrepo.py
> --- a/mercurial/unionrepo.py
> +++ b/mercurial/unionrepo.py
> @@ -93,7 +93,7 @@
>          return mdiff.textdiff(self.revision(self.node(rev1)),
>                                self.revision(self.node(rev2)))
>
> -    def revision(self, nodeorrev):
> +    def revision(self, nodeorrev, raw=False):
>          """return an uncompressed revision of a given node or revision
>          number.
>          """

Cheers,
Augie Fackler - Dec. 30, 2016, 3:38 p.m.
> On Dec 30, 2016, at 5:25 AM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
>> This patch adds a new 'rawrevision()' method setting the 'raw' argument to True
>> in the revlog.revision() call that is used to differentiate changegroup
>> generation and debugdata related calls to revision() from regular read accesses.
> 
> I don't get that part. it seems like 'rawrevision(…)' is just 'revision(…, raw=True)' so I do not understant why we need a new method. Can't we just call 'revision(…, raw=True)' directly. Am I missing something? Otherwise, we should just drop that method.

I believe it’s so that we can have extensions hook into either reading of raw revisions or non-raw revisions. In a perfect world, revision() wouldn’t even take a raw= keyword, but I’ve lost too much context this week to help without rereading the whole patch (which I don’t have time for right now.)
Rémi Chaintron - Dec. 30, 2016, 3:44 p.m.
Long story short, we need to pass the raw argument to processflags() so
that it can determine which transform use. This allows to differentiate the
regular use case (transform operations such as reading from a remote store
in lfs's case) from changegroup generation and debug commands. The only way
to not pass the raw argument to revision would be to refactor all flag
processing / checkhash out of revision and _addrevision, but that seems
like a significant rework in a really core part of revlog.
On Fri, 30 Dec 2016 at 10:38, Augie Fackler <raf@durin42.com> wrote:

>
> > On Dec 30, 2016, at 5:25 AM, Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
> >
> >> This patch adds a new 'rawrevision()' method setting the 'raw' argument
> to True
> >> in the revlog.revision() call that is used to differentiate changegroup
> >> generation and debugdata related calls to revision() from regular read
> accesses.
> >
> > I don't get that part. it seems like 'rawrevision(…)' is just
> 'revision(…, raw=True)' so I do not understant why we need a new method.
> Can't we just call 'revision(…, raw=True)' directly. Am I missing
> something? Otherwise, we should just drop that method.
>
> I believe it’s so that we can have extensions hook into either reading of
> raw revisions or non-raw revisions. In a perfect world, revision() wouldn’t
> even take a raw= keyword, but I’ve lost too much context this week to help
> without rereading the whole patch (which I don’t have time for right now.)
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Rémi Chaintron - Dec. 30, 2016, 3:46 p.m.
The actual rawrevision() method is simply a personal preference as I like
to have simpler apis on operations with optional arguments and I've seen
this in other places, but I do not feel strongly about moving it to use the
raw argument in calls to revision directly. Let me know what you prefer.
On Fri, 30 Dec 2016 at 10:44, Rémi Chaintron <remi.chaintron@gmail.com>
wrote:

> Long story short, we need to pass the raw argument to processflags() so
> that it can determine which transform use. This allows to differentiate the
> regular use case (transform operations such as reading from a remote store
> in lfs's case) from changegroup generation and debug commands. The only way
> to not pass the raw argument to revision would be to refactor all flag
> processing / checkhash out of revision and _addrevision, but that seems
> like a significant rework in a really core part of revlog.
> On Fri, 30 Dec 2016 at 10:38, Augie Fackler <raf@durin42.com> wrote:
>
>>
>> > On Dec 30, 2016, at 5:25 AM, Pierre-Yves David <
>> pierre-yves.david@ens-lyon.org> wrote:
>> >
>> >> This patch adds a new 'rawrevision()' method setting the 'raw'
>> argument to True
>> >> in the revlog.revision() call that is used to differentiate changegroup
>> >> generation and debugdata related calls to revision() from regular read
>> accesses.
>> >
>> > I don't get that part. it seems like 'rawrevision(…)' is just
>> 'revision(…, raw=True)' so I do not understant why we need a new method.
>> Can't we just call 'revision(…, raw=True)' directly. Am I missing
>> something? Otherwise, we should just drop that method.
>>
>> I believe it’s so that we can have extensions hook into either reading of
>> raw revisions or non-raw revisions. In a perfect world, revision() wouldn’t
>> even take a raw= keyword, but I’ve lost too much context this week to help
>> without rereading the whole patch (which I don’t have time for right now.)
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>>
> --
> Rémi
>
Augie Fackler - Dec. 30, 2016, 4:12 p.m.
> On Dec 30, 2016, at 10:46 AM, Rémi Chaintron <remi.chaintron@gmail.com> wrote:
> 
> The actual rawrevision() method is simply a personal preference as I like to have simpler apis on operations with optional arguments and I've seen this in other places, but I do not feel strongly about moving it to use the raw argument in calls to revision directly. Let me know what you prefer.

https://silkandspinach.net/2004/07/15/avoid-boolean-parameters/

In this particular case, my opinion is that it’s a clear win to avoid the boolean parameter since raw=False is actually defined in terms of raw=True, so it makes sense to express the one case in terms of the other using a function call to expose the structure to the reader.

Not a requirement, but that’s my reasoning for the choice. :)
Rémi Chaintron - Dec. 30, 2016, 5:04 p.m.
Following discussion with marmoute on IRC, I'll drop the rawrevision()
method for now.


On Fri, 30 Dec 2016 at 11:12 Augie Fackler <raf@durin42.com> wrote:

>
> > On Dec 30, 2016, at 10:46 AM, Rémi Chaintron <remi.chaintron@gmail.com>
> wrote:
> >
> > The actual rawrevision() method is simply a personal preference as I
> like to have simpler apis on operations with optional arguments and I've
> seen this in other places, but I do not feel strongly about moving it to
> use the raw argument in calls to revision directly. Let me know what you
> prefer.
>
> https://silkandspinach.net/2004/07/15/avoid-boolean-parameters/
>
> In this particular case, my opinion is that it’s a clear win to avoid the
> boolean parameter since raw=False is actually defined in terms of raw=True,
> so it makes sense to express the one case in terms of the other using a
> function call to expose the structure to the reader.
>
> Not a requirement, but that’s my reasoning for the choice. :)
Augie Fackler - Dec. 30, 2016, 5:09 p.m.
> On Dec 30, 2016, at 12:04 PM, Rémi Chaintron <remi.chaintron@gmail.com> wrote:
> 
> Following discussion with marmoute on IRC, I'll drop the rawrevision() method for now.

OOC, what’s the rationale for preferring the boolean parameter instead of making the code structure more self-documenting? I legitimately don’t see why the parameter version is preferable design-wise.

Thanks!

(Remi, don’t let my discussion of code organization preferences slow down your patch - do whatever Pierre-Yves is wanting so you can keep making progress.)
Pierre-Yves David - Dec. 30, 2016, 5:30 p.m.
On 12/30/2016 06:09 PM, Augie Fackler wrote:
>
>> On Dec 30, 2016, at 12:04 PM, Rémi Chaintron <remi.chaintron@gmail.com> wrote:
>>
>> Following discussion with marmoute on IRC, I'll drop the rawrevision() method for now.
>
> OOC, what’s the rationale for preferring the boolean parameter instead of making the code structure more self-documenting? I legitimately don’t see why the parameter version is preferable design-wise.

What about you check the IRC log so that remi and I can focus on getting 
this things out? The log are still warm and fresh.

Cheers,
Augie Fackler - Dec. 30, 2016, 7:46 p.m.
On Dec 30, 2016 12:30, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> wrote:

> On 12/30/2016 06:09 PM, Augie Fackler wrote:
> 
>> On Dec 30, 2016, at 12:04 PM, Rémi Chaintron <remi.chaintron@gmail.com> wrote:
>> 
>>> Following discussion with marmoute on IRC, I'll drop the rawrevision() method for now.
>> 
>> OOC, what’s the rationale for preferring the boolean parameter instead of making the code structure more self-documenting? I legitimately don’t see why the parameter version is preferable design-wise.
> 
> What about you check the IRC log so that remi and I can focus on getting this things out? The log are still warm and fresh.

I *did* read the IRC log, which I’ve quoted at the bottom of this message for future reference (since #mercurial isn’t logged, so until now this was only privately archived by our respective clients.)

Now, I just re-read that IRC conversation and still didn't see a *reason*, just that it's your aesthetic opinion and you find my link unconvincing (it's not the best example when compared to the specific case we have here, but instead is more a defense of the general "boolean parameters tend to be a design smell" principle). I'm happy to discuss my reasoning further, but am not happy to hear my design sense discarded because you don't like the look of the resulting API with no engineering reasoning.

I’d like Remi to move along with your design in the name of Facebook making progress and not burning out on your reviews, but in parallel with that I’d like to have this design discussion so that I can better understand why you’re insisting (or preferring? I’m not actually sure based on word choices, but you seem pretty strongly opposed to the two-methods approach) on this particular design.

Thanks!
Augie


IRC transcript from today in #mercurial, times America/New_York:
> 11:57 < marmoute> remi17: if rawrevision was a whole complex and different
>                   things on its own I would get why we had a new method.
> 11:57 < remi17> You and durin42 seem to disagree on what's best here, but a)
>                 if I get any cycles on this, this will be refactored later
>                 on anyway (but I'd understand if you do not want to bet on
>                 my cycles) b) it's not something I'm feeling strongy enough
>                 about to push either way c) I'd rather coin flip rather than
>                 having you guys losing time on this specific bit :)
> 11:58 < remi17> marmoute: I completely get your argument, it's 100% code
>                 style tastes/opinions at that point
> 11:58 < marmoute> But given how similar the codepath(basically just forward
>                   the boolean far down to some flag processing, I really
>                   don't see the point (and things from the link Augie
>                   provided is mostly too far away from the actual case here)
> 11:59 < marmoute> remi17: So, lets drop it (unless you want to actually do
>                   all that refactoring now ;-) )
> 11:59 < remi17> so if you feel strongly about this, and durin42 is happy
>                 with waiting for me to take a stab at it later after the
>                 censor clean up, I'll proceed with raw=True
> 11:59 < remi17> wilco
> 12:00 < marmoute> I'm okay with processing with dropping the method and
>                   using raw=True for now.
> 12:01 < marmoute> remi17: can you drop a minimal reply to your email saying
>                   that we'll drop it as per IRC discussion (so that people
>                   following the email have the context)
> 12:09 < marmoute> remi17: in patch 4, I don't see how the flag will be
>                   processed with different action on read and on write
>                   because the two call seems undistinguishable
> 12:12 < remi17> marmoute: the flag is not processed with different actions
> 12:12 < remi17> the order in which flags are processed is reversed between
>                 read and write operations
> 12:13 < marmoute> Yep, I got that part
> 12:13 < marmoute> But 'lfs' need to do different things on read and on
>                   write, right ?
> 12:13 < remi17> yeah, that's something we cleared while VC-ing
> 12:14 < marmoute> But I'm confused about the implementation in patch 4.
> 12:14 < marmoute> Because I don't see how any information is passed to allow
>                   having a different behavior
> 12:15 < remi17> We wrap around addrevision() to add flags and transform the
>                 contents on local commits
> 12:15 < marmoute> I also know that
> 12:16 < marmoute> remi17: the flag processor is not called at all during
>                   'addrevision' ?
> 12:17 < marmoute> (well, _addrevision)
> Quits: remi17
> 12:24 < marmoute> hum _addrevision call processflag, but that is in order to
>                   retrieve and check the base text.
> 12:24 < marmoute> hum no
> 12:26 < marmoute> That code seems censors related.
> 12:31 < marmoute> hum remi fanished.
> 12:31 < marmoute> s/f/v/
Pierre-Yves David - Jan. 7, 2017, 4:39 p.m.
On 12/30/2016 08:46 PM, Augie Fackler wrote:
>
> On Dec 30, 2016 12:30, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> wrote:
>
>> On 12/30/2016 06:09 PM, Augie Fackler wrote:
>>
>>> On Dec 30, 2016, at 12:04 PM, Rémi Chaintron <remi.chaintron@gmail.com> wrote:
>>>
>>>> Following discussion with marmoute on IRC, I'll drop the rawrevision() method for now.
>>>
>>> OOC, what’s the rationale for preferring the boolean parameter instead of making the code structure more self-documenting? I legitimately don’t see why the parameter version is preferable design-wise.
>>
>> What about you check the IRC log so that remi and I can focus on getting this things out? The log are still warm and fresh.
>
> I *did* read the IRC log, which I’ve quoted at the bottom of this message for future reference

There is more information in the IRC log, than you seems to have 
extracted. Please give it an extra reading. To help with that second 
reading, I've quoted the actual full IRC log of that conversation at the 
end of that email (yours only contained the last half of it).

My time is limited, so I'm not planning to spend around 1h to write a 
detailed email about that minor review feedback. Especially when there 
is already written material about the topic and no sign that you 
extracted it.
If your counter proposal is really important to you, fell free to come 
back with signs that you read the IRC log and a more concrete points 
than a generic blog post that doesn't really match Rémi's code.


Full IRC log about patch 2:

> 17:25 < remi17> durin42: sorry if i drop off the channel, currently hacking in a car between NYC and montreal
> 17:29 < remi17> durin42: I'm having trouble to understand if you prefer the rawrevision() method or explicitly use raw=True?
> 17:37 < durin42> I prefer rawrevision()
> 17:37 < durin42> because boolean parameters are typicall bad, and in this case I can even articulate why I don't like it :)
> 17:41 < remi17> Yeah, as I mentioned on the email thread, we can make this better for revision() by refactoring revision() and rawrevision() to use _revision() (that does the actual work but doesn't do the hash check/flagprocessing) instead and directly invoke
>                 processflags() with the right arguments
> 17:41 < remi17> It'll work well for revision but refactoring _addrevision will be another beast entirely, and I'll have to clean it up from censor before attempting it
> 17:42 < remi17> Death to all optional booleans still
> 17:47 < marmoute> remi17: resuming writing things about patch 4 now
> 17:50 < marmoute> remi17: not sure what's all that hate about the boolean raw is about. your new method is actually just using it.
> 17:52 < marmoute> remi17: I would prefer we drop the new method. Adding a new parameter is significantly less complex than adding a a new method. And in this case. We have the new parameter in all cases
> 17:53 < marmoute> So given how rare the 'raw' favor is compared to the normal one, I prefer that we drop the newmethod and stick to the new parameter only.
> 17:55 < remi17> marmoute: I honestly don't care much, this is all about code styles and I see your point about the argument being there in the first place. One point I'd still consider is that the right way forward is probably to refactor this part in a subsequent
>                 patch to keep only revision() and rawrevision(), getting rid of the argument entirely
> 17:57 < marmoute> remi17: if rawrevision was a whole complex and different things on its own I would get why we had a new method.
> 17:57 < remi17> You and durin42 seem to disagree on what's best here, but a) if I get any cycles on this, this will be refactored later on anyway (but I'd understand if you do not want to bet on my cycles) b) it's not something I'm feeling strongy enough about to push
>                 either way c) I'd rather coin flip rather than having you guys losing time on this specific bit :)
> 17:58 < remi17> marmoute: I completely get your argument, it's 100% code style tastes/opinions at that point
> 17:58 < marmoute> But given how similar the codepath(basically just forward the boolean far down to some flag processing, I really don't see the point (and things from the link Augie provided is mostly too far away from the actual case here)
> 17:59 < marmoute> remi17: So, lets drop it (unless you want to actually do all that refactoring now ;-) )
> 17:59 < remi17> so if you feel strongly about this, and durin42 is happy with waiting for me to take a stab at it later after the censor clean up, I'll proceed with raw=True
> 17:59 < remi17> wilco
> 18:00 < marmoute> I'm okay with processing with dropping the method and using raw=True for now.
> 18:01 < marmoute> remi17: can you drop a minimal reply to your email saying that we'll drop it as per IRC discussion (so that people following the email have the context)

Cheers,
Augie Fackler - Jan. 7, 2017, 5:33 p.m.
> On Jan 7, 2017, at 11:39 AM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
> 
> 
> On 12/30/2016 08:46 PM, Augie Fackler wrote:
>> 
>> On Dec 30, 2016 12:30, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org> wrote:
>> 
>>> On 12/30/2016 06:09 PM, Augie Fackler wrote:
>>> 
>>>> On Dec 30, 2016, at 12:04 PM, Rémi Chaintron <remi.chaintron@gmail.com> wrote:
>>>> 
>>>>> Following discussion with marmoute on IRC, I'll drop the rawrevision() method for now.
>>>> 
>>>> OOC, what’s the rationale for preferring the boolean parameter instead of making the code structure more self-documenting? I legitimately don’t see why the parameter version is preferable design-wise.
>>> 
>>> What about you check the IRC log so that remi and I can focus on getting this things out? The log are still warm and fresh.
>> 
>> I *did* read the IRC log, which I’ve quoted at the bottom of this message for future reference
> 
> There is more information in the IRC log, than you seems to have extracted. Please give it an extra reading. To help with that second reading, I've quoted the actual full IRC log of that conversation at the end of that email (yours only contained the last half of it).

I’ve done so, and responded inline below. Note that the actual objection I see was quoted in my mail, so if I’m missing something I’d really appreciate you taking the five minute to at least point out the line in this conversation I’m missing where you provide an engineering justification for your opinion.

> My time is limited, so I'm not planning to spend around 1h to write a detailed email about that minor review feedback. Especially when there is already written material about the topic and no sign that you extracted it.
> If your counter proposal is really important to you, fell free to come back with signs that you read the IRC log and a more concrete points than a generic blog post that doesn't really match Rémi's code.

I’m getting the sense here that you feel like I’ve been disrespectful of your time, and I’m sorry if you feel that way. I’m trying to work out what the misunderstanding here is so I can either drop my proposal with an understanding of why it’s harmful, or move forward with it.


> Full IRC log about patch 2:
> 
>> 17:25 < remi17> durin42: sorry if i drop off the channel, currently hacking in a car between NYC and montreal
>> 17:29 < remi17> durin42: I'm having trouble to understand if you prefer the rawrevision() method or explicitly use raw=True?
>> 17:37 < durin42> I prefer rawrevision()
>> 17:37 < durin42> because boolean parameters are typicall bad, and in this case I can even articulate why I don't like it :)
>> 17:41 < remi17> Yeah, as I mentioned on the email thread, we can make this better for revision() by refactoring revision() and rawrevision() to use _revision() (that does the actual work but doesn't do the hash check/flagprocessing) instead and directly invoke
>>                processflags() with the right arguments
>> 17:41 < remi17> It'll work well for revision but refactoring _addrevision will be another beast entirely, and I'll have to clean it up from censor before attempting it
>> 17:42 < remi17> Death to all optional booleans still
>> 17:47 < marmoute> remi17: resuming writing things about patch 4 now
>> 17:50 < marmoute> remi17: not sure what's all that hate about the boolean raw is about. your new method is actually just using it.
>> 17:52 < marmoute> remi17: I would prefer we drop the new method. Adding a new parameter is significantly less complex than adding a a new method. And in this case. We have the new parameter in all cases
>> 17:53 < marmoute> So given how rare the 'raw' favor is compared to the normal one, I prefer that we drop the newmethod and stick to the new parameter only.

So, this is one place where you disagree, and it’s not actually addressing my point (I’ll elaborate later).

>> 17:55 < remi17> marmoute: I honestly don't care much, this is all about code styles and I see your point about the argument being there in the first place. One point I'd still consider is that the right way forward is probably to refactor this part in a subsequent
>>                patch to keep only revision() and rawrevision(), getting rid of the argument entirely
>> 17:57 < marmoute> remi17: if rawrevision was a whole complex and different things on its own I would get why we had a new method.
>> 17:57 < remi17> You and durin42 seem to disagree on what's best here, but a) if I get any cycles on this, this will be refactored later on anyway (but I'd understand if you do not want to bet on my cycles) b) it's not something I'm feeling strongy enough about to push
>>                either way c) I'd rather coin flip rather than having you guys losing time on this specific bit :)
>> 17:58 < remi17> marmoute: I completely get your argument, it's 100% code style tastes/opinions at that point
>> 17:58 < marmoute> But given how similar the codepath(basically just forward the boolean far down to some flag processing, I really don't see the point (and things from the link Augie provided is mostly too far away from the actual case here)
>> 17:59 < marmoute> remi17: So, lets drop it (unless you want to actually do all that refactoring now ;-) )

And here you make a statement that’s actually covered by the blogpost, but I’m going to re-state my position wholly in my own words on the hope that I’ll be easier to understand (in part because I’ll use a dummy code example that’s exactly our current case, instead of merely related):

What we’ve got as of the current series is, effectively, this:

def readstuff(pointer, do_postprocessing=True):
  # read something into stuff
  if do_postprocessing:
    stuff = transform(stuff)
  return stuff

what I’d prefer is something like this:

def readrawstuff(pointer):
  # read something into stuff
  return stuff

def readstuff(pointer):
  stuff = readrawstuff(pointer)
  return transform(stuff)

I find this appealing for (at least) three architectural reasons:

1) It makes the structural relationship between the data and the transformers significantly more obvious
2) It makes chasing callsites of reading data or raw data a little more obvious what the intent was
3) The extra layer will make it easier to avoid code duplication with custom storage that’s not using revlog.

That last one, is of course, specific to this case in Mercurial, but I think it’s an important one, especially given the direction of some of the conversations Durham, Jun and I have been having about ways to modify changelog storage. There’s an immediate potential consumer on the horizon!

I’m still really only comprehending an aesthetic argument from you, that you think we don’t do raw reads enough for it to matter so you’d rather have fewer methods, which doesn’t actually address the value proposition I enumerated above. What do you think? Have I fundamentally misunderstood your position somehow?

Thanks for your time.
Augie

>> 17:59 < remi17> so if you feel strongly about this, and durin42 is happy with waiting for me to take a stab at it later after the censor clean up, I'll proceed with raw=True
>> 17:59 < remi17> wilco
>> 18:00 < marmoute> I'm okay with processing with dropping the method and using raw=True for now.
>> 18:01 < marmoute> remi17: can you drop a minimal reply to your email saying that we'll drop it as per IRC discussion (so that people following the email have the context)

A side-note for remi: it’d probably be good to quote the relevant IRC conversation (or at least post an agreed-upon summary) to the mailing list so that people who don’t lurk in IRC can still read the details of why a decision was made.

> 
> Cheers,
> 
> -- 
> Pierre-Yves David

Patch

diff --git a/mercurial/bundlerepo.py b/mercurial/bundlerepo.py

--- a/mercurial/bundlerepo.py

+++ b/mercurial/bundlerepo.py

@@ -117,7 +117,7 @@ 

         return mdiff.textdiff(self.revision(self.node(rev1)),
                               self.revision(self.node(rev2)))
 
-    def revision(self, nodeorrev):

+    def revision(self, nodeorrev, raw=False):

         """return an uncompressed revision of a given node or revision
         number.
         """
diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py

--- a/mercurial/changegroup.py

+++ b/mercurial/changegroup.py

@@ -783,7 +783,7 @@ 

         prefix = ''
         if revlog.iscensored(base) or revlog.iscensored(rev):
             try:
-                delta = revlog.revision(node)

+                delta = revlog.rawrevision(node)

             except error.CensoredNodeError as e:
                 delta = e.tombstone
             if base == nullrev:
@@ -792,7 +792,7 @@ 

                 baselen = revlog.rawsize(base)
                 prefix = mdiff.replacediffheader(baselen, len(delta))
         elif base == nullrev:
-            delta = revlog.revision(node)

+            delta = revlog.rawrevision(node)

             prefix = mdiff.trivialdiffheader(len(delta))
         else:
             delta = revlog.revdiff(base, rev)
diff --git a/mercurial/context.py b/mercurial/context.py

--- a/mercurial/context.py

+++ b/mercurial/context.py

@@ -1110,6 +1110,9 @@ 

         return filectx(self._repo, self._path, fileid=fileid,
                        filelog=self._filelog, changeid=changeid)
 
+    def rawdata(self):

+        return self._filelog.rawrevision(self._filenode)

+

     def data(self):
         try:
             return self._filelog.read(self._filenode)
diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py

--- a/mercurial/debugcommands.py

+++ b/mercurial/debugcommands.py

@@ -445,7 +445,7 @@ 

         raise error.CommandError('debugdata', _('invalid arguments'))
     r = cmdutil.openrevlog(repo, 'debugdata', file_, opts)
     try:
-        ui.write(r.revision(r.lookup(rev)))

+        ui.write(r.rawrevision(r.lookup(rev)))

     except KeyError:
         raise error.Abort(_('invalid revision identifier %s') % rev)
 
diff --git a/mercurial/revlog.py b/mercurial/revlog.py

--- a/mercurial/revlog.py

+++ b/mercurial/revlog.py

@@ -1202,12 +1202,18 @@ 

         return mdiff.textdiff(self.revision(rev1),
                               self.revision(rev2))
 
-    def revision(self, nodeorrev, _df=None):

+    def rawrevision(self, nodeorrev):

+        """cf. revision() for argument description."""

+        return self.revision(nodeorrev, raw=True)

+

+    def revision(self, nodeorrev, _df=None, raw=False):

         """return an uncompressed revision of a given node or revision
         number.
 
-        _df is an existing file handle to read from. It is meant to only be

-        used internally.

+        _df - an existing file handle to read from. (internal-only)

+        raw - an optional argument specifying if the revision data is to be

+        treated as raw data when applying flag transforms. 'raw' should be set

+        to True when generating changegroups or in debug commands.

         """
         if isinstance(nodeorrev, int):
             rev = nodeorrev
@@ -1357,7 +1363,8 @@ 

         ifh = self.opener(self.indexfile, "a+", checkambig=self._checkambig)
         try:
             return self._addrevision(node, text, transaction, link, p1, p2,
-                                     REVIDX_DEFAULT_FLAGS, cachedelta, ifh, dfh)

+                                     REVIDX_DEFAULT_FLAGS, cachedelta, ifh, dfh,

+                                     raw=False)

         finally:
             if dfh:
                 dfh.close()
@@ -1412,13 +1419,16 @@ 

         return True
 
     def _addrevision(self, node, text, transaction, link, p1, p2, flags,
-                     cachedelta, ifh, dfh, alwayscache=False):

+                     cachedelta, ifh, dfh, alwayscache=False, raw=False):

         """internal function to add revisions to the log
 
         see addrevision for argument descriptions.
         invariants:
         - text is optional (can be None); if not set, cachedelta must be set.
           if both are set, they must correspond to each other.
+        - raw is optional; if set to True, it indicates the revision data is to

+          be treated by processflags() as raw. It is usually set by changegroup

+          generation and debug commands.

         """
         btext = [text]
         def buildtext():
@@ -1438,8 +1448,9 @@ 

                     fh = ifh
                 else:
                     fh = dfh
-                basetext = self.revision(self.node(baserev), _df=fh)

+                basetext = self.revision(self.node(baserev), _df=fh, raw=raw)

                 btext[0] = mdiff.patch(basetext, delta)
+

             try:
                 self.checkhash(btext[0], node, p1=p1, p2=p2)
                 if flags & REVIDX_ISCENSORED:
@@ -1668,10 +1679,14 @@ 

                 # the added revision, which will require a call to
                 # revision(). revision() will fast path if there is a cache
                 # hit. So, we tell _addrevision() to always cache in this case.
+                # We're only using addgroup() in the context of changegroup

+                # generation so the revision data can always be handled as raw

+                # by the flagprocessor.

                 chain = self._addrevision(node, None, transaction, link,
                                           p1, p2, flags, (baserev, delta),
                                           ifh, dfh,
-                                          alwayscache=bool(addrevisioncb))

+                                          alwayscache=bool(addrevisioncb),

+                                          raw=True)

 
                 if addrevisioncb:
                     addrevisioncb(self, chain)
diff --git a/mercurial/unionrepo.py b/mercurial/unionrepo.py

--- a/mercurial/unionrepo.py

+++ b/mercurial/unionrepo.py

@@ -93,7 +93,7 @@ 

         return mdiff.textdiff(self.revision(self.node(rev1)),
                               self.revision(self.node(rev2)))
 
-    def revision(self, nodeorrev):

+    def revision(self, nodeorrev, raw=False):

         """return an uncompressed revision of a given node or revision
         number.
         """