Patchwork [1,of,2] revlog: return offset from _chunkraw()

login
register
mail settings
Submitter Gregory Szorc
Date Jan. 6, 2016, 3:52 a.m.
Message ID <2506af2e83054d590b9b.1452052332@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/12554/
State Accepted
Delegated to: Martin von Zweigbergk
Headers show

Comments

Gregory Szorc - Jan. 6, 2016, 3:52 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1452052311 28800
#      Tue Jan 05 19:51:51 2016 -0800
# Node ID 2506af2e83054d590b9b12912c91074f5f58a274
# Parent  b8405d739149cdd6d8d9bd5e3dd2ad8487b1f09a
revlog: return offset from _chunkraw()

A subsequent patch will refactor _chunks() and the calculation of the
offset will no longer occur in that function. Prepare by returning the
offset from _chunkraw().
Martin von Zweigbergk - Jan. 6, 2016, 4:39 a.m.
On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> +        Callers will need to call ``self.start(rev)`` and
> ``self.length(rev)``
> +        to determine where each revision's data begins and ends.


Is that still true? Is the returned offset/start different from
self.start(rev)?
Gregory Szorc - Jan. 6, 2016, 4:47 a.m.
On Tue, Jan 5, 2016 at 8:39 PM, Martin von Zweigbergk <martinvonz@google.com
> wrote:

>
>
> On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
>
>> +        Callers will need to call ``self.start(rev)`` and
>> ``self.length(rev)``
>> +        to determine where each revision's data begins and ends.
>
>
> Is that still true? Is the returned offset/start different from
> self.start(rev)?
>

The function returns data for 1 or more revisions. The first revision will
have offset == self.start(rev), so you don't need to call anything if you
are only getting a single revision.
Martin von Zweigbergk - Jan. 6, 2016, 4:52 a.m.
On Tue, Jan 5, 2016 at 8:47 PM Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> On Tue, Jan 5, 2016 at 8:39 PM, Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>>
>>
>> On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
>> wrote:
>>
>>> +        Callers will need to call ``self.start(rev)`` and
>>> ``self.length(rev)``
>>> +        to determine where each revision's data begins and ends.
>>
>>
>> Is that still true? Is the returned offset/start different from
>> self.start(rev)?
>>
>
> The function returns data for 1 or more revisions. The first revision will
> have offset == self.start(rev), so you don't need to call anything if you
> are only getting a single revision.
>

I see. Will test and push to the clowncopter soon. Thanks for cleaning up!
And great commit message on the next patch. I'm glad someone seems to
remember the (obsolete) reason for it. I would probably have given up
trying to understand that piece of code.
Gregory Szorc - Jan. 6, 2016, 5:11 a.m.
On Tue, Jan 5, 2016 at 8:52 PM, Martin von Zweigbergk <martinvonz@google.com
> wrote:

>
> On Tue, Jan 5, 2016 at 8:47 PM Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
>
>> On Tue, Jan 5, 2016 at 8:39 PM, Martin von Zweigbergk <
>> martinvonz@google.com> wrote:
>>
>>>
>>>
>>> On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
>>> wrote:
>>>
>>>> +        Callers will need to call ``self.start(rev)`` and
>>>> ``self.length(rev)``
>>>> +        to determine where each revision's data begins and ends.
>>>
>>>
>>> Is that still true? Is the returned offset/start different from
>>> self.start(rev)?
>>>
>>
>> The function returns data for 1 or more revisions. The first revision
>> will have offset == self.start(rev), so you don't need to call anything if
>> you are only getting a single revision.
>>
>
> I see. Will test and push to the clowncopter soon. Thanks for cleaning up!
> And great commit message on the next patch. I'm glad someone seems to
> remember the (obsolete) reason for it. I would probably have given up
> trying to understand that piece of code.
>

This should be in the commit message, but
https://selenic.com/repo/hg/rev/f13728d59c0e the changed code in patch 2.
Since `hg serve` is referenced, I assume it is related to thread safety.
Actually, https://selenic.com/repo/hg/rev/4ab287c2d337 could probably be
reverted too...
Martin von Zweigbergk - Jan. 6, 2016, 5:14 a.m.
On Tue, Jan 5, 2016 at 9:11 PM Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> On Tue, Jan 5, 2016 at 8:52 PM, Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>>
>> On Tue, Jan 5, 2016 at 8:47 PM Gregory Szorc <gregory.szorc@gmail.com>
>> wrote:
>>
>>> On Tue, Jan 5, 2016 at 8:39 PM, Martin von Zweigbergk <
>>> martinvonz@google.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
>>>> wrote:
>>>>
>>>>> +        Callers will need to call ``self.start(rev)`` and
>>>>> ``self.length(rev)``
>>>>> +        to determine where each revision's data begins and ends.
>>>>
>>>>
>>>> Is that still true? Is the returned offset/start different from
>>>> self.start(rev)?
>>>>
>>>
>>> The function returns data for 1 or more revisions. The first revision
>>> will have offset == self.start(rev), so you don't need to call anything if
>>> you are only getting a single revision.
>>>
>>
>> I see. Will test and push to the clowncopter soon. Thanks for cleaning
>> up! And great commit message on the next patch. I'm glad someone seems to
>> remember the (obsolete) reason for it. I would probably have given up
>> trying to understand that piece of code.
>>
>
> This should be in the commit message, but
> https://selenic.com/repo/hg/rev/f13728d59c0e the changed code in patch 2.
> Since `hg serve` is referenced, I assume it is related to thread safety.
> Actually, https://selenic.com/repo/hg/rev/4ab287c2d337 could probably be
> reverted too...
>

Want me to hold off queuing and wait for a V2? I don't mind queuing as is
and you can send a patch 3/2 later if you feel like it.
Martin von Zweigbergk - Jan. 6, 2016, 5:49 a.m.
On Tue, Jan 5, 2016 at 9:14 PM Martin von Zweigbergk <martinvonz@google.com>
wrote:

> On Tue, Jan 5, 2016 at 9:11 PM Gregory Szorc <gregory.szorc@gmail.com>
> wrote:
>
>> On Tue, Jan 5, 2016 at 8:52 PM, Martin von Zweigbergk <
>> martinvonz@google.com> wrote:
>>
>>>
>>> On Tue, Jan 5, 2016 at 8:47 PM Gregory Szorc <gregory.szorc@gmail.com>
>>> wrote:
>>>
>>>> On Tue, Jan 5, 2016 at 8:39 PM, Martin von Zweigbergk <
>>>> martinvonz@google.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +        Callers will need to call ``self.start(rev)`` and
>>>>>> ``self.length(rev)``
>>>>>> +        to determine where each revision's data begins and ends.
>>>>>
>>>>>
>>>>> Is that still true? Is the returned offset/start different from
>>>>> self.start(rev)?
>>>>>
>>>>
>>>> The function returns data for 1 or more revisions. The first revision
>>>> will have offset == self.start(rev), so you don't need to call anything if
>>>> you are only getting a single revision.
>>>>
>>>
>>> I see. Will test and push to the clowncopter soon. Thanks for cleaning
>>> up! And great commit message on the next patch. I'm glad someone seems to
>>> remember the (obsolete) reason for it. I would probably have given up
>>> trying to understand that piece of code.
>>>
>>
>> This should be in the commit message, but
>> https://selenic.com/repo/hg/rev/f13728d59c0e the changed code in patch
>> 2. Since `hg serve` is referenced, I assume it is related to thread safety.
>> Actually, https://selenic.com/repo/hg/rev/4ab287c2d337 could probably be
>> reverted too...
>>
>
> Want me to hold off queuing and wait for a V2? I don't mind queuing as is
> and you can send a patch 3/2 later if you feel like it.
>

Time is up, I have pushed these to the clowncopter :-)
Gregory Szorc - Jan. 6, 2016, 6:34 a.m.
It turns out I already reverted that other thread safety code in
d708873f1f33.

On Tue, Jan 5, 2016 at 9:49 PM, Martin von Zweigbergk <martinvonz@google.com
> wrote:

>
>
> On Tue, Jan 5, 2016 at 9:14 PM Martin von Zweigbergk <
> martinvonz@google.com> wrote:
>
>> On Tue, Jan 5, 2016 at 9:11 PM Gregory Szorc <gregory.szorc@gmail.com>
>> wrote:
>>
>>> On Tue, Jan 5, 2016 at 8:52 PM, Martin von Zweigbergk <
>>> martinvonz@google.com> wrote:
>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 8:47 PM Gregory Szorc <gregory.szorc@gmail.com>
>>>> wrote:
>>>>
>>>>> On Tue, Jan 5, 2016 at 8:39 PM, Martin von Zweigbergk <
>>>>> martinvonz@google.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 7:52 PM Gregory Szorc <gregory.szorc@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +        Callers will need to call ``self.start(rev)`` and
>>>>>>> ``self.length(rev)``
>>>>>>> +        to determine where each revision's data begins and ends.
>>>>>>
>>>>>>
>>>>>> Is that still true? Is the returned offset/start different from
>>>>>> self.start(rev)?
>>>>>>
>>>>>
>>>>> The function returns data for 1 or more revisions. The first revision
>>>>> will have offset == self.start(rev), so you don't need to call anything if
>>>>> you are only getting a single revision.
>>>>>
>>>>
>>>> I see. Will test and push to the clowncopter soon. Thanks for cleaning
>>>> up! And great commit message on the next patch. I'm glad someone seems to
>>>> remember the (obsolete) reason for it. I would probably have given up
>>>> trying to understand that piece of code.
>>>>
>>>
>>> This should be in the commit message, but
>>> https://selenic.com/repo/hg/rev/f13728d59c0e the changed code in patch
>>> 2. Since `hg serve` is referenced, I assume it is related to thread safety.
>>> Actually, https://selenic.com/repo/hg/rev/4ab287c2d337 could probably
>>> be reverted too...
>>>
>>
>> Want me to hold off queuing and wait for a V2? I don't mind queuing as is
>> and you can send a patch 3/2 later if you feel like it.
>>
>
> Time is up, I have pushed these to the clowncopter :-)
>

Patch

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -563,17 +563,17 @@  def perfrevlogrevision(ui, repo, file_, 
         r._checkhash(text, node, rev)
 
     def dorevision():
         if not cache:
             r.clearcaches()
         r.revision(node)
 
     chain = r._deltachain(rev)[0]
-    data = r._chunkraw(chain[0], chain[-1])
+    data = r._chunkraw(chain[0], chain[-1])[1]
     bins = r._chunks(chain)
     text = str(bins[0])
     bins = bins[1:]
     text = mdiff.patches(text, bins)
 
     benches = [
         (lambda: dorevision(), 'full'),
         (lambda: dodeltachain(rev), 'deltachain'),
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1066,38 +1066,42 @@  class revlog(object):
         """Obtain a segment of raw data corresponding to a range of revisions.
 
         Accepts the start and end revisions and an optional already-open
         file handle to be used for reading. If the file handle is read, its
         seek position will not be preserved.
 
         Requests for data may be satisfied by a cache.
 
-        Returns a str or a buffer instance of raw byte data. Callers will
-        need to call ``self.start(rev)`` and ``self.length()`` to determine
-        where each revision's data begins and ends.
+        Returns a 2-tuple of (offset, data) for the requested range of
+        revisions. Offset is the integer offset from the beginning of the
+        revlog and data is a str or buffer of the raw byte data.
+
+        Callers will need to call ``self.start(rev)`` and ``self.length(rev)``
+        to determine where each revision's data begins and ends.
         """
         start = self.start(startrev)
         end = self.end(endrev)
         if self._inline:
             start += (startrev + 1) * self._io.size
             end += (endrev + 1) * self._io.size
         length = end - start
-        return self._getchunk(start, length, df=df)
+
+        return start, self._getchunk(start, length, df=df)
 
     def _chunk(self, rev, df=None):
         """Obtain a single decompressed chunk for a revision.
 
         Accepts an integer revision and an optional already-open file handle
         to be used for reading. If used, the seek position of the file will not
         be preserved.
 
         Returns a str holding uncompressed data for the requested revision.
         """
-        return decompress(self._chunkraw(rev, rev, df=df))
+        return decompress(self._chunkraw(rev, rev, df=df)[1])
 
     def _chunks(self, revs, df=None):
         """Obtain decompressed chunks for the specified revisions.
 
         Accepts an iterable of numeric revisions that are assumed to be in
         ascending order. Also accepts an optional already-open file handle
         to be used for reading. If used, the seek position of the file will
         not be preserved.
@@ -1118,17 +1122,17 @@  class revlog(object):
         l = []
         ladd = l.append
 
         # preload the cache
         try:
             while True:
                 # ensure that the cache doesn't change out from under us
                 _cache = self._chunkcache
-                self._chunkraw(revs[0], revs[-1], df=df)
+                self._chunkraw(revs[0], revs[-1], df=df)[1]
                 if _cache == self._chunkcache:
                     break
             offset, data = _cache
         except OverflowError:
             # issue4215 - we can't cache a run of chunks greater than
             # 2G on Windows
             return [self._chunk(rev, df=df) for rev in revs]
 
@@ -1262,17 +1266,17 @@  class revlog(object):
 
         if fp:
             fp.flush()
             fp.close()
 
         df = self.opener(self.datafile, 'w')
         try:
             for r in self:
-                df.write(self._chunkraw(r, r))
+                df.write(self._chunkraw(r, r)[1])
         finally:
             df.close()
 
         fp = self.opener(self.indexfile, 'w', atomictemp=True)
         self.version &= ~(REVLOGNGINLINEDATA)
         self._inline = False
         for i in self:
             e = self._io.packentry(self.index[i], self.node, self.version, i)