Patchwork [1,of,3] hgweb: use an extensible list of file to check for refrash

login
register
mail settings
Submitter Pierre-Yves David
Date July 1, 2015, 7:26 p.m.
Message ID <7e3ad479d9bda6560885.1435778767@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/9848/
State Superseded
Headers show

Comments

Pierre-Yves David - July 1, 2015, 7:26 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1435735130 25200
#      Wed Jul 01 00:18:50 2015 -0700
# Node ID 7e3ad479d9bda6560885f08a1df63d386639b2e1
# Parent  c76e8d14383a44a740d986d87db6f58276fb57e8
hgweb: use an extensible list of file to check for refrash

The refresh feature was explicitly testing if '00changelog.i' and 'phasesroots'
changed. This is overlooking other important information like bookmarks and
obstore (bookmark have their own hack to work around it).

We move to a more extensible system with a list of file of interest that will be
used to build the repo state. The system should probably move in a more central
place so that the command server and other system are able to use it.
Extensions write will also be able to add entry to ensure that changes to
extension data are properly detected.

Also the current key (mtime, size) is notably weak for bookmarks and phases
whose files can easily change content without effect on their size.

Still, this patch seems like a valuable minimal first step.
Yuya Nishihara - July 2, 2015, 2:36 p.m.
On Wed, 01 Jul 2015 12:26:07 -0700, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1435735130 25200
> #      Wed Jul 01 00:18:50 2015 -0700
> # Node ID 7e3ad479d9bda6560885f08a1df63d386639b2e1
> # Parent  c76e8d14383a44a740d986d87db6f58276fb57e8
> hgweb: use an extensible list of file to check for refrash
> 
> The refresh feature was explicitly testing if '00changelog.i' and 'phasesroots'
> changed. This is overlooking other important information like bookmarks and
> obstore (bookmark have their own hack to work around it).
> 
> We move to a more extensible system with a list of file of interest that will be
> used to build the repo state. The system should probably move in a more central
> place so that the command server and other system are able to use it.
> Extensions write will also be able to add entry to ensure that changes to
> extension data are properly detected.
> 
> Also the current key (mtime, size) is notably weak for bookmarks and phases
> whose files can easily change content without effect on their size.
> 
> Still, this patch seems like a valuable minimal first step.
> 
> diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
> --- a/mercurial/hgweb/hgweb_mod.py
> +++ b/mercurial/hgweb/hgweb_mod.py
> @@ -24,10 +24,17 @@ perms = {
>      'listkeys': 'pull',
>      'unbundle': 'push',
>      'pushkey': 'push',
>  }
>  
> +## files of interrest
> +# used to check if the repository has changed looking at mtime and size of
> +# theses files.  This should probably be relocated a bit higher in core
> +foi = [('spath', None),
> +       ('spath', 'phaseroots'), # ! phase can change content at the same size
> +      ]
> +
>  def makebreadcrumb(url, prefix=''):
>      '''Return a 'URL breadcrumb' list
>  
>      A 'URL breadcrumb' is a list of URL-name pairs,
>      corresponding to each of the path items on a URL.
> @@ -118,14 +125,20 @@ class hgweb(object):
>              return repo.filtered(viewconfig)
>          else:
>              return repo.filtered('served')
>  
>      def refresh(self, request=None):
> -        st = get_stat(self.repo.spath)
> -        pst = get_stat(self.repo.spath, 'phaseroots')
> -        # changelog mtime and size, phaseroots mtime and size
> -        repostate = ((st.st_mtime, st.st_size), (pst.st_mtime, pst.st_size))
> +        repostate = []
> +        # file of interrests mtime and size
> +        for meth, fname in foi:
> +            pathfunc = getattr(self.repo, meth)

'repo.spath' isn't a function.

> +            if fname is None:
> +                st = get_stat(pathfunc)
> +            else:
> +                st = get_stat(pathfunc, fname)

I think the "fname is None" case is misleading because get_stat(spath) means
get_stat(spath, '00changelog.i'). Can we make fname a mandatory parameter?

Actually, I misread it as fstat('.hg/store') and wondered why it had to
check the mtime of .hg/store/phaseroots even though the mtime of .hg/store
should be changed at repo.lock().
Pierre-Yves David - July 2, 2015, 8:35 p.m.
On 07/02/2015 07:36 AM, Yuya Nishihara wrote:
> On Wed, 01 Jul 2015 12:26:07 -0700, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1435735130 25200
>> #      Wed Jul 01 00:18:50 2015 -0700
>> # Node ID 7e3ad479d9bda6560885f08a1df63d386639b2e1
>> # Parent  c76e8d14383a44a740d986d87db6f58276fb57e8
>> hgweb: use an extensible list of file to check for refrash
>>
>> The refresh feature was explicitly testing if '00changelog.i' and 'phasesroots'
>> changed. This is overlooking other important information like bookmarks and
>> obstore (bookmark have their own hack to work around it).
>>
>> We move to a more extensible system with a list of file of interest that will be
>> used to build the repo state. The system should probably move in a more central
>> place so that the command server and other system are able to use it.
>> Extensions write will also be able to add entry to ensure that changes to
>> extension data are properly detected.
>>
>> Also the current key (mtime, size) is notably weak for bookmarks and phases
>> whose files can easily change content without effect on their size.
>>
>> Still, this patch seems like a valuable minimal first step.
>>
>> diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
>> --- a/mercurial/hgweb/hgweb_mod.py
>> +++ b/mercurial/hgweb/hgweb_mod.py
>> @@ -24,10 +24,17 @@ perms = {
>>       'listkeys': 'pull',
>>       'unbundle': 'push',
>>       'pushkey': 'push',
>>   }
>>
>> +## files of interrest
>> +# used to check if the repository has changed looking at mtime and size of
>> +# theses files.  This should probably be relocated a bit higher in core
>> +foi = [('spath', None),
>> +       ('spath', 'phaseroots'), # ! phase can change content at the same size
>> +      ]
>> +
>>   def makebreadcrumb(url, prefix=''):
>>       '''Return a 'URL breadcrumb' list
>>
>>       A 'URL breadcrumb' is a list of URL-name pairs,
>>       corresponding to each of the path items on a URL.
>> @@ -118,14 +125,20 @@ class hgweb(object):
>>               return repo.filtered(viewconfig)
>>           else:
>>               return repo.filtered('served')
>>
>>       def refresh(self, request=None):
>> -        st = get_stat(self.repo.spath)
>> -        pst = get_stat(self.repo.spath, 'phaseroots')
>> -        # changelog mtime and size, phaseroots mtime and size
>> -        repostate = ((st.st_mtime, st.st_size), (pst.st_mtime, pst.st_size))
>> +        repostate = []
>> +        # file of interrests mtime and size
>> +        for meth, fname in foi:
>> +            pathfunc = getattr(self.repo, meth)
>
> 'repo.spath' isn't a function.

Well, it is a callable that take a filename and absolute file name 
value. From the user-code perspective it is indistinguishable from a 
function.

How would you like the variable to be called instead?

>> +            if fname is None:
>> +                st = get_stat(pathfunc)
>> +            else:
>> +                st = get_stat(pathfunc, fname)
>
> I think the "fname is None" case is misleading because get_stat(spath) means
> get_stat(spath, '00changelog.i'). Can we make fname a mandatory parameter?

I agree that the no-argument case is atrocious, but the final goal of 
the series is just to get bookmark change detected in a proper way.
I saw a deeper cleanup of this logic as a scope creep.
I expect the whole logic to go away eventually we me move to a more 
sensible checking mechanism.

If you think this should be fixed as part of this series, let me know, 
and I'll do it.

> Actually, I misread it as fstat('.hg/store') and wondered why it had to
> check the mtime of .hg/store/phaseroots even though the mtime of .hg/store
> should be changed at repo.lock().

(thanks for reviewing this)
Yuya Nishihara - July 3, 2015, 12:08 p.m.
On Thu, 02 Jul 2015 13:35:19 -0700, Pierre-Yves David wrote:
> On 07/02/2015 07:36 AM, Yuya Nishihara wrote:
> > On Wed, 01 Jul 2015 12:26:07 -0700, Pierre-Yves David wrote:
> >> # HG changeset patch
> >> # User Pierre-Yves David <pierre-yves.david@fb.com>
> >> # Date 1435735130 25200
> >> #      Wed Jul 01 00:18:50 2015 -0700
> >> # Node ID 7e3ad479d9bda6560885f08a1df63d386639b2e1
> >> # Parent  c76e8d14383a44a740d986d87db6f58276fb57e8
> >> hgweb: use an extensible list of file to check for refrash
> >>
> >> The refresh feature was explicitly testing if '00changelog.i' and 'phasesroots'
> >> changed. This is overlooking other important information like bookmarks and
> >> obstore (bookmark have their own hack to work around it).
> >>
> >> We move to a more extensible system with a list of file of interest that will be
> >> used to build the repo state. The system should probably move in a more central
> >> place so that the command server and other system are able to use it.
> >> Extensions write will also be able to add entry to ensure that changes to
> >> extension data are properly detected.
> >>
> >> Also the current key (mtime, size) is notably weak for bookmarks and phases
> >> whose files can easily change content without effect on their size.
> >>
> >> Still, this patch seems like a valuable minimal first step.
> >>
> >> diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
> >> --- a/mercurial/hgweb/hgweb_mod.py
> >> +++ b/mercurial/hgweb/hgweb_mod.py
> >> @@ -24,10 +24,17 @@ perms = {
> >>       'listkeys': 'pull',
> >>       'unbundle': 'push',
> >>       'pushkey': 'push',
> >>   }
> >>
> >> +## files of interrest
> >> +# used to check if the repository has changed looking at mtime and size of
> >> +# theses files.  This should probably be relocated a bit higher in core
> >> +foi = [('spath', None),
> >> +       ('spath', 'phaseroots'), # ! phase can change content at the same size
> >> +      ]
> >> +
> >>   def makebreadcrumb(url, prefix=''):
> >>       '''Return a 'URL breadcrumb' list
> >>
> >>       A 'URL breadcrumb' is a list of URL-name pairs,
> >>       corresponding to each of the path items on a URL.
> >> @@ -118,14 +125,20 @@ class hgweb(object):
> >>               return repo.filtered(viewconfig)
> >>           else:
> >>               return repo.filtered('served')
> >>
> >>       def refresh(self, request=None):
> >> -        st = get_stat(self.repo.spath)
> >> -        pst = get_stat(self.repo.spath, 'phaseroots')
> >> -        # changelog mtime and size, phaseroots mtime and size
> >> -        repostate = ((st.st_mtime, st.st_size), (pst.st_mtime, pst.st_size))
> >> +        repostate = []
> >> +        # file of interrests mtime and size
> >> +        for meth, fname in foi:
> >> +            pathfunc = getattr(self.repo, meth)
> >
> > 'repo.spath' isn't a function.
> 
> Well, it is a callable that take a filename and absolute file name 
> value. From the user-code perspective it is indistinguishable from a 
> function.

Didn't you confuse it with repo.sjoin() ? repo.spath is a str.

> >> +            if fname is None:
> >> +                st = get_stat(pathfunc)
> >> +            else:
> >> +                st = get_stat(pathfunc, fname)
> >
> > I think the "fname is None" case is misleading because get_stat(spath) means
> > get_stat(spath, '00changelog.i'). Can we make fname a mandatory parameter?
> 
> I agree that the no-argument case is atrocious, but the final goal of 
> the series is just to get bookmark change detected in a proper way.
> I saw a deeper cleanup of this logic as a scope creep.
> I expect the whole logic to go away eventually we me move to a more 
> sensible checking mechanism.
> 
> If you think this should be fixed as part of this series, let me know, 
> and I'll do it.

Well, I don't have a strong opinion about it. I just thought it would be
more comprehensible if fname is in the table:

foi = [('spath', '00changelog.i'),
       ('spath', 'phaseroots'),
Pierre-Yves David - July 3, 2015, 6:29 p.m.
On 07/03/2015 05:08 AM, Yuya Nishihara wrote:
> On Thu, 02 Jul 2015 13:35:19 -0700, Pierre-Yves David wrote:
>> On 07/02/2015 07:36 AM, Yuya Nishihara wrote:
>>> On Wed, 01 Jul 2015 12:26:07 -0700, Pierre-Yves David wrote:
>>>> # HG changeset patch
>>>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>>>> # Date 1435735130 25200
>>>> #      Wed Jul 01 00:18:50 2015 -0700
>>>> # Node ID 7e3ad479d9bda6560885f08a1df63d386639b2e1
>>>> # Parent  c76e8d14383a44a740d986d87db6f58276fb57e8
>>>> hgweb: use an extensible list of file to check for refrash
>>>>
>>>> The refresh feature was explicitly testing if '00changelog.i' and 'phasesroots'
>>>> changed. This is overlooking other important information like bookmarks and
>>>> obstore (bookmark have their own hack to work around it).
>>>>
>>>> We move to a more extensible system with a list of file of interest that will be
>>>> used to build the repo state. The system should probably move in a more central
>>>> place so that the command server and other system are able to use it.
>>>> Extensions write will also be able to add entry to ensure that changes to
>>>> extension data are properly detected.
>>>>
>>>> Also the current key (mtime, size) is notably weak for bookmarks and phases
>>>> whose files can easily change content without effect on their size.
>>>>
>>>> Still, this patch seems like a valuable minimal first step.
>>>>
>>>> diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
>>>> --- a/mercurial/hgweb/hgweb_mod.py
>>>> +++ b/mercurial/hgweb/hgweb_mod.py
>>>> @@ -24,10 +24,17 @@ perms = {
>>>>        'listkeys': 'pull',
>>>>        'unbundle': 'push',
>>>>        'pushkey': 'push',
>>>>    }
>>>>
>>>> +## files of interrest
>>>> +# used to check if the repository has changed looking at mtime and size of
>>>> +# theses files.  This should probably be relocated a bit higher in core
>>>> +foi = [('spath', None),
>>>> +       ('spath', 'phaseroots'), # ! phase can change content at the same size
>>>> +      ]
>>>> +
>>>>    def makebreadcrumb(url, prefix=''):
>>>>        '''Return a 'URL breadcrumb' list
>>>>
>>>>        A 'URL breadcrumb' is a list of URL-name pairs,
>>>>        corresponding to each of the path items on a URL.
>>>> @@ -118,14 +125,20 @@ class hgweb(object):
>>>>                return repo.filtered(viewconfig)
>>>>            else:
>>>>                return repo.filtered('served')
>>>>
>>>>        def refresh(self, request=None):
>>>> -        st = get_stat(self.repo.spath)
>>>> -        pst = get_stat(self.repo.spath, 'phaseroots')
>>>> -        # changelog mtime and size, phaseroots mtime and size
>>>> -        repostate = ((st.st_mtime, st.st_size), (pst.st_mtime, pst.st_size))
>>>> +        repostate = []
>>>> +        # file of interrests mtime and size
>>>> +        for meth, fname in foi:
>>>> +            pathfunc = getattr(self.repo, meth)
>>>
>>> 'repo.spath' isn't a function.
>>
>> Well, it is a callable that take a filename and absolute file name
>> value. From the user-code perspective it is indistinguishable from a
>> function.
>
> Didn't you confuse it with repo.sjoin() ? repo.spath is a str.

Oops, you are absolutely right, I can't read code.

I'll send a V2

(I've also killed the infamous default argument)

Patch

diff --git a/mercurial/hgweb/hgweb_mod.py b/mercurial/hgweb/hgweb_mod.py
--- a/mercurial/hgweb/hgweb_mod.py
+++ b/mercurial/hgweb/hgweb_mod.py
@@ -24,10 +24,17 @@  perms = {
     'listkeys': 'pull',
     'unbundle': 'push',
     'pushkey': 'push',
 }
 
+## files of interrest
+# used to check if the repository has changed looking at mtime and size of
+# theses files.  This should probably be relocated a bit higher in core
+foi = [('spath', None),
+       ('spath', 'phaseroots'), # ! phase can change content at the same size
+      ]
+
 def makebreadcrumb(url, prefix=''):
     '''Return a 'URL breadcrumb' list
 
     A 'URL breadcrumb' is a list of URL-name pairs,
     corresponding to each of the path items on a URL.
@@ -118,14 +125,20 @@  class hgweb(object):
             return repo.filtered(viewconfig)
         else:
             return repo.filtered('served')
 
     def refresh(self, request=None):
-        st = get_stat(self.repo.spath)
-        pst = get_stat(self.repo.spath, 'phaseroots')
-        # changelog mtime and size, phaseroots mtime and size
-        repostate = ((st.st_mtime, st.st_size), (pst.st_mtime, pst.st_size))
+        repostate = []
+        # file of interrests mtime and size
+        for meth, fname in foi:
+            pathfunc = getattr(self.repo, meth)
+            if fname is None:
+                st = get_stat(pathfunc)
+            else:
+                st = get_stat(pathfunc, fname)
+            repostate.append((st.st_mtime, st.st_size))
+        repostate = tuple(repostate)
         # we need to compare file size in addition to mtime to catch
         # changes made less than a second ago
         if repostate != self.repostate:
             r = hg.repository(self.repo.baseui, self.repo.url())
             self.repo = self._getview(r)