Patchwork [2,of,2,stable] scmutil: record all symlinks to a repository in walkrepos

login
register
mail settings
Submitter Enrique A. Tobis
Date Dec. 11, 2014, 3:47 p.m.
Message ID <a2351ea044f117402b17.1418312868@eowyn>
Download mbox | patch
Permalink /patch/7051/
State Rejected
Headers show

Comments

Enrique A. Tobis - Dec. 11, 2014, 3:47 p.m.
# HG changeset patch
# User Enrique A. Tobis <enrique@tobis.com.ar>
# Date 1418275100 18000
#      Thu Dec 11 00:18:20 2014 -0500
# Branch stable
# Node ID a2351ea044f117402b17809050bdc5dc2de29ff4
# Parent  c67b563426866e1c717c6a25b0056bb51cb155e5
scmutil: record all symlinks to a repository in walkrepos

In walkrepos, list all paths that lead to a repository when symlinks are
followed, while still avoiding infinite loops.
Matt Mackall - Dec. 12, 2014, 8:52 p.m.
On Thu, 2014-12-11 at 10:47 -0500, Enrique A. Tobis wrote:
> # HG changeset patch
> # User Enrique A. Tobis <enrique@tobis.com.ar>
> # Date 1418275100 18000
> #      Thu Dec 11 00:18:20 2014 -0500
> # Branch stable
> # Node ID a2351ea044f117402b17809050bdc5dc2de29ff4
> # Parent  c67b563426866e1c717c6a25b0056bb51cb155e5
> scmutil: record all symlinks to a repository in walkrepos
> 
> In walkrepos, list all paths that lead to a repository when symlinks are
> followed, while still avoiding infinite loops.

This patch is too complex and includes unrelated changes that interfere
with its readability. For instance:

> -    def errhandler(err):
> -        if err.filename == path:
> -            raise err

..we used to have this error handling code that was presumably there for
a reason. You removed it with no explanation.

Even if there is a good reason to remove it, it doesn't belong in this
patch.

Relatedly, you've replaced os.walk without implementing its error
handling.

If you really need to rewrite os.walk (I'm not convinced) to get the
behavior you want, you should do your patches like this:

- cleanups
- replace os.walk, no functional changes
- behavior change
Enrique A. Tobis - Dec. 13, 2014, 5:10 a.m.
On Fri, Dec 12, 2014 at 3:52 PM, Matt Mackall <mpm@selenic.com> wrote:
> On Thu, 2014-12-11 at 10:47 -0500, Enrique A. Tobis wrote:
>> # HG changeset patch
>> # User Enrique A. Tobis <enrique@tobis.com.ar>
>> # Date 1418275100 18000
>> #      Thu Dec 11 00:18:20 2014 -0500
>> # Branch stable
>> # Node ID a2351ea044f117402b17809050bdc5dc2de29ff4
>> # Parent  c67b563426866e1c717c6a25b0056bb51cb155e5
>> scmutil: record all symlinks to a repository in walkrepos
>>
>> In walkrepos, list all paths that lead to a repository when symlinks are
>> followed, while still avoiding infinite loops.
>
> This patch is too complex and includes unrelated changes that interfere
> with its readability. For instance:
>
>> -    def errhandler(err):
>> -        if err.filename == path:
>> -            raise err
>
> ..we used to have this error handling code that was presumably there for
> a reason. You removed it with no explanation.
>
> Even if there is a good reason to remove it, it doesn't belong in this
> patch.
>
> Relatedly, you've replaced os.walk without implementing its error
> handling.
>
> If you really need to rewrite os.walk (I'm not convinced) to get the
> behavior you want, you should do your patches like this:
>
> - cleanups
> - replace os.walk, no functional changes
> - behavior change
>
Thank you for your feedback! I'll be more careful with my patches in the future.

The current implementation of scmutil.walkrepo will follow symlinks if
asked, but if it finds a repository more than once, only the first
time it's found is yielded. My intention was to change that behavior
and make it return every occurence, not just the first. That is only
relevant in the context of symlinks, so I also want to avoid infinite
loops. if there is an infinite loop, I think it's reasonable to
traverse it once, but not more.

My idea for implementing this new behavior is to change the meaning of
seen_dirs. Right now it's, approximately, "every directory I've seen
so far since the top-most invocation of walkrepos." I want it to be
"every directory I've traversed to get here starting from the top-most
path." In other words, a stack of the directories traversed. I
couldn't come up with a way to do that using os.walk.

Do you think my change in behavior makes sense? If so, do you think it
can be implemented using os.walk?

Thanks again,
Enrique
Matt Mackall - Dec. 15, 2014, 7:07 p.m.
On Sat, 2014-12-13 at 00:10 -0500, Enrique A. Tobis wrote:
> On Fri, Dec 12, 2014 at 3:52 PM, Matt Mackall <mpm@selenic.com> wrote:
> > On Thu, 2014-12-11 at 10:47 -0500, Enrique A. Tobis wrote:
> >> # HG changeset patch
> >> # User Enrique A. Tobis <enrique@tobis.com.ar>
> >> # Date 1418275100 18000
> >> #      Thu Dec 11 00:18:20 2014 -0500
> >> # Branch stable
> >> # Node ID a2351ea044f117402b17809050bdc5dc2de29ff4
> >> # Parent  c67b563426866e1c717c6a25b0056bb51cb155e5
> >> scmutil: record all symlinks to a repository in walkrepos
> >>
> >> In walkrepos, list all paths that lead to a repository when symlinks are
> >> followed, while still avoiding infinite loops.
> >
> > This patch is too complex and includes unrelated changes that interfere
> > with its readability. For instance:
> >
> >> -    def errhandler(err):
> >> -        if err.filename == path:
> >> -            raise err
> >
> > ..we used to have this error handling code that was presumably there for
> > a reason. You removed it with no explanation.
> >
> > Even if there is a good reason to remove it, it doesn't belong in this
> > patch.
> >
> > Relatedly, you've replaced os.walk without implementing its error
> > handling.
> >
> > If you really need to rewrite os.walk (I'm not convinced) to get the
> > behavior you want, you should do your patches like this:
> >
> > - cleanups
> > - replace os.walk, no functional changes
> > - behavior change
> >
> Thank you for your feedback! I'll be more careful with my patches in the future.
> 
> The current implementation of scmutil.walkrepo will follow symlinks if
> asked, but if it finds a repository more than once, only the first
> time it's found is yielded. My intention was to change that behavior
> and make it return every occurence, not just the first.

Sounds like we probably had that behavior to start with.. and then nixed
it for some reason. So you need to do some archaelogy to figure out why
we did that and either a) convince us to change it back (not very
likely, because it just means someone else will be trying to convince us
to go the other way again next year) or b) convince us to make it an
option.
Pierre-Yves David - Dec. 15, 2014, 7:21 p.m.
On 12/15/2014 11:07 AM, Matt Mackall wrote:
> On Sat, 2014-12-13 at 00:10 -0500, Enrique A. Tobis wrote:
>> On Fri, Dec 12, 2014 at 3:52 PM, Matt Mackall <mpm@selenic.com> wrote:
>>> On Thu, 2014-12-11 at 10:47 -0500, Enrique A. Tobis wrote:
>>>> # HG changeset patch
>>>> # User Enrique A. Tobis <enrique@tobis.com.ar>
>>>> # Date 1418275100 18000
>>>> #      Thu Dec 11 00:18:20 2014 -0500
>>>> # Branch stable
>>>> # Node ID a2351ea044f117402b17809050bdc5dc2de29ff4
>>>> # Parent  c67b563426866e1c717c6a25b0056bb51cb155e5
>>>> scmutil: record all symlinks to a repository in walkrepos
>>>>
>>>> In walkrepos, list all paths that lead to a repository when symlinks are
>>>> followed, while still avoiding infinite loops.
>>>
>>> This patch is too complex and includes unrelated changes that interfere
>>> with its readability. For instance:
>>>
>>>> -    def errhandler(err):
>>>> -        if err.filename == path:
>>>> -            raise err
>>>
>>> ..we used to have this error handling code that was presumably there for
>>> a reason. You removed it with no explanation.
>>>
>>> Even if there is a good reason to remove it, it doesn't belong in this
>>> patch.
>>>
>>> Relatedly, you've replaced os.walk without implementing its error
>>> handling.
>>>
>>> If you really need to rewrite os.walk (I'm not convinced) to get the
>>> behavior you want, you should do your patches like this:
>>>
>>> - cleanups
>>> - replace os.walk, no functional changes
>>> - behavior change
>>>
>> Thank you for your feedback! I'll be more careful with my patches in the future.
>>
>> The current implementation of scmutil.walkrepo will follow symlinks if
>> asked, but if it finds a repository more than once, only the first
>> time it's found is yielded. My intention was to change that behavior
>> and make it return every occurence, not just the first.
>
> Sounds like we probably had that behavior to start with.. and then nixed
> it for some reason. So you need to do some archaelogy to figure out why
> we did that and either a) convince us to change it back (not very
> likely, because it just means someone else will be trying to convince us
> to go the other way again next year) or b) convince us to make it an
> option.

I remember this has been changed around the creation of the clowncopter 
repository. But I cannot remember the original reasoning. The patch was 
probably issued by David Soria Para, Durham Goode or Siddharth Agarwa

>

Patch

diff -r c67b56342686 -r a2351ea044f1 mercurial/scmutil.py
--- a/mercurial/scmutil.py	Wed Dec 10 23:46:47 2014 -0500
+++ b/mercurial/scmutil.py	Thu Dec 11 00:18:20 2014 -0500
@@ -448,9 +448,6 @@ 
 def walkrepos(path, followsym=False, seen_dirs=None, recurse=False):
     '''yield every hg repository under path, always recursively.
     The recurse flag will only control recursion into repo working dirs'''
-    def errhandler(err):
-        if err.filename == path:
-            raise err
     samestat = getattr(os.path, 'samestat', None)
     if followsym and samestat is not None:
         def adddir(dirlst, dirname):
@@ -463,35 +460,33 @@ 
             if not match:
                 dirlst.append(dirstat)
             return not match
+        removedir = lambda dirlst: dirlst.pop()
+        if seen_dirs is None:
+            seen_dirs = []
     else:
         followsym = False
-
-    if (seen_dirs is None) and followsym:
-        seen_dirs = []
-        adddir(seen_dirs, path)
-    for root, dirs, files in os.walk(path, topdown=True, onerror=errhandler):
+        adddir = lambda dirlst, dirname: not os.path.islink(dirname)
+        removedir = lambda dirlst: None
+    def _walkrepos(path, followsym, seen_dirs, recurse):
+        dirs = [dirname for dirname in os.listdir(path)
+                if os.path.isdir(os.path.join(path, dirname))]
         dirs.sort()
         if '.hg' in dirs:
-            yield root # found a repository
-            qroot = os.path.join(root, '.hg', 'patches')
+            yield path
+            qroot = os.path.join(path, '.hg', 'patches')
             if os.path.isdir(os.path.join(qroot, '.hg')):
-                yield qroot # we have a patch queue repo here
+                yield qroot
             if recurse:
-                # avoid recursing inside the .hg directory
                 dirs.remove('.hg')
             else:
-                dirs[:] = [] # don't descend further
-        elif followsym:
-            newdirs = []
-            for d in dirs:
-                fname = os.path.join(root, d)
-                if adddir(seen_dirs, fname):
-                    if os.path.islink(fname):
-                        for hgname in walkrepos(fname, True, seen_dirs):
-                            yield hgname
-                    else:
-                        newdirs.append(d)
-            dirs[:] = newdirs
+                return # don't descend further
+        for dirname in (os.path.join(path, dirname) for dirname in dirs):
+            if adddir(seen_dirs, dirname):
+                for repo in _walkrepos(dirname, followsym, seen_dirs, recurse):
+                    yield repo
+                removedir(seen_dirs)
+    for repo in _walkrepos(path, followsym, seen_dirs, recurse):
+        yield repo
 
 def osrcpath():
     '''return default os-specific hgrc search path'''
diff -r c67b56342686 -r a2351ea044f1 tests/test-hgwebdirsym.t
--- a/tests/test-hgwebdirsym.t	Wed Dec 10 23:46:47 2014 -0500
+++ b/tests/test-hgwebdirsym.t	Thu Dec 11 00:18:20 2014 -0500
@@ -40,6 +40,9 @@ 
   /al/
   /b/
   /c/
+  /circle/al/
+  /circle/b/
+  /circle/c/
   
   $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'al/file/tip/a?style=raw'
   200 Script output follows
@@ -53,28 +56,38 @@ 
   200 Script output follows
   
   c
-
+  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/al/file/tip/a?style=raw'
+  200 Script output follows
+  
+  a
+  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/b/file/tip/b?style=raw'
+  200 Script output follows
+  
+  b
+  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/c/file/tip/c?style=raw'
+  200 Script output follows
+  
+  c
 should fail
 
-  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/al/file/tip/a?style=raw'
+  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/circle/al/file/tip/a?style=raw'
   404 Not Found
   
   
-  error: repository circle/al/file/tip/a not found
+  error: repository circle/circle/al/file/tip/a not found
   [1]
-  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/b/file/tip/a?style=raw'
+  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/circle/b/file/tip/b?style=raw'
   404 Not Found
   
   
-  error: repository circle/b/file/tip/a not found
+  error: repository circle/circle/b/file/tip/b not found
   [1]
-  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/c/file/tip/a?style=raw'
+  $ "$TESTDIR/get-with-headers.py" localhost:$HGPORT 'circle/circle/c/file/tip/c?style=raw'
   404 Not Found
   
   
-  error: repository circle/c/file/tip/a not found
+  error: repository circle/circle/c/file/tip/c not found
   [1]
-
 collections errors
 
   $ cat error-collections.log
diff -r c67b56342686 -r a2351ea044f1 tests/test-walkrepo.py
--- a/tests/test-walkrepo.py	Wed Dec 10 23:46:47 2014 -0500
+++ b/tests/test-walkrepo.py	Thu Dec 11 00:18:20 2014 -0500
@@ -22,22 +22,22 @@ 
 
 def runtest():
     reposet = frozenset(walkrepos('.', followsym=True))
-    if sym and (len(reposet) != 3):
+    if sym and (len(reposet) != 7):
         print "reposet = %r" % (reposet,)
-        print ("Found %d repositories when I should have found 3"
+        print ("Found %d repositories when I should have found 7"
                % (len(reposet),))
     if (not sym) and (len(reposet) != 2):
         print "reposet = %r" % (reposet,)
         print ("Found %d repositories when I should have found 2"
                % (len(reposet),))
     sub1set = frozenset((pjoin('.', 'sub1'),
-                         pjoin('.', 'circle', 'subdir', 'sub1')))
+                         pjoin('.', 'circle', 'circle', 'subdir', 'sub1')))
     if len(sub1set & reposet) != 1:
         print "sub1set = %r" % (sub1set,)
         print "reposet = %r" % (reposet,)
         print "sub1set and reposet should have exactly one path in common."
-    sub2set = frozenset((pjoin('.', 'subsub1'),
-                         pjoin('.', 'subsubdir', 'subsub1')))
+    sub2set = frozenset((pjoin('.', 'sub1'),
+                         pjoin('.', 'circle', 'subdir', 'circle', 'top1')))
     if len(sub2set & reposet) != 1:
         print "sub2set = %r" % (sub2set,)
         print "reposet = %r" % (reposet,)