Patchwork D2938: grep: make grep search on working directory by default

login
register
mail settings
Submitter phabricator
Date March 23, 2018, 8:05 p.m.
Message ID <differential-rev-PHID-DREV-rdutetellksj6ietsxlz-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29803/
State New
Headers show

Comments

phabricator - March 23, 2018, 8:05 p.m.
sangeet259 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The earlier behaviour of grep was quite counter intuitive as it searches
  history by default. It throws up matches from files which have been
  deleted and commited out. This patch handles only the default behaviour
  by iterating on the files being tracked by the workingctx are
  not marked for removal

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

AFFECTED FILES
  mercurial/commands.py
  tests/test-grep.t

CHANGE DETAILS




To: sangeet259, #hg-reviewers
Cc: mercurial-devel
phabricator - March 24, 2018, 7:40 a.m.
pulkit added a comment.


  No need to add Edit1, Edit2 in commit message. If something changes from one version to another that deserves mention, include that in commit message as normal.

INLINE COMMENTS

> commands.py:2583
> +    # when nothing is passed in -r or --all
> +    if not bool(opts.get('rev')) and not bool(opts.get('all')):
> +        rev = scmutil.revsingle(repo, opts.get('rev'), None).node()

Nit: no need to add bool() calls.

> commands.py:2584
> +    if not bool(opts.get('rev')) and not bool(opts.get('all')):
> +        rev = scmutil.revsingle(repo, opts.get('rev'), None).node()
> +        m = scmutil.match(repo[rev], pats, opts, default='relglob')

The above if condition wants opts['rev'] to be empty and here you are using it's value.

> test-grep.t:343
>  
> -  $ cd ..
> +Test that grep searches only on working directory 
> +  $ cd ..

For better testing, please add untracked files which can be probable match. Also I suspect there will be existing tests for hg grep which should break because of new behavior.

> test-grep.t:357
> +  $ hg grep 'revsets'
> +  secondfile:None:mercurial revsets are awesome and makes life easier
> +  $ echo "another generic string" > fourthone

what's this None here?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: pulkit, mercurial-devel
phabricator - March 24, 2018, 10:43 a.m.
yuja added a comment.


  Perhaps we can start with adding an experimental option to grep files
  including unchanged ones?
  
  IIUC, the new default behavior is something like `hg grep -r "wdir()" --all-files`,
  which is basically `s/ctx.files()/ctx/`.
  (needless to say `--all-files` is a bad name, and the following patch is just a hack.)
  
    diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
    --- a/mercurial/cmdutil.py
    +++ b/mercurial/cmdutil.py
    @@ -1835,7 +1835,7 @@ def walkchangerevs(repo, match, opts, pr
         if not revs:
             return []
         wanted = set()
    -    slowpath = match.anypats() or (not match.always() and opts.get('removed'))
    +    slowpath = True
         fncache = {}
         change = repo.changectx
     
    @@ -1889,7 +1889,7 @@ def walkchangerevs(repo, match, opts, pr
                     else:
                         self.revs.discard(value)
                         ctx = change(value)
    -                    matches = [f for f in ctx.files() if match(f)]
    +                    matches = [f for f in ctx if match(f)]
                         if matches:
                             fncache[value] = matches
                             self.set.add(value)
    @@ -1939,7 +1939,7 @@ def walkchangerevs(repo, match, opts, pr
                     ctx = change(rev)
                     if not fns:
                         def fns_generator():
    -                        for f in ctx.files():
    +                        for f in ctx:
                                 if match(f):
                                     yield f
                         fns = fns_generator()

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: yuja, pulkit, mercurial-devel
phabricator - March 24, 2018, 11:12 a.m.
yuja added inline comments.

INLINE COMMENTS

> commands.py:2474
> +                return util.binary(flog.read(ctx.filenode(fn)))
> +            except AttributeError:
> +                return util.binary(ctx.filectx(fn).data())

Better to test if ctx is a workingctx (i.e. `ctx.rev() is None`).

Another idea is to make `wctx.filenode()` return wdirid and catch
WdirUnsupported exception
to fall back to the slow path.

> commands.py:2475
> +            except AttributeError:
> +                return util.binary(ctx.filectx(fn).data())
>  

Nit: `fctx.isbinary()` is preferred.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: yuja, pulkit, mercurial-devel
phabricator - March 24, 2018, 12:07 p.m.
av6 added inline comments.

INLINE COMMENTS

> commands.py:2485
> +            fm.data(node=fm.hexfunc(scmutil.binnode(ctx)))
> +            if not bool(opts.get('all')) and not bool(opts.get('rev')):
> +                cols = [

This line looks identical to the one later on, with Pulkit's comment about usage of bool().

> commands.py:2487-2490
>                  ('filename', fn, True),
> -                ('rev', rev, True),
> +                ('rev', rev, False),
>                  ('linenumber', l.linenum, opts.get('line_number')),
>              ]

This block is not indented enough, but see my other comment below.

> commands.py:2494
> +                    ('filename', fn, True),
> +                    ('rev', rev, True),
> +                    ('linenumber', l.linenum, opts.get('line_number')),

Since the difference in both branches for this if-else block seems to be only in this line, and just one value, it probably makes sense to store said value in a variable and use it in place of False/True and reduce duplication.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - March 25, 2018, 7:28 a.m.
sangeet259 added a comment.


  @yuja Can you please clarify this a bit more?
  
  > Perhaps we can start with adding an experimental option to grep files
  >  including unchanged ones?"

INLINE COMMENTS

> yuja wrote in commands.py:2474
> Better to test if ctx is a workingctx (i.e. `ctx.rev() is None`).
> 
> Another idea is to make `wctx.filenode()` return wdirid and catch
> WdirUnsupported exception
> to fall back to the slow path.

Sure, pushing the first approach now.
But, I think the second approach fits the "Easier to ask for forgiveness than permission" philosophy of python.

Currently the filenode of workingctx returns this: `return self._fileinfo(path)[0]` . Shall I change that to returning wdirid?

Also can you please highlight the slowpath part? Any reference or links to mailing lists, where I can learn more about it.

> yuja wrote in commands.py:2475
> Nit: `fctx.isbinary()` is preferred.

Done. Any reason why that is better?

> av6 wrote in commands.py:2494
> Since the difference in both branches for this if-else block seems to be only in this line, and just one value, it probably makes sense to store said value in a variable and use it in place of False/True and reduce duplication.

Great. Thanks !

> pulkit wrote in commands.py:2584
> The above if condition wants opts['rev'] to be empty and here you are using it's value.

I could have made the call by simply passing `''` instead of `opts.get('rev')` but I chose this because I am planning to build upon this to handle revisions as well.

> pulkit wrote in test-grep.t:357
> what's this None here?

The None was a result of displaying revision in the ui.write. Removed that. I had forgotten to update this result.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - March 25, 2018, 8:36 a.m.
yuja added a comment.


  In https://phab.mercurial-scm.org/D2938#47514, @sangeet259 wrote:
  
  > @yuja Can you please clarify this a bit more?
  >
  > > Perhaps we can start with adding an experimental option to grep files
  > >  including unchanged ones?"
  
  
  This patch appears to do 3 separate things by adding a big "if" branch.
  
  - fix `grep -r 'wdir()'`
  - add mode to include unmodified files
  - change the default
  
  It's probably better to write one (or more) patches for each change, in a way
  not duplicating code for each combination of possible options.

INLINE COMMENTS

> sangeet259 wrote in commands.py:2474
> Sure, pushing the first approach now.
> But, I think the second approach fits the "Easier to ask for forgiveness than permission" philosophy of python.
> 
> Currently the filenode of workingctx returns this: `return self._fileinfo(path)[0]` . Shall I change that to returning wdirid?
> 
> Also can you please highlight the slowpath part? Any reference or links to mailing lists, where I can learn more about it.

> Shall I change that to returning wdirid?

That isn't easy to answer because `wctx.filenode()` can return
another pseudo
hash (e.g. 000000000000000added) if `wctx._manifest` is preloaded.

> highlight the slowpath part

It's quite old, so I have no concrete idea. But maybe reusing filelog objects is the key.

> sangeet259 wrote in commands.py:2475
> Done. Any reason why that is better?

Extensions may override it to provide a faster implementation.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - March 25, 2018, 9:57 a.m.
yuja added inline comments.

INLINE COMMENTS

> yuja wrote in commands.py:2474
> > Shall I change that to returning wdirid?
> 
> That isn't easy to answer because `wctx.filenode()` can return
> another pseudo
> hash (e.g. 000000000000000added) if `wctx._manifest` is preloaded.
> 
> > highlight the slowpath part
> 
> It's quite old, so I have no concrete idea. But maybe reusing filelog objects is the key.

I did try that. https://phab.mercurial-scm.org/D2940 .. https://phab.mercurial-scm.org/D2942

No idea if they are good or bad.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - April 17, 2018, 3:05 p.m.
sangeet259 marked 2 inline comments as done.
sangeet259 added a comment.


  @yuja
  
  > fix grep -r 'wdir()'
  
  What exactly do you mean by fixing 'grep -r wdir()' ?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - April 19, 2018, 12:39 p.m.
yuja added a comment.


  > What exactly do you mean by fixing 'grep -r wdir()' ?
  
  Make it return the same result as
  
    $ hg ci -m 'this was wdir()'
    $ hg grep -r .

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - May 24, 2018, 9:14 p.m.
sangeet259 added a comment.


  @yuja 
  So I was working on it, but the way I tried to correct "wdir()" is by handling it as a special case, so introducing an "if" block again similar to this.
  But you were asking to eliminate the if/else block altogether, I wonder if there is way ?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
Yuya Nishihara - May 25, 2018, 1:13 p.m.
>   So I was working on it, but the way I tried to correct "wdir()" is by handling it as a special case, so introducing an "if" block again similar to this.
>   But you were asking to eliminate the if/else block altogether, I wonder if there is way ?

IIRC, my point was to avoid adding *big* if/else branch which will be likely
to introduce untested code paths.

I don't think there would be a way to go without if/else or try/except as
long as we use the low-level filelog API for performance reasons. Perhaps,
we'll have to catch WdirUnsupported error to fall back to the filectx API.

```
try:
    flog.read(fnode)
except error.WdirUnsupported:
    ctx[fn].data()
```
phabricator - May 25, 2018, 1:14 p.m.
yuja added a comment.


  >   So I was working on it, but the way I tried to correct "wdir()" is by handling it as a special case, so introducing an "if" block again similar to this.
  >   But you were asking to eliminate the if/else block altogether, I wonder if there is way ?
  
  IIRC, my point was to avoid adding *big* if/else branch which will be likely
  to introduce untested code paths.
  
  I don't think there would be a way to go without if/else or try/except as
  long as we use the low-level filelog API for performance reasons. Perhaps,
  we'll have to catch WdirUnsupported error to fall back to the filectx API.
  
    try:
        flog.read(fnode)
    except error.WdirUnsupported:
        ctx[fn].data()

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel
phabricator - May 26, 2018, 2:23 p.m.
sangeet259 added a comment.


  > Perhaps, we'll have to catch WdirUnsupported error to fall back to the filectx API.
  
  This seems a good idea.
  Trying it.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2938

To: sangeet259, #hg-reviewers
Cc: av6, yuja, pulkit, mercurial-devel

Patch

diff --git a/tests/test-grep.t b/tests/test-grep.t
--- a/tests/test-grep.t
+++ b/tests/test-grep.t
@@ -340,4 +340,25 @@ 
   $ hg grep "MaCam" --all
   binfile.bin:0:+: Binary file matches
 
-  $ cd ..
+Test that grep searches only on working directory 
+  $ cd ..
+  $ hg init t5
+  $ cd t5
+  $ echo "mercurial revsets are awesome" > firstfile
+  $ hg add firstfile
+  $ hg commit -m  'adds firstfile'
+  $ hg rm firstfile
+  $ hg commit -m 'removes firstfile'
+  $ echo "mercurial revsets are awesome and makes life easier" > secondfile
+  $ echo "some generic text" > thirdfile
+  $ hg add secondfile thirdfile
+  $ hg commit -m 'adds two new files'
+  $ hg grep 'revsets'
+  secondfile:None:mercurial revsets are awesome and makes life easier
+  $ echo "another generic string" > fourthone
+
+Search on added but not commit i.e dirty working directory
+  $ hg add fourthone
+  $ hg grep "generic"
+  fourthone:None:another generic string
+  thirdfile:None:some generic text
diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -2469,16 +2469,22 @@ 
         @util.cachefunc
         def binary():
             flog = getfile(fn)
-            return util.binary(flog.read(ctx.filenode(fn)))
+            try:
+                return util.binary(flog.read(ctx.filenode(fn)))
+            except AttributeError:
+                return util.binary(ctx.filectx(fn).data())
 
         fieldnamemap = {'filename': 'file', 'linenumber': 'line_number'}
         if opts.get('all'):
             iter = difflinestates(pstates, states)
         else:
             iter = [('', l) for l in states]
         for change, l in iter:
             fm.startitem()
-            fm.data(node=fm.hexfunc(ctx.node()))
+            try:
+                fm.data(node=fm.hexfunc(ctx.node()))
+            except TypeError:
+                pass
             cols = [
                 ('filename', fn, True),
                 ('rev', rev, True),
@@ -2568,26 +2574,46 @@ 
 
     ui.pager('grep')
     fm = ui.formatter('grep', opts)
-    for ctx in cmdutil.walkchangerevs(repo, match, opts, prep):
-        rev = ctx.rev()
-        parent = ctx.p1().rev()
-        for fn in sorted(revfiles.get(rev, [])):
-            states = matches[rev][fn]
-            copy = copies.get(rev, {}).get(fn)
-            if fn in skip:
-                if copy:
-                    skip[copy] = True
+    # This if part handles the default situation,\
+    # when nothing is passed in -r or --all
+    if not bool(opts.get('rev')) and not bool(opts.get('all')):
+        rev = scmutil.revsingle(repo, opts.get('rev'), None).node()
+        m = scmutil.match(repo[rev], pats, opts, default='relglob')
+        m.bad = lambda x, y: False
+        ctx = repo[rev]
+        ds = ctx.repo().dirstate
+
+        for fn in ctx.matches(m):
+            if rev is None and ds[fn] == 'r':
                 continue
-            pstates = matches.get(parent, {}).get(copy or fn, [])
-            if pstates or states:
-                r = display(fm, fn, ctx, pstates, states)
+            data = ctx.filectx(fn).data()
+            states = []
+            for lnum, cstart, cend, line in matchlines(data):
+                states.append(linestate(line, lnum, cstart, cend))
+            if states:
+                r = display(fm, fn, ctx, [], states)
                 found = found or r
-                if r and not opts.get('all'):
-                    skip[fn] = True
+    else :
+        for ctx in cmdutil.walkchangerevs(repo, match, opts, prep):
+            rev = ctx.rev()
+            parent = ctx.p1().rev()
+            for fn in sorted(revfiles.get(rev, [])):
+                states = matches[rev][fn]
+                copy = copies.get(rev, {}).get(fn)
+                if fn in skip:
                     if copy:
                         skip[copy] = True
-        del matches[rev]
-        del revfiles[rev]
+                    continue
+                pstates = matches.get(parent, {}).get(copy or fn, [])
+                if pstates or states:
+                    r = display(fm, fn, ctx, pstates, states)
+                    found = found or r
+                    if r and not opts.get('all'):
+                        skip[fn] = True
+                        if copy:
+                            skip[copy] = True
+            del matches[rev]
+            del revfiles[rev]
     fm.end()
 
     return not found