Submitter | Gregory Szorc |
---|---|
Date | July 8, 2017, 11:29 p.m. |
Message ID | <94f98bc84936defadb95.1499556546@ubuntu-vm-main> |
Download | mbox | patch |
Permalink | /patch/22135/ |
State | Accepted |
Headers | show |
Comments
(For Durham) On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote: > # HG changeset patch > # User Gregory Szorc <gregory.szorc@gmail.com> > # Date 1499555309 25200 > # Sat Jul 08 16:08:29 2017 -0700 > # Node ID 94f98bc84936defadb959e31012555dba170d8cd > # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 > dirstate: integrate sparse matcher with _ignore (API) Why does sparse do it this way instead of intersecting the sparse matcher with the user's matcher?
On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: > (For Durham) > > On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote: >> # HG changeset patch >> # User Gregory Szorc <gregory.szorc@gmail.com> >> # Date 1499555309 25200 >> # Sat Jul 08 16:08:29 2017 -0700 >> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >> dirstate: integrate sparse matcher with _ignore (API) > > Why does sparse do it this way instead of intersecting the sparse > matcher with the user's matcher? I'm not sure I understand the question. What is the "user's matcher" here? The ignore matcher? This code produces a matcher that returns true for any file that should be ignored. Since both hgignore files and sparse-ignored files should be ignored, I'm not sure how that could be expressed with intersection of those two matchers?
On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: > On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >> >> (For Durham) >> >> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc <gregory.szorc@gmail.com> >> wrote: >>> >>> # HG changeset patch >>> # User Gregory Szorc <gregory.szorc@gmail.com> >>> # Date 1499555309 25200 >>> # Sat Jul 08 16:08:29 2017 -0700 >>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>> dirstate: integrate sparse matcher with _ignore (API) >> >> >> Why does sparse do it this way instead of intersecting the sparse >> matcher with the user's matcher? > > > I'm not sure I understand the question. What is the "user's matcher" here? > The ignore matcher? I mean the matcher the user may have provided on the command line (or match.always() by default), as in "hg status dir/" (where the matcher would be "relpath:dir"). > > This code produces a matcher that returns true for any file that should be > ignored. Since both hgignore files and sparse-ignored files should be > ignored, I'm not sure how that could be expressed with intersection of those > two matchers? For narrowhg, we did it the other way around: filtering in instead of filtering out. So if the narrowspec (like sparse config, IIUC) says to include foo/ and bar/ and the user provides 'glob:*c', we'd intersect that and list *.c files in those two directories (recursively).
On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: > On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: >> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >>> >>> (For Durham) >>> >>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc <gregory.szorc@gmail.com> >>> wrote: >>>> >>>> # HG changeset patch >>>> # User Gregory Szorc <gregory.szorc@gmail.com> >>>> # Date 1499555309 25200 >>>> # Sat Jul 08 16:08:29 2017 -0700 >>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>>> dirstate: integrate sparse matcher with _ignore (API) >>> >>> >>> Why does sparse do it this way instead of intersecting the sparse >>> matcher with the user's matcher? >> >> >> I'm not sure I understand the question. What is the "user's matcher" here? >> The ignore matcher? > > I mean the matcher the user may have provided on the command line (or > match.always() by default), as in "hg status dir/" (where the matcher > would be "relpath:dir"). > >> >> This code produces a matcher that returns true for any file that should be >> ignored. Since both hgignore files and sparse-ignored files should be >> ignored, I'm not sure how that could be expressed with intersection of those >> two matchers? > > For narrowhg, we did it the other way around: filtering in instead of > filtering out. So if the narrowspec (like sparse config, IIUC) says to > include foo/ and bar/ and the user provides 'glob:*c', we'd intersect > that and list *.c files in those two directories (recursively). I'd have to look at the code to be specific, but I think the dirstate ignore logic covers more cases than the user provided matcher logic. I'd be surprised if all commands that hit dirstate.ignore also happened to take patterns at the command level. It just seemed cleaner to have a unified matcher for ignored files at the repo level. The user specific matcher can always be added on top of it later for commands that take patterns.
On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham@fb.com> wrote: > > > On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: >> >> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: >>> >>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >>>> >>>> >>>> (For Durham) >>>> >>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc <gregory.szorc@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> # HG changeset patch >>>>> # User Gregory Szorc <gregory.szorc@gmail.com> >>>>> # Date 1499555309 25200 >>>>> # Sat Jul 08 16:08:29 2017 -0700 >>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>>>> dirstate: integrate sparse matcher with _ignore (API) >>>> >>>> >>>> >>>> Why does sparse do it this way instead of intersecting the sparse >>>> matcher with the user's matcher? >>> >>> >>> >>> I'm not sure I understand the question. What is the "user's matcher" >>> here? >>> The ignore matcher? >> >> >> I mean the matcher the user may have provided on the command line (or >> match.always() by default), as in "hg status dir/" (where the matcher >> would be "relpath:dir"). >> >>> >>> This code produces a matcher that returns true for any file that should >>> be >>> ignored. Since both hgignore files and sparse-ignored files should be >>> ignored, I'm not sure how that could be expressed with intersection of >>> those >>> two matchers? >> >> >> For narrowhg, we did it the other way around: filtering in instead of >> filtering out. So if the narrowspec (like sparse config, IIUC) says to >> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect >> that and list *.c files in those two directories (recursively). > > > I'd have to look at the code to be specific, but I think the dirstate ignore > logic covers more cases than the user provided matcher logic. I'd be > surprised if all commands that hit dirstate.ignore also happened to take > patterns at the command level. If they don't, then the sparse matcher can be passed as is. > It just seemed cleaner to have a unified > matcher for ignored files at the repo level. The user specific matcher can > always be added on top of it later for commands that take patterns. For narrow, we have to apply the matcher when walking the manifest too. The user can pass a matcher to e.g. "hg status -c ." or "hg files -r ." and in those cases we need to intersect the narrow matcher with the user-supplied one. It seemed more natural to do the same for dirstate walks. It also seems simpler to control which directories are visited if using a positive matcher than a negative one. For example, let's say the narrow matcher is path:dir/. The narrowhg code will then restrict the walk to visit only the root directory, dir/, and subdirectories of dir/ (both for manifest walks and dirstate walks). I think we can simply make negatematcher's visitdir return False iff the narrow/sparse matcher returns 'all', so it's probably easy to get it to work. It still seems more natural to me to match what should be included.
On Mon, Jul 10, 2017 at 1:04 PM, Martin von Zweigbergk < martinvonz@google.com> wrote: > On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham@fb.com> wrote: > > > > > > On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: > >> > >> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: > >>> > >>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: > >>>> > >>>> > >>>> (For Durham) > >>>> > >>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc < > gregory.szorc@gmail.com> > >>>> wrote: > >>>>> > >>>>> > >>>>> # HG changeset patch > >>>>> # User Gregory Szorc <gregory.szorc@gmail.com> > >>>>> # Date 1499555309 25200 > >>>>> # Sat Jul 08 16:08:29 2017 -0700 > >>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd > >>>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 > >>>>> dirstate: integrate sparse matcher with _ignore (API) > >>>> > >>>> > >>>> > >>>> Why does sparse do it this way instead of intersecting the sparse > >>>> matcher with the user's matcher? > >>> > >>> > >>> > >>> I'm not sure I understand the question. What is the "user's matcher" > >>> here? > >>> The ignore matcher? > >> > >> > >> I mean the matcher the user may have provided on the command line (or > >> match.always() by default), as in "hg status dir/" (where the matcher > >> would be "relpath:dir"). > >> > >>> > >>> This code produces a matcher that returns true for any file that should > >>> be > >>> ignored. Since both hgignore files and sparse-ignored files should be > >>> ignored, I'm not sure how that could be expressed with intersection of > >>> those > >>> two matchers? > >> > >> > >> For narrowhg, we did it the other way around: filtering in instead of > >> filtering out. So if the narrowspec (like sparse config, IIUC) says to > >> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect > >> that and list *.c files in those two directories (recursively). > > > > > > I'd have to look at the code to be specific, but I think the dirstate > ignore > > logic covers more cases than the user provided matcher logic. I'd be > > surprised if all commands that hit dirstate.ignore also happened to take > > patterns at the command level. > > If they don't, then the sparse matcher can be passed as is. > > > It just seemed cleaner to have a unified > > matcher for ignored files at the repo level. The user specific matcher > can > > always be added on top of it later for commands that take patterns. > > For narrow, we have to apply the matcher when walking the manifest > too. The user can pass a matcher to e.g. "hg status -c ." or "hg files > -r ." and in those cases we need to intersect the narrow matcher with > the user-supplied one. It seemed more natural to do the same for > dirstate walks. > > It also seems simpler to control which directories are visited if > using a positive matcher than a negative one. For example, let's say > the narrow matcher is path:dir/. The narrowhg code will then restrict > the walk to visit only the root directory, dir/, and subdirectories of > dir/ (both for manifest walks and dirstate walks). I think we can > simply make negatematcher's visitdir return False iff the > narrow/sparse matcher returns 'all', so it's probably easy to get it > to work. It still seems more natural to me to match what should be > included. > I think there's tons of potential to improve how the filtering/matching is performed. At the very least, we'll likely attach a matcher to the repo instance instead of dynamically obtaining one for every call site. I think a lot of the design of the current sparse code is heavily influenced by a) state of the code 3+ years ago b) what's easiest for an extension to do. Now that things are in core, we can and should think about better ways to implement everything.
On 7/10/17 1:04 PM, Martin von Zweigbergk wrote: > On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham@fb.com> wrote: >> >> >> On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: >>> >>> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: >>>> >>>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >>>>> >>>>> >>>>> (For Durham) >>>>> >>>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc <gregory.szorc@gmail.com> >>>>> wrote: >>>>>> >>>>>> >>>>>> # HG changeset patch >>>>>> # User Gregory Szorc <gregory.szorc@gmail.com> >>>>>> # Date 1499555309 25200 >>>>>> # Sat Jul 08 16:08:29 2017 -0700 >>>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>>>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>>>>> dirstate: integrate sparse matcher with _ignore (API) >>>>> >>>>> >>>>> >>>>> Why does sparse do it this way instead of intersecting the sparse >>>>> matcher with the user's matcher? >>>> >>>> >>>> >>>> I'm not sure I understand the question. What is the "user's matcher" >>>> here? >>>> The ignore matcher? >>> >>> >>> I mean the matcher the user may have provided on the command line (or >>> match.always() by default), as in "hg status dir/" (where the matcher >>> would be "relpath:dir"). >>> >>>> >>>> This code produces a matcher that returns true for any file that should >>>> be >>>> ignored. Since both hgignore files and sparse-ignored files should be >>>> ignored, I'm not sure how that could be expressed with intersection of >>>> those >>>> two matchers? >>> >>> >>> For narrowhg, we did it the other way around: filtering in instead of >>> filtering out. So if the narrowspec (like sparse config, IIUC) says to >>> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect >>> that and list *.c files in those two directories (recursively). >> >> >> I'd have to look at the code to be specific, but I think the dirstate ignore >> logic covers more cases than the user provided matcher logic. I'd be >> surprised if all commands that hit dirstate.ignore also happened to take >> patterns at the command level. > > If they don't, then the sparse matcher can be passed as is. > >> It just seemed cleaner to have a unified >> matcher for ignored files at the repo level. The user specific matcher can >> always be added on top of it later for commands that take patterns. > > For narrow, we have to apply the matcher when walking the manifest > too. The user can pass a matcher to e.g. "hg status -c ." or "hg files > -r ." and in those cases we need to intersect the narrow matcher with > the user-supplied one. It seemed more natural to do the same for > dirstate walks. > > It also seems simpler to control which directories are visited if > using a positive matcher than a negative one. For example, let's say > the narrow matcher is path:dir/. The narrowhg code will then restrict > the walk to visit only the root directory, dir/, and subdirectories of > dir/ (both for manifest walks and dirstate walks). I think we can > simply make negatematcher's visitdir return False iff the > narrow/sparse matcher returns 'all', so it's probably easy to get it > to work. It still seems more natural to me to match what should be > included. > I don't have a strong opinion either way. When I made sparse, it was specific to the working copy, so it mapped to the ignore matcher very tightly. If that needs to change, that's fine. I just want to avoid duplicating repetitive matcher logic amongst individual commands.
On Mon, Jul 10, 2017 at 2:48 PM, Durham Goode <durham@fb.com> wrote: > > > On 7/10/17 1:04 PM, Martin von Zweigbergk wrote: >> >> On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham@fb.com> wrote: >>> >>> >>> >>> On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: >>>> >>>> >>>> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: >>>>> >>>>> >>>>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >>>>>> >>>>>> >>>>>> >>>>>> (For Durham) >>>>>> >>>>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc >>>>>> <gregory.szorc@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> # HG changeset patch >>>>>>> # User Gregory Szorc <gregory.szorc@gmail.com> >>>>>>> # Date 1499555309 25200 >>>>>>> # Sat Jul 08 16:08:29 2017 -0700 >>>>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>>>>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>>>>>> dirstate: integrate sparse matcher with _ignore (API) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Why does sparse do it this way instead of intersecting the sparse >>>>>> matcher with the user's matcher? >>>>> >>>>> >>>>> >>>>> >>>>> I'm not sure I understand the question. What is the "user's matcher" >>>>> here? >>>>> The ignore matcher? >>>> >>>> >>>> >>>> I mean the matcher the user may have provided on the command line (or >>>> match.always() by default), as in "hg status dir/" (where the matcher >>>> would be "relpath:dir"). >>>> >>>>> >>>>> This code produces a matcher that returns true for any file that should >>>>> be >>>>> ignored. Since both hgignore files and sparse-ignored files should be >>>>> ignored, I'm not sure how that could be expressed with intersection of >>>>> those >>>>> two matchers? >>>> >>>> >>>> >>>> For narrowhg, we did it the other way around: filtering in instead of >>>> filtering out. So if the narrowspec (like sparse config, IIUC) says to >>>> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect >>>> that and list *.c files in those two directories (recursively). >>> >>> >>> >>> I'd have to look at the code to be specific, but I think the dirstate >>> ignore >>> logic covers more cases than the user provided matcher logic. I'd be >>> surprised if all commands that hit dirstate.ignore also happened to take >>> patterns at the command level. >> >> >> If they don't, then the sparse matcher can be passed as is. >> >>> It just seemed cleaner to have a unified >>> matcher for ignored files at the repo level. The user specific matcher >>> can >>> always be added on top of it later for commands that take patterns. >> >> >> For narrow, we have to apply the matcher when walking the manifest >> too. The user can pass a matcher to e.g. "hg status -c ." or "hg files >> -r ." and in those cases we need to intersect the narrow matcher with >> the user-supplied one. It seemed more natural to do the same for >> dirstate walks. >> >> It also seems simpler to control which directories are visited if >> using a positive matcher than a negative one. For example, let's say >> the narrow matcher is path:dir/. The narrowhg code will then restrict >> the walk to visit only the root directory, dir/, and subdirectories of >> dir/ (both for manifest walks and dirstate walks). I think we can >> simply make negatematcher's visitdir return False iff the >> narrow/sparse matcher returns 'all', so it's probably easy to get it >> to work. It still seems more natural to me to match what should be >> included. >> > > I don't have a strong opinion either way. When I made sparse, it was > specific to the working copy, so it mapped to the ignore matcher very > tightly. If that needs to change, that's fine. I just tried this. I think the result is cleaner, but there's a functional change that perhaps suggests the reason you modified the ignores instead of modifying the walk: untracked files outside the sparse config are reported as ignored with your version and are not reported at all with my version. That's an interesting aspect that I had not thought about. I had previously considered adding an option to status for narrowhg to make it also list files outside the narrow config. One option would be to not include them by default and list their status as something special (maybe O for outside?) since neither untracked nor ignored is really accurate. They may (kind of) be tracked, just not in the working copy, and with narrowhg with treemanifests we can't even tell if they're in the manifest or not. So, would you be okay with changing the behavior to not report these files as ignored by default and instead adding an option to status (maybe --no-sparse)? For narrow, even committed revisions will be filtered by the narrowspec by default. One can imagine passing --no-sparse (or --no-narrow?) for committed revisions as well. We could then potentially report things as outside the narrowspec and added, removed, modified, or clean, so there would be those three statuses with an extra bit saying that it's also outside the narrowspec (since they're all letters, we could use lowercase 'a', 'r', 'm', 'c', but I'm not sure that's a good idea). We can and should the discuss the specifics later. (When running hg on top of our overlay file system at Google, we definitely do not want to walk the entire working copy (because it has millions of files outside the sparse/narrow config). But that's not a normal hg setup, so it would be our problem to set up an override for that if we decided to report these files as ignored in hg core.) > > I just want to avoid duplicating repetitive matcher logic amongst individual > commands.
On 7/11/17 10:16 AM, Martin von Zweigbergk wrote: > On Mon, Jul 10, 2017 at 2:48 PM, Durham Goode <durham@fb.com> wrote: >> >> >> On 7/10/17 1:04 PM, Martin von Zweigbergk wrote: >>> >>> On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham@fb.com> wrote: >>>> >>>> >>>> >>>> On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: >>>>> >>>>> >>>>> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: >>>>>> >>>>>> >>>>>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> (For Durham) >>>>>>> >>>>>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc >>>>>>> <gregory.szorc@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> # HG changeset patch >>>>>>>> # User Gregory Szorc <gregory.szorc@gmail.com> >>>>>>>> # Date 1499555309 25200 >>>>>>>> # Sat Jul 08 16:08:29 2017 -0700 >>>>>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>>>>>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>>>>>>> dirstate: integrate sparse matcher with _ignore (API) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Why does sparse do it this way instead of intersecting the sparse >>>>>>> matcher with the user's matcher? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I'm not sure I understand the question. What is the "user's matcher" >>>>>> here? >>>>>> The ignore matcher? >>>>> >>>>> >>>>> >>>>> I mean the matcher the user may have provided on the command line (or >>>>> match.always() by default), as in "hg status dir/" (where the matcher >>>>> would be "relpath:dir"). >>>>> >>>>>> >>>>>> This code produces a matcher that returns true for any file that should >>>>>> be >>>>>> ignored. Since both hgignore files and sparse-ignored files should be >>>>>> ignored, I'm not sure how that could be expressed with intersection of >>>>>> those >>>>>> two matchers? >>>>> >>>>> >>>>> >>>>> For narrowhg, we did it the other way around: filtering in instead of >>>>> filtering out. So if the narrowspec (like sparse config, IIUC) says to >>>>> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect >>>>> that and list *.c files in those two directories (recursively). >>>> >>>> >>>> >>>> I'd have to look at the code to be specific, but I think the dirstate >>>> ignore >>>> logic covers more cases than the user provided matcher logic. I'd be >>>> surprised if all commands that hit dirstate.ignore also happened to take >>>> patterns at the command level. >>> >>> >>> If they don't, then the sparse matcher can be passed as is. >>> >>>> It just seemed cleaner to have a unified >>>> matcher for ignored files at the repo level. The user specific matcher >>>> can >>>> always be added on top of it later for commands that take patterns. >>> >>> >>> For narrow, we have to apply the matcher when walking the manifest >>> too. The user can pass a matcher to e.g. "hg status -c ." or "hg files >>> -r ." and in those cases we need to intersect the narrow matcher with >>> the user-supplied one. It seemed more natural to do the same for >>> dirstate walks. >>> >>> It also seems simpler to control which directories are visited if >>> using a positive matcher than a negative one. For example, let's say >>> the narrow matcher is path:dir/. The narrowhg code will then restrict >>> the walk to visit only the root directory, dir/, and subdirectories of >>> dir/ (both for manifest walks and dirstate walks). I think we can >>> simply make negatematcher's visitdir return False iff the >>> narrow/sparse matcher returns 'all', so it's probably easy to get it >>> to work. It still seems more natural to me to match what should be >>> included. >>> >> >> I don't have a strong opinion either way. When I made sparse, it was >> specific to the working copy, so it mapped to the ignore matcher very >> tightly. If that needs to change, that's fine. > > I just tried this. I think the result is cleaner, but there's a > functional change that perhaps suggests the reason you modified the > ignores instead of modifying the walk: untracked files outside the > sparse config are reported as ignored with your version and are not > reported at all with my version. That's an interesting aspect that I > had not thought about. > > I had previously considered adding an option to status for narrowhg to > make it also list files outside the narrow config. One option would be > to not include them by default and list their status as something > special (maybe O for outside?) since neither untracked nor ignored is > really accurate. They may (kind of) be tracked, just not in the > working copy, and with narrowhg with treemanifests we can't even tell > if they're in the manifest or not. > > So, would you be okay with changing the behavior to not report these > files as ignored by default and instead adding an option to status > (maybe --no-sparse)? If that's the only effect, I'm fine with that change. The only time we care about the files outside the sparse checkout is when widening the sparse checkout (it checks if files already exist on disk and prevents the sparse-widening from overwriting them if the content differs), but that might be done via other means.
On Tue, Jul 11, 2017 at 10:23 AM, Durham Goode <durham@fb.com> wrote: > > > On 7/11/17 10:16 AM, Martin von Zweigbergk wrote: >> >> On Mon, Jul 10, 2017 at 2:48 PM, Durham Goode <durham@fb.com> wrote: >>> >>> >>> >>> On 7/10/17 1:04 PM, Martin von Zweigbergk wrote: >>>> >>>> >>>> On Mon, Jul 10, 2017 at 11:58 AM, Durham Goode <durham@fb.com> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> On 7/10/17 11:55 AM, Martin von Zweigbergk wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jul 10, 2017 at 11:45 AM, Durham Goode <durham@fb.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7/10/17 10:01 AM, Martin von Zweigbergk wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> (For Durham) >>>>>>>> >>>>>>>> On Sat, Jul 8, 2017 at 4:29 PM, Gregory Szorc >>>>>>>> <gregory.szorc@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> # HG changeset patch >>>>>>>>> # User Gregory Szorc <gregory.szorc@gmail.com> >>>>>>>>> # Date 1499555309 25200 >>>>>>>>> # Sat Jul 08 16:08:29 2017 -0700 >>>>>>>>> # Node ID 94f98bc84936defadb959e31012555dba170d8cd >>>>>>>>> # Parent a2867557f9c2314aeea19a946dfb8e167def4fb8 >>>>>>>>> dirstate: integrate sparse matcher with _ignore (API) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Why does sparse do it this way instead of intersecting the sparse >>>>>>>> matcher with the user's matcher? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I'm not sure I understand the question. What is the "user's matcher" >>>>>>> here? >>>>>>> The ignore matcher? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I mean the matcher the user may have provided on the command line (or >>>>>> match.always() by default), as in "hg status dir/" (where the matcher >>>>>> would be "relpath:dir"). >>>>>> >>>>>>> >>>>>>> This code produces a matcher that returns true for any file that >>>>>>> should >>>>>>> be >>>>>>> ignored. Since both hgignore files and sparse-ignored files should >>>>>>> be >>>>>>> ignored, I'm not sure how that could be expressed with intersection >>>>>>> of >>>>>>> those >>>>>>> two matchers? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> For narrowhg, we did it the other way around: filtering in instead of >>>>>> filtering out. So if the narrowspec (like sparse config, IIUC) says to >>>>>> include foo/ and bar/ and the user provides 'glob:*c', we'd intersect >>>>>> that and list *.c files in those two directories (recursively). >>>>> >>>>> >>>>> >>>>> >>>>> I'd have to look at the code to be specific, but I think the dirstate >>>>> ignore >>>>> logic covers more cases than the user provided matcher logic. I'd be >>>>> surprised if all commands that hit dirstate.ignore also happened to >>>>> take >>>>> patterns at the command level. >>>> >>>> >>>> >>>> If they don't, then the sparse matcher can be passed as is. >>>> >>>>> It just seemed cleaner to have a unified >>>>> matcher for ignored files at the repo level. The user specific matcher >>>>> can >>>>> always be added on top of it later for commands that take patterns. >>>> >>>> >>>> >>>> For narrow, we have to apply the matcher when walking the manifest >>>> too. The user can pass a matcher to e.g. "hg status -c ." or "hg files >>>> -r ." and in those cases we need to intersect the narrow matcher with >>>> the user-supplied one. It seemed more natural to do the same for >>>> dirstate walks. >>>> >>>> It also seems simpler to control which directories are visited if >>>> using a positive matcher than a negative one. For example, let's say >>>> the narrow matcher is path:dir/. The narrowhg code will then restrict >>>> the walk to visit only the root directory, dir/, and subdirectories of >>>> dir/ (both for manifest walks and dirstate walks). I think we can >>>> simply make negatematcher's visitdir return False iff the >>>> narrow/sparse matcher returns 'all', so it's probably easy to get it >>>> to work. It still seems more natural to me to match what should be >>>> included. >>>> >>> >>> I don't have a strong opinion either way. When I made sparse, it was >>> specific to the working copy, so it mapped to the ignore matcher very >>> tightly. If that needs to change, that's fine. >> >> >> I just tried this. I think the result is cleaner, but there's a >> functional change that perhaps suggests the reason you modified the >> ignores instead of modifying the walk: untracked files outside the >> sparse config are reported as ignored with your version and are not >> reported at all with my version. That's an interesting aspect that I >> had not thought about. >> >> I had previously considered adding an option to status for narrowhg to >> make it also list files outside the narrow config. One option would be >> to not include them by default and list their status as something >> special (maybe O for outside?) since neither untracked nor ignored is >> really accurate. They may (kind of) be tracked, just not in the >> working copy, and with narrowhg with treemanifests we can't even tell >> if they're in the manifest or not. >> >> So, would you be okay with changing the behavior to not report these >> files as ignored by default and instead adding an option to status >> (maybe --no-sparse)? > > > If that's the only effect, I'm fine with that change. The only time we care > about the files outside the sparse checkout is when widening the sparse > checkout (it checks if files already exist on disk and prevents the > sparse-widening from overwriting them if the content differs), but that > might be done via other means. Great. I'll send patches (first a few patches to fix the matchers that are currently broken).
Patch
diff --git a/hgext/sparse.py b/hgext/sparse.py --- a/hgext/sparse.py +++ b/hgext/sparse.py @@ -78,11 +78,9 @@ from mercurial.i18n import _ from mercurial import ( cmdutil, commands, - dirstate, error, extensions, hg, - match as matchmod, registrar, sparse, util, @@ -103,23 +101,6 @@ def extsetup(ui): _setupclone(ui) _setuplog(ui) _setupadd(ui) - _setupdirstate(ui) - -def replacefilecache(cls, propname, replacement): - """Replace a filecache property with a new class. This allows changing the - cache invalidation condition.""" - origcls = cls - assert callable(replacement) - while cls is not object: - if propname in cls.__dict__: - orig = cls.__dict__[propname] - setattr(cls, propname, replacement(orig)) - break - cls = cls.__bases__[0] - - if cls is object: - raise AttributeError(_("type '%s' has no property '%s'") % (origcls, - propname)) def _setuplog(ui): entry = commands.table['^log|history'] @@ -187,42 +168,6 @@ def _setupadd(ui): extensions.wrapcommand(commands.table, 'add', _add) -def _setupdirstate(ui): - """Modify the dirstate to prevent stat'ing excluded files, - and to prevent modifications to files outside the checkout. - """ - - # The atrocity below is needed to wrap dirstate._ignore. It is a cached - # property, which means normal function wrapping doesn't work. - class ignorewrapper(object): - def __init__(self, orig): - self.orig = orig - self.origignore = None - self.func = None - self.sparsematch = None - - def __get__(self, obj, type=None): - origignore = self.orig.__get__(obj) - - sparsematch = obj._sparsematcher - if sparsematch.always(): - return origignore - - if self.sparsematch != sparsematch or self.origignore != origignore: - self.func = matchmod.unionmatcher([ - origignore, matchmod.negatematcher(sparsematch)]) - self.sparsematch = sparsematch - self.origignore = origignore - return self.func - - def __set__(self, obj, value): - return self.orig.__set__(obj, value) - - def __delete__(self, obj): - return self.orig.__delete__(obj) - - replacefilecache(dirstate.dirstate, '_ignore', ignorewrapper) - @command('^debugsparse', [ ('I', 'include', False, _('include files in the sparse checkout')), ('X', 'exclude', False, _('exclude files in the sparse checkout')), diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py --- a/mercurial/dirstate.py +++ b/mercurial/dirstate.py @@ -248,7 +248,7 @@ class dirstate(object): return self._dirs @rootcache('.hgignore') - def _ignore(self): + def _hgignorematch(self): files = self._ignorefiles() if not files: return matchmod.never(self._root, '') @@ -256,6 +256,15 @@ class dirstate(object): pats = ['include:%s' % f for f in files] return matchmod.match(self._root, '', [], pats, warn=self._ui.warn) + @property + def _ignore(self): + sparsematch = self._sparsematcher + if sparsematch.always(): + return self._hgignorematch + + return matchmod.unionmatcher([self._hgignorematch, + matchmod.negatematcher(sparsematch)]) + @propertycache def _slash(self): return self._ui.configbool('ui', 'slash') and pycompat.ossep != '/' @@ -516,7 +525,7 @@ class dirstate(object): for a in ("_map", "_copymap", "_identity", "_filefoldmap", "_dirfoldmap", "_branch", - "_pl", "_dirs", "_ignore", "_nonnormalset", + "_pl", "_dirs", "_hgignorematcher", "_nonnormalset", "_otherparentset"): if a in self.__dict__: delattr(self, a)