Patchwork [1,of,4] schemas: add schemas to repositories

login
register
mail settings
Submitter Durham Goode
Date Aug. 7, 2013, 12:28 a.m.
Message ID <f6fca02b697a67bec6e6.1375835306@dev350.prn1.facebook.com>
Download mbox | patch
Permalink /patch/2011/
State Changes Requested
Headers show

Comments

Durham Goode - Aug. 7, 2013, 12:28 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1375827328 25200
#      Tue Aug 06 15:15:28 2013 -0700
# Node ID f6fca02b697a67bec6e6ca2c9a48d7c7d7f1d077
# Parent  3e34e7b223d10bbe8814f82d7a1f53575fe09096
schemas: add schemas to repositories

This adds the concept of schemas to repositories. A schema is a key/value pair
indicating alternative means of accessing the remote repository. For example,
the largefiles extension might provide a schema such as:

largefiles http://some.cdn.com/myrepo

When a user interacts with the repository, if the client knows how to handle
the largefiles schema, it will read the url and obtain largefiles from the
url. Thus letting you serve large files from a CDN.

The key difference between this and normal hgrc config options is that it is
copied across to the client during clone/pull. So if a client clones from
another client, they will also see the available schemas from the original
server. The logic for clone/pull will happen in a later patch.

This change enables an extension we're working on that will keep all of the
file history on the server, and clients will all have shallow repos.
Angel Ezquerra - Aug. 7, 2013, 6:34 a.m.
On Wed, Aug 7, 2013 at 2:28 AM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1375827328 25200
> #      Tue Aug 06 15:15:28 2013 -0700
> # Node ID f6fca02b697a67bec6e6ca2c9a48d7c7d7f1d077
> # Parent  3e34e7b223d10bbe8814f82d7a1f53575fe09096
> schemas: add schemas to repositories
>
> This adds the concept of schemas to repositories. A schema is a key/value pair
> indicating alternative means of accessing the remote repository. For example,
> the largefiles extension might provide a schema such as:
>
> largefiles http://some.cdn.com/myrepo
>
> When a user interacts with the repository, if the client knows how to handle
> the largefiles schema, it will read the url and obtain largefiles from the
> url. Thus letting you serve large files from a CDN.
>
> The key difference between this and normal hgrc config options is that it is
> copied across to the client during clone/pull. So if a client clones from
> another client, they will also see the available schemas from the original
> server. The logic for clone/pull will happen in a later patch.
>
> This change enables an extension we're working on that will keep all of the
> file history on the server, and clients will all have shallow repos.

Durham,

have you had a look at the projrc extension? You can find it at
http://mercurial.selenic.com/wiki/ProjrcExtension

It seems to cover the use case you describe (if I understood what your
extension does correctly) but it also seems to cover other important
use cases.

Basically, it lets you define a ".projrc" file which contains an hgrc
configuration that is transmitted to the client on clone and pull. The
client can configure which subset of the .projrc configuration is
willing to accept, on a server by server basis, which is meant to make
the extension as safe as such an extension can be (for example you can
chose to never accept extension or hook configurations).

This extension was developed by Martin Geisler for one of its clients
(Lantiq) and I have been maintaining it. It is bundled with
TortoiseHg.

I wonder if perhaps bundling the projrc extension with mercurial would
not cover your needs?

Now, regarding your patch, if you still want to go ahead with your
"schemas" file, perhaps it would be best if it had a more .hgrc like
syntax (i.e. "key= value", rather than "key value")?

Cheers,

Angel
Durham Goode - Aug. 7, 2013, 10:40 p.m.
On 8/6/13 11:34 PM, "Angel Ezquerra" <angel.ezquerra@gmail.com> wrote:

>On Wed, Aug 7, 2013 at 2:28 AM, Durham Goode <durham@fb.com> wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1375827328 25200
>> #      Tue Aug 06 15:15:28 2013 -0700
>> # Node ID f6fca02b697a67bec6e6ca2c9a48d7c7d7f1d077
>> # Parent  3e34e7b223d10bbe8814f82d7a1f53575fe09096
>> schemas: add schemas to repositories
>>
>> This adds the concept of schemas to repositories. A schema is a
>>key/value pair
>> indicating alternative means of accessing the remote repository. For
>>example,
>> the largefiles extension might provide a schema such as:
>>
>> largefiles http://some.cdn.com/myrepo
>>
>> When a user interacts with the repository, if the client knows how to
>>handle
>> the largefiles schema, it will read the url and obtain largefiles from
>>the
>> url. Thus letting you serve large files from a CDN.
>>
>> The key difference between this and normal hgrc config options is that
>>it is
>> copied across to the client during clone/pull. So if a client clones
>>from
>> another client, they will also see the available schemas from the
>>original
>> server. The logic for clone/pull will happen in a later patch.
>>
>> This change enables an extension we're working on that will keep all of
>>the
>> file history on the server, and clients will all have shallow repos.
>
>Durham,
>
>have you had a look at the projrc extension? You can find it at
>http://mercurial.selenic.com/wiki/ProjrcExtension
>
>It seems to cover the use case you describe (if I understood what your
>extension does correctly) but it also seems to cover other important
>use cases.
>
>Basically, it lets you define a ".projrc" file which contains an hgrc
>configuration that is transmitted to the client on clone and pull. The
>client can configure which subset of the .projrc configuration is
>willing to accept, on a server by server basis, which is meant to make
>the extension as safe as such an extension can be (for example you can
>chose to never accept extension or hook configurations).
>
>This extension was developed by Martin Geisler for one of its clients
>(Lantiq) and I have been maintaining it. It is bundled with
>TortoiseHg.
>
>I wonder if perhaps bundling the projrc extension with mercurial would
>not cover your needs?
>
>Now, regarding your patch, if you still want to go ahead with your
>"schemas" file, perhaps it would be best if it had a more .hgrc like
>syntax (i.e. "key= value", rather than "key value")?
>
>Cheers,
>
>Angel
>

I wasn't aware of the projrc extension, but I did discuss that concept
internally and with Matt. I think the problem is that it just isn't secure
enough.  If one of our users is dumb and sets up "servers = *, include =
*", they could execute arbitrary code from bitbucket within our network.

Durham
Angel Ezquerra - Aug. 7, 2013, 11:23 p.m.
On Aug 8, 2013 12:40 AM, "Durham Goode" <durham@fb.com> wrote:
>
> On 8/6/13 11:34 PM, "Angel Ezquerra" <angel.ezquerra@gmail.com> wrote:
>
> >On Wed, Aug 7, 2013 at 2:28 AM, Durham Goode <durham@fb.com> wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1375827328 25200
> >> #      Tue Aug 06 15:15:28 2013 -0700
> >> # Node ID f6fca02b697a67bec6e6ca2c9a48d7c7d7f1d077
> >> # Parent  3e34e7b223d10bbe8814f82d7a1f53575fe09096
> >> schemas: add schemas to repositories
> >>
> >> This adds the concept of schemas to repositories. A schema is a
> >>key/value pair
> >> indicating alternative means of accessing the remote repository. For
> >>example,
> >> the largefiles extension might provide a schema such as:
> >>
> >> largefiles http://some.cdn.com/myrepo
> >>
> >> When a user interacts with the repository, if the client knows how to
> >>handle
> >> the largefiles schema, it will read the url and obtain largefiles from
> >>the
> >> url. Thus letting you serve large files from a CDN.
> >>
> >> The key difference between this and normal hgrc config options is that
> >>it is
> >> copied across to the client during clone/pull. So if a client clones
> >>from
> >> another client, they will also see the available schemas from the
> >>original
> >> server. The logic for clone/pull will happen in a later patch.
> >>
> >> This change enables an extension we're working on that will keep all of
> >>the
> >> file history on the server, and clients will all have shallow repos.
> >
> >Durham,
> >
> >have you had a look at the projrc extension? You can find it at
> >http://mercurial.selenic.com/wiki/ProjrcExtension
> >
> >It seems to cover the use case you describe (if I understood what your
> >extension does correctly) but it also seems to cover other important
> >use cases.
> >
> >Basically, it lets you define a ".projrc" file which contains an hgrc
> >configuration that is transmitted to the client on clone and pull. The
> >client can configure which subset of the .projrc configuration is
> >willing to accept, on a server by server basis, which is meant to make
> >the extension as safe as such an extension can be (for example you can
> >chose to never accept extension or hook configurations).
> >
> >This extension was developed by Martin Geisler for one of its clients
> >(Lantiq) and I have been maintaining it. It is bundled with
> >TortoiseHg.
> >
> >I wonder if perhaps bundling the projrc extension with mercurial would
> >not cover your needs?
> >
> >Now, regarding your patch, if you still want to go ahead with your
> >"schemas" file, perhaps it would be best if it had a more .hgrc like
> >syntax (i.e. "key= value", rather than "key value")?
> >
> >Cheers,
> >
> >Angel
> >
>
> I wasn't aware of the projrc extension, but I did discuss that concept
> internally and with Matt. I think the problem is that it just isn't secure
> enough.  If one of our users is dumb and sets up "servers = *, include =
> *", they could execute arbitrary code from bitbucket within our network.
>
> Durham

I understand your concern but the extension does not blindly accept new
configurations. In fact I think it is actually pretty safe and it has been
developed to be as safe as possible. In particular by default the extension
requires confirmation whenever the projrc file changes. Currently it does
not show which settings changed but it could do so. It could also require
explicit confirmation before accepting changes to the "dangerous" sections.

Additionally the extension could be tweaked to be even safer. For example
the extension could be changed so that it would only accept projrc settings
from your internal servers. Another option would be to make it necessary to
explicitly include dangerous settings such as hooks and extensions (i.e.
include = * would only include safe settings).

On the other hand I feel that your proposed schemas functionality is a bit
narrow to be included as part of mercurial core. Distributing a common
mercurial config is a common problem. IMHO it would be nice if mercurial
offered a generic solution to that problem. I don't think your proposal is
such a solution.

Angel
Durham Goode - Aug. 7, 2013, 11:43 p.m.
On 8/7/13 4:23 PM, "Angel Ezquerra" <angel.ezquerra@gmail.com> wrote:

>
>On Aug 8, 2013 12:40 AM, "Durham Goode" <durham@fb.com> wrote:
>>
>>I wasn't aware of the projrc extension, but I did discuss that concept
>> internally and with Matt. I think the problem is that it just isn't
>>secure
>> enough.  If one of our users is dumb and sets up "servers = *, include =
>> *", they could execute arbitrary code from bitbucket within our network.
>>
>> Durham
>I understand your concern but the extension does not blindly accept new
>configurations. In fact I think it is actually pretty safe and it has
>been developed to be as safe as possible. In particular by default the
>extension requires confirmation
> whenever the projrc file changes. Currently it does not show which
>settings changed but it could do so. It could also require explicit
>confirmation before accepting changes to the "dangerous" sections.
>Additionally the extension could be tweaked to be even safer. For example
>the extension could be changed so that it would only accept projrc
>settings from your internal servers. Another option would be to make it
>necessary to explicitly include
> dangerous settings such as hooks and extensions (i.e. include = * would
>only include safe settings).
>On the other hand I feel that your proposed schemas functionality is a
>bit narrow to be included as part of mercurial core. Distributing a
>common mercurial config is a common problem. IMHO it would be nice if
>mercurial offered a generic solution
> to that problem. I don't think your proposal is such a solution.
>Angel


It's totally possible the schema stuff isn't appropriate for upstream
core. I'm open to that response from the community. The fact that we only
need a small subset of the projrc functionality and the fact that projrc
requires particularly careful consideration of how we can keep it secure,
means I'd rather implement the 30 lines I need and keep the security model
super simple. I am a security dummy and have zero faith in me being able
to design a secure generic solution to this problem.
Angel Ezquerra - Aug. 8, 2013, 6:36 a.m.
On Thu, Aug 8, 2013 at 1:43 AM, Durham Goode <durham@fb.com> wrote:
> On 8/7/13 4:23 PM, "Angel Ezquerra" <angel.ezquerra@gmail.com> wrote:
>
>>
>>On Aug 8, 2013 12:40 AM, "Durham Goode" <durham@fb.com> wrote:
>>>
>>>I wasn't aware of the projrc extension, but I did discuss that concept
>>> internally and with Matt. I think the problem is that it just isn't
>>>secure
>>> enough.  If one of our users is dumb and sets up "servers = *, include =
>>> *", they could execute arbitrary code from bitbucket within our network.
>>>
>>> Durham
>>I understand your concern but the extension does not blindly accept new
>>configurations. In fact I think it is actually pretty safe and it has
>>been developed to be as safe as possible. In particular by default the
>>extension requires confirmation
>> whenever the projrc file changes. Currently it does not show which
>>settings changed but it could do so. It could also require explicit
>>confirmation before accepting changes to the "dangerous" sections.
>>Additionally the extension could be tweaked to be even safer. For example
>>the extension could be changed so that it would only accept projrc
>>settings from your internal servers. Another option would be to make it
>>necessary to explicitly include
>> dangerous settings such as hooks and extensions (i.e. include = * would
>>only include safe settings).
>>On the other hand I feel that your proposed schemas functionality is a
>>bit narrow to be included as part of mercurial core. Distributing a
>>common mercurial config is a common problem. IMHO it would be nice if
>>mercurial offered a generic solution
>> to that problem. I don't think your proposal is such a solution.
>>Angel
>
>
> It's totally possible the schema stuff isn't appropriate for upstream
> core. I'm open to that response from the community.

Maybe I was too categorical. What I meant to say is that your proposal
seems a bit too narrow in scope. It would be nice if mercurial had a
built-in, secure way to distribute (default) settings.

That being said to a certain extent this is already the case for many
users, since TortoiseHg bundles the projrc extension.

Cheers,

Angel
Augie Fackler - Aug. 9, 2013, 3:04 p.m.
On Thu, Aug 08, 2013 at 08:36:29AM +0200, Angel Ezquerra wrote:
> On Thu, Aug 8, 2013 at 1:43 AM, Durham Goode <durham@fb.com> wrote:
> > On 8/7/13 4:23 PM, "Angel Ezquerra" <angel.ezquerra@gmail.com> wrote:
> >
> >>
> >>On Aug 8, 2013 12:40 AM, "Durham Goode" <durham@fb.com> wrote:
> >>>
> >>>I wasn't aware of the projrc extension, but I did discuss that concept
> >>> internally and with Matt. I think the problem is that it just isn't
> >>>secure
> >>> enough.  If one of our users is dumb and sets up "servers = *, include =
> >>> *", they could execute arbitrary code from bitbucket within our network.
> >>>
> >>> Durham
> >>I understand your concern but the extension does not blindly accept new
> >>configurations. In fact I think it is actually pretty safe and it has
> >>been developed to be as safe as possible. In particular by default the
> >>extension requires confirmation
> >> whenever the projrc file changes. Currently it does not show which
> >>settings changed but it could do so. It could also require explicit
> >>confirmation before accepting changes to the "dangerous" sections.
> >>Additionally the extension could be tweaked to be even safer. For example
> >>the extension could be changed so that it would only accept projrc
> >>settings from your internal servers. Another option would be to make it
> >>necessary to explicitly include
> >> dangerous settings such as hooks and extensions (i.e. include = * would
> >>only include safe settings).
> >>On the other hand I feel that your proposed schemas functionality is a
> >>bit narrow to be included as part of mercurial core. Distributing a
> >>common mercurial config is a common problem. IMHO it would be nice if
> >>mercurial offered a generic solution
> >> to that problem. I don't think your proposal is such a solution.
> >>Angel
> >
> >
> > It's totally possible the schema stuff isn't appropriate for upstream
> > core. I'm open to that response from the community.
>
> Maybe I was too categorical. What I meant to say is that your proposal
> seems a bit too narrow in scope. It would be nice if mercurial had a
> built-in, secure way to distribute (default) settings.

The difficulty here is that many config knobs can lead to arbitrary
code exection (eg [alias] stuff).

>
> That being said to a certain extent this is already the case for many
> users, since TortoiseHg bundles the projrc extension.

It's off by default, right?

>
> Cheers,
>
> Angel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Angel Ezquerra - Aug. 9, 2013, 4:56 p.m.
On Aug 9, 2013 5:04 PM, "Augie Fackler" <raf@durin42.com> wrote:
>
> On Thu, Aug 08, 2013 at 08:36:29AM +0200, Angel Ezquerra wrote:
> > On Thu, Aug 8, 2013 at 1:43 AM, Durham Goode <durham@fb.com> wrote:
> > > On 8/7/13 4:23 PM, "Angel Ezquerra" <angel.ezquerra@gmail.com> wrote:
> > >
> > >>
> > >>On Aug 8, 2013 12:40 AM, "Durham Goode" <durham@fb.com> wrote:
> > >>>
> > >>>I wasn't aware of the projrc extension, but I did discuss that
concept
> > >>> internally and with Matt. I think the problem is that it just isn't
> > >>>secure
> > >>> enough.  If one of our users is dumb and sets up "servers = *,
include =
> > >>> *", they could execute arbitrary code from bitbucket within our
network.
> > >>>
> > >>> Durham
> > >>I understand your concern but the extension does not blindly accept
new
> > >>configurations. In fact I think it is actually pretty safe and it has
> > >>been developed to be as safe as possible. In particular by default the
> > >>extension requires confirmation
> > >> whenever the projrc file changes. Currently it does not show which
> > >>settings changed but it could do so. It could also require explicit
> > >>confirmation before accepting changes to the "dangerous" sections.
> > >>Additionally the extension could be tweaked to be even safer. For
example
> > >>the extension could be changed so that it would only accept projrc
> > >>settings from your internal servers. Another option would be to make
it
> > >>necessary to explicitly include
> > >> dangerous settings such as hooks and extensions (i.e. include = *
would
> > >>only include safe settings).
> > >>On the other hand I feel that your proposed schemas functionality is a
> > >>bit narrow to be included as part of mercurial core. Distributing a
> > >>common mercurial config is a common problem. IMHO it would be nice if
> > >>mercurial offered a generic solution
> > >> to that problem. I don't think your proposal is such a solution.
> > >>Angel
> > >
> > >
> > > It's totally possible the schema stuff isn't appropriate for upstream
> > > core. I'm open to that response from the community.
> >
> > Maybe I was too categorical. What I meant to say is that your proposal
> > seems a bit too narrow in scope. It would be nice if mercurial had a
> > built-in, secure way to distribute (default) settings.
>
> The difficulty here is that many config knobs can lead to arbitrary
> code exection (eg [alias] stuff).
>
> >
> > That being said to a certain extent this is already the case for many
> > users, since TortoiseHg bundles the projrc extension.
>
> It's off by default, right?
>

Like mercurial, every extension we ship is off by default.

Cheers,

Angel

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -661,6 +661,47 @@ 
             bt[bn] = self._branchtip(heads)
         return bt
 
+    @repofilecache('schemas')
+    def schemas(self):
+        try:
+            file = self.opener('schemas')
+        except IOError, inst:
+            if inst.errno != errno.ENOENT:
+                raise
+            return {}
+
+        try:
+            schemas = {}
+            lines = file.readlines()
+            for line in lines:
+                if line.find(' ') > 0:
+                    k, v = line.split(' ', 1)
+                    schemas[k] = v[:-1]
+
+            return schemas
+        finally:
+            file.close()
+
+    def addschemas(self, schemas):
+        existingschemas = self.schemas
+        existingschemas.update(schemas)
+        try:
+            file = self.opener('schemas', 'w')
+        except IOError, inst:
+            if inst.errno != errno.ENOENT:
+                raise
+            return
+
+        try:
+            serialized = ""
+            for k, v in existingschemas.iteritems():
+                if len(k) > 0 and len(v) > 0:
+                    serialized += "%s %s\n" % (k, v)
+
+            file.write(serialized)
+        finally:
+            file.close()
+
     def lookup(self, key):
         return self[key].node()