Patchwork [RFC,v8] scmutil: add a simple key-value file helper

login
register
mail settings
Submitter Kostia Balytskyi
Date Dec. 14, 2016, 5:06 p.m.
Message ID <0973a68a3dcdfb7154e8.1481735187@dev1902.lla1.facebook.com>
Download mbox | patch
Permalink /patch/17910/
State Changes Requested
Headers show

Comments

Kostia Balytskyi - Dec. 14, 2016, 5:06 p.m.
# HG changeset patch
# User Kostia Balytskyi <ikostia@fb.com>
# Date 1481734993 28800
#      Wed Dec 14 09:03:13 2016 -0800
# Node ID 0973a68a3dcdfb7154e8ab27b7bb7028e97dd516
# Parent  95b076f5ddee5ab7bd208b8c2f42121ce9907129
scmutil: add a simple key-value file helper

The purpose of the added class is to serve purposes like save files of shelve
or state files of shelve, rebase and histedit. Keys of these files can be
alphanumeric and start with letters, while values must not contain newlines.
Keys which start with an uppercase letter are required, while other keys
are optional.

In light of Mercurial's reluctancy to use Python's json module, this tries
to provide a reasonable alternative for a non-nested named data.
Comparing to current approach of storing state in plain text files, where
semantic meaning of lines of text is only determined by their oreder,
simple key-value file allows for reordering lines and thus helps handle
optional values.

Initial use-case I see for this is obs-shelve's shelve files. Later we
can possibly migrate state files to this approach.

The test is in a new file beause I did not figure out where to put it
within existing test suite. If you give me a better idea, I will gladly
follow it.
Augie Fackler - Dec. 16, 2016, 6:35 p.m.
On Wed, Dec 14, 2016 at 09:06:27AM -0800, Kostia Balytskyi wrote:
> # HG changeset patch
> # User Kostia Balytskyi <ikostia@fb.com>
> # Date 1481734993 28800
> #      Wed Dec 14 09:03:13 2016 -0800
> # Node ID 0973a68a3dcdfb7154e8ab27b7bb7028e97dd516
> # Parent  95b076f5ddee5ab7bd208b8c2f42121ce9907129
> scmutil: add a simple key-value file helper
>

At this point, I think I'd like to see obs-shelve too. Is there a
place I can do that?

>
> diff --git a/mercurial/error.py b/mercurial/error.py
> --- a/mercurial/error.py
> +++ b/mercurial/error.py
> @@ -246,3 +246,6 @@ class UnsupportedBundleSpecification(Exc
>
>  class CorruptedState(Exception):
>      """error raised when a command is not able to read its state from file"""
> +
> +class MissingRequiredKey(Exception):
> +    """error raised when simple key-value file misses a required key"""

I probably would have put this next to the simplekeyvaluefile type,
but I don't actually care. Others may have strong opinions.

> diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
> --- a/mercurial/scmutil.py
> +++ b/mercurial/scmutil.py
> @@ -1571,3 +1571,123 @@ class checkambigatclosing(closewrapbase)
>      def close(self):
>          self._origfh.close()
>          self._checkambig()
> +
> +class simplekeyvaluefile(object):
> +    """A simple file with key=value lines
> +
> +    Keys must be alphanumerics and start with a letter, values must not
> +    contain '\n' characters


> +    >>> contents = {}
> +    >>> class fileobj(object):

This is...a lot of doctests. Can I convince you to make them
unittest.TestCase tests in tests instead? I think that'd be more
maintainable in the long term.

Take a look at test-bdiff.py for a short example, I'd call this
test-simplekeyvaluefile.py.

(I know we typically have a moratorium on new test files, but for a
unittest like this, I think that's strictly more maintainable.)

[elided 60-some lines of doctests]

> +    """
> +
> +    # if KEYS is non-empty, read values are validated against it:
> +    # each key is a tuple (keyname, required)
> +    KEYS = []

It looks like the only use of this is to check for required
keys. Could this instead be required_keys = set(), and then you could
make validate be:

missing = self.required_keys - set(d)
if missing:
  raise MissingRequiredKey('Missing required keys: %s' % ' '.join(missing))

Not a pressing concern though. I'm more worried about the difficulty
of maintaining doctests.

> +
> +    def __init__(self, vfs, path):
> +        self.vfs = vfs
> +        self.path = path
> +
> +    def validate(self, d):
> +        for key, req in self.KEYS:
> +            if req and key not in d:
> +                e = "missing a required key: '%s'" % key
> +                raise error.MissingRequiredKey(e)
> +
> +    def read(self):
> +        lines = self.vfs.readlines(self.path)
> +        try:
> +            d = dict(line[:-1].split('=', 1) for line in lines if line)
> +        except ValueError as e:
> +            raise error.CorruptedState(str(e))
> +        self.validate(d)
> +        return d
> +
> +    def write(self, data):
> +        """Write key=>value mapping to a file
> +        data is a dict. Keys must be alphanumerical and start with a letter.
> +        Values must not contain newline characters."""
> +        lines = []
> +        for k, v in data.items():
> +            if not k[0].isalpha():
> +                e = "keys must start with a letter in a key-value file"
> +                raise error.ProgrammingError(e)
> +            if not k.isalnum():
> +                e = "invalid key name in a simple key-value file"
> +                raise error.ProgrammingError(e)
> +            if '\n' in v:
> +                e = "invalid value in a simple key-value file"
> +                raise error.ProgrammingError(e)
> +            lines.append("%s=%s\n" % (k, v))
> +        with self.vfs(self.path, mode='wb', atomictemp=True) as fp:
> +            fp.write(''.join(lines))

I'm completely fine with this implementation. Seems clear, short, and useful.

> diff --git a/tests/test-doctest.py b/tests/test-doctest.py
> --- a/tests/test-doctest.py
> +++ b/tests/test-doctest.py
> @@ -29,6 +29,7 @@ testmod('mercurial.patch')
>  testmod('mercurial.pathutil')
>  testmod('mercurial.parser')
>  testmod('mercurial.revset')
> +testmod('mercurial.scmutil', optionflags=doctest.ELLIPSIS)
>  testmod('mercurial.store')
>  testmod('mercurial.subrepo')
>  testmod('mercurial.templatefilters')
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Augie Fackler - Dec. 16, 2016, 6:45 p.m.
On Fri, Dec 16, 2016 at 1:35 PM, Augie Fackler <raf@durin42.com> wrote:
>> scmutil: add a simple key-value file helper
>>
>
> At this point, I think I'd like to see obs-shelve too. Is there a
> place I can do that?

(Specifically: before we do a resend of this, I'd like to see the
client, so a link to a pastebin or repo where I could see a client on
this thread would be outstanding. THanks!)
Kostia Balytskyi - Dec. 20, 2016, 5:08 p.m.
On 12/16/2016 6:45 PM, Augie Fackler wrote:
> On Fri, Dec 16, 2016 at 1:35 PM, Augie Fackler <raf@durin42.com> wrote:

>>> scmutil: add a simple key-value file helper

>>>

>> At this point, I think I'd like to see obs-shelve too. Is there a

>> place I can do that?

> (Specifically: before we do a resend of this, I'd like to see the

> client, so a link to a pastebin or repo where I could see a client on

> this thread would be outstanding. THanks!)

Here's a pastebin of my shelve.py: http://pastebin.com/3CYmqf2J
Please note that I haven't worked on storing the actual shelved state in 
this file yet: that is a low-pri for me at the moment, but I want it to 
be available for when I have time.
See obsshelvefile on line 88. The use case in shelve is very primitive - 
I could just store the node in a text file and it would've been ok. But 
I wanted to have a potential for growth and a way to move states of 
other commands as well.

> _______________________________________________

> Mercurial-devel mailing list

> Mercurial-devel@mercurial-scm.org

> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Augie Fackler - Dec. 23, 2016, 3:33 p.m.
> On Dec 20, 2016, at 12:08 PM, Kostia Balytskyi <kobalyts@outlook.com> wrote:
> 
> Here's a pastebin of my shelve.py: http://pastebin.com/3CYmqf2J
> Please note that I haven't worked on storing the actual shelved state in 
> this file yet: that is a low-pri for me at the moment, but I want it to 
> be available for when I have time.
> See obsshelvefile on line 88. The use case in shelve is very primitive - 
> I could just store the node in a text file and it would've been ok. But 
> I wanted to have a potential for growth and a way to move states of 
> other commands as well.

Looks reasonable to me. Let’s iterate on this after January 1st (I’ll be paying less attention to email for roughly the next week, and I suspect others are in the same boat), preferably with the first pass of the shelve code in the series so we’ve got a client for the key value store coming in as we start using it.

Looks like nice infrastructure in general - I wish we had had it when histedit was being born.

Thanks!
Augie

Patch

diff --git a/mercurial/error.py b/mercurial/error.py
--- a/mercurial/error.py
+++ b/mercurial/error.py
@@ -246,3 +246,6 @@  class UnsupportedBundleSpecification(Exc
 
 class CorruptedState(Exception):
     """error raised when a command is not able to read its state from file"""
+
+class MissingRequiredKey(Exception):
+    """error raised when simple key-value file misses a required key"""
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -1571,3 +1571,123 @@  class checkambigatclosing(closewrapbase)
     def close(self):
         self._origfh.close()
         self._checkambig()
+
+class simplekeyvaluefile(object):
+    """A simple file with key=value lines
+
+    Keys must be alphanumerics and start with a letter, values must not
+    contain '\n' characters
+
+    >>> contents = {}
+    >>> class fileobj(object):
+    ...     def __init__(self, name):
+    ...         self.name = name
+    ...     def __enter__(self):
+    ...         return self
+    ...     def __exit__(self, *args, **kwargs):
+    ...         pass
+    ...     def write(self, text):
+    ...         contents[self.name] = text
+    ...     def read(self):
+    ...         return contents[self.name]
+    >>> class mockvfs(object):
+    ...     def read(self, path):
+    ...         return fileobj(path).read()
+    ...     def readlines(self, path):
+    ...         return fileobj(path).read().split('\\n')
+    ...     def __call__(self, path, mode, atomictemp):
+    ...         return fileobj(path)
+    >>> vfs = mockvfs()
+
+    Basic testing of whether simple key-value file works:
+    >>> d = {'key1': 'value1', 'Key2': 'value2'}
+    >>> simplekeyvaluefile(vfs, 'kvfile').write(d)
+    >>> print sorted(vfs.read('kvfile').split('\\n'))
+    ['', 'Key2=value2', 'key1=value1']
+
+    Testing of whether invalid keys are detected:
+    >>> d = {'0key1': 'value1', 'Key2': 'value2'}
+    >>> simplekeyvaluefile(vfs, 'kvfile').write(d)
+    Traceback (most recent call last):
+        ...
+    ProgrammingError: keys must start with a letter ...
+    >>> d = {'key1@': 'value1', 'Key2': 'value2'}
+    >>> simplekeyvaluefile(vfs, 'kvfile').write(d)
+    Traceback (most recent call last):
+        ...
+    ProgrammingError: invalid key name in a simple key-value file
+
+    Testing of whether invalid values are detected:
+    >>> d = {'key1': 'value1', 'Key2': 'value2\\n'}
+    >>> simplekeyvaluefile(vfs, 'kvfile').write(d)
+    Traceback (most recent call last):
+        ...
+    ProgrammingError: invalid value in a simple key-value file
+
+    Test cases when necessary keys are present
+    >>> d = {'key1': 'value1', 'Key2': 'value2'}
+    >>> simplekeyvaluefile(vfs, 'allkeyshere').write(d)
+    >>> class kvf(simplekeyvaluefile):
+    ...     KEYS = [('key3', False), ('Key2', True)]
+    >>> print sorted(kvf(vfs, 'allkeyshere').read().items())
+    [('Key2', 'value'), ('key1', 'value')]
+
+    Test cases when necessary keys are absent
+    >>> d = {'key1': 'value1', 'Key3': 'value2'}
+    >>> simplekeyvaluefile(vfs, 'missingkeys').write(d)
+    >>> class kvf(simplekeyvaluefile):
+    ...     KEYS = [('key3', False), ('Key2', True)]
+    >>> print sorted(kvf(vfs, 'missingkeys').read().items())
+    Traceback (most recent call last):
+        ...
+    MissingRequiredKey: missing a required key: 'Key2'
+
+    Test cases when file is not really a simple key-value file
+    >>> contents['badfile'] = 'ababagalamaga\\n'
+    >>> simplekeyvaluefile(vfs, 'badfile').read()
+    Traceback (most recent call last):
+        ...
+    CorruptedState: dictionary ... element #0 has length 1; 2 is required
+    """
+
+    # if KEYS is non-empty, read values are validated against it:
+    # each key is a tuple (keyname, required)
+    KEYS = []
+
+    def __init__(self, vfs, path):
+        self.vfs = vfs
+        self.path = path
+
+    def validate(self, d):
+        for key, req in self.KEYS:
+            if req and key not in d:
+                e = "missing a required key: '%s'" % key
+                raise error.MissingRequiredKey(e)
+
+    def read(self):
+        lines = self.vfs.readlines(self.path)
+        try:
+            d = dict(line[:-1].split('=', 1) for line in lines if line)
+        except ValueError as e:
+            raise error.CorruptedState(str(e))
+        self.validate(d)
+        return d
+
+    def write(self, data):
+        """Write key=>value mapping to a file
+        data is a dict. Keys must be alphanumerical and start with a letter.
+        Values must not contain newline characters."""
+        lines = []
+        for k, v in data.items():
+            if not k[0].isalpha():
+                e = "keys must start with a letter in a key-value file"
+                raise error.ProgrammingError(e)
+            if not k.isalnum():
+                e = "invalid key name in a simple key-value file"
+                raise error.ProgrammingError(e)
+            if '\n' in v:
+                e = "invalid value in a simple key-value file"
+                raise error.ProgrammingError(e)
+            lines.append("%s=%s\n" % (k, v))
+        with self.vfs(self.path, mode='wb', atomictemp=True) as fp:
+            fp.write(''.join(lines))
diff --git a/tests/test-doctest.py b/tests/test-doctest.py
--- a/tests/test-doctest.py
+++ b/tests/test-doctest.py
@@ -29,6 +29,7 @@  testmod('mercurial.patch')
 testmod('mercurial.pathutil')
 testmod('mercurial.parser')
 testmod('mercurial.revset')
+testmod('mercurial.scmutil', optionflags=doctest.ELLIPSIS)
 testmod('mercurial.store')
 testmod('mercurial.subrepo')
 testmod('mercurial.templatefilters')