Patchwork [12,of,20] hgweb, paper: add shortlogajax template and use it

login
register
mail settings
Submitter Alexander Plavin
Date Aug. 9, 2013, 6:57 p.m.
Message ID <d29091e49e209d58302e.1376074657@debian-alexander.dolgopa>
Download mbox | patch
Permalink /patch/2118/
State Changes Requested
Headers show

Comments

Alexander Plavin - Aug. 9, 2013, 6:57 p.m.
# HG changeset patch
# User Alexander Plavin <alexander@plav.in>
# Date 1376061214 -14400
#      Fri Aug 09 19:13:34 2013 +0400
# Node ID d29091e49e209d58302e2835a3cdf63b6df7f8f2
# Parent  d207510e86ed78f44521ce847283ac5abafa9eeb
hgweb, paper: add shortlogajax template and use it
Matt Mackall - Aug. 12, 2013, 6:16 p.m.
On Fri, 2013-08-09 at 22:57 +0400, Alexander Plavin wrote:
> # HG changeset patch
> # User Alexander Plavin <alexander@plav.in>
> # Date 1376061214 -14400
> #      Fri Aug 09 19:13:34 2013 +0400
> # Node ID d29091e49e209d58302e2835a3cdf63b6df7f8f2
> # Parent  d207510e86ed78f44521ce847283ac5abafa9eeb
> hgweb, paper: add shortlogajax template and use it

I get the extremely vague impression that this is the form that's used
by the client to fill in async requests..

If so, the sensible way to do it is to add a new xml style.
Alexander Plavin - Aug. 12, 2013, 6:39 p.m.
12.08.2013, 22:16, "Matt Mackall" <mpm@selenic.com>:

>  On Fri, 2013-08-09 at 22:57 +0400, Alexander Plavin wrote:
>>   # HG changeset patch
>>   # User Alexander Plavin <alexander@plav.in>
>>   # Date 1376061214 -14400
>>   #      Fri Aug 09 19:13:34 2013 +0400
>>   # Node ID d29091e49e209d58302e2835a3cdf63b6df7f8f2
>>   # Parent  d207510e86ed78f44521ce847283ac5abafa9eeb
>>   hgweb, paper: add shortlogajax template and use it
>  I get the extremely vague impression that this is the form that's used
>  by the client to fill in async requests..

You are right.

>  If so, the sensible way to do it is to add a new xml style.

The first idea and implementation I thought of was exactly this, a separate style (of course it still exists somehow, thanks to evolve :) ). It generated a well-structured xml style, i.e. all revision elements like hash or description were in separate machine-readable xml tags. However, this adds extra difficulty then: javascript will need to render this to html and add to this page, thus we need two things here. One is a separate template for entries, as the existent serverside one can't be reused for several reasons, and other is the actual template language and implementation of it (this language doesn't need to be as complex as serverside one of course, but it is needed).

On the other hand, the solution implemented here fully reuses serverside templates for the entries, as they are generated same way as usual shortlog page (it applies to other pages as well), just without all the markup: the rendered, complete html code is sent as a CDATA section, and js just has to add it to the page. Thus the only things needed to add neat ajaxy shortlog to a style (I mean theme here, like coal and so on) are creating this shortlogajax template (which must always be the same, so may be it's possible to define it as a default somewhere?) and add a small piece of js to shortlog template, which would call initialization function.

Or you have other suggestions here?

>  --
>  Mathematics is the supreme nostalgia of our time.
Matt Mackall - Aug. 12, 2013, 8:45 p.m.
On Mon, 2013-08-12 at 22:39 +0400, Alexander Plavin wrote:
> 12.08.2013, 22:16, "Matt Mackall" <mpm@selenic.com>:
> 
> >  On Fri, 2013-08-09 at 22:57 +0400, Alexander Plavin wrote:
> >>   # HG changeset patch
> >>   # User Alexander Plavin <alexander@plav.in>
> >>   # Date 1376061214 -14400
> >>   #      Fri Aug 09 19:13:34 2013 +0400
> >>   # Node ID d29091e49e209d58302e2835a3cdf63b6df7f8f2
> >>   # Parent  d207510e86ed78f44521ce847283ac5abafa9eeb
> >>   hgweb, paper: add shortlogajax template and use it
> >  I get the extremely vague impression that this is the form that's used
> >  by the client to fill in async requests..
> 
> You are right.
> 
> >  If so, the sensible way to do it is to add a new xml style.
> 
> The first idea and implementation I thought of was exactly this, a
> separate style (of course it still exists somehow, thanks to evolve :)
> ). It generated a well-structured xml style, i.e. all revision
> elements like hash or description were in separate machine-readable
> xml tags. However, this adds extra difficulty then: javascript will
> need to render this to html and add to this page, thus we need two
> things here. One is a separate template for entries, as the existent
> serverside one can't be reused for several reasons, and other is the
> actual template language and implementation of it (this language
> doesn't need to be as complex as serverside one of course, but it is
> needed).

I see your point. Yes, this does nicely solve the problem of rendering
in a DRY way. Though it seems it could be cleaner still: the client-side
javascript could just request plain-old shortlog and strip off the
wrapping HTML with a tiny bit more client-side code and no changes
server-side.

However, looking forward, we're probably going to want to do this sort
of thing in more than one place, and probably not in ways where the
pre-rendering helps. For instance, hover-over on an annotate line to get
commit info or similar. As this data won't even be available without JS,
there won't be any templates getting duplicated. So, while it may not
make sense for shortlog infinite scroll, in the general case, I think
AJAX features are going to want an XML style.
Alexander Plavin - Aug. 12, 2013, 10:57 p.m.
13.08.2013, 00:45, "Matt Mackall" <mpm@selenic.com>:
> On Mon, 2013-08-12 at 22:39 +0400, Alexander Plavin wrote:
>
>>  12.08.2013, 22:16, "Matt Mackall" <mpm@selenic.com>:
>>>   On Fri, 2013-08-09 at 22:57 +0400, Alexander Plavin wrote:
>>>>    # HG changeset patch
>>>>    # User Alexander Plavin <alexander@plav.in>
>>>>    # Date 1376061214 -14400
>>>>    #      Fri Aug 09 19:13:34 2013 +0400
>>>>    # Node ID d29091e49e209d58302e2835a3cdf63b6df7f8f2
>>>>    # Parent  d207510e86ed78f44521ce847283ac5abafa9eeb
>>>>    hgweb, paper: add shortlogajax template and use it
>>>   I get the extremely vague impression that this is the form that's used
>>>   by the client to fill in async requests..
>>  You are right.
>>>   If so, the sensible way to do it is to add a new xml style.
>>  The first idea and implementation I thought of was exactly this, a
>>  separate style (of course it still exists somehow, thanks to evolve :)
>>  ). It generated a well-structured xml style, i.e. all revision
>>  elements like hash or description were in separate machine-readable
>>  xml tags. However, this adds extra difficulty then: javascript will
>>  need to render this to html and add to this page, thus we need two
>>  things here. One is a separate template for entries, as the existent
>>  serverside one can't be reused for several reasons, and other is the
>>  actual template language and implementation of it (this language
>>  doesn't need to be as complex as serverside one of course, but it is
>>  needed).
>
> I see your point. Yes, this does nicely solve the problem of rendering
> in a DRY way. Though it seems it could be cleaner still: the client-side
> javascript could just request plain-old shortlog and strip off the
> wrapping HTML with a tiny bit more client-side code and no changes
> server-side.

I doubt that it would be the best solution. JS code needs to know the last-shown-revision hash or number, and if we just send the 'plain-old shortlog', we will need to get this info somehow. And as we can't predict all possible user-made custom themes (and motd, by the way - it will be on the page too), heuristics is the most that can be done there. So, as I see this problem, sending simple XML with ready HTML in it and a small piece of metadata (hash or number) is better.

>
> However, looking forward, we're probably going to want to do this sort
> of thing in more than one place, and probably not in ways where the
> pre-rendering helps. For instance, hover-over on an annotate line to get
> commit info or similar. As this data won't even be available without JS,
> there won't be any templates getting duplicated. So, while it may not
> make sense for shortlog infinite scroll, in the general case, I think
> AJAX features are going to want an XML style.

Fully agree here, though I didn't think much about other possible AJAXy things like annotate that you mentioned. And anyway, as for me, adding XML style should be done when it will be needed (it's almost clear that it will).

>
> --
> Mathematics is the supreme nostalgia of our time.
Martin Geisler - Aug. 13, 2013, 8:30 a.m.
Alexander Plavin <alexander@plav.in> writes:

> [...] the rendered, complete html code is sent as a CDATA section, and
> js just has to add it to the page.

Please ignore this if it's obvious to you, but remember that you need to
escape "]]>" in CDATA sections:

  http://en.wikipedia.org/wiki/CDATA#Uses_of_CDATA_sections

You will probably have to unescape this again on the client side?

In other words, using CDATA isn't really different from using normal
escaping of "<" and "&", it just feels that way because you (typically)
get to escape fewer characters.


Another thought that occured to me when I read the discussion about
writing an XML style: have you considered writing a JSON style instead?
That might be even more useful for JavaScript code.

Other applications might benefit from such a style too, even on the
command line where I would prefer to parse JSON encoded 'hg log' output
over XML output. That of course depends on whether JSON or XML has
better support in the environment where you work.
Alexander Plavin - Aug. 13, 2013, 8:44 a.m.
13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
> Alexander Plavin <alexander@plav.in> writes:
>
>>  [...] the rendered, complete html code is sent as a CDATA section, and
>>  js just has to add it to the page.
>
> Please ignore this if it's obvious to you, but remember that you need to
> escape "]]>" in CDATA sections:
>
>   http://en.wikipedia.org/wiki/CDATA#Uses_of_CDATA_sections
>
> You will probably have to unescape this again on the client side?
>
> In other words, using CDATA isn't really different from using normal
> escaping of "<" and "&", it just feels that way because you (typically)
> get to escape fewer characters.

Heh, it's true of course, but there can't occur such character sequence (as I understand). The content of the CDATA section is just HTML markup with correct tags and all the special characters in text (e.g. cset description) escaped.

>
> Another thought that occured to me when I read the discussion about
> writing an XML style: have you considered writing a JSON style instead?
> That might be even more useful for JavaScript code.

In this application (I mean infinite scrolling) json style doesn't give any differences/benefits to xml one, as we would have to render the templates in JS anyway. The only difference for future uses in javascript is using xfr.responsexml vs json.parse(xfr.responsetext). So, this variants are about equivalent and one of them just has to be chosen.
However, I don't know what will be best here, and would like to hear more opinions on this.

>
> Other applications might benefit from such a style too, even on the
> command line where I would prefer to parse JSON encoded 'hg log' output
> over XML output. That of course depends on whether JSON or XML has
> better support in the environment where you work.
>
> --
> Martin Geisler
Laurens Holst - Aug. 13, 2013, 3:44 p.m.
Op 13-08-13 10:44, Alexander Plavin schreef:
>> Another thought that occured to me when I read the discussion about
>> writing an XML style: have you considered writing a JSON style instead?
>> That might be even more useful for JavaScript code.
> In this application (I mean infinite scrolling) json style doesn't give any differences/benefits to xml one, as we would have to render the templates in JS anyway. The only difference for future uses in javascript is using xfr.responsexml vs json.parse(xfr.responsetext).

Not really, if you consider iterating over elements, getting values and 
such as well.

> So, this variants are about equivalent and one of them just has to be chosen.
> However, I don't know what will be best here, and would like to hear more opinions on this.

Personally I would go with JSON as well. Much easier to process.

~Laurens
Matt Mackall - Aug. 14, 2013, 8:31 p.m.
On Tue, 2013-08-13 at 23:24 -0500, Kevin Bullock wrote:
> On 13 Aug 2013, at 3:44 AM, Alexander Plavin wrote:
> 
> > 13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
> >> Another thought that occured to me when I read the discussion about
> >> writing an XML style: have you considered writing a JSON style instead?
> >> That might be even more useful for JavaScript code.
> > 
> > In this application (I mean infinite scrolling) json style doesn't give any differences/benefits to xml one, as we would have to render the templates in JS anyway. The only difference for future uses in javascript is using xfr.responsexml vs json.parse(xfr.responsetext). So, this variants are about equivalent and one of them just has to be chosen.
> > However, I don't know what will be best here, and would like to hear more opinions on this.
> 
> I'd lean towards JSON as well. There are solid, widely-available
> parsers for both, but JSON is generally more compact, and more widely
> used by newer client-side libraries.

Note that both JSON and XML have a serious problem that will need to be
addressed before we can use them: can't pass arbitrary character sets.
Alexander Plavin - Aug. 14, 2013, 8:48 p.m.
15.08.2013, 00:32, "Matt Mackall" <mpm@selenic.com>:
> On Tue, 2013-08-13 at 23:24 -0500, Kevin Bullock wrote:
>
>>  On 13 Aug 2013, at 3:44 AM, Alexander Plavin wrote:
>>>  13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
>>>>  Another thought that occured to me when I read the discussion about
>>>>  writing an XML style: have you considered writing a JSON style instead?
>>>>  That might be even more useful for JavaScript code.
>>>  In this application (I mean infinite scrolling) json style doesn't give any differences/benefits to xml one, as we would have to render the templates in JS anyway. The only difference for future uses in javascript is using xfr.responsexml vs json.parse(xfr.responsetext). So, this variants are about equivalent and one of them just has to be chosen.
>>>  However, I don't know what will be best here, and would like to hear more opinions on this.
>>  I'd lean towards JSON as well. There are solid, widely-available
>>  parsers for both, but JSON is generally more compact, and more widely
>>  used by newer client-side libraries.
>
> Note that both JSON and XML have a serious problem that will need to be
> addressed before we can use them: can't pass arbitrary character sets.

What exactly can't we pass there? According to the json rfc, "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through +001F).". I found it at http://www.ietf.org/rfc/rfc4627.txt (2.5).

>
> --
> Mathematics is the supreme nostalgia of our time.
Matt Mackall - Aug. 15, 2013, 6:28 a.m.
On Thu, 2013-08-15 at 00:48 +0400, Alexander Plavin wrote:
> 
> 15.08.2013, 00:32, "Matt Mackall" <mpm@selenic.com>:
> > On Tue, 2013-08-13 at 23:24 -0500, Kevin Bullock wrote:
> >
> >>  On 13 Aug 2013, at 3:44 AM, Alexander Plavin wrote:
> >>>  13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
> >>>>  Another thought that occured to me when I read the discussion about
> >>>>  writing an XML style: have you considered writing a JSON style instead?
> >>>>  That might be even more useful for JavaScript code.
> >>>  In this application (I mean infinite scrolling) json style doesn't give any differences/benefits to xml one, as we would have to render the templates in JS anyway. The only difference for future uses in javascript is using xfr.responsexml vs json.parse(xfr.responsetext). So, this variants are about equivalent and one of them just has to be chosen.
> >>>  However, I don't know what will be best here, and would like to hear more opinions on this.
> >>  I'd lean towards JSON as well. There are solid, widely-available
> >>  parsers for both, but JSON is generally more compact, and more widely
> >>  used by newer client-side libraries.
> >
> > Note that both JSON and XML have a serious problem that will need to be
> > addressed before we can use them: can't pass arbitrary character sets.
> 
> What exactly can't we pass there? According to the json rfc, "All
> Unicode characters may be placed within the quotation marks except for
> the characters that must be escaped: quotation mark, reverse solidus,
> and the control characters (U+0000 through +001F).". I found it at
> http://www.ietf.org/rfc/rfc4627.txt (2.5).

Eight bit bytes in undeclared encoding. In Mercurial, that's filenames
and file contents. See:

http://mercurial.selenic.com/wiki/GenericTemplatingPlan?highlight=%
28utf-8b%29#JSON.2C_XML.2C_and_encoding_troubles

Also, neither JSON nor XML can handle strings with NUL bytes at all.
Martin Geisler - Aug. 15, 2013, 7:45 a.m.
Alexander Plavin <alexander@plav.in> writes:

> 15.08.2013, 00:32, "Matt Mackall" <mpm@selenic.com>:
>> On Tue, 2013-08-13 at 23:24 -0500, Kevin Bullock wrote:
>>
>>> On 13 Aug 2013, at 3:44 AM, Alexander Plavin wrote:
>>>> 13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
>>>>
>>>>> Another thought that occured to me when I read the discussion
>>>>> about writing an XML style: have you considered writing a JSON
>>>>> style instead? That might be even more useful for JavaScript code.
>>>>
>>>> In this application (I mean infinite scrolling) json style doesn't
>>>> give any differences/benefits to xml one, as we would have to
>>>> render the templates in JS anyway. The only difference for future
>>>> uses in javascript is using xfr.responsexml vs
>>>> json.parse(xfr.responsetext). So, this variants are about
>>>> equivalent and one of them just has to be chosen.   However, I
>>>> don't know what will be best here, and would like to hear more
>>>> opinions on this.
>>>
>>> I'd lean towards JSON as well. There are solid, widely-available
>>> parsers for both, but JSON is generally more compact, and more
>>> widely used by newer client-side libraries.
>>
>> Note that both JSON and XML have a serious problem that will need to
>> be addressed before we can use them: can't pass arbitrary character
>> sets.
>
> What exactly can't we pass there? According to the json rfc, "All
> Unicode characters may be placed within the quotation marks except for
> the characters that must be escaped: quotation mark, reverse solidus,
> and the control characters (U+0000 through +001F).". I found it at
> http://www.ietf.org/rfc/rfc4627.txt (2.5).

We don't know the encoding of things like file names and file content.
Hgweb will just dump the raw bytestrings to the output and unless the
encoding matches HGENCODING, filenames will look funny in the browser.

Try it by setting HGENCODING to something like ascii, latin-1, or utf-8
and run hg serve on a repo with non-ASCII characters.

It is only meta data like commit messages and usernames that we try to
decode and store as UTF-8 internally. The encoding.fromlocal function
does this. The encoding.tolocal convertes the other way, from internal
UTF-8 to the local encoding.

Give that encoding.fromlocal always return a UTF-8 string (or raises an
exception), I believe we can safely use JSON for the strings that are in
the "local" encoding.

Looking in changelog.read shows that user and desc are converted to
local encoding upon read. Looking in context.branch shows that the
branch name is in local encoding too. Tags and bookmarks are also read
with encoding.tolocal.

So a tooltip which shows the username, commit message, bookmarks and
branch names could be implemented. Showing the filenames for a commit is
more problematic since you cannot be sure that they can be JSON encoded.
Alexander Plavin - Aug. 15, 2013, 9:27 a.m.
15.08.2013, 11:45, "Martin Geisler" <martin@geisler.net>:
> Alexander Plavin <alexander@plav.in> writes:
>
>>  15.08.2013, 00:32, "Matt Mackall" <mpm@selenic.com>:
>>>  On Tue, 2013-08-13 at 23:24 -0500, Kevin Bullock wrote:
>>>>  On 13 Aug 2013, at 3:44 AM, Alexander Plavin wrote:
>>>>>  13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
>>>>>>  Another thought that occured to me when I read the discussion
>>>>>>  about writing an XML style: have you considered writing a JSON
>>>>>>  style instead? That might be even more useful for JavaScript code.
>>>>>  In this application (I mean infinite scrolling) json style doesn't
>>>>>  give any differences/benefits to xml one, as we would have to
>>>>>  render the templates in JS anyway. The only difference for future
>>>>>  uses in javascript is using xfr.responsexml vs
>>>>>  json.parse(xfr.responsetext). So, this variants are about
>>>>>  equivalent and one of them just has to be chosen.   However, I
>>>>>  don't know what will be best here, and would like to hear more
>>>>>  opinions on this.
>>>>  I'd lean towards JSON as well. There are solid, widely-available
>>>>  parsers for both, but JSON is generally more compact, and more
>>>>  widely used by newer client-side libraries.
>>>  Note that both JSON and XML have a serious problem that will need to
>>>  be addressed before we can use them: can't pass arbitrary character
>>>  sets.
>>  What exactly can't we pass there? According to the json rfc, "All
>>  Unicode characters may be placed within the quotation marks except for
>>  the characters that must be escaped: quotation mark, reverse solidus,
>>  and the control characters (U+0000 through +001F).". I found it at
>>  http://www.ietf.org/rfc/rfc4627.txt (2.5).
>
> We don't know the encoding of things like file names and file content.
> Hgweb will just dump the raw bytestrings to the output and unless the
> encoding matches HGENCODING, filenames will look funny in the browser.
>
> Try it by setting HGENCODING to something like ascii, latin-1, or utf-8
> and run hg serve on a repo with non-ASCII characters.
>
> It is only meta data like commit messages and usernames that we try to
> decode and store as UTF-8 internally. The encoding.fromlocal function
> does this. The encoding.tolocal convertes the other way, from internal
> UTF-8 to the local encoding.
>
> Give that encoding.fromlocal always return a UTF-8 string (or raises an
> exception), I believe we can safely use JSON for the strings that are in
> the "local" encoding.
>
> Looking in changelog.read shows that user and desc are converted to
> local encoding upon read. Looking in context.branch shows that the
> branch name is in local encoding too. Tags and bookmarks are also read
> with encoding.tolocal.
>
> So a tooltip which shows the username, commit message, bookmarks and
> branch names could be implemented. Showing the filenames for a commit is
> more problematic since you cannot be sure that they can be JSON encoded.

Why can't we encode the characters outside of the jsonable range, or just strip them off? Hgweb outputs them now in html pages as strings, and I don't see any reason why it's not possible to add filter jsonescape to process this correctly.

>
> --
> Martin Geisler
Antoine Pitrou - Aug. 20, 2013, 9:18 a.m.
Martin Geisler <martin <at> geisler.net> writes:
> So a tooltip which shows the username, commit message, bookmarks and
> branch names could be implemented. Showing the filenames for a commit is
> more problematic since you cannot be sure that they can be JSON encoded.

How is it more problematic than representing them under HTML form?
If you're saying hgweb can put undecodable byte sequences in its generated
HTML, well this is ugly as hell ("mojibake").

What you're thinking of as a technical representation issue (i.e. encoding)
is really a visual presentation issue. You want to present data to the user,
but you don't know what the data means - since you refused to infer a
character encoding when saving the filenames. Human beings can read text,
they can't read bytestreams. If Mercurial loses information as to how
to interpret a piece of data, then Mercurial has a problem - not necessarily
JSON or XML ;-)

Regards

Antoine.
Martin Geisler - Aug. 20, 2013, 3:27 p.m.
Antoine Pitrou <solipsis@pitrou.net> writes:

> Martin Geisler <martin <at> geisler.net> writes:
>> So a tooltip which shows the username, commit message, bookmarks and
>> branch names could be implemented. Showing the filenames for a commit
>> is more problematic since you cannot be sure that they can be JSON
>> encoded.
>
> How is it more problematic than representing them under HTML form? If
> you're saying hgweb can put undecodable byte sequences in its
> generated HTML, well this is ugly as hell ("mojibake").

I'm afraid that this is the situation -- like the rest of Mercurial,
hgweb is working on byte strings internally and wont hesitate to output
pages with mixed encodings.

Patches are an easy example: they're not transcoded when you view them
in hgweb, so unless HGENCODING matches the patch encoding, you'll have a
page with mojibake.

> What you're thinking of as a technical representation issue (i.e.
> encoding) is really a visual presentation issue. You want to present
> data to the user, but you don't know what the data means - since you
> refused to infer a character encoding when saving the filenames. Human
> beings can read text, they can't read bytestreams. If Mercurial loses
> information as to how to interpret a piece of data, then Mercurial has
> a problem - not necessarily JSON or XML ;-)

Yeah, it's no secret that I agree with that. Keeping everything as byte
strings internally removes the risk or getting UnicodeErrors at runtime,
but it also makes it difficult to reason about the data later.

Matt often talks about the Makefile problem (silently changing the
filename encoding without changing the Makefile encoding), but when I've
tried to reproduce the problem, I found non-ASCII filenames to be not
portable in the first place: Make could not find the files Mercurial
checked out with non-ASCII characters in their names.

The example repo is here:

  https://bitbucket.org/mg/makefile-problem/

and testing with Window 7 and GnuWin32 Make 3.81 still fails.

I've added a new commit there with the Makefile encoded in OEM 437
encoding, which make 'type Makefile' look correct in the Command Prompt.
But make fails with:

  make: *** No rule to make target ``ble.txt', needed by `p`re.txt'.

I don't know if there is an encoding that will let Make work with
non-ASCII characters on Windows?
Laurens Holst - Aug. 20, 2013, 5:02 p.m.
Op 20-08-13 17:27, Martin Geisler schreef:
> Antoine Pitrou <solipsis@pitrou.net> writes:
>
>> Martin Geisler <martin <at> geisler.net> writes:
>>> So a tooltip which shows the username, commit message, bookmarks and
>>> branch names could be implemented. Showing the filenames for a commit
>>> is more problematic since you cannot be sure that they can be JSON
>>> encoded.
>> How is it more problematic than representing them under HTML form? If
>> you're saying hgweb can put undecodable byte sequences in its
>> generated HTML, well this is ugly as hell ("mojibake").
> I'm afraid that this is the situation -- like the rest of Mercurial,
> hgweb is working on byte strings internally and wont hesitate to output
> pages with mixed encodings.
>
> Patches are an easy example: they're not transcoded when you view them
> in hgweb, so unless HGENCODING matches the patch encoding, you'll have a
> page with mojibake.
>
>> What you're thinking of as a technical representation issue (i.e.
>> encoding) is really a visual presentation issue. You want to present
>> data to the user, but you don't know what the data means - since you
>> refused to infer a character encoding when saving the filenames. Human
>> beings can read text, they can't read bytestreams. If Mercurial loses
>> information as to how to interpret a piece of data, then Mercurial has
>> a problem - not necessarily JSON or XML ;-)
> Yeah, it's no secret that I agree with that. Keeping everything as byte
> strings internally removes the risk or getting UnicodeErrors at runtime,
> but it also makes it difficult to reason about the data later.
>
> Matt often talks about the Makefile problem (silently changing the
> filename encoding without changing the Makefile encoding), but when I've
> tried to reproduce the problem, I found non-ASCII filenames to be not
> portable in the first place: Make could not find the files Mercurial
> checked out with non-ASCII characters in their names.
>
> The example repo is here:
>
>    https://bitbucket.org/mg/makefile-problem/
>
> and testing with Window 7 and GnuWin32 Make 3.81 still fails.
>
> I've added a new commit there with the Makefile encoded in OEM 437
> encoding, which make 'type Makefile' look correct in the Command Prompt.
> But make fails with:
>
>    make: *** No rule to make target ``ble.txt', needed by `p`re.txt'.
>
> I don't know if there is an encoding that will let Make work with
> non-ASCII characters on Windows?
>

This is the plan:

http://mercurial.selenic.com/wiki/WindowsUTF8Plan

However nobody’s implemented it yet afaik.

(Btw, Linux and OS X users are mostly already using UTF-8. If the 
WindowsUTF8Plan is implemented Windows users will be as well. So this is 
a problem that in time will mostly solve itself.)

~Laurens
Laurens Holst - Aug. 20, 2013, 5:14 p.m.
Op 14-08-13 22:31, Matt Mackall schreef:
> On Tue, 2013-08-13 at 23:24 -0500, Kevin Bullock wrote:
>> On 13 Aug 2013, at 3:44 AM, Alexander Plavin wrote:
>>
>>> 13.08.2013, 12:30, "Martin Geisler" <martin@geisler.net>:
>>>> Another thought that occured to me when I read the discussion about
>>>> writing an XML style: have you considered writing a JSON style instead?
>>>> That might be even more useful for JavaScript code.
>>> In this application (I mean infinite scrolling) json style doesn't give any differences/benefits to xml one, as we would have to render the templates in JS anyway. The only difference for future uses in javascript is using xfr.responsexml vs json.parse(xfr.responsetext). So, this variants are about equivalent and one of them just has to be chosen.
>>> However, I don't know what will be best here, and would like to hear more opinions on this.
>> I'd lean towards JSON as well. There are solid, widely-available
>> parsers for both, but JSON is generally more compact, and more widely
>> used by newer client-side libraries.
> Note that both JSON and XML have a serious problem that will need to be
> addressed before we can use them: can't pass arbitrary character sets.

JSON specifies that the encoding must be a form of Unicode [1].

XML does not mandate a specific encoding, although it does specify that 
if such encoding is not UTF-8 and there is no external encoding 
information, it must have an <?xml encoding?> declaration [2].

Then, I presume you can specify the encoding as:

     <?xml encoding=" unknown-8bit"?>

This is an IANA-registered encoding that's afaik also used by patchbomb.

So that would be a solid reason to favour XML over JSON.

When delivered over HTTP for hgweb consumption (as is the case here), I 
would pass the encoding specified in web.encoding in the Content-Type 
header. Content-Type overrides any encoding specified in the document 
itself, so you don't need to remove the above mentioned encoding 
declaration from the document.

~Laurens

[1] http://tools.ietf.org/html/rfc4627#section-3
[2] http://www.w3.org/TR/REC-xml/#charencoding

Patch

diff -r d207510e86ed -r d29091e49e20 mercurial/hgweb/webcommands.py
--- a/mercurial/hgweb/webcommands.py	Wed Jul 24 20:02:34 2013 +0400
+++ b/mercurial/hgweb/webcommands.py	Fri Aug 09 19:13:34 2013 +0400
@@ -317,7 +317,11 @@ 
     latestentry = entries[:1]
     oldestentry = entries[-1:]
 
-    return tmpl(shortlog and 'shortlog' or 'changelog', changenav=changenav,
+    template = shortlog and 'shortlog' or 'changelog'
+    if 'ajax' in req.form:
+        template += 'ajax'
+
+    return tmpl(template, changenav=changenav,
                 node=ctx.hex(), rev=pos, changesets=count,
                 entries=entries,
                 latestentry=latestentry, oldestentry=oldestentry,
diff -r d207510e86ed -r d29091e49e20 mercurial/templates/paper/map
--- a/mercurial/templates/paper/map	Wed Jul 24 20:02:34 2013 +0400
+++ b/mercurial/templates/paper/map	Fri Aug 09 19:13:34 2013 +0400
@@ -7,6 +7,10 @@ 
 
 changelog = shortlog.tmpl
 shortlog = shortlog.tmpl
+shortlogajax = '<shortlog>
+    <html><![CDATA[{entries%shortlogentry}]]></html>
+    <lasthash>{oldestentry%"{node|short}"}</lasthash>
+  </shortlog>'
 shortlogentry = shortlogentry.tmpl
 graph = graph.tmpl
 help = help.tmpl