Submitter | Matt Mackall |
---|---|
Date | May 21, 2013, 3:39 p.m. |
Message ID | <1369150765.29469.20.camel@calx> |
Download | mbox | patch |
Permalink | /patch/1659/ |
State | Rejected, archived |
Headers | show |
Comments
Matt Mackall <mpm@selenic.com> writes: > On Mon, 2013-05-20 at 19:51 -0300, Wagner Bruna wrote: >> Em 16-05-2013 15:58, Matt Mackall escreveu: >> > On Wed, 2013-05-15 at 21:37 -0500, Kevin Bullock wrote: >> >> # HG changeset patch >> >> # User Kevin Bullock <kbullock@ringworld.org> >> >> # Date 1368671779 18000 >> >> # Wed May 15 21:36:19 2013 -0500 >> >> # Branch stable >> >> # Node ID fc1c4221dd82de958b9be5f05c57622679625d21 >> >> # Parent 278057693a1ddb93f95fa641e30e7a966ac98434 >> >> i18n: fix untranslated prompts with translated responses (issue3936) >> > >> > Queued for stable, thanks. >> > >> > I'd like to see a consensus among translators on how to do this. >> >> IMHO we should always accept the English keys, even with translated prompts >> (to help with muscle memory). And, ideally, detect conflicts at build_mo time. >> >> > I also thing we should unify all the string args used by prompt() into a >> > single string so that translators have the full context. >> >> Seems like a good approach (as long as we ensure there's no paragraph boundary >> on that single string, of course). > > Ok, here's a typical use: > > elif repo.ui.promptchoice( > _("local changed %s which remote deleted\n" > "use (c)hanged version or (d)elete?") % f, > (_("&Changed"), _("&Delete")), 0): > > How shall we format this string to include all the components? Perhaps: > > _("local changed %s which remote deleted\n" > "use (c)changed version or (d)elete?" > "\f&Changed" > "\f&Delete") > > I think something like \f or \b (not \t) will work well as a separator > but I have no idea if our l10n tools will agree with me. Gettext will warn when it sees an escape sequence other than \n and \t. It writes: warning: internationalized messages should not contain the `\f' escape sequence This was apparently introduced around 2005: http://lists.gnu.org/archive/html/bug-gnu-utils/2005-05/msg00065.html I found another mail that explains the rationale a bit more: POT files are supposed to contain just \n and xgettext will apparently turn \r\n into \n in the POT file: http://lists.gnu.org/archive/html/bug-gnu-utils/2010-08/msg00021.html When I quickly tested this, I found that a string with \f wasn't translated, despite showing up in the hg.pot file. Would \t not be an easier choice here or are you afraid that it will be difficult for translators to produce a TAB character in the translation? I've only used Emacs where the control characters show up as \t which makes it easy to edit them.
On Tue, 2013-05-21 at 21:50 +0200, Martin Geisler wrote: > Matt Mackall <mpm@selenic.com> writes: > > > On Mon, 2013-05-20 at 19:51 -0300, Wagner Bruna wrote: > >> Em 16-05-2013 15:58, Matt Mackall escreveu: > >> > On Wed, 2013-05-15 at 21:37 -0500, Kevin Bullock wrote: > >> >> # HG changeset patch > >> >> # User Kevin Bullock <kbullock@ringworld.org> > >> >> # Date 1368671779 18000 > >> >> # Wed May 15 21:36:19 2013 -0500 > >> >> # Branch stable > >> >> # Node ID fc1c4221dd82de958b9be5f05c57622679625d21 > >> >> # Parent 278057693a1ddb93f95fa641e30e7a966ac98434 > >> >> i18n: fix untranslated prompts with translated responses (issue3936) > >> > > >> > Queued for stable, thanks. > >> > > >> > I'd like to see a consensus among translators on how to do this. > >> > >> IMHO we should always accept the English keys, even with translated prompts > >> (to help with muscle memory). And, ideally, detect conflicts at build_mo time. > >> > >> > I also thing we should unify all the string args used by prompt() into a > >> > single string so that translators have the full context. > >> > >> Seems like a good approach (as long as we ensure there's no paragraph boundary > >> on that single string, of course). > > > > Ok, here's a typical use: > > > > elif repo.ui.promptchoice( > > _("local changed %s which remote deleted\n" > > "use (c)hanged version or (d)elete?") % f, > > (_("&Changed"), _("&Delete")), 0): > > > > How shall we format this string to include all the components? Perhaps: > > > > _("local changed %s which remote deleted\n" > > "use (c)changed version or (d)elete?" > > "\f&Changed" > > "\f&Delete") > > > > I think something like \f or \b (not \t) will work well as a separator > > but I have no idea if our l10n tools will agree with me. > > Gettext will warn when it sees an escape sequence other than \n and \t. > It writes: > > warning: internationalized messages should not contain the `\f' escape > sequence Bah. > This was apparently introduced around 2005: > > http://lists.gnu.org/archive/html/bug-gnu-utils/2005-05/msg00065.html > > I found another mail that explains the rationale a bit more: POT files > are supposed to contain just \n and xgettext will apparently turn \r\n > into \n in the POT file: > > http://lists.gnu.org/archive/html/bug-gnu-utils/2010-08/msg00021.html > > When I quickly tested this, I found that a string with \f wasn't > translated, despite showing up in the hg.pot file. Huh. > Would \t not be an easier choice here or are you afraid that it will be > difficult for translators to produce a TAB character in the translation? Mostly, I thought there was a possibility we might actually want to use \t as a real tab, whereas I'm quite sure we'll never want to use \f. Frankly, my first thought was \0. I also don't want to preclude using any of the printing ASCII characters either. But now I'm reminded that we have Shift-JIS out there, which means we can't reliably split translated strings on single ASCII bytes anyway. Perhaps '$$' as the separator?
Em 21-05-2013 16:50, Martin Geisler escreveu: > Matt Mackall <mpm@selenic.com> writes: > >> On Mon, 2013-05-20 at 19:51 -0300, Wagner Bruna wrote: >>> Em 16-05-2013 15:58, Matt Mackall escreveu: >>>> On Wed, 2013-05-15 at 21:37 -0500, Kevin Bullock wrote: >>>>> # HG changeset patch >>>>> # User Kevin Bullock <kbullock@ringworld.org> >>>>> # Date 1368671779 18000 >>>>> # Wed May 15 21:36:19 2013 -0500 >>>>> # Branch stable >>>>> # Node ID fc1c4221dd82de958b9be5f05c57622679625d21 >>>>> # Parent 278057693a1ddb93f95fa641e30e7a966ac98434 >>>>> i18n: fix untranslated prompts with translated responses (issue3936) >>>> >>>> Queued for stable, thanks. >>>> >>>> I'd like to see a consensus among translators on how to do this. >>> >>> IMHO we should always accept the English keys, even with translated prompts >>> (to help with muscle memory). And, ideally, detect conflicts at build_mo time. >>> >>>> I also thing we should unify all the string args used by prompt() into a >>>> single string so that translators have the full context. >>> >>> Seems like a good approach (as long as we ensure there's no paragraph boundary >>> on that single string, of course). >> >> Ok, here's a typical use: >> >> elif repo.ui.promptchoice( >> _("local changed %s which remote deleted\n" >> "use (c)hanged version or (d)elete?") % f, >> (_("&Changed"), _("&Delete")), 0): >> >> How shall we format this string to include all the components? Perhaps: >> >> _("local changed %s which remote deleted\n" >> "use (c)changed version or (d)elete?" >> "\f&Changed" >> "\f&Delete") >> >> I think something like \f or \b (not \t) will work well as a separator >> but I have no idea if our l10n tools will agree with me. > > Gettext will warn when it sees an escape sequence other than \n and \t. > It writes: > > warning: internationalized messages should not contain the `\f' escape > sequence > > This was apparently introduced around 2005: > > http://lists.gnu.org/archive/html/bug-gnu-utils/2005-05/msg00065.html > > I found another mail that explains the rationale a bit more: POT files > are supposed to contain just \n and xgettext will apparently turn \r\n > into \n in the POT file: > > http://lists.gnu.org/archive/html/bug-gnu-utils/2010-08/msg00021.html > > When I quickly tested this, I found that a string with \f wasn't > translated, despite showing up in the hg.pot file. Indeed; the manual recommends avoiding any "unusual markup or control characters" on to-be-translated messages: https://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html Perhaps the safest would be using HTML-like markup, since it's briefly mentioned in the manual. Or even (ab)using a valid conversion specification ("%*%", for instance), since most translating software already checks for matching formatting strings (but we should test the '&' marks anyway, so that extra check may not matter that much). > Would \t not be an easier choice here or are you afraid that it will be > difficult for translators to produce a TAB character in the translation? > I've only used Emacs where the control characters show up as \t which > makes it easy to edit them. FWIW, as another datapoint: my old kbabel here colorizes both '\t' and '<BR>'. msgfmt (0.18.1) accepts unmatching '\t' and '<BR>', but refuses unmatching "%s", "%*%", etc. Regards, Wagner
Matt Mackall <mpm@selenic.com> writes: > On Tue, 2013-05-21 at 21:50 +0200, Martin Geisler wrote: >> Matt Mackall <mpm@selenic.com> writes: >> >> > Ok, here's a typical use: >> > >> > elif repo.ui.promptchoice( >> > _("local changed %s which remote deleted\n" >> > "use (c)hanged version or (d)elete?") % f, >> > (_("&Changed"), _("&Delete")), 0): >> > >> > How shall we format this string to include all the components? Perhaps: >> > >> > _("local changed %s which remote deleted\n" >> > "use (c)changed version or (d)elete?" >> > "\f&Changed" >> > "\f&Delete") >> > >> > I think something like \f or \b (not \t) will work well as a separator >> > but I have no idea if our l10n tools will agree with me. >> >> Gettext will warn when it sees an escape sequence other than \n and \t. >> It writes: >> >> warning: internationalized messages should not contain the `\f' escape >> sequence > > Bah. Yeah, I don't get it either :) >> http://lists.gnu.org/archive/html/bug-gnu-utils/2010-08/msg00021.html >> >> When I quickly tested this, I found that a string with \f wasn't >> translated, despite showing up in the hg.pot file. > > Huh. > >> Would \t not be an easier choice here or are you afraid that it will >> be difficult for translators to produce a TAB character in the >> translation? > > Mostly, I thought there was a possibility we might actually want to > use \t as a real tab, whereas I'm quite sure we'll never want to use > \f. Frankly, my first thought was \0. I see, makes sense. > I also don't want to preclude using any of the printing ASCII > characters either. > > But now I'm reminded that we have Shift-JIS out there, which means we > can't reliably split translated strings on single ASCII bytes anyway. > Perhaps '$$' as the separator? Sounds good to me!
At Tue, 21 May 2013 15:17:55 -0500, Matt Mackall wrote: > > On Tue, 2013-05-21 at 21:50 +0200, Martin Geisler wrote: > > Matt Mackall <mpm@selenic.com> writes: > > > > > On Mon, 2013-05-20 at 19:51 -0300, Wagner Bruna wrote: > > >> Em 16-05-2013 15:58, Matt Mackall escreveu: > > >> > On Wed, 2013-05-15 at 21:37 -0500, Kevin Bullock wrote: > > >> >> # HG changeset patch > > >> >> # User Kevin Bullock <kbullock@ringworld.org> > > >> >> # Date 1368671779 18000 > > >> >> # Wed May 15 21:36:19 2013 -0500 > > >> >> # Branch stable > > >> >> # Node ID fc1c4221dd82de958b9be5f05c57622679625d21 > > >> >> # Parent 278057693a1ddb93f95fa641e30e7a966ac98434 > > >> >> i18n: fix untranslated prompts with translated responses (issue3936) > > >> > > > >> > Queued for stable, thanks. > > >> > > > >> > I'd like to see a consensus among translators on how to do this. > > >> > > >> IMHO we should always accept the English keys, even with translated prompts > > >> (to help with muscle memory). And, ideally, detect conflicts at build_mo time. > > >> > > >> > I also thing we should unify all the string args used by prompt() into a > > >> > single string so that translators have the full context. > > >> > > >> Seems like a good approach (as long as we ensure there's no paragraph boundary > > >> on that single string, of course). > > > > > > Ok, here's a typical use: > > > > > > elif repo.ui.promptchoice( > > > _("local changed %s which remote deleted\n" > > > "use (c)hanged version or (d)elete?") % f, > > > (_("&Changed"), _("&Delete")), 0): > > > > > > How shall we format this string to include all the components? Perhaps: > > > > > > _("local changed %s which remote deleted\n" > > > "use (c)changed version or (d)elete?" > > > "\f&Changed" > > > "\f&Delete") > > > > > > I think something like \f or \b (not \t) will work well as a separator > > > but I have no idea if our l10n tools will agree with me. > > > > Gettext will warn when it sees an escape sequence other than \n and \t. > > It writes: > > > > warning: internationalized messages should not contain the `\f' escape > > sequence > > Bah. > > > This was apparently introduced around 2005: > > > > http://lists.gnu.org/archive/html/bug-gnu-utils/2005-05/msg00065.html > > > > I found another mail that explains the rationale a bit more: POT files > > are supposed to contain just \n and xgettext will apparently turn \r\n > > into \n in the POT file: > > > > http://lists.gnu.org/archive/html/bug-gnu-utils/2010-08/msg00021.html > > > > When I quickly tested this, I found that a string with \f wasn't > > translated, despite showing up in the hg.pot file. > > Huh. > > > Would \t not be an easier choice here or are you afraid that it will be > > difficult for translators to produce a TAB character in the translation? > > Mostly, I thought there was a possibility we might actually want to use > \t as a real tab, whereas I'm quite sure we'll never want to use \f. > Frankly, my first thought was \0. > > I also don't want to preclude using any of the printing ASCII characters > either. > > But now I'm reminded that we have Shift-JIS out there, which means we > can't reliably split translated strings on single ASCII bytes anyway. In fact, in message file for Japanese, any strings for choices are intentionally not translated, because switching keyboard layout to answer question is not friendly for users, as mentioned by Nikolaj in his reply. So, splitting by single ASCII bytes will work on Shift-JIS, too. OK, I understand that you use "Shift-JIS" as a symbolic encoding, which uses normal ASCII (maybe back-slash, too) bytes as 2nd byte or after in multi-byte characters :-) > Perhaps '$$' as the separator? Would you mean translatable message like below ? _("local changed %s which remote deleted\n" "use (c)changed version or (d)elete?" "$$&Changed $$&Delete") Picking single byte following '&' up like below may cause problem with strings in MBCS, because it breaks byte sequences of MBCS. resps = [s[s.index('&') + 1].lower() for s in choices] So, IMHO, the way to surround "the symbolic letter" should be also needed. What about surrounding by '&' ? _("local changed %s which remote deleted\n" "use (c)changed version or (d)elete?" "$$&C&hanged $$&D&elete") ---------------------------------------------------------------------- [FUJIWARA Katsunori] foozy@lares.dti.ne.jp
On Thu, 2013-05-23 at 01:56 +0900, FUJIWARA Katsunori wrote: > At Tue, 21 May 2013 15:17:55 -0500, > Matt Mackall wrote: > > > > On Tue, 2013-05-21 at 21:50 +0200, Martin Geisler wrote: > > > Matt Mackall <mpm@selenic.com> writes: > > > > > > > On Mon, 2013-05-20 at 19:51 -0300, Wagner Bruna wrote: > > > >> Em 16-05-2013 15:58, Matt Mackall escreveu: > > > >> > On Wed, 2013-05-15 at 21:37 -0500, Kevin Bullock wrote: > > > >> >> # HG changeset patch > > > >> >> # User Kevin Bullock <kbullock@ringworld.org> > > > >> >> # Date 1368671779 18000 > > > >> >> # Wed May 15 21:36:19 2013 -0500 > > > >> >> # Branch stable > > > >> >> # Node ID fc1c4221dd82de958b9be5f05c57622679625d21 > > > >> >> # Parent 278057693a1ddb93f95fa641e30e7a966ac98434 > > > >> >> i18n: fix untranslated prompts with translated responses (issue3936) > > > >> > > > > >> > Queued for stable, thanks. > > > >> > > > > >> > I'd like to see a consensus among translators on how to do this. > > > >> > > > >> IMHO we should always accept the English keys, even with translated prompts > > > >> (to help with muscle memory). And, ideally, detect conflicts at build_mo time. > > > >> > > > >> > I also thing we should unify all the string args used by prompt() into a > > > >> > single string so that translators have the full context. > > > >> > > > >> Seems like a good approach (as long as we ensure there's no paragraph boundary > > > >> on that single string, of course). > > > > > > > > Ok, here's a typical use: > > > > > > > > elif repo.ui.promptchoice( > > > > _("local changed %s which remote deleted\n" > > > > "use (c)hanged version or (d)elete?") % f, > > > > (_("&Changed"), _("&Delete")), 0): > > > > > > > > How shall we format this string to include all the components? Perhaps: > > > > > > > > _("local changed %s which remote deleted\n" > > > > "use (c)changed version or (d)elete?" > > > > "\f&Changed" > > > > "\f&Delete") > > > > > > > > I think something like \f or \b (not \t) will work well as a separator > > > > but I have no idea if our l10n tools will agree with me. > > > > > > Gettext will warn when it sees an escape sequence other than \n and \t. > > > It writes: > > > > > > warning: internationalized messages should not contain the `\f' escape > > > sequence > > > > Bah. > > > > > This was apparently introduced around 2005: > > > > > > http://lists.gnu.org/archive/html/bug-gnu-utils/2005-05/msg00065.html > > > > > > I found another mail that explains the rationale a bit more: POT files > > > are supposed to contain just \n and xgettext will apparently turn \r\n > > > into \n in the POT file: > > > > > > http://lists.gnu.org/archive/html/bug-gnu-utils/2010-08/msg00021.html > > > > > > When I quickly tested this, I found that a string with \f wasn't > > > translated, despite showing up in the hg.pot file. > > > > Huh. > > > > > Would \t not be an easier choice here or are you afraid that it will be > > > difficult for translators to produce a TAB character in the translation? > > > > Mostly, I thought there was a possibility we might actually want to use > > \t as a real tab, whereas I'm quite sure we'll never want to use \f. > > Frankly, my first thought was \0. > > > > I also don't want to preclude using any of the printing ASCII characters > > either. > > > > But now I'm reminded that we have Shift-JIS out there, which means we > > can't reliably split translated strings on single ASCII bytes anyway. > > In fact, in message file for Japanese, any strings for choices are > intentionally not translated, because switching keyboard layout to > answer question is not friendly for users, as mentioned by Nikolaj in > his reply. > > So, splitting by single ASCII bytes will work on Shift-JIS, too. > > OK, I understand that you use "Shift-JIS" as a symbolic encoding, > which uses normal ASCII (maybe back-slash, too) bytes as 2nd byte or > after in multi-byte characters :-) > > > > Perhaps '$$' as the separator? > > Would you mean translatable message like below ? > > _("local changed %s which remote deleted\n" > "use (c)changed version or (d)elete?" > "$$&Changed $$&Delete") Yep. I've pushed a patch based on this. > Picking single byte following '&' up like below may cause problem with > strings in MBCS, because it breaks byte sequences of MBCS. > > resps = [s[s.index('&') + 1].lower() for s in choices] > > > So, IMHO, the way to surround "the symbolic letter" should be also > needed. What about surrounding by '&' ? > > _("local changed %s which remote deleted\n" > "use (c)changed version or (d)elete?" > "$$&C&hanged $$&D&elete") I'm tempted to worry about this part of the problem when we encounter it.
Patch
diff -r 0ec31231afad mercurial/merge.py --- a/mercurial/merge.py Fri May 17 17:22:08 2013 -0500 +++ b/mercurial/merge.py Tue May 21 10:37:41 2013 -0500 @@ -365,8 +365,8 @@ actions.append((f, "r", None, "remote delete")) elif repo.ui.promptchoice( _("local changed %s which remote deleted\n" - "use (c)hanged version or (d)elete?") % f, - (_("&Changed"), _("&Delete")), 0): + "use (c)hanged version or (d)elete?" + "\f&Changed\f&Deleted") % f, 0): actions.append((f, "r", None, "prompt delete")) else: actions.append((f, "a", None, "prompt keep")) @@ -375,8 +375,8 @@ actions.append((f, "g", (m2.flags(f),), "remote recreating")) elif repo.ui.promptchoice( _("remote changed %s which local deleted\n" - "use (c)hanged version or leave (d)eleted?") % f, - (_("&Changed"), _("&Deleted")), 0) == 0: + "use (c)hanged version or leave (d)eleted?", + "\f&Changed\f&Deleted") % f, 0) == 0: actions.append((f, "g", (m2.flags(f),), "prompt recreating")) else: assert False, m return actions diff -r 0ec31231afad mercurial/ui.py --- a/mercurial/ui.py Fri May 17 17:22:08 2013 -0500 +++ b/mercurial/ui.py Tue May 21 10:37:41 2013 -0500 @@ -639,13 +639,16 @@ except EOFError: raise util.Abort(_('response expected')) - def promptchoice(self, msg, choices, default=0): + def promptchoice(self, msg, choices=None, default=0): """Prompt user with msg, read response, and ensure it matches one of the provided choices. The index of the choice is returned. choices is a sequence of acceptable responses with the format: ('&None', 'E&xec', 'Sym&link') Responses are case insensitive. If ui is not interactive, the default is returned. """ + parts = msg.split('\f') + if len(parts): + msg, choices, default = parts[0], parts[1:], choices resps = [s[s.index('&') + 1].lower() for s in choices] while True: r = self.prompt(msg, resps[default])