unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Drew Adams <drew.adams@oracle.com>
To: Jean-Christophe Helary <jean.christophe.helary@gmail.com>,
	Emacs development discussions <emacs-devel@gnu.org>
Subject: RE: format use inquiry
Date: Sat, 1 Jul 2017 08:43:51 -0700 (PDT)	[thread overview]
Message-ID: <cf243644-4d4a-4996-93bb-2fdb4ce87a1a@default> (raw)
In-Reply-To: <2CC7167F-53F8-4676-BA51-4C68F25108BC@gmail.com>

> > As for the question of messages that use singular
> > vs plural forms, I'd again point to Common Lisp's
> > `format', which addresses that kind of thing (at
> > least for English).
          ^^^^^^^^^^^
> That -P flag for CL format contributes to producing
> just as ugly code as what we have here.

Code using `format' is ugly - sure.  But it sure is
convenient too.  It's a bit like using Lisp `loop' or
using `find' in UNIX, GNU, etc.  Or using regexps.

For better _and_ worse, `format' is practically a
language unto itself - especially the more complex
`format' of Common Lisp.  Each of `loop', `find', and
regexps is a language, and in each case the result of
using it can be compact but esoteric code.

If you prefer to use conditional code with `concat'
etc. instead of `format', feel free.  I don't see why
existing code that uses `format' should be replaced
with code that does similar conditional processing
that is more explicit and verbose.

> Code should never be used to create natural language
> strings with syntactic expectation.

In that case, a powerful alternative should be developed
and proposed.  It's clear that `format' & compagnie are
_not_ the right tools for natural-language processing,
including for translation help.  They were not designed
for that.  There is nothing in Emacs Lisp that provides
such a tool, AFAIK.

If we start tearing apart existing Lisp code because it
handles messages, menus, titles, doc, etc. in a way that
does not facilitate natural-language treatment (including
localization), that could just create a mess.  Let simple
sleeping dogs lie.

Instead, someone interested in that aspect of things
would do well to work on creating powerful Emacs-Lisp
constructs that really _do_ facilitate natural-language
treatment.  IOW, try to come up with something that is
a reasonable alternative to the rudimentary constructs
that we use today.

That's probably a hard thing to do, but if that's where
someone's interest is, it could be a worthwhile endeavor.

And with decades of natural language processing and
localization research behind us now, perhaps there is
some existing code out there (perhaps not Emacs Lisp)
that could serve as inspiration, if not as a direct
model.

I don't think anyone would argue that `format' has
what it takes to help with handling multiple languages
cleanly.  I certainly wouldn't.  But it doesn't follow
that we should now try to recode existing uses of it
to recompose text into full words, in the interest of
some potential future localization.

Better would be to start localizing, and make whatever
changes are truly helpful immediately, here or there,
as needed.  IOW, demand-driven vs eager, proactive
change in ways that someone thinks might help.

> Here is the example given in "Common Lisp the Language,
> 2nd Edition"
> (format nil "~D tr~:@P/~D win~:P" 7 1) => "7 tries/1 win"
> (format nil "~D tr~:@P/~D win~:P" 1 0) => "1 try/0 wins"
> (format nil "~D tr~:@P/~D win~:P" 1 3) => "1 try/3 wins"
> 
> How do you think that kind of strings can possibly be localized ?

I said: "at least for English".  I was not proposing
`format', Common Lisp or otherwise, as a tool or
solution for localization.

But if some such code really did need to be localized
then the answer would presumably be to evaluate it - as
you have done, to see the effect/result, and then code
appropriately to produce an appropriate effect/result
for other languages.

Any way you look at it, in such a situation, regardless
of how the first-language treatment is coded, you would
presumably need to translate each of the possible outputs.

As long as there is no such built-in `format' handling
for other languages, their handling would need to be
done using some condition tests (if-then-else etc.).

That part would presumably follow what you would prefer.
But why should it be used also for the English part, if
a simpler expression is available for English?

And clearly such brute manipulation of strings for
natural language is a sledgehammer, whether it uses
`format' or explicit, verbose conditional code.  As you
point out, this is not the way to do natural-language
processing.

IOW, it is not `format' or its use that is the problem.
We don't have natural-language processing constructs
in Emacs Lisp.

In the absence of general natural-language help for
Emacs Lisp, a better short-term treatment might be for
someone to extend `format' (or similar) so that it can
handle this or that other language - IOW, to give
French, etc. a similar advantage wrt composing messages
that handle both singular and plural etc.

No, that would not be the right tool for localization
or general natural-language processing either.  But it
would at least give French etc. the same advantage of
compactness for formatting simple messages.

[One reason that something like `format', instead of
some more systematic, language-oriented construct,
has been used for English messages could be that
English is so irregular.  (1) It can be used, simply.
(2) It's not easy to come up with something more
systematic, for English.  It does the simple job,
and it's hard to do a more systematic job.

I expect that it would be simpler to handle
plurality, etc. in a more regular language, such as
French.  In English, there are very few useful rules
to take advantage of, so the treatment by CL `format'
is both (a) rudimentary and (b) about as good as it
can get while remaining simple.]

Anyway, again, I was not proposing Common-Lisp
`format' for any purpose of localization.  I'm a fan
of it for English, for simple things, and I think
Emacs might benefit from it or something similar.

My mention of it was essentially off-topic for your
thread about localization.  (But it could still be
somewhat on-topic for a thread about `format'.)



  reply	other threads:[~2017-07-01 15:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-20  8:48 format use inquiry Jean-Christophe Helary
2017-06-20 11:20 ` Thien-Thi Nguyen
2017-06-20 15:02 ` Yuri Khan
2017-06-20 19:51   ` Jean-Christophe Helary
2017-06-20 22:53 ` Glenn Morris
2017-06-21  0:20   ` Jean-Christophe Helary
2017-06-21  1:23     ` Paul Eggert
2017-06-21  3:38       ` Jean-Christophe Helary
2017-06-22  1:57   ` Richard Stallman
2017-07-01  1:56     ` Jean-Christophe Helary
2017-07-01  2:32       ` Drew Adams
2017-07-01  4:38         ` Jean-Christophe Helary
2017-07-01 15:43           ` Drew Adams [this message]
2017-07-02  1:35             ` Jean-Christophe Helary

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf243644-4d4a-4996-93bb-2fdb4ce87a1a@default \
    --to=drew.adams@oracle.com \
    --cc=emacs-devel@gnu.org \
    --cc=jean.christophe.helary@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).