Re: per-buffer language environments

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: handa@m17n.org, emacs-devel@gnu.org
Subject: Re: per-buffer language environments
Date: Wed, 15 Dec 2010 01:47:46 -0500	[thread overview]
Message-ID: <E1PSl9O-0001wu-GB@fencepost.gnu.org> (raw)
In-Reply-To: <87hbefr63n.fsf@uwakimon.sk.tsukuba.ac.jp> (stephen@xemacs.org)

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: emacs-devel@gnu.org,
>     handa@m17n.org
> Date: Wed, 15 Dec 2010 13:51:40 +0900
> 
>  > Those are all valid concerns, but they are just the tip of an
>  > iceberg.
> 
> No, they *are* the iceberg, at least as far as the autopilot is
> concerned.  After that, you *must* ask the user.

As long as we agree that there _is_ an iceberg, I won't argue.

>  > There's an almost infinite number of combinations of a language and
>  > the preferred encoding
> 
> Sure, but given a language and the set of encoding features Emacs
> knows how to detect *when reading from a stream*, there remains
> substantial ambiguity.

The emphasis on *reading* takes what I originally wrote out of its
context.  I didn't comment on reading alone, I commented on the entire
issue of coding-systems being tied up to the language:

> > * Which coding system to use on writing when the current
> >   buffer contains a character that can't be encoded by
> >   buffer-file-coding-system.
> > 
> > * Which coding systems have higher priority when inserting a
> >   file in the current buffer.
> 
> I could understand how the font selection and the default input method
> are related to the language, but what do encodings have to do with
> that?  The preferred encoding is generally an attribute of a locale,
> not of a language.

If the ambiguity you are talking about is that there are more settings
than just for reading, then I was originally talking about those, too.
If the ambiguity is about something else, please tell what that is.

> All problems with the language environment that I know
> of stem from its global nature applying to all buffers and the
> application itself, not from appropriate use in a given buffer.

I agree that it would be useful to have a language as per-buffer
setting.  This discussion is about what should that include.

> IOW, it's just the defects of the POSIX_ME_HARDER locale mirrored
> into Emacs itself.

I also stated quite clearly (I think) that I think we should
distinguish between the locale and the language, as far as their
effects on Emacs are concerned.

>  > , and it's impossible to fold them all, or even their significant
>  > fraction,
> 
> Of course a significant fraction is possible.  That's precisely what
> the priority lists have been achieving since the early 1990s.

Evidently, your examples try to show that the fraction is not
significant enough.

> If your complaint is that we should do better, "patches welcome" is
> the only thing I can think of to say.

No, I'm saying we shouldn't try to do better _automatically_.  Users
have enough facilities to affect the defaults according to their
specific use-cases.

>  > in a reasonably usable user-level interface.  We shouldn't even
>  > try, IMO; we already have prefer-coding-system
> 
> Huh?  prefer-coding-system has two effects: it promotes a certain
> coding-system to highest priority in its category, and it promotes
> that category to highest priority in case of ambiguity.  IOW, it's a
> user override of the priority setting that comes from the language
> environment.

Exactly my point: the user can override the automated selections if
she needs.  So the current automation doesn't need to do better.

> A completely different purpose (handling exceptions)
> from the language environment itself (handling the unmarked case).

Except that set-language-environment calls prefer-coding-system under
the hood to do most of its job...

> Are you sure you have any idea what you're talking about?

I think I do.  I'm not sure we are talking about the same thing,
though.

> That's an honest question; the way you are going, I have to wonder.

Knowing me for as long as you do, I wonder how can such a question be
honest.  But I digress.

> If you say "yes", I'll trust you, but I'd appreciate an explanation
> of what you're talking about that refers to real bugs in the current
> system, rather than general features that offend your sense of
> design.

I wasn't talking about any bugs at all.  Werner suggested to add a new
_feature_; I was talking about what that feature should and shouldn't
include.

> [coding priority settings] are to remove ambiguities like "we have
> EUC, but which one?" and "we have Windows-125x, but which one?" and
> "since ISO-8859-1 allows all 256 bytes, if we want to give priority
> to Chinese or Japanese, that had better come late in the list!"

I don't think I said anything to the contrary.  I would add, though,
that the priority settings also deal with "we have some encoding that
uses 8-bit bytes, but which encoding is that?"

>  > > AFAIK Emacsen use the locale as a heuristic for determining the
>  > > language environment
>  > 
>  > There's no heuristic involved, AFAIR.  Emacs has a database of
>  > languages _and_encodings_ suitable for the known locale names.
> 
> You're confusing "algorithmic" with "non-heuristic".

Please take a look at the database.  I stand by what I wrote: there's
no heuristic anywhere in sight.

> And of course in this case, locale is a heuristic.  *Emacs is a
> multilingual* (well, technically, multiscript) *application*, and any
> setting of the language environment that doesn't take into account the
> current text we're working with is surely heuristic.

If so, it's a heuristic that is external to Emacs.  Emacs just abides
by it, because users expect that.  Anyway, this aspect is entirely
unrelated to the issue at hand.

>  > set-locale-environment uses that database to get the language and the
>  > preferred encoding(s), then calls set-language-environment with the
>  > language, and sets the priorities of the encodings according to the
>  > encoding preferences.
> 
> That's an unnecessary API, ISTM.  (set-language-environment nil)
> should do that.

So we basically agree: the (not entirely complete) equivalence between
these 2 APIs is not TRT and it should go away.  We may disagree which
API should be dropped and which one retained, but that's just a naming
issue (and maybe a consequence of the fact that you didn't know about
set-locale-environment before).

But this is not the main issue I wanted to discuss.  The main issue is
what constitutes a "language environment" as far as Emacs is
concerned, after we factor out the effects of the locale?  If we are
going to implement per-buffer language environments, we need to decide
that first and foremost.

Perhaps a useful starting point would be to ask: what exactly is a
"language name" string? should it specify only a language, or should
it also try to specify the preferred encodings?

> the POSIX_ME_HARDER locale is an abomination in a multilingual
> application and should be buried as deeply as we can manage.  It is,
> of course, a useful heuristic for the user's preferred language
> environment for *scratch*, but that's about as far as we can take that.

I'm not sure it's as black and white as you make it sound.  For
example, users of the same language on GNU/Linux and on MS-Windows
might very well disagree wrt to the preferred encodings.  So some
aspects of the locale still affect language-specific choices.  But
again, I think talking about the locale just muddies the waters in
this discussion.

next prev parent reply	other threads:[~2010-12-15  6:47 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-11 15:25 per-buffer language environments Werner LEMBERG
2010-12-11 19:00 ` Eli Zaretskii
2010-12-12  6:25   ` Werner LEMBERG
2010-12-13  7:56     ` Kenichi Handa
2010-12-13  9:27       ` Werner LEMBERG
2010-12-13 10:59         ` Kenichi Handa
2010-12-13 12:15           ` Werner LEMBERG
2010-12-13 11:47       ` Eli Zaretskii
2010-12-14 11:38         ` Stephen J. Turnbull
2010-12-14 15:14           ` Eli Zaretskii
2010-12-15  4:51             ` Stephen J. Turnbull
2010-12-15  6:47               ` Eli Zaretskii [this message]
2010-12-15  7:45                 ` Werner LEMBERG
2010-12-16 21:10                 ` Stephen J. Turnbull
2010-12-17 11:51                   ` Eli Zaretskii
2010-12-18  6:29                     ` Werner LEMBERG
2010-12-18  9:30                     ` Stephen J. Turnbull
2010-12-21 18:39                       ` Eli Zaretskii
2010-12-21 21:16                       ` Werner LEMBERG
2010-12-22  6:52                         ` Stephen J. Turnbull
2010-12-22  7:42                           ` Werner LEMBERG
2010-12-17  0:51                 ` Kenichi Handa
2010-12-17  2:48                   ` Stephen J. Turnbull
2010-12-17 11:05                   ` Eli Zaretskii
2010-12-18 17:03       ` Per Starbäck
2010-12-19 13:54         ` Stefan Monnier
2010-12-19 21:05           ` Dimitri Fontaine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1PSl9O-0001wu-GB@fencepost.gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=stephen@xemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.