From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: per-buffer language environments
Date: Wed, 15 Dec 2010 01:47:46 -0500
Message-ID: <E1PSl9O-0001wu-GB@fencepost.gnu.org>
References: <20101211.162503.37993912.wl@gnu.org> <83sjy4t9s0.fsf@gnu.org>
	<20101212.072550.527160732.wl@gnu.org> <tl7oc8q14yv.fsf@m17n.org>
	<838vztj3n0.fsf@gnu.org> <87y67sr3cs.fsf@uwakimon.sk.tsukuba.ac.jp>
	<E1PSWaX-0004Uq-Dj@fencepost.gnu.org>
	<87hbefr63n.fsf@uwakimon.sk.tsukuba.ac.jp>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: lo.gmane.org
X-Trace: dough.gmane.org 1292395690 22972 80.91.229.12 (15 Dec 2010 06:48:10 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 15 Dec 2010 06:48:10 +0000 (UTC)
Cc: handa@m17n.org, emacs-devel@gnu.org
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 15 07:48:04 2010
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1PSl9d-0002ee-4x
	for ged-emacs-devel@m.gmane.org; Wed, 15 Dec 2010 07:48:01 +0100
Original-Received: from localhost ([127.0.0.1]:44792 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1PSl9b-00017w-P6
	for ged-emacs-devel@m.gmane.org; Wed, 15 Dec 2010 01:47:59 -0500
Original-Received: from [140.186.70.92] (port=49874 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PSl9V-00017q-9r
	for emacs-devel@gnu.org; Wed, 15 Dec 2010 01:47:54 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1PSl9T-0002IO-QP
	for emacs-devel@gnu.org; Wed, 15 Dec 2010 01:47:53 -0500
Original-Received: from fencepost.gnu.org ([140.186.70.10]:52860)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1PSl9T-0002IK-NY
	for emacs-devel@gnu.org; Wed, 15 Dec 2010 01:47:51 -0500
Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.69)
	(envelope-from <eliz@gnu.org>)
	id 1PSl9O-0001wu-GB; Wed, 15 Dec 2010 01:47:46 -0500
In-reply-to: <87hbefr63n.fsf@uwakimon.sk.tsukuba.ac.jp> (stephen@xemacs.org)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:133709
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/133709>

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: emacs-devel@gnu.org,
>     handa@m17n.org
> Date: Wed, 15 Dec 2010 13:51:40 +0900
> 
>  > Those are all valid concerns, but they are just the tip of an
>  > iceberg.
> 
> No, they *are* the iceberg, at least as far as the autopilot is
> concerned.  After that, you *must* ask the user.

As long as we agree that there _is_ an iceberg, I won't argue.

>  > There's an almost infinite number of combinations of a language and
>  > the preferred encoding
> 
> Sure, but given a language and the set of encoding features Emacs
> knows how to detect *when reading from a stream*, there remains
> substantial ambiguity.

The emphasis on *reading* takes what I originally wrote out of its
context.  I didn't comment on reading alone, I commented on the entire
issue of coding-systems being tied up to the language:

> > * Which coding system to use on writing when the current
> >   buffer contains a character that can't be encoded by
> >   buffer-file-coding-system.
> > 
> > * Which coding systems have higher priority when inserting a
> >   file in the current buffer.
> 
> I could understand how the font selection and the default input method
> are related to the language, but what do encodings have to do with
> that?  The preferred encoding is generally an attribute of a locale,
> not of a language.

If the ambiguity you are talking about is that there are more settings
than just for reading, then I was originally talking about those, too.
If the ambiguity is about something else, please tell what that is.

> All problems with the language environment that I know
> of stem from its global nature applying to all buffers and the
> application itself, not from appropriate use in a given buffer.

I agree that it would be useful to have a language as per-buffer
setting.  This discussion is about what should that include.

> IOW, it's just the defects of the POSIX_ME_HARDER locale mirrored
> into Emacs itself.

I also stated quite clearly (I think) that I think we should
distinguish between the locale and the language, as far as their
effects on Emacs are concerned.

>  > , and it's impossible to fold them all, or even their significant
>  > fraction,
> 
> Of course a significant fraction is possible.  That's precisely what
> the priority lists have been achieving since the early 1990s.

Evidently, your examples try to show that the fraction is not
significant enough.

> If your complaint is that we should do better, "patches welcome" is
> the only thing I can think of to say.

No, I'm saying we shouldn't try to do better _automatically_.  Users
have enough facilities to affect the defaults according to their
specific use-cases.

>  > in a reasonably usable user-level interface.  We shouldn't even
>  > try, IMO; we already have prefer-coding-system
> 
> Huh?  prefer-coding-system has two effects: it promotes a certain
> coding-system to highest priority in its category, and it promotes
> that category to highest priority in case of ambiguity.  IOW, it's a
> user override of the priority setting that comes from the language
> environment.

Exactly my point: the user can override the automated selections if
she needs.  So the current automation doesn't need to do better.

> A completely different purpose (handling exceptions)
> from the language environment itself (handling the unmarked case).

Except that set-language-environment calls prefer-coding-system under
the hood to do most of its job...

> Are you sure you have any idea what you're talking about?

I think I do.  I'm not sure we are talking about the same thing,
though.

> That's an honest question; the way you are going, I have to wonder.

Knowing me for as long as you do, I wonder how can such a question be
honest.  But I digress.

> If you say "yes", I'll trust you, but I'd appreciate an explanation
> of what you're talking about that refers to real bugs in the current
> system, rather than general features that offend your sense of
> design.

I wasn't talking about any bugs at all.  Werner suggested to add a new
_feature_; I was talking about what that feature should and shouldn't
include.

> [coding priority settings] are to remove ambiguities like "we have
> EUC, but which one?" and "we have Windows-125x, but which one?" and
> "since ISO-8859-1 allows all 256 bytes, if we want to give priority
> to Chinese or Japanese, that had better come late in the list!"

I don't think I said anything to the contrary.  I would add, though,
that the priority settings also deal with "we have some encoding that
uses 8-bit bytes, but which encoding is that?"

>  > > AFAIK Emacsen use the locale as a heuristic for determining the
>  > > language environment
>  > 
>  > There's no heuristic involved, AFAIR.  Emacs has a database of
>  > languages _and_encodings_ suitable for the known locale names.
> 
> You're confusing "algorithmic" with "non-heuristic".

Please take a look at the database.  I stand by what I wrote: there's
no heuristic anywhere in sight.

> And of course in this case, locale is a heuristic.  *Emacs is a
> multilingual* (well, technically, multiscript) *application*, and any
> setting of the language environment that doesn't take into account the
> current text we're working with is surely heuristic.

If so, it's a heuristic that is external to Emacs.  Emacs just abides
by it, because users expect that.  Anyway, this aspect is entirely
unrelated to the issue at hand.

>  > set-locale-environment uses that database to get the language and the
>  > preferred encoding(s), then calls set-language-environment with the
>  > language, and sets the priorities of the encodings according to the
>  > encoding preferences.
> 
> That's an unnecessary API, ISTM.  (set-language-environment nil)
> should do that.

So we basically agree: the (not entirely complete) equivalence between
these 2 APIs is not TRT and it should go away.  We may disagree which
API should be dropped and which one retained, but that's just a naming
issue (and maybe a consequence of the fact that you didn't know about
set-locale-environment before).

But this is not the main issue I wanted to discuss.  The main issue is
what constitutes a "language environment" as far as Emacs is
concerned, after we factor out the effects of the locale?  If we are
going to implement per-buffer language environments, we need to decide
that first and foremost.

Perhaps a useful starting point would be to ask: what exactly is a
"language name" string? should it specify only a language, or should
it also try to specify the preferred encodings?

> the POSIX_ME_HARDER locale is an abomination in a multilingual
> application and should be buried as deeply as we can manage.  It is,
> of course, a useful heuristic for the user's preferred language
> environment for *scratch*, but that's about as far as we can take that.

I'm not sure it's as black and white as you make it sound.  For
example, users of the same language on GNU/Linux and on MS-Windows
might very well disagree wrt to the preferred encodings.  So some
aspects of the locale still affect language-specific choices.  But
again, I think talking about the locale just muddies the waters in
this discussion.