From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: per-buffer language environments Date: Wed, 15 Dec 2010 01:47:46 -0500 Message-ID: References: <20101211.162503.37993912.wl@gnu.org> <83sjy4t9s0.fsf@gnu.org> <20101212.072550.527160732.wl@gnu.org> <838vztj3n0.fsf@gnu.org> <87y67sr3cs.fsf@uwakimon.sk.tsukuba.ac.jp> <87hbefr63n.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1292395690 22972 80.91.229.12 (15 Dec 2010 06:48:10 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 15 Dec 2010 06:48:10 +0000 (UTC) Cc: handa@m17n.org, emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 15 07:48:04 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PSl9d-0002ee-4x for ged-emacs-devel@m.gmane.org; Wed, 15 Dec 2010 07:48:01 +0100 Original-Received: from localhost ([127.0.0.1]:44792 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PSl9b-00017w-P6 for ged-emacs-devel@m.gmane.org; Wed, 15 Dec 2010 01:47:59 -0500 Original-Received: from [140.186.70.92] (port=49874 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PSl9V-00017q-9r for emacs-devel@gnu.org; Wed, 15 Dec 2010 01:47:54 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PSl9T-0002IO-QP for emacs-devel@gnu.org; Wed, 15 Dec 2010 01:47:53 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]:52860) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PSl9T-0002IK-NY for emacs-devel@gnu.org; Wed, 15 Dec 2010 01:47:51 -0500 Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.69) (envelope-from ) id 1PSl9O-0001wu-GB; Wed, 15 Dec 2010 01:47:46 -0500 In-reply-to: <87hbefr63n.fsf@uwakimon.sk.tsukuba.ac.jp> (stephen@xemacs.org) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:133709 Archived-At: > From: "Stephen J. Turnbull" > Cc: emacs-devel@gnu.org, > handa@m17n.org > Date: Wed, 15 Dec 2010 13:51:40 +0900 > > > Those are all valid concerns, but they are just the tip of an > > iceberg. > > No, they *are* the iceberg, at least as far as the autopilot is > concerned. After that, you *must* ask the user. As long as we agree that there _is_ an iceberg, I won't argue. > > There's an almost infinite number of combinations of a language and > > the preferred encoding > > Sure, but given a language and the set of encoding features Emacs > knows how to detect *when reading from a stream*, there remains > substantial ambiguity. The emphasis on *reading* takes what I originally wrote out of its context. I didn't comment on reading alone, I commented on the entire issue of coding-systems being tied up to the language: > > * Which coding system to use on writing when the current > > buffer contains a character that can't be encoded by > > buffer-file-coding-system. > > > > * Which coding systems have higher priority when inserting a > > file in the current buffer. > > I could understand how the font selection and the default input method > are related to the language, but what do encodings have to do with > that? The preferred encoding is generally an attribute of a locale, > not of a language. If the ambiguity you are talking about is that there are more settings than just for reading, then I was originally talking about those, too. If the ambiguity is about something else, please tell what that is. > All problems with the language environment that I know > of stem from its global nature applying to all buffers and the > application itself, not from appropriate use in a given buffer. I agree that it would be useful to have a language as per-buffer setting. This discussion is about what should that include. > IOW, it's just the defects of the POSIX_ME_HARDER locale mirrored > into Emacs itself. I also stated quite clearly (I think) that I think we should distinguish between the locale and the language, as far as their effects on Emacs are concerned. > > , and it's impossible to fold them all, or even their significant > > fraction, > > Of course a significant fraction is possible. That's precisely what > the priority lists have been achieving since the early 1990s. Evidently, your examples try to show that the fraction is not significant enough. > If your complaint is that we should do better, "patches welcome" is > the only thing I can think of to say. No, I'm saying we shouldn't try to do better _automatically_. Users have enough facilities to affect the defaults according to their specific use-cases. > > in a reasonably usable user-level interface. We shouldn't even > > try, IMO; we already have prefer-coding-system > > Huh? prefer-coding-system has two effects: it promotes a certain > coding-system to highest priority in its category, and it promotes > that category to highest priority in case of ambiguity. IOW, it's a > user override of the priority setting that comes from the language > environment. Exactly my point: the user can override the automated selections if she needs. So the current automation doesn't need to do better. > A completely different purpose (handling exceptions) > from the language environment itself (handling the unmarked case). Except that set-language-environment calls prefer-coding-system under the hood to do most of its job... > Are you sure you have any idea what you're talking about? I think I do. I'm not sure we are talking about the same thing, though. > That's an honest question; the way you are going, I have to wonder. Knowing me for as long as you do, I wonder how can such a question be honest. But I digress. > If you say "yes", I'll trust you, but I'd appreciate an explanation > of what you're talking about that refers to real bugs in the current > system, rather than general features that offend your sense of > design. I wasn't talking about any bugs at all. Werner suggested to add a new _feature_; I was talking about what that feature should and shouldn't include. > [coding priority settings] are to remove ambiguities like "we have > EUC, but which one?" and "we have Windows-125x, but which one?" and > "since ISO-8859-1 allows all 256 bytes, if we want to give priority > to Chinese or Japanese, that had better come late in the list!" I don't think I said anything to the contrary. I would add, though, that the priority settings also deal with "we have some encoding that uses 8-bit bytes, but which encoding is that?" > > > AFAIK Emacsen use the locale as a heuristic for determining the > > > language environment > > > > There's no heuristic involved, AFAIR. Emacs has a database of > > languages _and_encodings_ suitable for the known locale names. > > You're confusing "algorithmic" with "non-heuristic". Please take a look at the database. I stand by what I wrote: there's no heuristic anywhere in sight. > And of course in this case, locale is a heuristic. *Emacs is a > multilingual* (well, technically, multiscript) *application*, and any > setting of the language environment that doesn't take into account the > current text we're working with is surely heuristic. If so, it's a heuristic that is external to Emacs. Emacs just abides by it, because users expect that. Anyway, this aspect is entirely unrelated to the issue at hand. > > set-locale-environment uses that database to get the language and the > > preferred encoding(s), then calls set-language-environment with the > > language, and sets the priorities of the encodings according to the > > encoding preferences. > > That's an unnecessary API, ISTM. (set-language-environment nil) > should do that. So we basically agree: the (not entirely complete) equivalence between these 2 APIs is not TRT and it should go away. We may disagree which API should be dropped and which one retained, but that's just a naming issue (and maybe a consequence of the fact that you didn't know about set-locale-environment before). But this is not the main issue I wanted to discuss. The main issue is what constitutes a "language environment" as far as Emacs is concerned, after we factor out the effects of the locale? If we are going to implement per-buffer language environments, we need to decide that first and foremost. Perhaps a useful starting point would be to ask: what exactly is a "language name" string? should it specify only a language, or should it also try to specify the preferred encodings? > the POSIX_ME_HARDER locale is an abomination in a multilingual > application and should be buried as deeply as we can manage. It is, > of course, a useful heuristic for the user's preferred language > environment for *scratch*, but that's about as far as we can take that. I'm not sure it's as black and white as you make it sound. For example, users of the same language on GNU/Linux and on MS-Windows might very well disagree wrt to the preferred encodings. So some aspects of the locale still affect language-specific choices. But again, I think talking about the locale just muddies the waters in this discussion.