From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: What is a preferred charset? Date: Thu, 22 Nov 2018 17:30:29 +0200 Message-ID: <83in0pgmei.fsf@gnu.org> References: <87zhu24h0b.fsf@gmx.net> <87r2fe4bru.fsf@gmx.net> <83o9aigj29.fsf@gnu.org> <87in0q3va0.fsf@gmx.net> <83lg5lhjby.fsf@gnu.org> <87muq1bhvv.fsf@gmx.net> NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1542900535 26262 195.159.176.226 (22 Nov 2018 15:28:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 22 Nov 2018 15:28:55 +0000 (UTC) Cc: schwab@suse.de, emacs-devel@gnu.org To: Stephen Berman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 22 16:28:50 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gPquQ-0006hj-1q for ged-emacs-devel@m.gmane.org; Thu, 22 Nov 2018 16:28:50 +0100 Original-Received: from localhost ([::1]:47368 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gPqwW-0003kS-Cv for ged-emacs-devel@m.gmane.org; Thu, 22 Nov 2018 10:31:00 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38438) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gPqvt-0003kI-F1 for emacs-devel@gnu.org; Thu, 22 Nov 2018 10:30:22 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gPqvp-0000IN-Cv for emacs-devel@gnu.org; Thu, 22 Nov 2018 10:30:21 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:48937) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gPqvp-0000IA-8n; Thu, 22 Nov 2018 10:30:17 -0500 Original-Received: from [176.228.60.248] (port=4871 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gPqvo-0004uX-Qx; Thu, 22 Nov 2018 10:30:17 -0500 In-reply-to: <87muq1bhvv.fsf@gmx.net> (message from Stephen Berman on Thu, 22 Nov 2018 10:07:00 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:231304 Archived-At: > From: Stephen Berman > Cc: schwab@suse.de, emacs-devel@gnu.org > Date: Thu, 22 Nov 2018 10:07:00 +0100 > > > It is not a question of success or failure: every charset which > > supports the character "succeeds". We choose one of them in order to > > produce the effect (such as select a font for displaying it) that > > suits best what this particular user in this particular case expects. > > When text comes from an encoding that specifies its charset (such as > > Latin-N), we can determine that charset from the encoding; if not, we > > use the charset-priority order that is determined by the locale, as > > fallback. > > So "preferred charset" means "charset the encoding specifies, if any, > otherwise the locale-specific highest priority charset"? Yes, but that's not a useful definition, see below. > If so, it's still not clear to me why HELLO specifies charsets that > (at least in some cases, like INVERTED EXCLAMATION MARK) differ from > the highest priority Because it wants to demonstrate that Emacs is capable of using mixed character sets in the same buffer, and still have each one displayed as it would in its native locale. > is it because the specified charsets are known to correctly > display the characters regardless of locale (if that's even possible), > while it's not known whether the highest priority charset can correctly > display them? No, the highest priority charset will also succeed in displaying them. But HELLO wants each greeting to be a good representative of its native locale, regardless of the locale in which the Emacs session showing HELLO runs. I find the following description useful when thinking about this: Emacs wants to know the charset of each character to be able to display it correctly using the proper fonts (and also for a few other features). If the text announces its charset via the 'charset' text property, Emacs uses that; otherwise it guesses using the locale's defaults as guidelines. It is similar to what Emacs does when it needs to guess the encoding of a file. > In any case, it's ok with me to drop this now, since it's > become clear to me that "preferred charset" is not a technical term but > a term of convenience used only by describe-char, and it hasn't bothered > anyone till now (and I hadn't thought about it till now either). Thanks > for the feedback. Thanks for pointing out how this display might be confusing; I have now removed the "preferred" part from the display, and added descriptions of how each attribute of the character is obtained, so that interested users could drill down.