From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel,gmane.emacs.gnus.general Subject: Re: gnus should accept UTF8 even if UTF-8 is standard Date: Wed, 22 Oct 2008 11:34:17 +0900 Message-ID: <87fxmptl1y.fsf@xemacs.org> References: <87wsg2tvcn.fsf@xemacs.org> <20081021062510.GB22593@tomas> <87prlutizh.fsf@xemacs.org> <87k5c2tan9.fsf@xemacs.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1224642572 20930 80.91.229.12 (22 Oct 2008 02:29:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 22 Oct 2008 02:29:32 +0000 (UTC) Cc: rms@gnu.org, ding@gnus.org, emacs-devel@gnu.org, tomas@tuxteam.de, monnier@iro.umontreal.ca, miles@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Oct 22 04:30:32 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KsTUV-0008JR-1h for ged-emacs-devel@m.gmane.org; Wed, 22 Oct 2008 04:30:31 +0200 Original-Received: from localhost ([127.0.0.1]:51622 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KsTTO-0001zE-MP for ged-emacs-devel@m.gmane.org; Tue, 21 Oct 2008 22:29:22 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KsTTL-0001z7-Fh for emacs-devel@gnu.org; Tue, 21 Oct 2008 22:29:19 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KsTTJ-0001yn-2l for emacs-devel@gnu.org; Tue, 21 Oct 2008 22:29:18 -0400 Original-Received: from [199.232.76.173] (port=37992 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KsTTJ-0001yk-0j for emacs-devel@gnu.org; Tue, 21 Oct 2008 22:29:17 -0400 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:41656) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KsTT6-000743-VQ; Tue, 21 Oct 2008 22:29:06 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 519A1800E; Wed, 22 Oct 2008 11:29:01 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id C528C1A26AE; Wed, 22 Oct 2008 11:34:17 +0900 (JST) In-Reply-To: X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta28) "fuki" 83e35df20028+ XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:104807 gmane.emacs.gnus.general:67666 Archived-At: Eli Zaretskii writes: > > > Perhaps something like `canonicalize-coding-system-name' would be good. > > > > That implies that the return value would be a string, not the coding > > system itself. I suggest we return the coding system (or nil), not > > just the name. > > What I meant is that, instead of returning a _string_, which is the > name of a coding system, it is better to return a _symbol_ of that > coding system. Of course. My point is that the symbol is the name, and therefore "canonicalize-coding-system-name" is a reasonable name for this function. If it weren't for the conflict with XEmacs, which still needs `get-coding-system' to return a coding system object, I'd be perfectly happy using that. > > AIUI, the point of the function is to guess what people who don't > > know what they're doing are trying to express (and to provide some > > interactive convenience to people who do know what they're doing). > > Agreed, but in most cases the argument will be a valid MIME charset. Except when Richard is typing, and surely we all consider that an important use case? Aside from Richard's expressed preference for a harmless convenience, the presence or absence of one or more hyphens is something the various standards disagree about: > The case of "UTF8" is an exception. Well, no, I think it is not. AFAIK only one of "iso-8859-1" and "iso8859-1" is registered, but Emacs uses the former exclusively, and X11 only the latter (in XLFDs). Both are acceptable to iconv. (And the ISO standards actually use "ISO 8859/1" which isn't even acceptable to glibc iconv!) > And even in this exceptional case, I understand that "UTF8" came > from some charset= header. That is why I suggested > coding-system-for-charset. Well, the MIME nomenclature is seriously broken. A substantial minority of the things it denotes "charsets" are not "character sets" in any sense. > I don't mind coding-system-for-mime-charset, either, if that was > your point. That's the worst of several suggestions, as this mapping is not limited to MIME charsets, but is useful for coding systems in general, as the usage of hyphens in their names has no rhyme nor reason. Is it "KOI8-R" or "KOI-8R"? That one confused me, at least, for a while. > (In Emacs 23+, the original Mule meaning of "charset" will fade > out.) That would be sad. While I agree that UTF-8 will fairly quickly become universal for current text documents, I don't expect the vast amount of legacy archives to be converted any time soon (some will be converted at the time of converting to new media, but human beings being what they are I expect that for a couple centuries some bureaucrats will just make bit-level copies ;-). Emacs should be the premier application for reading those!