From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Miles Bader Newsgroups: gmane.emacs.devel,gmane.emacs.pretest.bugs Subject: Re: 23.0.60; Defaut encoding for XML files should be undefined (instead of utf-8) Date: Sat, 16 Feb 2008 08:24:04 +0900 Message-ID: <87k5l5yfaj.fsf@catnip.gol.com> References: <87odaifv16.fsf@mundaneum.com> <47B61830.6020005@gnu.org> Reply-To: Miles Bader NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1203117870 16679 80.91.229.12 (15 Feb 2008 23:24:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 15 Feb 2008 23:24:30 +0000 (UTC) Cc: emacs-pretest-bug@gnu.org, Edward O'Connor , emacs-devel@gnu.org To: Jason Rumney Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Feb 16 00:24:51 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JQ9vF-0005XO-CL for ged-emacs-devel@m.gmane.org; Sat, 16 Feb 2008 00:24:49 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JQ9ul-0008T1-8S for ged-emacs-devel@m.gmane.org; Fri, 15 Feb 2008 18:24:19 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JQ9ug-0008RE-8q for emacs-devel@gnu.org; Fri, 15 Feb 2008 18:24:14 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JQ9ue-0008Pe-CH for emacs-devel@gnu.org; Fri, 15 Feb 2008 18:24:13 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JQ9ue-0008PL-6a for emacs-devel@gnu.org; Fri, 15 Feb 2008 18:24:12 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1JQ9ud-0006gj-SX for emacs-devel@gnu.org; Fri, 15 Feb 2008 18:24:12 -0500 Original-Received: from mx10.gnu.org ([199.232.76.166]) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1JQ9ud-0002KH-Jt for emacs-pretest-bug@gnu.org; Fri, 15 Feb 2008 18:24:11 -0500 Original-Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1JQ9ua-0006gL-GH for emacs-pretest-bug@gnu.org; Fri, 15 Feb 2008 18:24:11 -0500 Original-Received: from smtp11.dentaku.gol.com ([203.216.5.73]) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JQ9uZ-0006el-Sw; Fri, 15 Feb 2008 18:24:08 -0500 Original-Received: from 203-216-97-023.dsl.gol.ne.jp ([203.216.97.23] helo=catnip.gol.com) by smtp11.dentaku.gol.com with esmtpa (Dentaku) id 1JQ9uX-0004HB-FC; Sat, 16 Feb 2008 08:24:05 +0900 Original-Received: by catnip.gol.com (Postfix, from userid 1000) id 7B45A2FF7; Sat, 16 Feb 2008 08:24:04 +0900 (JST) System-Type: i686-pc-linux-gnu In-Reply-To: <47B61830.6020005@gnu.org> (Jason Rumney's message of "Fri, 15 Feb 2008 22:54:40 +0000") Original-Lines: 26 X-Virus-Scanned: ClamAV GOL (outbound) X-Abuse-Complaints: abuse@gol.com X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 3) X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:89193 gmane.emacs.pretest.bugs:21123 Archived-At: Jason Rumney writes: > Emacs goes beyond doing the right thing at the moment. The right thing > would be to guide users into using utf-8 by making that the default > encoding for *new* XML files, and perhaps warning if an existing file > was detected as non-utf-8 without a charset declaration in the > header. Forcing users into using utf-8 by ignoring explicit requests to > save the file as latin-1 and by opening latin-1 encoded files as utf-8 > even when the decoding fails is not the right behaviour. Our users are > not slaves to specifications. Perhaps the "best" thing would be to temporarily do a (prefer-coding-system 'utf-8) when reading xml files without an encoding header, instead of _forcing_ the coding system to be utf-8. However it doesn't look like the current Emacs mechanism for format-specific coding systems (`auto-coding-functions') explicitly supports functionality. Maybe sgml-xml-auto-coding-function could use whatever lower-level function does coding-system-detection using only the characters in the buffer (I can't seem to find it, but there must be such a thing...). -Miles -- Scriptures, n. The sacred books of our holy religion, as distinguished from the false and profane writings on which all other faiths are based.