From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kevin Rodgers Newsgroups: gmane.emacs.devel Subject: Re: Should `auto-coding-functions' be mode-specific? Date: Tue, 02 Jan 2007 22:26:22 -0700 Message-ID: References: <87ejqd88iv.fsf@pacem.orebokech.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1167802030 1841 80.91.229.12 (3 Jan 2007 05:27:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 3 Jan 2007 05:27:10 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 03 06:27:07 2007 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1H1yeZ-0003jL-Az for ged-emacs-devel@m.gmane.org; Wed, 03 Jan 2007 06:27:07 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H1yeY-0006pH-Na for ged-emacs-devel@m.gmane.org; Wed, 03 Jan 2007 00:27:06 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1H1yeM-0006oj-9k for emacs-devel@gnu.org; Wed, 03 Jan 2007 00:26:54 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1H1yeJ-0006nk-LR for emacs-devel@gnu.org; Wed, 03 Jan 2007 00:26:53 -0500 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1H1yeJ-0006ng-HT for emacs-devel@gnu.org; Wed, 03 Jan 2007 00:26:51 -0500 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1H1yeI-0002fa-8j for emacs-devel@gnu.org; Wed, 03 Jan 2007 00:26:51 -0500 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1H1ye9-0005VC-Bt for emacs-devel@gnu.org; Wed, 03 Jan 2007 06:26:41 +0100 Original-Received: from c-24-9-156-178.hsd1.co.comcast.net ([24.9.156.178]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 03 Jan 2007 06:26:41 +0100 Original-Received: from kevin.d.rodgers by c-24-9-156-178.hsd1.co.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 03 Jan 2007 06:26:41 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-To: emacs-devel@gnu.org Original-Lines: 67 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: c-24-9-156-178.hsd1.co.comcast.net User-Agent: Thunderbird 1.5.0.9 (Macintosh/20061207) In-Reply-To: <87ejqd88iv.fsf@pacem.orebokech.com> X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:64672 Archived-At: Romain Francoise wrote: > I received a bug report from a Debian user (CC'd) who was surprised > to see that Emacs 22 opens one of his utf-8-encoded files as ASCII, > because it contains the following HTML snippet near the top: > > | > | > | > | > | > | > > The file itself is not an HTML file, but Emacs still uses the > encoding specified in the HTML code to set the encoding. (This is > caused by `sgml-html-meta-auto-coding-function', which is present by > default in the list of `auto-coding-functions' -- the functions are > tried in the first 1K or last 3K bytes of the buffer.) > > I replied that the encoding can be forced using a -*- coding: .. -*- > cookie, but the submitter argues that the functions to get the > encoding from the file's contents should only be enabled in modes > where the content of the buffer is supposed to match -- i.e. don't > use the META header function in buffers that aren't in html-mode (or > equivalent). The other default element of auto-coding-functions is sgml-xml-auto-coding-function, which looks for the encoding specified in the XML declaration but is careful to ensure that the declaration occurs at the beginning of the buffer (optionally preceded by whitespace, as allowed by the XML spec). Shouldn't sgml-html-meta-auto-coding-function ensure that the tag occurs within an HTML document, by also matching an appropriate pattern at the beginning buffer? I know there is more variation in what is allowed at the beginning of an HTML document compared to an XML document, but I think it would be an improvement to require either an HTML document type declaration or an tag (optionally preceded by whitespace): (when (re-search-forward "\\`[[:space:]\n]*\\( to handle invalid HTML (such as Mozilla Firefox bookmark files): 2006-06-02 Juri Linkov * international/mule.el (sgml-html-meta-auto-coding-function): Remove the condition `(search-forward " tag before the tag. The bug reported by the Debian user concerns a file which is clearly not an HTML file even though it contains a valid HTML document, because of the text that precedes the markup. > What do people think? > > (See http://bugs.debian.org/404236 for the discussion.) -- Kevin Rodgers Denver, Colorado, USA