all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kevin Rodgers <kevin.d.rodgers@gmail.com>
Subject: Re: Should `auto-coding-functions' be mode-specific?
Date: Tue, 02 Jan 2007 22:26:22 -0700	[thread overview]
Message-ID: <enfepu$1ml$1@sea.gmane.org> (raw)
In-Reply-To: <87ejqd88iv.fsf@pacem.orebokech.com>

Romain Francoise wrote:
> I received a bug report from a Debian user (CC'd) who was surprised
> to see that Emacs 22 opens one of his utf-8-encoded files as ASCII,
> because it contains the following HTML snippet near the top:
> 
> | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> | <HTML><HEAD>
> | <META http-equiv=Content-Type content="text/html; charset=us-ascii">
> | </HEAD>
> | <BODY>
> | </BODY></HTML>
> 
> The file itself is not an HTML file, but Emacs still uses the
> encoding specified in the HTML code to set the encoding.  (This is
> caused by `sgml-html-meta-auto-coding-function', which is present by
> default in the list of `auto-coding-functions' -- the functions are
> tried in the first 1K or last 3K bytes of the buffer.)
> 
> I replied that the encoding can be forced using a -*- coding: .. -*-
> cookie, but the submitter argues that the functions to get the
> encoding from the file's contents should only be enabled in modes
> where the content of the buffer is supposed to match -- i.e. don't
> use the META header function in buffers that aren't in html-mode (or
> equivalent).

The other default element of auto-coding-functions is
sgml-xml-auto-coding-function, which looks for the encoding specified in
the XML declaration but is careful to ensure that the declaration occurs
at the beginning of the buffer (optionally preceded by whitespace, as
allowed by the XML spec).  Shouldn't sgml-html-meta-auto-coding-function
ensure that the <meta> tag occurs within an HTML document, by also
matching an appropriate pattern at the beginning buffer?

I know there is more variation in what is allowed at the beginning of an
HTML document compared to an XML document, but I think it would be an
improvement to require either an HTML document type declaration or an
<html> tag (optionally preceded by whitespace):

   (when (re-search-forward 
"\\`[[:space:]\n]*\\(<!doctype[[:space:]\n]+html\\|<html\\)"
			   size t)
     ...)

Finally, note the following ChangeLog entry, which describes the patch
proposed by Juri in
<URL:http://lists.gnu.org/archive/html/emacs-devel/2005-10/msg00916.html>
to handle invalid HTML (such as Mozilla Firefox bookmark files):

2006-06-02  Juri Linkov  <juri@jurta.org>

	* international/mule.el (sgml-html-meta-auto-coding-function):
	Remove the condition `(search-forward "<html" size t)'.
	Replace `\"' with `[\"']?' in `re-search-forward'.

I agree that Emacs should not be too pedantic about HTML, but I don't
think it's too much to require an <html> tag before the <meta> tag.  The
bug reported by the Debian user concerns a file which is clearly not an
HTML file even though it contains a valid HTML document, because of the
text that precedes the markup.

> What do people think?
> 
> (See http://bugs.debian.org/404236 for the discussion.)

-- 
Kevin Rodgers
Denver, Colorado, USA

  parent reply	other threads:[~2007-01-03  5:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-02 20:43 Should `auto-coding-functions' be mode-specific? Romain Francoise
2007-01-02 22:14 ` Lennart Borgman (gmail)
2007-01-03  0:54 ` Kevin Ryde
2007-01-03  3:34 ` Richard Stallman
2007-01-03 12:09   ` Vincent Lefevre
2007-01-04  2:31     ` Richard Stallman
2007-01-04  4:14       ` Kenichi Handa
2007-01-04 19:02         ` Romain Francoise
2007-01-04 22:33         ` Richard Stallman
2007-01-05 18:04           ` Romain Francoise
2007-01-03  5:26 ` Kevin Rodgers [this message]
2007-01-03 14:18   ` Ralf Mattes
2007-01-04  8:44   ` Romain Francoise
2007-01-06 23:33     ` Juri Linkov
2007-01-07 13:47       ` Romain Francoise
2007-01-07 16:22         ` Juri Linkov
2007-01-08 19:46           ` Romain Francoise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='enfepu$1ml$1@sea.gmane.org' \
    --to=kevin.d.rodgers@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.