unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Handling invalid HTML
@ 2005-10-18  8:06 Juri Linkov
  2005-10-19  2:43 ` Richard M. Stallman
  0 siblings, 1 reply; 5+ messages in thread
From: Juri Linkov @ 2005-10-18  8:06 UTC (permalink / raw)


Current rules of recognizing HTML files in Emacs are too strict:

1. The valid string delimiter for HTML attribute values is the
quotation character.  However, some HTML files on the Web use
apostrophes, e.g.

<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'>

The program that generates such non-standard meta headers is identified
as 'Microsoft DHTML Editing Control' (no surprise).

`sgml-html-meta-auto-coding-function' can't determine encoding from
such invalid meta headers.  I propose to replace \" with [\"']
in regexps in `sgml-html-meta-auto-coding-function' to accept
such invalid HTML.  (The regexps in other function
`sgml-xml-auto-coding-function' already match [\"'] for XML files).

2. `sgml-html-meta-auto-coding-function' can't determine encoding when
HTML file has no `<html>' starting element.  An example of such HTML
file is the Mozilla Firefox bookmark file.  Sometimes it's needed
to open this file in Emacs and to use isearch on it, but Emacs can't
detect its encoding.  Perhaps the test `(search-forward "<html" size t)'
should be removed from `sgml-html-meta-auto-coding-function'.

3. Visiting Mozilla Firefox bookmark file in Emacs also can't detect
the type of this file.  Emacs opens it in SGML mode whereas it is
actually HTML file.  This problem is caused by the default value of
`magic-mode-alist'.  Maybe the `.html' extension in `auto-mode-alist'
should take precedence over `magic-mode-alist'?

-- 
Juri Linkov
http://www.jurta.org/emacs/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-10-20  4:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E1ERnvt-0006VR-9S@monty-python.gnu.org>
2005-10-18 15:05 ` Handling invalid HTML Jonathan Yavner
2005-10-19 15:59   ` Juri Linkov
2005-10-20  4:54     ` Richard M. Stallman
2005-10-18  8:06 Juri Linkov
2005-10-19  2:43 ` Richard M. Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).