From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Handling invalid HTML Date: Tue, 18 Oct 2005 11:06:42 +0300 Organization: JURTA Message-ID: <87br1ni7gl.fsf@jurta.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1129628082 1781 80.91.229.2 (18 Oct 2005 09:34:42 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 18 Oct 2005 09:34:42 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 18 11:34:41 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1ERnqw-0003my-NF for ged-emacs-devel@m.gmane.org; Tue, 18 Oct 2005 11:33:50 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ERnqv-0005CL-SK for ged-emacs-devel@m.gmane.org; Tue, 18 Oct 2005 05:33:50 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1ERmZj-00033f-1y for emacs-devel@gnu.org; Tue, 18 Oct 2005 04:11:59 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1ERmZc-00032B-99 for emacs-devel@gnu.org; Tue, 18 Oct 2005 04:11:54 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ERmZZ-00031D-2N for emacs-devel@gnu.org; Tue, 18 Oct 2005 04:11:49 -0400 Original-Received: from [194.126.101.114] (helo=mail.neti.ee) by monty-python.gnu.org with esmtp (Exim 4.34) id 1ERmZY-0000uB-T8 for emacs-devel@gnu.org; Tue, 18 Oct 2005 04:11:49 -0400 Original-Received: from mail.neti.ee (80-235-32-236-dsl.mus.estpak.ee [80.235.32.236]) by Relayhost1.neti.ee (Postfix) with ESMTP id B6B403841 for ; Tue, 18 Oct 2005 11:12:01 +0300 (EEST) Original-To: emacs-devel@gnu.org User-Agent: Gnus/5.110004 (No Gnus v0.4) Emacs/22.0.50 (gnu/linux) X-Virus-Scanned: by amavisd-new-2.2.1 (20041222) (Debian) at neti.ee X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:44244 Archived-At: Current rules of recognizing HTML files in Emacs are too strict: 1. The valid string delimiter for HTML attribute values is the quotation character. However, some HTML files on the Web use apostrophes, e.g. The program that generates such non-standard meta headers is identified as 'Microsoft DHTML Editing Control' (no surprise). `sgml-html-meta-auto-coding-function' can't determine encoding from such invalid meta headers. I propose to replace \" with [\"'] in regexps in `sgml-html-meta-auto-coding-function' to accept such invalid HTML. (The regexps in other function `sgml-xml-auto-coding-function' already match [\"'] for XML files). 2. `sgml-html-meta-auto-coding-function' can't determine encoding when HTML file has no `' starting element. An example of such HTML file is the Mozilla Firefox bookmark file. Sometimes it's needed to open this file in Emacs and to use isearch on it, but Emacs can't detect its encoding. Perhaps the test `(search-forward "