unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Richard M. Stallman" <rms@gnu.org>
Subject: [user42@zip.com.au: html-coding.el -- coding system from meta tag]
Date: Wed, 20 Jul 2005 18:08:46 -0400	[thread overview]
Message-ID: <E1DvMkA-0000VA-PS@fencepost.gnu.org> (raw)

Could people who know more than I about HTML specifications please
look at this, and tell me whether they think it is good to add to Emacs?

------- Start of forwarded message -------
From: Kevin Ryde <user42@zip.com.au>
To: gnu-emacs-sources@gnu.org
Organization: Bah Humbug
Date: Wed, 20 Jul 2005 10:47:38 +1000
Subject: html-coding.el -- coding system from meta tag
Sender: gnu-emacs-sources-bounces+rms=gnu.org@gnu.org
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on monty-python
X-Spam-Level: 
X-Spam-Status: No, hits=0.9 required=5.0 tests=FROM_ENDS_IN_NUMS autolearn=no 
	version=2.63

- --=-=-=

This is a little spot of code for getting the coding system from the
meta tag when visiting a html file.

The emacs cvs head already has this feature, so this code is only for
emacs 21.

I'd be surprised if something like this isn't already in some or most
of the heavy duty html/sgml editing/viewing packages, though I
couldn't find the right bits on cursory inspection.  In any case all I
wanted was to see the right chars in a plain old find-file of some
random html.


- --=-=-=
Content-Type: application/emacs-lisp
Content-Disposition: attachment; filename=html-coding.el
Content-Transfer-Encoding: quoted-printable

;;; html-coding.el --- coding system from meta tag when visiting html files.

;; Copyright 2005 Kevin Ryde
;;
;; html-coding.el is free software; you can redistribute it and/or modify it
;; under the terms of the GNU General Public License as published by the
;; Free Software Foundation; either version 2, or (at your option) any later
;; version.
;;
;; html-coding.el is distributed in the hope that it will be useful, but
;; WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General
;; Public License for more details.
;;
;; You can get a copy of the GNU General Public License online at
;; http://www.gnu.org/licenses/gpl.txt, or you should have one in the file
;; COPYING which comes with GNU Emacs and other GNU programs.  Failing that,
;; write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
;; Boston, MA 02111-1307, USA.


;;; Commentary:

;; This is a spot of code for getting the coding system from a HTML <meta>
;; tag when visiting a .html, .shtml or .htm file.  mm-util.el (from Gnus)
;; is used to map a mime charset name in the html to an emacs coding system.
;;
;; This code is designed for Emacs 21.  The Emacs cvs head (which will be
;; Emacs 22 or whatever) already has this feature (in
;; sgml-html-meta-auto-coding-function), so nothing is done there.

;; If you have a file with a slightly bogus charset name, like "iso8859-1"
;; where it should be "iso-8859-1", you can map to the right one in
;; `mm-charset-synonym-alist', like
;;
;;     (eval-after-load "mm-util"
;;       '(add-to-list 'mm-charset-synonym-alist '(iso8859-1 . iso-8859-1)))
;;
;; But note that the mm-util.el which comes with Emacs 21.4a has a bug that
;; stops this working.  The test (mm-coding-system-p charset) should be
;; (mm-coding-system-p cs), ie. validate the mapped good name, not the bad
;; one.  You can make that change, or it's fixed in the separately packaged
;; Gnus.


;;; Install:

;; Put html-coding.el somewhere in your `load-path', and in your .emacs put
;;
;;     (require 'html-coding)

;;; History:

;; Version 1 - the first version.


;;; Code:

;; emacs 22 `sgml-html-meta-auto-coding-function' does this coding system
;; determination already, skip our code in that case
;;
(unless (fboundp 'sgml-html-meta-auto-coding-function)

  (defun html-coding-system (args)
    "Return the coding system for reading a HTML file, based on the <meta> =
tag.
If there's no charset in the file, this function checks what other rules sa=
y.

This function is for use in `file-coding-system-alist', the ARGS parameter
is a list, the only form handled here is `(insert-file-contents ...)'."
    (or (and (eq (car args) 'insert-file-contents)
             (file-exists-p (cadr args))
             (with-temp-buffer
               (insert-file-contents-literally (cadr args))
               (and (re-search-forward "<meta\\s-[^>]*charset=3D\\([^\">]+\=
\)"
                                       ;; first 10 lines, like emacs 22
                                       (save-excursion (forward-line 10)
                                                       (point))
                                       t)
                    (let ((charset (match-string 1)))
                      (require 'mm-util)
                      (or (mm-charset-to-coding-system charset)
                          (progn
                            (message "Unrecognised HTML MIME charset: %s"
                                     charset)
                            nil))))))
        (progn
          (require 'cl)
          (let ((file-coding-system-alist
                 (remove* 'html-coding-system file-coding-system-alist
                          :key 'cdr)))
            (apply 'find-operation-coding-system args)))))

  (modify-coding-system-alist 'file "\\.\\(html\\|shtml\\|htm\\)\\'" 'html-=
coding-system))

(provide 'html-coding)

;;; html-coding.el ends here

- --=-=-=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Gnu-emacs-sources mailing list
Gnu-emacs-sources@gnu.org
http://lists.gnu.org/mailman/listinfo/gnu-emacs-sources

- --=-=-=--
------- End of forwarded message -------

             reply	other threads:[~2005-07-20 22:08 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-20 22:08 Richard M. Stallman [this message]
2005-07-20 22:36 ` [user42@zip.com.au: html-coding.el -- coding system from meta tag] Arne Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1DvMkA-0000VA-PS@fencepost.gnu.org \
    --to=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).