From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kevin Rodgers Newsgroups: gmane.emacs.bugs Subject: Re: why not use unicode if html file has charset=utf-8? Date: Tue, 27 Jul 2004 09:50:56 -0600 Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Message-ID: <410679E0.8020802@yahoo.com> References: NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1090943497 7258 80.91.224.253 (27 Jul 2004 15:51:37 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 27 Jul 2004 15:51:37 +0000 (UTC) Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jul 27 17:51:24 2004 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BpUEe-0006Ei-00 for ; Tue, 27 Jul 2004 17:51:24 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BpUHk-0006Tq-68 for geb-bug-gnu-emacs@m.gmane.org; Tue, 27 Jul 2004 11:54:36 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1BpUHh-0006Tk-PC for bug-gnu-emacs@gnu.org; Tue, 27 Jul 2004 11:54:33 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1BpUHg-0006TY-5d for bug-gnu-emacs@gnu.org; Tue, 27 Jul 2004 11:54:33 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1BpUHg-0006TV-3u for bug-gnu-emacs@gnu.org; Tue, 27 Jul 2004 11:54:32 -0400 Original-Received: from [216.168.1.22] (helo=trinity.supernews.net) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1BpUEK-00036e-3C for bug-gnu-emacs@gnu.org; Tue, 27 Jul 2004 11:51:04 -0400 Original-Received: from mail.fu-berlin.de ([130.133.1.2]:3724) by trinity.supernews.net with esmtp (Exim 4.24; FreeBSD) id 1BpUEJ-0000Zs-3c for gnu-emacs-bug@moderators.isc.org; Tue, 27 Jul 2004 15:51:03 +0000 Original-Received: by Mail.FU-Berlin.DE (Exim 4.41) from curry.zedat.fu-berlin.de ([160.45.10.36]) for gnu-emacs-bug@moderators.isc.org with esmtp id <1BpUEB-0003zT-Cg>; Tue, 27 Jul 2004 17:50:55 +0200 Original-Received: by Curry.ZEDAT.FU-Berlin.DE (Smail3.2.0.98) from news.uni-berlin.de with bsmtp id ; Tue, 27 Jul 2004 17:50:55 +0200 (MEST) Original-To: gnu-emacs-bug@moderators.isc.org Original-Path: not-for-mail Original-Newsgroups: gnu.emacs.bug Original-Lines: 31 X-Orig-X-Trace: news.uni-berlin.de Jg/EGjwOd21krj9ooNJ4Kg+9kuHoKwbj6oV85bxDKoPAaJVaA= User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:0.9.4.1) Gecko/20020406 Netscape6/6.2.2 X-Accept-Language: en-us X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: main.gmane.org gmane.emacs.bugs:8551 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:8551 Dan Jacobson wrote: > One would think that if some file.html had > > near the top, emacs would show it with the unicode charset. > Browsers get that right. I think the first step would be to go from the (MIME) charset attribute value to an Emacs coding system. But this particular example (utf-8) returns 8 alternatives on Emacs 21.3: (let ((mime-charset 'utf-8) ; more generally: (intern (downcase "UTF-8")) (coding-systems '())) (mapatoms (lambda (symbol) (if (and symbol (coding-system-p symbol) (eq (coding-system-get symbol 'mime-charset) mime-charset)) (setq coding-systems (cons symbol coding-systems))))) (sort coding-systems 'string-lessp)) => (mule-utf-8 mule-utf-8-dos mule-utf-8-mac mule-utf-8-unix utf-8 utf-8-dos utf-8-mac utf-8-unix) What's the right way to choose among them? Ah, gnus/mm-util.el has ths: (mm-charset-to-coding-system "UTF-8") => utf-8 The next step would be to call set-buffer-file-coding-system; should that be done via html-mode-hook, or is that too late? What about using after-insert-file-functions/after-insert-file-set-buffer-file-coding-system? -- Kevin Rodgers