From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: po file charset via auto-coding-functions Date: Mon, 24 Oct 2005 10:39:16 +0900 Message-ID: References: <87zmp399ue.fsf@zip.com.au> <87ll0ma3ow.fsf@zip.com.au> <87fyqu9ung.fsf@zip.com.au> <87k6g6e05k.fsf-monnier+emacs@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: sea.gmane.org 1130118024 655 80.91.229.2 (24 Oct 2005 01:40:24 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 24 Oct 2005 01:40:24 +0000 (UTC) Cc: user42@zip.com.au, emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 24 03:40:14 2005 Return-path: Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1ETrJJ-0002aI-NS for ged-emacs-devel@m.gmane.org; Mon, 24 Oct 2005 03:39:38 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ETrJI-0004VG-OX for ged-emacs-devel@m.gmane.org; Sun, 23 Oct 2005 21:39:36 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1ETrJ9-0004VB-GS for emacs-devel@gnu.org; Sun, 23 Oct 2005 21:39:27 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1ETrJ8-0004Uz-Hw for emacs-devel@gnu.org; Sun, 23 Oct 2005 21:39:27 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1ETrJ8-0004Ur-D4 for emacs-devel@gnu.org; Sun, 23 Oct 2005 21:39:26 -0400 Original-Received: from [192.47.44.130] (helo=tsukuba.m17n.org) by monty-python.gnu.org with esmtp (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA:24) (Exim 4.34) id 1ETrJ8-0002Z3-0C for emacs-devel@gnu.org; Sun, 23 Oct 2005 21:39:26 -0400 Original-Received: from nfs.m17n.org (nfs.m17n.org [192.47.44.7]) by tsukuba.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id j9O1dJOk024274; Mon, 24 Oct 2005 10:39:19 +0900 Original-Received: from etlken (etlken.m17n.org [192.47.44.125]) by nfs.m17n.org (8.13.4/8.13.4/Debian-3) with ESMTP id j9O1dHd4026662; Mon, 24 Oct 2005 10:39:17 +0900 Original-Received: from handa by etlken with local (Exim 3.36 #1 (Debian)) id 1ETrIy-0007uT-00; Mon, 24 Oct 2005 10:39:16 +0900 Original-To: Stefan Monnier In-reply-to: <87k6g6e05k.fsf-monnier+emacs@gnu.org> (message from Stefan Monnier on Fri, 21 Oct 2005 22:50:12 -0400) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/22.0.50 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:44670 Archived-At: In article <87k6g6e05k.fsf-monnier+emacs@gnu.org>, Stefan Monnier writes: >> environment. Hmmm, it seems that you are right. There's no >> way to handle a tared/archived file in a function registered >> in file-coding-system-alist. > Provide a file-name-handler for tar files and archives would work > around that problem. Maybe, but I'm not sure. My gut feeling tells that it's not easy to setup various handlers for an archive member already setup in a (narrowed) buffer. We don't know what kind of file operation a function in file-coding-system-alist performs. By the way, while considering the possibility of using file-name-handler, I got this idea. The correct operation in a handler for insert-file-contents will be to find a buffer pretending to visit the file, and insert that buffer contents. And, for that, we have to give buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the filename itself (e.g. vi.po) to find-operation-coding-system. I think such a change is safe because, at least, all current entries in file-coding-system-alist checks only the tail of a filename. But, if we have such a change, with a fairly simple change to po.el, we can fix the current problem. So, I now propose the attached change. --- Kenichi Handa handa@m17n.org 2005-10-24 Kenichi Handa * arc-mode.el (archive-set-buffer-as-visiting-file): Give buffer-file-name to find-operation-coding-system. * tar-mode.el (tar-extract): Give buffer-file-name to find-operation-coding-system. * textmodes/po.el (po-find-charset): If there exists a buffer visiting filename, check the contents of that buffer. (po-find-file-coding-system-guts): Check if there exists a buffer visiting filename. Index: arc-mode.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/arc-mode.el,v retrieving revision 1.68 diff -c -r1.68 arc-mode.el *** arc-mode.el 16 Oct 2005 17:05:23 -0000 1.68 --- arc-mode.el 24 Oct 2005 01:33:13 -0000 *************** *** 877,883 **** (let ((file-name-handler-alist '(("" . archive-file-name-handler)))) (car (find-operation-coding-system 'insert-file-contents ! filename t)))))) (if (and (not coding-system-for-read) (not enable-multibyte-characters)) (setq coding --- 877,883 ---- (let ((file-name-handler-alist '(("" . archive-file-name-handler)))) (car (find-operation-coding-system 'insert-file-contents ! buffer-file-name t)))))) (if (and (not coding-system-for-read) (not enable-multibyte-characters)) (setq coding Index: tar-mode.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/tar-mode.el,v retrieving revision 1.103 diff -c -r1.103 tar-mode.el *** tar-mode.el 22 Oct 2005 01:24:38 -0000 1.103 --- tar-mode.el 24 Oct 2005 01:33:14 -0000 *************** *** 737,743 **** (funcall set-auto-coding-function name (- (point-max) (point))))) (car (find-operation-coding-system ! 'insert-file-contents name t)))) (multibyte enable-multibyte-characters) (detected (detect-coding-region (point-min) --- 737,743 ---- (funcall set-auto-coding-function name (- (point-max) (point))))) (car (find-operation-coding-system ! 'insert-file-contents buffer-file-name t)))) (multibyte enable-multibyte-characters) (detected (detect-coding-region (point-min) Index: textmodes/po.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/textmodes/po.el,v retrieving revision 1.12 diff -c -r1.12 po.el *** textmodes/po.el 6 Aug 2005 17:41:15 -0000 1.12 --- textmodes/po.el 24 Oct 2005 01:33:14 -0000 *************** *** 44,55 **** "Return PO charset value for FILENAME." (let ((charset-regexp "^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"") (short-read nil)) ;; Try the first 4096 bytes. In case we cannot find the charset value ;; within the first 4096 bytes (the PO file might start with a long ;; comment) try the next 4096 bytes repeatedly until we'll know for sure ;; we've checked the empty header entry entirely. ! (while (not (or short-read (re-search-forward "^msgid" nil t))) (save-excursion (goto-char (point-max)) (let ((pair (insert-file-contents-literally filename nil --- 44,59 ---- "Return PO charset value for FILENAME." (let ((charset-regexp "^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"") + (buf (get-file-buffer filename)) (short-read nil)) + (when buf + (set-buffer buf) + (goto-char (point-min))) ;; Try the first 4096 bytes. In case we cannot find the charset value ;; within the first 4096 bytes (the PO file might start with a long ;; comment) try the next 4096 bytes repeatedly until we'll know for sure ;; we've checked the empty header entry entirely. ! (while (not (or short-read (re-search-forward "^msgid" nil t) buf)) (save-excursion (goto-char (point-max)) (let ((pair (insert-file-contents-literally filename nil *************** *** 57,63 **** (1- (+ (point) 4096))))) (setq short-read (< (nth 1 pair) 4096))))) (cond ((re-search-forward charset-regexp nil t) (match-string 1)) ! (short-read nil) ;; We've found the first msgid; maybe, only a part of the msgstr ;; value was loaded. Load the next 1024 bytes; if charset still ;; isn't available, give up. --- 61,67 ---- (1- (+ (point) 4096))))) (setq short-read (< (nth 1 pair) 4096))))) (cond ((re-search-forward charset-regexp nil t) (match-string 1)) ! ((or short-read buf) nil) ;; We've found the first msgid; maybe, only a part of the msgstr ;; value was loaded. Load the next 1024 bytes; if charset still ;; isn't available, give up. *************** *** 74,80 **** Do so according to FILENAME's declared charset." (and (eq operation 'insert-file-contents) ! (file-exists-p filename) (with-temp-buffer (let* ((coding-system-for-read 'no-conversion) (charset (or (po-find-charset filename) "ascii")) --- 78,84 ---- Do so according to FILENAME's declared charset." (and (eq operation 'insert-file-contents) ! (or (get-file-buffer filename) (file-exists-p filename)) (with-temp-buffer (let* ((coding-system-for-read 'no-conversion) (charset (or (po-find-charset filename) "ascii"))