From: Kevin Ryde <user42@zip.com.au>
Subject: po file charset via auto-coding-functions
Date: Fri, 21 Oct 2005 07:06:49 +1000 [thread overview]
Message-ID: <87zmp399ue.fsf@zip.com.au> (raw)
[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]
This is a proposal to get the coding system for a .po file via
auto-coding-functions, instead of the way textmodes/po.el reads the
file explicitly.
2005-10-20 Kevin Ryde <user42@zip.com.au>
* international/mule.el (po-content-type-charset-alist): Moved from
textmodes.el, add "CHARSET" which is a placeholder from xgettext.
(po-auto-coding-function): New function. This gets the right coding
system when visiting a .po via archive-mode; po-find-file-coding-system
only worked on a normal file. charset= regexp from textmodes/po.el.
(auto-coding-functions): Use po-auto-coding-function.
* international/mule-conf.el (file-coding-system-alist): Remove
po-find-file-coding-system.
* textmodes/po.el: Remove file, no longer used.
One possible problem is that po files can have more than 1024 bytes of
comments before the header info block. I see fileio.c
Finsert_file_contents only grabs 1024 bytes before calling
set-auto-coding, but I can't tell if/when that happens. I think a
normal visit or an `archive-extract' has the whole file, so they work.
I used the following bit of code to exercise po-auto-coding-function
on all my .po files. The function prints messages about bad charsets,
the result is a list of the bad files.
(delq nil
(mapcar (lambda (filename)
(with-temp-buffer
(insert-file-contents-literally filename)
(goto-char (point-min))
(if (po-auto-coding-function (- (point-max) (point-min)))
nil
filename)))
(delete "" (split-string
(shell-command-to-string "locate \\*.po") "\n"))))
Among my files I found two unrecognised:
"TCVN-5712" in gtk 1.2 vietnamese. Is there a good place to map or
alias that to `tcvn' which emacs knows?
"iso-8859-9e" in gtk 1.2 Azerbaijani turkish, but I don't know what
that charset is or is meant to be. glibc iconv doesn't seem to
recognise it, so presumably it's unused.
[-- Attachment #2: mule.el.po-coding.diff --]
[-- Type: text/plain, Size: 3146 bytes --]
*** mule.el.~1.226.~ 2005-09-29 09:23:59.000000000 +1000
--- mule.el 2005-10-21 06:54:07.785993736 +1000
***************
*** 1586,1592 ****
(symbol :tag "Coding system"))))
;; See the bottom of this file for built-in auto coding functions.
! (defcustom auto-coding-functions '(sgml-xml-auto-coding-function
sgml-html-meta-auto-coding-function)
"A list of functions which attempt to determine a coding system.
--- 1586,1593 ----
(symbol :tag "Coding system"))))
;; See the bottom of this file for built-in auto coding functions.
! (defcustom auto-coding-functions '(po-auto-coding-function
! sgml-xml-auto-coding-function
sgml-html-meta-auto-coding-function)
"A list of functions which attempt to determine a coding system.
***************
*** 2203,2208 ****
--- 2204,2254 ----
;;; Built-in auto-coding-functions:
+ (defconst po-content-type-charset-alist
+ '(("ASCII" . undecided)
+ ("ANSI_X3.4-1968" . undecided)
+ ("US-ASCII" . undecided)
+ ;; "charset=CHARSET" is generated by xgettext, and may be present before
+ ;; someone fills in their target charset. `undecided' should be right.
+ ("CHARSET" . undecided))
+ "Alist of coding system versus GNU libc/libiconv canonical charset name.
+ Contains canonical charset names that don't correspond to coding systems.")
+
+ (defun po-auto-coding-function (size)
+ "Determine character encoding of a gettext .po or .pot file.
+ This function is designed for use in `auto-coding-functions'.
+
+ A po file starts with msgstr \"\" which has header information, in
+ particular \"Content-Type: text/plain; charset=ASCII\\n\" or whatever."
+
+ ;; Skip "#" comment lines and whitespace-only lines, then want
+ ;; msgstr ""
+ ;; msgid ""
+ ;; which, up to the next blank line, is the header info.
+ ;;
+ (and (looking-at
+ "\\(#.*\n\\|[ \t\r]*\n\\)*msgid \"\"[ \t\r]*\nmsgstr \"\"[ \t\r]*\n")
+ (let ((limit (+ (point) size)))
+ (save-excursion
+ (goto-char (match-end 0))
+
+ ;; Blank line is the end of the header, stop searching there, or
+ ;; at existing `limit' if the file is a header only.
+ (setq limit (or (save-excursion
+ (re-search-forward "\n[ \t\r]*\n" limit t))
+ limit))
+
+ (and (re-search-forward "^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"" limit t)
+ (let ((charset (match-string 1)))
+
+ (or (cdr (assoc-string charset po-content-type-charset-alist
+ t))
+ (locale-charset-to-coding-system charset)
+ (progn
+ (message "Warning: unknown coding system \"%s\""
+ charset)
+ nil))))))))
+
(defun sgml-xml-auto-coding-function (size)
"Determine whether the buffer is XML, and if so, its encoding.
This function is intended to be added to `auto-coding-functions'."
[-- Attachment #3: mule-conf.el.po-coding.diff --]
[-- Type: text/plain, Size: 493 bytes --]
*** mule-conf.el.~1.82.~ 2005-08-04 06:28:21.000000000 +1000
--- mule-conf.el 2005-10-20 17:45:02.000000000 +1000
***************
*** 502,508 ****
;; the beginning of a doc string, work.
("\\(\\`\\|/\\)loaddefs.el\\'" . (raw-text . raw-text-unix))
("\\.tar\\'" . (no-conversion . no-conversion))
- ( "\\.po[tx]?\\'\\|\\.po\\." . po-find-file-coding-system)
("\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'" . latexenc-find-file-coding-system)
("" . (undecided . nil))))
--- 502,507 ----
[-- Attachment #4: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
next reply other threads:[~2005-10-20 21:06 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-20 21:06 Kevin Ryde [this message]
2005-10-21 2:18 ` po file charset via auto-coding-functions Kenichi Handa
2005-10-21 22:46 ` Kevin Ryde
2005-10-22 1:43 ` Kenichi Handa
2005-10-22 2:01 ` Kevin Ryde
2005-10-22 2:39 ` Kenichi Handa
2005-10-22 2:50 ` Stefan Monnier
2005-10-22 22:44 ` Kevin Ryde
2005-10-24 1:39 ` Kenichi Handa
2005-10-22 15:51 ` Richard M. Stallman
2005-10-24 2:05 ` Kenichi Handa
2005-10-25 15:59 ` Richard M. Stallman
2005-11-02 10:27 ` Richard Stallman
2005-11-10 2:09 ` Richard Stallman
2005-11-10 3:49 ` Stefan Monnier
2005-11-10 17:49 ` Richard M. Stallman
2005-11-10 18:33 ` Stefan Monnier
2005-11-11 7:42 ` Richard M. Stallman
2005-11-18 13:08 ` Kenichi Handa
2005-11-18 17:21 ` Stefan Monnier
2005-11-19 0:30 ` Kenichi Handa
2005-11-20 1:16 ` Juri Linkov
2005-11-29 19:13 ` Kevin Rodgers
2005-11-30 2:45 ` Juri Linkov
2005-11-30 19:01 ` Richard M. Stallman
2005-11-19 23:27 ` Richard M. Stallman
2005-11-20 12:05 ` Kenichi Handa
2005-12-28 17:01 ` Richard M. Stallman
2005-12-29 11:47 ` Kenichi Handa
2005-12-30 2:18 ` Richard M. Stallman
2006-01-04 4:37 ` Kenichi Handa
2005-10-22 22:51 ` Kevin Ryde
2005-10-24 1:53 ` Kenichi Handa
2005-10-24 2:04 ` Kevin Ryde
2005-10-24 5:19 ` Kenichi Handa
2005-10-24 14:11 ` Stefan Monnier
2005-10-25 1:03 ` Kenichi Handa
2005-10-24 23:35 ` Juri Linkov
2005-10-25 6:42 ` Kenichi Handa
2005-10-25 20:27 ` Richard M. Stallman
2005-10-21 4:49 ` Richard M. Stallman
2005-10-21 21:07 ` Kevin Ryde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zmp399ue.fsf@zip.com.au \
--to=user42@zip.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.