* po file charset via auto-coding-functions
@ 2005-10-20 21:06 Kevin Ryde
2005-10-21 2:18 ` Kenichi Handa
2005-10-21 4:49 ` Richard M. Stallman
0 siblings, 2 replies; 42+ messages in thread
From: Kevin Ryde @ 2005-10-20 21:06 UTC (permalink / raw)
[-- Attachment #1: Type: text/plain, Size: 2060 bytes --]
This is a proposal to get the coding system for a .po file via
auto-coding-functions, instead of the way textmodes/po.el reads the
file explicitly.
2005-10-20 Kevin Ryde <user42@zip.com.au>
* international/mule.el (po-content-type-charset-alist): Moved from
textmodes.el, add "CHARSET" which is a placeholder from xgettext.
(po-auto-coding-function): New function. This gets the right coding
system when visiting a .po via archive-mode; po-find-file-coding-system
only worked on a normal file. charset= regexp from textmodes/po.el.
(auto-coding-functions): Use po-auto-coding-function.
* international/mule-conf.el (file-coding-system-alist): Remove
po-find-file-coding-system.
* textmodes/po.el: Remove file, no longer used.
One possible problem is that po files can have more than 1024 bytes of
comments before the header info block. I see fileio.c
Finsert_file_contents only grabs 1024 bytes before calling
set-auto-coding, but I can't tell if/when that happens. I think a
normal visit or an `archive-extract' has the whole file, so they work.
I used the following bit of code to exercise po-auto-coding-function
on all my .po files. The function prints messages about bad charsets,
the result is a list of the bad files.
(delq nil
(mapcar (lambda (filename)
(with-temp-buffer
(insert-file-contents-literally filename)
(goto-char (point-min))
(if (po-auto-coding-function (- (point-max) (point-min)))
nil
filename)))
(delete "" (split-string
(shell-command-to-string "locate \\*.po") "\n"))))
Among my files I found two unrecognised:
"TCVN-5712" in gtk 1.2 vietnamese. Is there a good place to map or
alias that to `tcvn' which emacs knows?
"iso-8859-9e" in gtk 1.2 Azerbaijani turkish, but I don't know what
that charset is or is meant to be. glibc iconv doesn't seem to
recognise it, so presumably it's unused.
[-- Attachment #2: mule.el.po-coding.diff --]
[-- Type: text/plain, Size: 3146 bytes --]
*** mule.el.~1.226.~ 2005-09-29 09:23:59.000000000 +1000
--- mule.el 2005-10-21 06:54:07.785993736 +1000
***************
*** 1586,1592 ****
(symbol :tag "Coding system"))))
;; See the bottom of this file for built-in auto coding functions.
! (defcustom auto-coding-functions '(sgml-xml-auto-coding-function
sgml-html-meta-auto-coding-function)
"A list of functions which attempt to determine a coding system.
--- 1586,1593 ----
(symbol :tag "Coding system"))))
;; See the bottom of this file for built-in auto coding functions.
! (defcustom auto-coding-functions '(po-auto-coding-function
! sgml-xml-auto-coding-function
sgml-html-meta-auto-coding-function)
"A list of functions which attempt to determine a coding system.
***************
*** 2203,2208 ****
--- 2204,2254 ----
;;; Built-in auto-coding-functions:
+ (defconst po-content-type-charset-alist
+ '(("ASCII" . undecided)
+ ("ANSI_X3.4-1968" . undecided)
+ ("US-ASCII" . undecided)
+ ;; "charset=CHARSET" is generated by xgettext, and may be present before
+ ;; someone fills in their target charset. `undecided' should be right.
+ ("CHARSET" . undecided))
+ "Alist of coding system versus GNU libc/libiconv canonical charset name.
+ Contains canonical charset names that don't correspond to coding systems.")
+
+ (defun po-auto-coding-function (size)
+ "Determine character encoding of a gettext .po or .pot file.
+ This function is designed for use in `auto-coding-functions'.
+
+ A po file starts with msgstr \"\" which has header information, in
+ particular \"Content-Type: text/plain; charset=ASCII\\n\" or whatever."
+
+ ;; Skip "#" comment lines and whitespace-only lines, then want
+ ;; msgstr ""
+ ;; msgid ""
+ ;; which, up to the next blank line, is the header info.
+ ;;
+ (and (looking-at
+ "\\(#.*\n\\|[ \t\r]*\n\\)*msgid \"\"[ \t\r]*\nmsgstr \"\"[ \t\r]*\n")
+ (let ((limit (+ (point) size)))
+ (save-excursion
+ (goto-char (match-end 0))
+
+ ;; Blank line is the end of the header, stop searching there, or
+ ;; at existing `limit' if the file is a header only.
+ (setq limit (or (save-excursion
+ (re-search-forward "\n[ \t\r]*\n" limit t))
+ limit))
+
+ (and (re-search-forward "^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"" limit t)
+ (let ((charset (match-string 1)))
+
+ (or (cdr (assoc-string charset po-content-type-charset-alist
+ t))
+ (locale-charset-to-coding-system charset)
+ (progn
+ (message "Warning: unknown coding system \"%s\""
+ charset)
+ nil))))))))
+
(defun sgml-xml-auto-coding-function (size)
"Determine whether the buffer is XML, and if so, its encoding.
This function is intended to be added to `auto-coding-functions'."
[-- Attachment #3: mule-conf.el.po-coding.diff --]
[-- Type: text/plain, Size: 493 bytes --]
*** mule-conf.el.~1.82.~ 2005-08-04 06:28:21.000000000 +1000
--- mule-conf.el 2005-10-20 17:45:02.000000000 +1000
***************
*** 502,508 ****
;; the beginning of a doc string, work.
("\\(\\`\\|/\\)loaddefs.el\\'" . (raw-text . raw-text-unix))
("\\.tar\\'" . (no-conversion . no-conversion))
- ( "\\.po[tx]?\\'\\|\\.po\\." . po-find-file-coding-system)
("\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'" . latexenc-find-file-coding-system)
("" . (undecided . nil))))
--- 502,507 ----
[-- Attachment #4: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-20 21:06 po file charset via auto-coding-functions Kevin Ryde
@ 2005-10-21 2:18 ` Kenichi Handa
2005-10-21 22:46 ` Kevin Ryde
2005-10-21 4:49 ` Richard M. Stallman
1 sibling, 1 reply; 42+ messages in thread
From: Kenichi Handa @ 2005-10-21 2:18 UTC (permalink / raw)
Cc: emacs-devel
In article <87zmp399ue.fsf@zip.com.au>, Kevin Ryde <user42@zip.com.au> writes:
> This is a proposal to get the coding system for a .po file via
> auto-coding-functions, instead of the way textmodes/po.el reads the
> file explicitly.
I agree that if .po file can be handled correctly in the
frame work of auto-coding-functions, it's a good change.
But, does the current method have any problem that can't be
fixed within po.el? If not, I think we should make such a
change after the current release.
> One possible problem is that po files can have more than 1024 bytes of
> comments before the header info block. I see fileio.c
> Finsert_file_contents only grabs 1024 bytes before calling
> set-auto-coding, but I can't tell if/when that happens. I think a
> normal visit or an `archive-extract' has the whole file, so they work.
Yes.
> "TCVN-5712" in gtk 1.2 vietnamese. Is there a good place to map or
> alias that to `tcvn' which emacs knows?
I've just added this line in lisp/langauge/vietnamese.el.
(define-coding-system-alias 'tcvn-5712 'vietnamese-tcvn)
> "iso-8859-9e" in gtk 1.2 Azerbaijani turkish, but I don't know what
> that charset is or is meant to be. glibc iconv doesn't seem to
> recognise it, so presumably it's unused.
I have no idea.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-20 21:06 po file charset via auto-coding-functions Kevin Ryde
2005-10-21 2:18 ` Kenichi Handa
@ 2005-10-21 4:49 ` Richard M. Stallman
2005-10-21 21:07 ` Kevin Ryde
1 sibling, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-10-21 4:49 UTC (permalink / raw)
Cc: emacs-devel
One possible problem is that po files can have more than 1024 bytes of
comments before the header info block. I see fileio.c
Finsert_file_contents only grabs 1024 bytes before calling
set-auto-coding, but I can't tell if/when that happens.
I think it ALWAYS happens. Every call to Finsert_file_contents will
try to determine the coding system from the first 1k and last 3k of
the file. (Unless it already knows the coding system to use.)
I think a
normal visit or an `archive-extract' has the whole file, so they work.
If Finsert_file_contents can't determine the coding system from that
part of the file, then if it is a normal visit (or if the buffer was
previously empty), it will try again after reading all the file. So I
guess this will work ok in that case.
But if it is inserting the file into a buffer that had other text in
it, or if it is inserting just part of the file, I think it won't
work.
To make it work reliably, therefore, I think the
Vset_auto_coding_function function has to look at more of the file.
If it is looking at a po file, it should do that.
This should not be too hard, since it gets the file name as an argument.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-21 4:49 ` Richard M. Stallman
@ 2005-10-21 21:07 ` Kevin Ryde
0 siblings, 0 replies; 42+ messages in thread
From: Kevin Ryde @ 2005-10-21 21:07 UTC (permalink / raw)
"Richard M. Stallman" <rms@gnu.org> writes:
>
> I think it ALWAYS happens. Every call to Finsert_file_contents will
> try to determine the coding system from the first 1k and last 3k of
> the file. (Unless it already knows the coding system to use.)
I tried this (in my build of the cvs starting from -q -no-site-file),
(debug-on-entry 'sgml-html-meta-auto-coding-function)
(find-file "etc/NEWS") ;; the emacs NEWS file
and got
Debugger entered--entering a function:
* sgml-html-meta-auto-coding-function(580713)
byte-code("\212eb\210^H\211A^P@ !)\207" [funcs size] 2)
find-auto-coding("/down/emacs/etc/NEWS" 580713)
set-auto-coding("/down/emacs/etc/NEWS" 580713)
insert-file-contents("/down/emacs/etc/NEWS" t)
ie. the size presented to the func (in just one call to it) is the
full 580kbytes (and the current-buffer has the full file contents).
So, like I say, I was unsure when this does or doesn't happen.
> To make it work reliably, therefore, I think the
> Vset_auto_coding_function function has to look at more of the file.
> If it is looking at a po file, it should do that.
Looking at more than 1k of the file will also be wanted for Project
Gutenberg texts. They have about a ~1k blurb at the start before the
header information that has the coding system spec.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-21 2:18 ` Kenichi Handa
@ 2005-10-21 22:46 ` Kevin Ryde
2005-10-22 1:43 ` Kenichi Handa
0 siblings, 1 reply; 42+ messages in thread
From: Kevin Ryde @ 2005-10-21 22:46 UTC (permalink / raw)
Kenichi Handa <handa@m17n.org> writes:
>
> But, does the current method have any problem that can't be
> fixed within po.el?
If you visit a po file from tar-mode or archive-mode (with "Ret", ie.
tar-extract or archive-extra), the coding system specified in the file
doesn't take effect. Sometimes other guessing seems to get the right
answer, but eg. "tcvn" doesn't work from a tar where it does work from
a plain file.
(I hope I'm right that auto-coding-functions is the place to get a
coding system from file contents.)
(Incidentally, latexenc-find-file-coding-system looks like it might be
another candidate for this in the future, except for the way it goes
looking for a top-level file.)
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-21 22:46 ` Kevin Ryde
@ 2005-10-22 1:43 ` Kenichi Handa
2005-10-22 2:01 ` Kevin Ryde
2005-10-22 22:51 ` Kevin Ryde
0 siblings, 2 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-22 1:43 UTC (permalink / raw)
Cc: emacs-devel
In article <87ll0ma3ow.fsf@zip.com.au>, Kevin Ryde <user42@zip.com.au> writes:
> If you visit a po file from tar-mode or archive-mode (with "Ret", ie.
> tar-extract or archive-extra), the coding system specified in the file
> doesn't take effect. Sometimes other guessing seems to get the right
> answer, but eg. "tcvn" doesn't work from a tar where it does work from
> a plain file.
It's a bug of tar-mode. Thank you for noticing it. I've
just installed a fix. But, I can't reproduce it in
archive-mode. Please send me a minimum *.zip file that
reproduces that bug.
> (I hope I'm right that auto-coding-functions is the place to get a
> coding system from file contents.)
Yes. But, file-coding-system-alist is also a place to
handle it.
And, functions in auto-coding-functions is called on any
file, but currently, po-find-file-coding-system is called
only on *.po file. I'm not sure which is better, but at
least there's a difference.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 1:43 ` Kenichi Handa
@ 2005-10-22 2:01 ` Kevin Ryde
2005-10-22 2:39 ` Kenichi Handa
2005-10-22 22:51 ` Kevin Ryde
1 sibling, 1 reply; 42+ messages in thread
From: Kevin Ryde @ 2005-10-22 2:01 UTC (permalink / raw)
Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 730 bytes --]
Kenichi Handa <handa@m17n.org> writes:
>
> It's a bug of tar-mode. Thank you for noticing it. I've
> just installed a fix. But, I can't reproduce it in
> archive-mode. Please send me a minimum *.zip file that
> reproduces that bug.
Sample x.tar and x.zip below, containing gtk 1.2 vi.po in tcvn-5712.
Visiting from the .zip gives an error,
Opening input file: no such file or directory, /tmp/vi.po
Visiting from the .tar gives no error but leaves C-h C saying
`raw-text-unix' and apparent garbage in the buffer. (I just
cvs-up'ed, not sure if my build is hitting your last change though.)
Be sure not to have a vi.po in the current directory, or the po
functions will open and use that, making it look like they work.
[-- Attachment #2: x.tar --]
[-- Type: application/x-tar, Size: 20480 bytes --]
[-- Attachment #3: x.zip --]
[-- Type: application/zip, Size: 3044 bytes --]
[-- Attachment #4: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 2:01 ` Kevin Ryde
@ 2005-10-22 2:39 ` Kenichi Handa
2005-10-22 2:50 ` Stefan Monnier
2005-10-22 15:51 ` Richard M. Stallman
0 siblings, 2 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-22 2:39 UTC (permalink / raw)
Cc: emacs-devel
In article <87fyqu9ung.fsf@zip.com.au>, Kevin Ryde <user42@zip.com.au> writes:
> Be sure not to have a vi.po in the current directory, or the po
> functions will open and use that, making it look like they work.
Ah! I see. That's why archive-mode succeeded in my
environment. Hmmm, it seems that you are right. There's no
way to handle a tared/archived file in a function registered
in file-coding-system-alist. So, we surely need your patch
to detect a coding of *.po file in such a case.
Richard, shall I install his patch?
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 2:39 ` Kenichi Handa
@ 2005-10-22 2:50 ` Stefan Monnier
2005-10-22 22:44 ` Kevin Ryde
2005-10-24 1:39 ` Kenichi Handa
2005-10-22 15:51 ` Richard M. Stallman
1 sibling, 2 replies; 42+ messages in thread
From: Stefan Monnier @ 2005-10-22 2:50 UTC (permalink / raw)
Cc: Kevin Ryde, emacs-devel
> environment. Hmmm, it seems that you are right. There's no
> way to handle a tared/archived file in a function registered
> in file-coding-system-alist.
Provide a file-name-handler for tar files and archives would work
around that problem.
Stefan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 2:39 ` Kenichi Handa
2005-10-22 2:50 ` Stefan Monnier
@ 2005-10-22 15:51 ` Richard M. Stallman
2005-10-24 2:05 ` Kenichi Handa
1 sibling, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-10-22 15:51 UTC (permalink / raw)
Cc: user42, emacs-devel
Ah! I see. That's why archive-mode succeeded in my
environment. Hmmm, it seems that you are right. There's no
way to handle a tared/archived file in a function registered
in file-coding-system-alist.
Could you explain what that means?
I don't entirely understand.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 2:50 ` Stefan Monnier
@ 2005-10-22 22:44 ` Kevin Ryde
2005-10-24 1:39 ` Kenichi Handa
1 sibling, 0 replies; 42+ messages in thread
From: Kevin Ryde @ 2005-10-22 22:44 UTC (permalink / raw)
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>
> Provide a file-name-handler for tar files and archives would work
> around that problem.
But I guess there's also jka-compr and the gnus mime part display
using this stuff (or should be using it).
archive-mode does add a archive-file-name-handler while extracting,
but it seems to only pretend the file exists. There must be a good
reason for that, but claiming the file exists makes
po-find-file-coding-system-guts try to open it, which gets an error.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 1:43 ` Kenichi Handa
2005-10-22 2:01 ` Kevin Ryde
@ 2005-10-22 22:51 ` Kevin Ryde
2005-10-24 1:53 ` Kenichi Handa
1 sibling, 1 reply; 42+ messages in thread
From: Kevin Ryde @ 2005-10-22 22:51 UTC (permalink / raw)
Kenichi Handa <handa@m17n.org> writes:
>
> And, functions in auto-coding-functions is called on any
> file, but currently, po-find-file-coding-system is called
> only on *.po file. I'm not sure which is better, but at
> least there's a difference.
I guess if the file contents are unambiguous, or near enough so, then
it may actually be a good thing to ignore the filename.
Maybe the auto-coding-functions could be called with the filename too,
so they could restrict themselves if they felt the need.
find-auto-coding has that available for the calls, or document that
`filename' is bound, or whatever.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 2:50 ` Stefan Monnier
2005-10-22 22:44 ` Kevin Ryde
@ 2005-10-24 1:39 ` Kenichi Handa
1 sibling, 0 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-24 1:39 UTC (permalink / raw)
Cc: user42, emacs-devel
In article <87k6g6e05k.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> environment. Hmmm, it seems that you are right. There's no
>> way to handle a tared/archived file in a function registered
>> in file-coding-system-alist.
> Provide a file-name-handler for tar files and archives would work
> around that problem.
Maybe, but I'm not sure. My gut feeling tells that it's not
easy to setup various handlers for an archive member already
setup in a (narrowed) buffer. We don't know what kind of
file operation a function in file-coding-system-alist
performs.
By the way, while considering the possibility of using
file-name-handler, I got this idea.
The correct operation in a handler for insert-file-contents
will be to find a buffer pretending to visit the file, and
insert that buffer contents. And, for that, we have to give
buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the
filename itself (e.g. vi.po) to
find-operation-coding-system. I think such a change is safe
because, at least, all current entries in
file-coding-system-alist checks only the tail of a filename.
But, if we have such a change, with a fairly simple change
to po.el, we can fix the current problem. So, I now propose
the attached change.
---
Kenichi Handa
handa@m17n.org
2005-10-24 Kenichi Handa <handa@m17n.org>
* arc-mode.el (archive-set-buffer-as-visiting-file): Give
buffer-file-name to find-operation-coding-system.
* tar-mode.el (tar-extract): Give buffer-file-name to
find-operation-coding-system.
* textmodes/po.el (po-find-charset): If there exists a buffer
visiting filename, check the contents of that buffer.
(po-find-file-coding-system-guts): Check if there exists a buffer
visiting filename.
Index: arc-mode.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/arc-mode.el,v
retrieving revision 1.68
diff -c -r1.68 arc-mode.el
*** arc-mode.el 16 Oct 2005 17:05:23 -0000 1.68
--- arc-mode.el 24 Oct 2005 01:33:13 -0000
***************
*** 877,883 ****
(let ((file-name-handler-alist
'(("" . archive-file-name-handler))))
(car (find-operation-coding-system 'insert-file-contents
! filename t))))))
(if (and (not coding-system-for-read)
(not enable-multibyte-characters))
(setq coding
--- 877,883 ----
(let ((file-name-handler-alist
'(("" . archive-file-name-handler))))
(car (find-operation-coding-system 'insert-file-contents
! buffer-file-name t))))))
(if (and (not coding-system-for-read)
(not enable-multibyte-characters))
(setq coding
Index: tar-mode.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/tar-mode.el,v
retrieving revision 1.103
diff -c -r1.103 tar-mode.el
*** tar-mode.el 22 Oct 2005 01:24:38 -0000 1.103
--- tar-mode.el 24 Oct 2005 01:33:14 -0000
***************
*** 737,743 ****
(funcall set-auto-coding-function
name (- (point-max) (point)))))
(car (find-operation-coding-system
! 'insert-file-contents name t))))
(multibyte enable-multibyte-characters)
(detected (detect-coding-region
(point-min)
--- 737,743 ----
(funcall set-auto-coding-function
name (- (point-max) (point)))))
(car (find-operation-coding-system
! 'insert-file-contents buffer-file-name t))))
(multibyte enable-multibyte-characters)
(detected (detect-coding-region
(point-min)
Index: textmodes/po.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/textmodes/po.el,v
retrieving revision 1.12
diff -c -r1.12 po.el
*** textmodes/po.el 6 Aug 2005 17:41:15 -0000 1.12
--- textmodes/po.el 24 Oct 2005 01:33:14 -0000
***************
*** 44,55 ****
"Return PO charset value for FILENAME."
(let ((charset-regexp
"^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"")
(short-read nil))
;; Try the first 4096 bytes. In case we cannot find the charset value
;; within the first 4096 bytes (the PO file might start with a long
;; comment) try the next 4096 bytes repeatedly until we'll know for sure
;; we've checked the empty header entry entirely.
! (while (not (or short-read (re-search-forward "^msgid" nil t)))
(save-excursion
(goto-char (point-max))
(let ((pair (insert-file-contents-literally filename nil
--- 44,59 ----
"Return PO charset value for FILENAME."
(let ((charset-regexp
"^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"")
+ (buf (get-file-buffer filename))
(short-read nil))
+ (when buf
+ (set-buffer buf)
+ (goto-char (point-min)))
;; Try the first 4096 bytes. In case we cannot find the charset value
;; within the first 4096 bytes (the PO file might start with a long
;; comment) try the next 4096 bytes repeatedly until we'll know for sure
;; we've checked the empty header entry entirely.
! (while (not (or short-read (re-search-forward "^msgid" nil t) buf))
(save-excursion
(goto-char (point-max))
(let ((pair (insert-file-contents-literally filename nil
***************
*** 57,63 ****
(1- (+ (point) 4096)))))
(setq short-read (< (nth 1 pair) 4096)))))
(cond ((re-search-forward charset-regexp nil t) (match-string 1))
! (short-read nil)
;; We've found the first msgid; maybe, only a part of the msgstr
;; value was loaded. Load the next 1024 bytes; if charset still
;; isn't available, give up.
--- 61,67 ----
(1- (+ (point) 4096)))))
(setq short-read (< (nth 1 pair) 4096)))))
(cond ((re-search-forward charset-regexp nil t) (match-string 1))
! ((or short-read buf) nil)
;; We've found the first msgid; maybe, only a part of the msgstr
;; value was loaded. Load the next 1024 bytes; if charset still
;; isn't available, give up.
***************
*** 74,80 ****
Do so according to FILENAME's declared charset."
(and
(eq operation 'insert-file-contents)
! (file-exists-p filename)
(with-temp-buffer
(let* ((coding-system-for-read 'no-conversion)
(charset (or (po-find-charset filename) "ascii"))
--- 78,84 ----
Do so according to FILENAME's declared charset."
(and
(eq operation 'insert-file-contents)
! (or (get-file-buffer filename) (file-exists-p filename))
(with-temp-buffer
(let* ((coding-system-for-read 'no-conversion)
(charset (or (po-find-charset filename) "ascii"))
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 22:51 ` Kevin Ryde
@ 2005-10-24 1:53 ` Kenichi Handa
2005-10-24 2:04 ` Kevin Ryde
0 siblings, 1 reply; 42+ messages in thread
From: Kenichi Handa @ 2005-10-24 1:53 UTC (permalink / raw)
Cc: emacs-devel
In article <87zmp1f9np.fsf@zip.com.au>, Kevin Ryde <user42@zip.com.au> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>
>> And, functions in auto-coding-functions is called on any
>> file, but currently, po-find-file-coding-system is called
>> only on *.po file. I'm not sure which is better, but at
>> least there's a difference.
> I guess if the file contents are unambiguous, or near enough so, then
> it may actually be a good thing to ignore the filename.
In general, I agree. But, for that, a function to detect a
coding system has to check if the contents is surely in an
expected format in advance. For instance, in the case of
po-find-file-coding-system, it has to check if the file is
surely PO file. But, it seems that the current po.el
doesn't do that but expects that it is called only on PO
file.
> Maybe the auto-coding-functions could be called with the filename too,
> so they could restrict themselves if they felt the need.
> find-auto-coding has that available for the calls, or document that
> `filename' is bound, or whatever.
Yes, if a function in auto-coding-functions can check a
filename too, the above problem disappears (or at least
Emacs can perform the same operation as now). But, I don't
want to change an API at this moment, especially when we can
fix the problem by the other method (as in my previous
mail).
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 1:53 ` Kenichi Handa
@ 2005-10-24 2:04 ` Kevin Ryde
2005-10-24 5:19 ` Kenichi Handa
0 siblings, 1 reply; 42+ messages in thread
From: Kevin Ryde @ 2005-10-24 2:04 UTC (permalink / raw)
Cc: emacs-devel
Kenichi Handa <handa@m17n.org> writes:
>
> But, it seems that the current po.el doesn't do that but expects
> that it is called only on PO file.
Yes, that's so. That's why I tightened it up with a looking-at to
demand a msgid "" + msgstr "", then the "Content-Type" in that header
block. It's always possible a non-po file could start that way, but
it should be unlikely.
> especially when we can fix the problem by the other method (as in my
> previous mail).
I think jka-compr is also affected, try visiting a compressed
vi.po.gz.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-22 15:51 ` Richard M. Stallman
@ 2005-10-24 2:05 ` Kenichi Handa
2005-10-25 15:59 ` Richard M. Stallman
` (2 more replies)
0 siblings, 3 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-24 2:05 UTC (permalink / raw)
Cc: user42, emacs-devel
In article <E1ETLeE-0006XR-KE@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
> Ah! I see. That's why archive-mode succeeded in my
> environment. Hmmm, it seems that you are right. There's no
> way to handle a tared/archived file in a function registered
> in file-coding-system-alist.
> Could you explain what that means?
> I don't entirely understand.
Let's assume that a tar file x.tar contains a file vi.po.
The current problem is that on extracting vi.po in tar-mode,
a function registered in file-coding-system-alist is called
with a filename "vi.po". So, that function can't check the
contents of that file because it can't read the contents by
insert-file-contents. But, po-find-file-coding-system have
to read the contents to determine a coding system.
The original suggestion by Kevin is to move
po-find-file-coding-system from file-coding-system-alist to
auto-coding-functions because, then,
po-find-file-coding-system can check the file contents even
for an achive file member.
Then, Stefan suggested to provide a proper file-name-handler
in tar-mode and arc-mode so that insert-file-contents should
correctly work even for an arhive file member.
But, I thought it's not easy, and proposed the other simpler
method in the last mail.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 2:04 ` Kevin Ryde
@ 2005-10-24 5:19 ` Kenichi Handa
2005-10-24 14:11 ` Stefan Monnier
2005-10-24 23:35 ` Juri Linkov
0 siblings, 2 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-24 5:19 UTC (permalink / raw)
Cc: emacs-devel
In article <87mzkz3c23.fsf@zip.com.au>, Kevin Ryde <user42@zip.com.au> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>
>> But, it seems that the current po.el doesn't do that but expects
>> that it is called only on PO file.
> Yes, that's so. That's why I tightened it up with a looking-at to
> demand a msgid "" + msgstr "", then the "Content-Type" in that header
> block. It's always possible a non-po file could start that way, but
> it should be unlikely.
I see.
>> especially when we can fix the problem by the other method (as in my
>> previous mail).
> I think jka-compr is also affected, try visiting a compressed
> vi.po.gz.
It seems that this additional patch will fix it. But I
admit that that this change is rather tricky compared with
the previous ones. :-(
*** jka-compr.el 08 Aug 2005 10:13:24 +0900 1.87
--- jka-compr.el 24 Oct 2005 11:38:27 +0900
***************
*** 500,509 ****
(delete-file local-copy)))
(unless notfound
(decode-coding-inserted-region
(point) (+ (point) size)
! (jka-compr-byte-compiler-base-file-name file)
! visit beg end replace))
(and
visit
--- 500,513 ----
(delete-file local-copy)))
(unless notfound
+ (let ((buffer-file-name
+ (concat file "!"
+ (jka-compr-byte-compiler-base-file-name file))))
+
(decode-coding-inserted-region
(point) (+ (point) size)
! buffer-file-name
! visit beg end replace)))
(and
visit
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 5:19 ` Kenichi Handa
@ 2005-10-24 14:11 ` Stefan Monnier
2005-10-25 1:03 ` Kenichi Handa
2005-10-24 23:35 ` Juri Linkov
1 sibling, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2005-10-24 14:11 UTC (permalink / raw)
Cc: Kevin Ryde, emacs-devel
>>> especially when we can fix the problem by the other method (as in my
>>> previous mail).
Passing the full file name seems like the right thing to do, indeed.
> It seems that this additional patch will fix it. But I
> admit that that this change is rather tricky compared with
> the previous ones. :-(
I don't see what's the trick. So I guess it deserves a comment.
Stefan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 5:19 ` Kenichi Handa
2005-10-24 14:11 ` Stefan Monnier
@ 2005-10-24 23:35 ` Juri Linkov
2005-10-25 6:42 ` Kenichi Handa
2005-10-25 20:27 ` Richard M. Stallman
1 sibling, 2 replies; 42+ messages in thread
From: Juri Linkov @ 2005-10-24 23:35 UTC (permalink / raw)
Cc: user42, emacs-devel
>> I think jka-compr is also affected, try visiting a compressed
>> vi.po.gz.
>
> It seems that this additional patch will fix it. But I
> admit that that this change is rather tricky compared with
> the previous ones. :-(
Please also pay attention to the related problem: .tar and .zip don't
support some Emacs features. For example, after (require 'generic-x)
visiting vi.po or vi.po.gz sets Default-Generic mode, but visiting the
vi.po file from x.tar or x.zip doesn't set this mode. Or after
(progn (setq-default save-place t) (require 'saveplace)) visiting
vi.po or vi.po.gz remembers point positions, but visiting them
from x.tar or x.zip doesn't.
These features work by putting the hook in `find-file-hook'.
Calling this hook on a visited archive file seems work for these
features, but is it the right way to do?
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 14:11 ` Stefan Monnier
@ 2005-10-25 1:03 ` Kenichi Handa
0 siblings, 0 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-25 1:03 UTC (permalink / raw)
Cc: user42, emacs-devel
In article <87k6g3atv0.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
>>>> especially when we can fix the problem by the other method (as in my
>>>> previous mail).
> Passing the full file name seems like the right thing to do, indeed.
What do you mean by "full file name"? Something like
"/home/handa/x.tar!vi.po" which my change passes?
>> It seems that this additional patch will fix it. But I
>> admit that that this change is rather tricky compared with
>> the previous ones. :-(
> I don't see what's the trick. So I guess it deserves a comment.
It temporarily binds buffer-file-name to
"/home/handa/vi.po.gz!/home/handa/vi.po" and passes that
name so that a function can find the current buffer by
get-file-buffer. I think such a way of using
buffer-file-name is rather tricky. Anyway, yes, I'll add
more comments if my change is approved.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 23:35 ` Juri Linkov
@ 2005-10-25 6:42 ` Kenichi Handa
2005-10-25 20:27 ` Richard M. Stallman
1 sibling, 0 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-10-25 6:42 UTC (permalink / raw)
Cc: user42, emacs-devel
In article <8764rmqyhx.fsf@jurta.org>, Juri Linkov <juri@jurta.org> writes:
> Please also pay attention to the related problem: .tar and .zip don't
> support some Emacs features. For example, after (require 'generic-x)
> visiting vi.po or vi.po.gz sets Default-Generic mode, but visiting the
> vi.po file from x.tar or x.zip doesn't set this mode. Or after
> (progn (setq-default save-place t) (require 'saveplace)) visiting
> vi.po or vi.po.gz remembers point positions, but visiting them
> from x.tar or x.zip doesn't.
> These features work by putting the hook in `find-file-hook'.
> Calling this hook on a visited archive file seems work for these
> features, but is it the right way to do?
I have no idea. As I've never used those mode, to work on
it, I must start from investigating how they work in a
normal case. I'd like to ask someone else who already knows
those code to work on it. At least, it seems that it
doesn't require "mule" knowledge.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 2:05 ` Kenichi Handa
@ 2005-10-25 15:59 ` Richard M. Stallman
2005-11-02 10:27 ` Richard Stallman
2005-11-10 2:09 ` Richard Stallman
2 siblings, 0 replies; 42+ messages in thread
From: Richard M. Stallman @ 2005-10-25 15:59 UTC (permalink / raw)
Cc: user42, emacs-devel
The current problem is that on extracting vi.po in tar-mode,
a function registered in file-coding-system-alist is called
with a filename "vi.po". So, that function can't check the
contents of that file because it can't read the contents by
insert-file-contents.
I see.
The correct operation in a handler for insert-file-contents
will be to find a buffer pretending to visit the file, and
insert that buffer contents. And, for that, we have to give
buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the
filename itself (e.g. vi.po) to
find-operation-coding-system. I think such a change is safe
because, at least, all current entries in
file-coding-system-alist checks only the tail of a filename.
If this means passing "/home/handa/x.tgz!vi.po" as the second argument
to find-operation-coding-system, I think it is not clean. That
argument is documented to be _the same_ in meaning as the argument you
would pass to OPERATION, but this string isn't the same, isn't a valid
argument to insert-file-contents.
If we want to modify the calling convention of
find-operation-coding-system, we should do it in a cleaner way: by
binding a global variable. That variable would be nil, normally.
I think the variable's value should be a buffer name.
If the variable is non-nil, it means "the file's contents
are already in this buffer". And all the functions that handle
find-operation-coding-system for various file types should check
the variable.
I think that is a fairly clean solution.
What do you think?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 23:35 ` Juri Linkov
2005-10-25 6:42 ` Kenichi Handa
@ 2005-10-25 20:27 ` Richard M. Stallman
1 sibling, 0 replies; 42+ messages in thread
From: Richard M. Stallman @ 2005-10-25 20:27 UTC (permalink / raw)
Cc: emacs-devel, user42, handa
These features work by putting the hook in `find-file-hook'.
Calling this hook on a visited archive file seems work for these
features, but is it the right way to do?
In theory it sounds right. It will probably fix more bugs than it
causes. So let's make that change.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 2:05 ` Kenichi Handa
2005-10-25 15:59 ` Richard M. Stallman
@ 2005-11-02 10:27 ` Richard Stallman
2005-11-10 2:09 ` Richard Stallman
2 siblings, 0 replies; 42+ messages in thread
From: Richard Stallman @ 2005-11-02 10:27 UTC (permalink / raw)
Cc: user42, emacs-devel
[I sent this message a week ago but did not get a response.
Could we get the discussion moving again?]
The current problem is that on extracting vi.po in tar-mode,
a function registered in file-coding-system-alist is called
with a filename "vi.po". So, that function can't check the
contents of that file because it can't read the contents by
insert-file-contents.
I see.
The correct operation in a handler for insert-file-contents
will be to find a buffer pretending to visit the file, and
insert that buffer contents. And, for that, we have to give
buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the
filename itself (e.g. vi.po) to
find-operation-coding-system. I think such a change is safe
because, at least, all current entries in
file-coding-system-alist checks only the tail of a filename.
If this means passing "/home/handa/x.tgz!vi.po" as the second argument
to find-operation-coding-system, I think it is not clean. That
argument is documented to be _the same_ in meaning as the argument you
would pass to OPERATION, but this string isn't the same, isn't a valid
argument to insert-file-contents.
If we want to modify the calling convention of
find-operation-coding-system, we should do it in a cleaner way: by
binding a global variable. That variable would be nil, normally.
I think the variable's value should be a buffer name.
If the variable is non-nil, it means "the file's contents
are already in this buffer". And all the functions that handle
find-operation-coding-system for various file types should check
the variable.
I think that is a fairly clean solution.
What do you think?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-10-24 2:05 ` Kenichi Handa
2005-10-25 15:59 ` Richard M. Stallman
2005-11-02 10:27 ` Richard Stallman
@ 2005-11-10 2:09 ` Richard Stallman
2005-11-10 3:49 ` Stefan Monnier
2 siblings, 1 reply; 42+ messages in thread
From: Richard Stallman @ 2005-11-10 2:09 UTC (permalink / raw)
Cc: user42, emacs-devel
[I sent this message twice but did not get a response.
Could we get the discussion moving again?]
The current problem is that on extracting vi.po in tar-mode,
a function registered in file-coding-system-alist is called
with a filename "vi.po". So, that function can't check the
contents of that file because it can't read the contents by
insert-file-contents.
I see.
The correct operation in a handler for insert-file-contents
will be to find a buffer pretending to visit the file, and
insert that buffer contents. And, for that, we have to give
buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the
filename itself (e.g. vi.po) to
find-operation-coding-system. I think such a change is safe
because, at least, all current entries in
file-coding-system-alist checks only the tail of a filename.
If this means passing "/home/handa/x.tgz!vi.po" as the second argument
to find-operation-coding-system, I think it is not clean. That
argument is documented to be _the same_ in meaning as the argument you
would pass to OPERATION, but this string isn't the same, isn't a valid
argument to insert-file-contents.
If we want to modify the calling convention of
find-operation-coding-system, we should do it in a cleaner way: by
binding a global variable. That variable would be nil, normally.
I think the variable's value should be a buffer name.
If the variable is non-nil, it means "the file's contents
are already in this buffer". And all the functions that handle
find-operation-coding-system for various file types should check
the variable.
I think that is a fairly clean solution.
What do you think?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-10 2:09 ` Richard Stallman
@ 2005-11-10 3:49 ` Stefan Monnier
2005-11-10 17:49 ` Richard M. Stallman
0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2005-11-10 3:49 UTC (permalink / raw)
Cc: emacs-devel, user42, Kenichi Handa
> If this means passing "/home/handa/x.tgz!vi.po" as the second argument
> to find-operation-coding-system, I think it is not clean. That
> argument is documented to be _the same_ in meaning as the argument you
> would pass to OPERATION, but this string isn't the same, isn't a valid
> argument to insert-file-contents.
IIUC currently the argument passed is "vi.po" but that is not the argument
passed to operation either. So passing "/home/handa/x.tgz!vi.po" instead
won't make it any worse. As a matter of fact I think passing
"/home/handa/x.tgz!vi.po" is more correct: if we assume that
a suitable file-name-handler is installed it's perfectly correct.
Stefan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-10 3:49 ` Stefan Monnier
@ 2005-11-10 17:49 ` Richard M. Stallman
2005-11-10 18:33 ` Stefan Monnier
0 siblings, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-11-10 17:49 UTC (permalink / raw)
Cc: emacs-devel, user42, handa
IIUC currently the argument passed is "vi.po" but that is not the argument
passed to operation either.
Perhaps we should fix that.
As a matter of fact I think passing
"/home/handa/x.tgz!vi.po" is more correct: if we assume that
a suitable file-name-handler is installed it's perfectly correct.
But I don't think we should install such a handler, because it would
be an incompatible change in file name syntax. And if we don't
install such a handler, the argument isn't correct at all.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-10 17:49 ` Richard M. Stallman
@ 2005-11-10 18:33 ` Stefan Monnier
2005-11-11 7:42 ` Richard M. Stallman
0 siblings, 1 reply; 42+ messages in thread
From: Stefan Monnier @ 2005-11-10 18:33 UTC (permalink / raw)
Cc: emacs-devel, user42, handa
> IIUC currently the argument passed is "vi.po" but that is not the argument
> passed to operation either.
> Perhaps we should fix that.
> As a matter of fact I think passing
> "/home/handa/x.tgz!vi.po" is more correct: if we assume that
> a suitable file-name-handler is installed it's perfectly correct.
> But I don't think we should install such a handler, because it would
> be an incompatible change in file name syntax.
Then maybe another syntax should be used, but currently that's the syntax
chosen: this is the value used for buffer-file-name. In any case even if
you don't like this syntax, it's not "incompatible" since all it takes is to
make sure the file-name-handler makes expand-file-name,
file-name-nondirectory, and friends work correctly.
Myself, I don't care much about which particular syntax is used but
simply that "/full/pseudoname/to/vi.po" is no worse than "vi.po".
> And if we don't install such a handler, the argument isn't correct at all.
But it currently isn't correct either. We're just passing more info.
Stefan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-10 18:33 ` Stefan Monnier
@ 2005-11-11 7:42 ` Richard M. Stallman
2005-11-18 13:08 ` Kenichi Handa
0 siblings, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-11-11 7:42 UTC (permalink / raw)
Cc: emacs-devel, user42, handa
> As a matter of fact I think passing
> "/home/handa/x.tgz!vi.po" is more correct: if we assume that
> a suitable file-name-handler is installed it's perfectly correct.
> But I don't think we should install such a handler, because it would
> be an incompatible change in file name syntax.
Then maybe another syntax should be used, but currently that's the syntax
chosen: this is the value used for buffer-file-name.
Does that file name get used for anything except to appear in the C_x
C-b listing and be helpful for the user? I think it does not.
The previous message said:
The correct operation in a handler for insert-file-contents
will be to find a buffer pretending to visit the file, and
insert that buffer contents. And, for that, we have to give
buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the
filename itself (e.g. vi.po) to
find-operation-coding-system.
Can you explain the context of this? Where is insert-file-contents
being called from, and with what file name argument?
How does it relate to this issue?
If it is simply a matter to call find-operation-coding-system here,
in tar-extract, then I agree it is ok to pass buffer-file-name.
;; We need to mimic the parts of insert-file-contents
;; which determine the coding-system and decode the text.
(let ((coding
(or coding-system-for-read
(and set-auto-coding-function
(save-excursion
(funcall set-auto-coding-function
name (- (point-max) (point)))))
(car (find-operation-coding-system
'insert-file-contents name t))))
But since that isn't a true valid file name, it needs a comment
to explain why calling find-operation-coding-system in a way
contrary to its rules nonetheless gives correct results.
And maybe we need to document this also in the place that defines
find-operation-coding-system and the functions it calls, so that people
will not change them in a way that breaks this code.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-11 7:42 ` Richard M. Stallman
@ 2005-11-18 13:08 ` Kenichi Handa
2005-11-18 17:21 ` Stefan Monnier
2005-11-19 23:27 ` Richard M. Stallman
0 siblings, 2 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-11-18 13:08 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
Sorry for the late response on this matter. I was just back
from Hanoi; Vietnamese foods were very good. :-)
In article <E1EaTYC-0001cx-Rn@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
>> As a matter of fact I think passing
>> "/home/handa/x.tgz!vi.po" is more correct: if we assume that
>> a suitable file-name-handler is installed it's perfectly correct.
>> But I don't think we should install such a handler, because it would
>> be an incompatible change in file name syntax.
> Then maybe another syntax should be used, but currently that's the syntax
> chosen: this is the value used for buffer-file-name.
> Does that file name get used for anything except to appear in the C_x
> C-b listing and be helpful for the user? I think it does not.
I don't know exactly which command uses it, but it is used
by any operations that call get-file-buffer.
> The previous message said:
> The correct operation in a handler for insert-file-contents
> will be to find a buffer pretending to visit the file, and
> insert that buffer contents. And, for that, we have to give
> buffer-file-name (e.g. /home/handa/x.tgz!vi.po") not the
> filename itself (e.g. vi.po) to
> find-operation-coding-system.
> Can you explain the context of this? Where is insert-file-contents
> being called from, and with what file name argument?
> How does it relate to this issue?
Here the "handler" means a function registered in
file-coding-system-alist (it's po-find-file-coding-system in
the current case).
> If it is simply a matter to call find-operation-coding-system here,
> in tar-extract, then I agree it is ok to pass buffer-file-name.
Yes, that is what the change I propsed for an archived file
does.
And, for a compressed file, I proposed this change.
*** jka-compr.el 08 Aug 2005 10:13:24 +0900 1.87
--- jka-compr.el 24 Oct 2005 11:38:27 +0900
***************
*** 500,509 ****
(delete-file local-copy)))
(unless notfound
(decode-coding-inserted-region
(point) (+ (point) size)
! (jka-compr-byte-compiler-base-file-name file)
! visit beg end replace))
(and
visit
--- 500,513 ----
(delete-file local-copy)))
(unless notfound
+ (let ((buffer-file-name
+ (concat file "!"
+ (jka-compr-byte-compiler-base-file-name file))))
+
(decode-coding-inserted-region
(point) (+ (point) size)
! buffer-file-name
! visit beg end replace)))
(and
visit
As you see, this binds buffer-file-name temporarily to
something like /home/handa/temp.po.gz!/home/handa/temp.po so
that find-operation-coding-system (called in
decode-coding-inserted-region) can surely find
po-find-file-coding-system to be called, and it can surely
find the current buffer by get-file-buffer.
Do you agree with this change too (of course provided that
more comments are added)?
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-18 13:08 ` Kenichi Handa
@ 2005-11-18 17:21 ` Stefan Monnier
2005-11-19 0:30 ` Kenichi Handa
2005-11-20 1:16 ` Juri Linkov
2005-11-19 23:27 ` Richard M. Stallman
1 sibling, 2 replies; 42+ messages in thread
From: Stefan Monnier @ 2005-11-18 17:21 UTC (permalink / raw)
Cc: user42, rms, emacs-devel
> As you see, this binds buffer-file-name temporarily to
> something like /home/handa/temp.po.gz!/home/handa/temp.po so
> that find-operation-coding-system (called in
> decode-coding-inserted-region) can surely find
> po-find-file-coding-system to be called, and it can surely
> find the current buffer by get-file-buffer.
Hmm... that's not what I had understood. The above looks fishy.
Why not pass /home/handa/temp.po to find-operation-coding-system (just as
it is used for auto-mode-alist)?
Stefan
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-18 17:21 ` Stefan Monnier
@ 2005-11-19 0:30 ` Kenichi Handa
2005-11-20 1:16 ` Juri Linkov
1 sibling, 0 replies; 42+ messages in thread
From: Kenichi Handa @ 2005-11-19 0:30 UTC (permalink / raw)
Cc: user42, rms, emacs-devel
In article <87sltt98l6.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> As you see, this binds buffer-file-name temporarily to
>> something like /home/handa/temp.po.gz!/home/handa/temp.po so
>> that find-operation-coding-system (called in
>> decode-coding-inserted-region) can surely find
>> po-find-file-coding-system to be called, and it can surely
>> find the current buffer by get-file-buffer.
> Hmm... that's not what I had understood. The above looks fishy.
I agree, so I wrote it's more tricky.
> Why not pass /home/handa/temp.po to find-operation-coding-system (just as
> it is used for auto-mode-alist)?
Very unlikely but a user may be visiting that file in a
different buffer.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-18 13:08 ` Kenichi Handa
2005-11-18 17:21 ` Stefan Monnier
@ 2005-11-19 23:27 ` Richard M. Stallman
2005-11-20 12:05 ` Kenichi Handa
1 sibling, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-11-19 23:27 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
I don't know exactly which command uses it, but it is used
by any operations that call get-file-buffer.
Most callers of get-file-buffer pass a real file name. So unless
someone uses a file whose name resembles that of a tar subfile,
get-file-buffer will never return one of these buffers, and I think
that is the correct result.
Thus, looking at my question:
> Does that file name get used for anything except to appear in the C_x
> C-b listing and be helpful for the user? I think it does not.
I think I was right--it is not used for anything else, or at least,
not unless a problem is occurring.
Therefore, we do not want to install any file name handler for this syntax.
> If it is simply a matter to call find-operation-coding-system here,
> in tar-extract, then I agree it is ok to pass buffer-file-name.
Yes, that is what the change I propsed for an archived file
does.
Ok, please make that change.
+ (let ((buffer-file-name
+ (concat file "!"
+ (jka-compr-byte-compiler-base-file-name file))))
+
Binding buffer-file-name is rather unclean.
And I don't see a reason to do it.
so that find-operation-coding-system (called in
decode-coding-inserted-region) can surely find
po-find-file-coding-system to be called, and it can surely
find the current buffer by get-file-buffer.
decode-coding-inserted-region passes its FILENAME arg to
find-operation-coding-system. So all you need to do is to pass
this funny file name to decode-coding-inserted-region.
There is no need to bind buffer-file-name.
Why are you concerned about whether get-file-buffer can be used with
this funny file name?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-18 17:21 ` Stefan Monnier
2005-11-19 0:30 ` Kenichi Handa
@ 2005-11-20 1:16 ` Juri Linkov
2005-11-29 19:13 ` Kevin Rodgers
1 sibling, 1 reply; 42+ messages in thread
From: Juri Linkov @ 2005-11-20 1:16 UTC (permalink / raw)
Cc: emacs-devel, user42, rms, handa
>> As you see, this binds buffer-file-name temporarily to
>> something like /home/handa/temp.po.gz!/home/handa/temp.po so
>> that find-operation-coding-system (called in
>> decode-coding-inserted-region) can surely find
>> po-find-file-coding-system to be called, and it can surely
>> find the current buffer by get-file-buffer.
>
> Hmm... that's not what I had understood. The above looks fishy.
> Why not pass /home/handa/temp.po to find-operation-coding-system (just as
> it is used for auto-mode-alist)?
Wouldn't it be better to use the same syntax for members of
all supported archivers (gzip, zip, tar)?
Currently `buffer-file-name' of the visited archive member
uses three different syntaxes:
"/home/handa/temp.po.gz"
"/home/handa/temp.zip:vi.po"
"/home/handa/temp.tar!vi.po"
After choosing one syntax, `buffer-file-name' would be:
"/home/handa/temp.po.gz!temp.po"
"/home/handa/temp.zip!vi.po"
"/home/handa/temp.tar!vi.po"
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-19 23:27 ` Richard M. Stallman
@ 2005-11-20 12:05 ` Kenichi Handa
2005-12-28 17:01 ` Richard M. Stallman
0 siblings, 1 reply; 42+ messages in thread
From: Kenichi Handa @ 2005-11-20 12:05 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
In article <E1Edc6s-0002BI-Ax@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
> Binding buffer-file-name is rather unclean.
> And I don't see a reason to do it.
> so that find-operation-coding-system (called in
> decode-coding-inserted-region) can surely find
> po-find-file-coding-system to be called, and it can surely
> find the current buffer by get-file-buffer.
> decode-coding-inserted-region passes its FILENAME arg to
> find-operation-coding-system. So all you need to do is to pass
> this funny file name to decode-coding-inserted-region.
> There is no need to bind buffer-file-name.
> Why are you concerned about whether get-file-buffer can be used with
> this funny file name?
A function registered in find-operation-coding-system have
to find which buffer is pretending to visit FILENAME if
FILENAME doesn't exist. And, get-file-buffer is the only
(or at least the very natural) way for that.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-20 1:16 ` Juri Linkov
@ 2005-11-29 19:13 ` Kevin Rodgers
2005-11-30 2:45 ` Juri Linkov
2005-11-30 19:01 ` Richard M. Stallman
0 siblings, 2 replies; 42+ messages in thread
From: Kevin Rodgers @ 2005-11-29 19:13 UTC (permalink / raw)
Juri Linkov wrote:
> Wouldn't it be better to use the same syntax for members of
> all supported archivers (gzip, zip, tar)?
>
> Currently `buffer-file-name' of the visited archive member
> uses three different syntaxes:
>
> "/home/handa/temp.po.gz"
> "/home/handa/temp.zip:vi.po"
> "/home/handa/temp.tar!vi.po"
>
> After choosing one syntax, `buffer-file-name' would be:
>
> "/home/handa/temp.po.gz!temp.po"
> "/home/handa/temp.zip!vi.po"
> "/home/handa/temp.tar!vi.po"
Or how about a URL-compatible syntax:
arc://home/handa/temp.po.gz#temp.po
arc://home/handa/temp.gip#vi.po
arc://home/handa/temp.tar#vi.po
--
Kevin Rodgers
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-29 19:13 ` Kevin Rodgers
@ 2005-11-30 2:45 ` Juri Linkov
2005-11-30 19:01 ` Richard M. Stallman
1 sibling, 0 replies; 42+ messages in thread
From: Juri Linkov @ 2005-11-30 2:45 UTC (permalink / raw)
Cc: emacs-devel
>> Wouldn't it be better to use the same syntax for members of
>> all supported archivers (gzip, zip, tar)?
>> Currently `buffer-file-name' of the visited archive member
>> uses three different syntaxes:
>> "/home/handa/temp.po.gz"
>> "/home/handa/temp.zip:vi.po"
>> "/home/handa/temp.tar!vi.po"
>> After choosing one syntax, `buffer-file-name' would be:
>> "/home/handa/temp.po.gz!temp.po"
>> "/home/handa/temp.zip!vi.po"
>> "/home/handa/temp.tar!vi.po"
>
> Or how about a URL-compatible syntax:
>
> arc://home/handa/temp.po.gz#temp.po
> arc://home/handa/temp.gip#vi.po
> arc://home/handa/temp.tar#vi.po
There is no such standard URI scheme as `arc'. But perhaps `#' in
`file' is appropriate to address archive members:
file://home/handa/temp.po.gz#temp.po
file://home/handa/temp.gip#vi.po
file://home/handa/temp.tar#vi.po
--
Juri Linkov
http://www.jurta.org/emacs/
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-29 19:13 ` Kevin Rodgers
2005-11-30 2:45 ` Juri Linkov
@ 2005-11-30 19:01 ` Richard M. Stallman
1 sibling, 0 replies; 42+ messages in thread
From: Richard M. Stallman @ 2005-11-30 19:01 UTC (permalink / raw)
Cc: emacs-devel
Or how about a URL-compatible syntax:
arc://home/handa/temp.po.gz#temp.po
arc://home/handa/temp.gip#vi.po
arc://home/handa/temp.tar#vi.po
URLs are not file names. No thanks.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-11-20 12:05 ` Kenichi Handa
@ 2005-12-28 17:01 ` Richard M. Stallman
2005-12-29 11:47 ` Kenichi Handa
0 siblings, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-12-28 17:01 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
Please forgive the delay in my response.
> Binding buffer-file-name is rather unclean.
> And I don't see a reason to do it.
> so that find-operation-coding-system (called in
> decode-coding-inserted-region) can surely find
> po-find-file-coding-system to be called, and it can surely
> find the current buffer by get-file-buffer.
> decode-coding-inserted-region passes its FILENAME arg to
> find-operation-coding-system. So all you need to do is to pass
> this funny file name to decode-coding-inserted-region.
> There is no need to bind buffer-file-name.
> Why are you concerned about whether get-file-buffer can be used with
> this funny file name?
A function registered in find-operation-coding-system have
to find which buffer is pretending to visit FILENAME if
FILENAME doesn't exist. And, get-file-buffer is the only
(or at least the very natural) way for that.
Could you give me an example or two?
Which filename handler function does this?
Looking at an example, I can understand the issue.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-12-28 17:01 ` Richard M. Stallman
@ 2005-12-29 11:47 ` Kenichi Handa
2005-12-30 2:18 ` Richard M. Stallman
0 siblings, 1 reply; 42+ messages in thread
From: Kenichi Handa @ 2005-12-29 11:47 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
In article <E1Ereg8-0000Fq-AH@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
> Please forgive the delay in my response.
>> Binding buffer-file-name is rather unclean.
>> And I don't see a reason to do it.
>> so that find-operation-coding-system (called in
>> decode-coding-inserted-region) can surely find
>> po-find-file-coding-system to be called, and it can surely
>> find the current buffer by get-file-buffer.
>> decode-coding-inserted-region passes its FILENAME arg to
>> find-operation-coding-system. So all you need to do is to pass
>> this funny file name to decode-coding-inserted-region.
>> There is no need to bind buffer-file-name.
>> Why are you concerned about whether get-file-buffer can be used with
>> this funny file name?
> A function registered in find-operation-coding-system have
> to find which buffer is pretending to visit FILENAME if
> FILENAME doesn't exist. And, get-file-buffer is the only
> (or at least the very natural) way for that.
> Could you give me an example or two?
> Which filename handler function does this?
> Looking at an example, I can understand the issue.
The attached in the patch for po.el I posted. This change
utilizes get-file-buffer to check if there is a buffer
visiting (or pretending to visit) FILENAME.
---
Kenichi Handa
handa@m17n.org
*** po.el 08 Aug 2005 10:13:42 +0900 1.12
--- po.el 18 Nov 2005 21:08:50 +0900
***************
*** 44,55 ****
"Return PO charset value for FILENAME."
(let ((charset-regexp
"^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"")
(short-read nil))
;; Try the first 4096 bytes. In case we cannot find the charset value
;; within the first 4096 bytes (the PO file might start with a long
;; comment) try the next 4096 bytes repeatedly until we'll know for sure
;; we've checked the empty header entry entirely.
! (while (not (or short-read (re-search-forward "^msgid" nil t)))
(save-excursion
(goto-char (point-max))
(let ((pair (insert-file-contents-literally filename nil
--- 44,59 ----
"Return PO charset value for FILENAME."
(let ((charset-regexp
"^\"Content-Type:[ \t]*text/plain;[ \t]*charset=\\(.*\\)\\\\n\"")
+ (buf (get-file-buffer filename))
(short-read nil))
+ (when buf
+ (set-buffer buf)
+ (goto-char (point-min)))
;; Try the first 4096 bytes. In case we cannot find the charset value
;; within the first 4096 bytes (the PO file might start with a long
;; comment) try the next 4096 bytes repeatedly until we'll know for sure
;; we've checked the empty header entry entirely.
! (while (not (or short-read (re-search-forward "^msgid" nil t) buf))
(save-excursion
(goto-char (point-max))
(let ((pair (insert-file-contents-literally filename nil
***************
*** 57,63 ****
(1- (+ (point) 4096)))))
(setq short-read (< (nth 1 pair) 4096)))))
(cond ((re-search-forward charset-regexp nil t) (match-string 1))
! (short-read nil)
;; We've found the first msgid; maybe, only a part of the msgstr
;; value was loaded. Load the next 1024 bytes; if charset still
;; isn't available, give up.
--- 61,67 ----
(1- (+ (point) 4096)))))
(setq short-read (< (nth 1 pair) 4096)))))
(cond ((re-search-forward charset-regexp nil t) (match-string 1))
! ((or short-read buf) nil)
;; We've found the first msgid; maybe, only a part of the msgstr
;; value was loaded. Load the next 1024 bytes; if charset still
;; isn't available, give up.
***************
*** 74,80 ****
Do so according to FILENAME's declared charset."
(and
(eq operation 'insert-file-contents)
! (file-exists-p filename)
(with-temp-buffer
(let* ((coding-system-for-read 'no-conversion)
(charset (or (po-find-charset filename) "ascii"))
--- 78,84 ----
Do so according to FILENAME's declared charset."
(and
(eq operation 'insert-file-contents)
! (or (get-file-buffer filename) (file-exists-p filename))
(with-temp-buffer
(let* ((coding-system-for-read 'no-conversion)
(charset (or (po-find-charset filename) "ascii"))
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-12-29 11:47 ` Kenichi Handa
@ 2005-12-30 2:18 ` Richard M. Stallman
2006-01-04 4:37 ` Kenichi Handa
0 siblings, 1 reply; 42+ messages in thread
From: Richard M. Stallman @ 2005-12-30 2:18 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
> Could you give me an example or two?
> Which filename handler function does this?
> Looking at an example, I can understand the issue.
We are failing to communicate. Your patch is a change in
po-find-charset. po-find-charset is not a filename handler function.
You wrote:
> A function registered in find-operation-coding-system have
> to find which buffer is pretending to visit FILENAME if
> FILENAME doesn't exist. And, get-file-buffer is the only
> (or at least the very natural) way for that.
What's the name of one function registered in find-operation-coding-system
which needs to do this?
Does that function call po-find-charset? If so, could you show me
the sequence of function calls that gets from there to po-find-charset?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: po file charset via auto-coding-functions
2005-12-30 2:18 ` Richard M. Stallman
@ 2006-01-04 4:37 ` Kenichi Handa
0 siblings, 0 replies; 42+ messages in thread
From: Kenichi Handa @ 2006-01-04 4:37 UTC (permalink / raw)
Cc: user42, monnier, emacs-devel
In article <E1Es9qt-000650-Ko@fencepost.gnu.org>, "Richard M. Stallman" <rms@gnu.org> writes:
>> Could you give me an example or two?
>> Which filename handler function does this?
>> Looking at an example, I can understand the issue.
> We are failing to communicate. Your patch is a change in
> po-find-charset. po-find-charset is not a filename handler function.
The patch contains also a change in
po-find-file-coding-system-guts which is called from
po-find-file-coding-system. And po-find-file-coding-system
is a function registered in file-coding-system-alist and
thus is called from find-operation-coding-system. I thought
you meant such a function by "filename handler function" in
the current context.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2006-01-04 4:37 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-20 21:06 po file charset via auto-coding-functions Kevin Ryde
2005-10-21 2:18 ` Kenichi Handa
2005-10-21 22:46 ` Kevin Ryde
2005-10-22 1:43 ` Kenichi Handa
2005-10-22 2:01 ` Kevin Ryde
2005-10-22 2:39 ` Kenichi Handa
2005-10-22 2:50 ` Stefan Monnier
2005-10-22 22:44 ` Kevin Ryde
2005-10-24 1:39 ` Kenichi Handa
2005-10-22 15:51 ` Richard M. Stallman
2005-10-24 2:05 ` Kenichi Handa
2005-10-25 15:59 ` Richard M. Stallman
2005-11-02 10:27 ` Richard Stallman
2005-11-10 2:09 ` Richard Stallman
2005-11-10 3:49 ` Stefan Monnier
2005-11-10 17:49 ` Richard M. Stallman
2005-11-10 18:33 ` Stefan Monnier
2005-11-11 7:42 ` Richard M. Stallman
2005-11-18 13:08 ` Kenichi Handa
2005-11-18 17:21 ` Stefan Monnier
2005-11-19 0:30 ` Kenichi Handa
2005-11-20 1:16 ` Juri Linkov
2005-11-29 19:13 ` Kevin Rodgers
2005-11-30 2:45 ` Juri Linkov
2005-11-30 19:01 ` Richard M. Stallman
2005-11-19 23:27 ` Richard M. Stallman
2005-11-20 12:05 ` Kenichi Handa
2005-12-28 17:01 ` Richard M. Stallman
2005-12-29 11:47 ` Kenichi Handa
2005-12-30 2:18 ` Richard M. Stallman
2006-01-04 4:37 ` Kenichi Handa
2005-10-22 22:51 ` Kevin Ryde
2005-10-24 1:53 ` Kenichi Handa
2005-10-24 2:04 ` Kevin Ryde
2005-10-24 5:19 ` Kenichi Handa
2005-10-24 14:11 ` Stefan Monnier
2005-10-25 1:03 ` Kenichi Handa
2005-10-24 23:35 ` Juri Linkov
2005-10-25 6:42 ` Kenichi Handa
2005-10-25 20:27 ` Richard M. Stallman
2005-10-21 4:49 ` Richard M. Stallman
2005-10-21 21:07 ` Kevin Ryde
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).