unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#5553: 23.1.92; Archives with wrong coding system
@ 2010-02-09 21:19 Juri Linkov
  2010-02-09 22:19 ` Juri Linkov
  2010-02-09 22:34 ` Eli Zaretskii
  0 siblings, 2 replies; 7+ messages in thread
From: Juri Linkov @ 2010-02-09 21:19 UTC (permalink / raw)
  To: 5553

When `archive-mode' is enabled for an archive file with an unknown file
extension, using the rule ("\\(PK00\\)?[P]K\003\004" . archive-mode)
from `magic-fallback-mode-alist', visiting such a file fails with the
args-out-of-range error.

The following patch should fix this bug using the same regexp as in
`magic-fallback-mode-alist' and the same coding system as for archive
file extensions in `auto-coding-alist':

=== modified file 'lisp/international/mule.el'
--- lisp/international/mule.el	2010-02-01 22:57:45 +0000
+++ lisp/international/mule.el	2010-02-09 21:18:51 +0000
@@ -1653,7 +1653,9 @@ (defcustom auto-coding-regexp-alist
     ("\\`\xFE\xFF" . utf-16be-with-signature)
     ("\\`\xFF\xFE" . utf-16le-with-signature)
     ("\\`\xEF\xBB\xBF" . utf-8-with-signature)
-    ("\\`;ELC\024\0\0\0" . emacs-mule)))	; Emacs 20-compiled
+    ("\\`;ELC\024\0\0\0" . emacs-mule)	; Emacs 20-compiled
+    ;; For `archive-mode' in `magic-fallback-mode-alist':
+    ("\\(PK00\\)?[P]K\003\004" . no-conversion-multibyte)))
   "Alist of patterns vs corresponding coding systems.
 Each element looks like (REGEXP . CODING-SYSTEM).
 A file whose first bytes match REGEXP is decoded by CODING-SYSTEM on reading.

-- 
Juri Linkov
http://www.jurta.org/emacs/







^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#5553: 23.1.92; Archives with wrong coding system
  2010-02-09 21:19 bug#5553: 23.1.92; Archives with wrong coding system Juri Linkov
@ 2010-02-09 22:19 ` Juri Linkov
  2010-02-09 22:34 ` Eli Zaretskii
  1 sibling, 0 replies; 7+ messages in thread
From: Juri Linkov @ 2010-02-09 22:19 UTC (permalink / raw)
  To: 5553

> When `archive-mode' is enabled for an archive file with an unknown file
> extension, using the rule ("\\(PK00\\)?[P]K\003\004" . archive-mode)
> from `magic-fallback-mode-alist', visiting such a file fails with the
> args-out-of-range error.
>
> The following patch should fix this bug using the same regexp as in
> `magic-fallback-mode-alist' and the same coding system as for archive
> file extensions in `auto-coding-alist':

The same problem exists also for images.  `magic-fallback-mode-alist' contains:

  (image-type-auto-detected-p . image-mode)

but visiting an image file with a non-standard file extension
(i.e. not in `auto-mode-alist') doesn't display it as an image.

The following patch fixes this problem, but it seems duplicating
image regexps from `image-type-header-regexps' is too ugly?

=== modified file 'lisp/international/mule.el'
--- lisp/international/mule.el	2010-02-09 05:00:56 +0000
+++ lisp/international/mule.el	2010-02-09 22:16:28 +0000
@@ -1655,7 +1655,14 @@ (defcustom auto-coding-regexp-alist
     ("\\`\xEF\xBB\xBF" . utf-8-with-signature)
     ("\\`;ELC\024\0\0\0" . emacs-mule)	; Emacs 20-compiled
     ;; For `archive-mode' in `magic-fallback-mode-alist':
-    ("\\(PK00\\)?[P]K\003\004" . no-conversion-multibyte)))
+    ("\\(PK00\\)?[P]K\003\004" . no-conversion-multibyte)
+    ;; For `image-mode' in `magic-fallback-mode-alist'
+    ;; (regexps duplicated from `image-type-header-regexps'):
+    ("\\`GIF8[79]a"                 . no-conversion) ; gif
+    ("\\`\x89PNG\r\n\x1a\n"         . no-conversion) ; png
+    ("\\`\\(?:MM\0\\*\\|II\\*\0\\)" . no-conversion) ; tiff
+    ("\\`\xff\xd8"                  . no-conversion) ; jpeg
+    ))
   "Alist of patterns vs corresponding coding systems.
 Each element looks like (REGEXP . CODING-SYSTEM).
 A file whose first bytes match REGEXP is decoded by CODING-SYSTEM on reading.

-- 
Juri Linkov
http://www.jurta.org/emacs/






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#5553: 23.1.92; Archives with wrong coding system
  2010-02-09 21:19 bug#5553: 23.1.92; Archives with wrong coding system Juri Linkov
  2010-02-09 22:19 ` Juri Linkov
@ 2010-02-09 22:34 ` Eli Zaretskii
  2010-02-10  0:09   ` Juri Linkov
  1 sibling, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2010-02-09 22:34 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 5553

> From: Juri Linkov <juri@jurta.org>
> Date: Tue, 09 Feb 2010 23:19:27 +0200
> Cc: 
> 
> When `archive-mode' is enabled for an archive file with an unknown file
> extension, using the rule ("\\(PK00\\)?[P]K\003\004" . archive-mode)
> from `magic-fallback-mode-alist', visiting such a file fails with the
> args-out-of-range error.
> 
> The following patch should fix this bug using the same regexp as in
> `magic-fallback-mode-alist' and the same coding system as for archive
> file extensions in `auto-coding-alist':

Thanks, but please provide a self-contained recipe for reproducing the
problem, starting with "emacs -Q".






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#5553: 23.1.92; Archives with wrong coding system
  2010-02-09 22:34 ` Eli Zaretskii
@ 2010-02-10  0:09   ` Juri Linkov
  2010-02-10 20:14     ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2010-02-10  0:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 5553

> Thanks, but please provide a self-contained recipe for reproducing the
> problem, starting with "emacs -Q".

AFAICS, it is not reproducible with "emacs -Q" where visited archives
and images with non-standard file extensions are visited in proper modes.

The problem appears with using Unicad (http://code.google.com/p/unicad/).
Basically what is does boils down to the following line:

  (add-to-list 'auto-coding-functions 'unicad-universal-charset-detect)

The rest is just statistical guessing of the coding system based solely
on the content of the file, and in case of archives and images, the
guess is incorrect, and `magic-fallback-mode-alist' fails to match
a mode regexp at the beginning of the buffer.

So the question is whether we should complement entries in
`magic-fallback-mode-alist' with the corresponding entries in
`auto-coding-regexp-alist' with the same regexps (like we complement
entries in `auto-mode-alist' with entries in `auto-coding-alist')?

Or every function in `auto-coding-functions' that determines a coding system
should somehow take care of exceptions in `magic-fallback-mode-alist'?

-- 
Juri Linkov
http://www.jurta.org/emacs/






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#5553: 23.1.92; Archives with wrong coding system
  2010-02-10  0:09   ` Juri Linkov
@ 2010-02-10 20:14     ` Stefan Monnier
  2010-02-10 22:33       ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Monnier @ 2010-02-10 20:14 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 5553

> So the question is whether we should complement entries in
> `magic-fallback-mode-alist' with the corresponding entries in
> `auto-coding-regexp-alist' with the same regexps (like we complement
> entries in `auto-mode-alist' with entries in `auto-coding-alist')?

> Or every function in `auto-coding-functions' that determines a coding system
> should somehow take care of exceptions in `magic-fallback-mode-alist'?

I think that auto-coding-alist should allow mapping not only file-names
but also major modes to coding-systems.  This should hopefully take care
of those issues by mapping image-mode and archive-mode to no-conversion.


        Stefan






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#5553: 23.1.92; Archives with wrong coding system
  2010-02-10 20:14     ` Stefan Monnier
@ 2010-02-10 22:33       ` Juri Linkov
  2010-02-11  2:12         ` Stefan Monnier
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2010-02-10 22:33 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 5553

>> So the question is whether we should complement entries in
>> `magic-fallback-mode-alist' with the corresponding entries in
>> `auto-coding-regexp-alist' with the same regexps (like we complement
>> entries in `auto-mode-alist' with entries in `auto-coding-alist')?
>
>> Or every function in `auto-coding-functions' that determines a coding system
>> should somehow take care of exceptions in `magic-fallback-mode-alist'?
>
> I think that auto-coding-alist should allow mapping not only file-names
> but also major modes to coding-systems.  This should hopefully take care
> of those issues by mapping image-mode and archive-mode to no-conversion.

I don't understand how this is possible because currently a coding system
should be recognized before mode is chosen:

1. Recognizing Coding Systems
1.1. coding-system-for-read if non-nil
1.2. auto-coding-alist matching a filename
1.3. auto-coding-regexp-alist matching first bytes
1.4. `-*- coding: -*-' tag
1.5. auto-coding-functions (e.g. unicad-universal-charset-detect)
1.6. file-coding-system-alist matching a filename

2. Choosing Modes
2.1. `-*- mode: -*-' tag
2.2. interpreter-mode-alist
2.3. magic-mode-alist
2.4. auto-mode-alist
2.5. magic-fallback-mode-alist

-- 
Juri Linkov
http://www.jurta.org/emacs/






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#5553: 23.1.92; Archives with wrong coding system
  2010-02-10 22:33       ` Juri Linkov
@ 2010-02-11  2:12         ` Stefan Monnier
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2010-02-11  2:12 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 5553

>> I think that auto-coding-alist should allow mapping not only file-names
>> but also major modes to coding-systems.  This should hopefully take care
>> of those issues by mapping image-mode and archive-mode to no-conversion.
> I don't understand how this is possible because currently a coding system
> should be recognized before mode is chosen:

This is the reason why my suggestion did not come with a patch ;-)
This said, I don't think it's impossible, but it would require
a reorganization indeed.


        Stefan






^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-02-11  2:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-09 21:19 bug#5553: 23.1.92; Archives with wrong coding system Juri Linkov
2010-02-09 22:19 ` Juri Linkov
2010-02-09 22:34 ` Eli Zaretskii
2010-02-10  0:09   ` Juri Linkov
2010-02-10 20:14     ` Stefan Monnier
2010-02-10 22:33       ` Juri Linkov
2010-02-11  2:12         ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).