From: "Mattias Engdegård" <mattiase@acm.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 40407@debbugs.gnu.org, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Mon, 6 Apr 2020 17:56:27 +0200 [thread overview]
Message-ID: <5D4B264A-C43B-4CEE-91DE-760AEBE80671@acm.org> (raw)
In-Reply-To: <835zecsnip.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 743 bytes --]
6 apr. 2020 kl. 16.21 skrev Eli Zaretskii <eliz@gnu.org>:
> Kenichi, why was coding-type of UTF-7 systems set to 'utf-8'?
> Wouldn't it be better to set it to 'utf-16'? Or is there some
> subtlety here that we should be aware of? Do you have any comments on
> the patch below?
There is no reason why utf-7[-imap] should have utf-8 as coding-type, is there? utf-16 is definitely wrong (utf-7* are encoded in ASCII). What about the patch below instead?
By the way, there appears to be another, unrelated bug in utf-7-imap: According to RFC 2060, all C0 controls are base64-encoded, but in Emacs some of them are passed through unchanged (CR, LF and TAB). This is permitted by plain UTF-7 (RFC 1642) but not in the IMAP variant.
[-- Attachment #2: utf-7.diff --]
[-- Type: application/octet-stream, Size: 2474 bytes --]
diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index e6e6135243..c5cfbaeb87 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1511,20 +1511,25 @@ 'iso-safe
(define-coding-system 'utf-7
"UTF-7 encoding of Unicode (RFC 2152)."
- :coding-type 'utf-8
+ :coding-type 'charset
+ :charset-list '(ascii)
:mnemonic ?U
:mime-charset 'utf-7
- :charset-list '(unicode)
:pre-write-conversion 'utf-7-pre-write-conversion
:post-read-conversion 'utf-7-post-read-conversion)
+;; Having `ascii' in :charset-list automatically sets :ascii-compatible-p,
+;; but UTF-7 is not ASCII compatible; disable.
+(coding-system-put 'utf-7 :ascii-compatible-p nil)
(define-coding-system 'utf-7-imap
"UTF-7 encoding of Unicode, IMAP version (RFC 2060)"
- :coding-type 'utf-8
+ :coding-type 'charset
+ :charset-list '(ascii)
:mnemonic ?u
- :charset-list '(unicode)
:pre-write-conversion 'utf-7-imap-pre-write-conversion
:post-read-conversion 'utf-7-imap-post-read-conversion)
+;; See comment for utf-7 above.
+(coding-system-put 'utf-7-imap :ascii-compatible-p nil)
;; Use us-ascii for terminal output if some other coding system is not
;; specified explicitly.
diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el
index 91e3c2279f..b5fbb4ab8e 100644
--- a/test/lisp/international/mule-tests.el
+++ b/test/lisp/international/mule-tests.el
@@ -48,6 +48,19 @@ mule-cmds--test-universal-coding-system-argument
(append (kbd "C-x RET c u t f - 8 RET C-u C-u c a b RET") nil)))
(read-string "prompt:")))))
+(ert-deftest mule-utf-7 ()
+ ;; utf-7 and utf-7-imap are not ASCII-compatible.
+ (should-not (coding-system-get 'utf-7 :ascii-compatible-p))
+ (should-not (coding-system-get 'utf-7-imap :ascii-compatible-p))
+ ;; Invariant ASCII subset.
+ (let ((s (apply #'string (append (number-sequence #x20 #x25)
+ (number-sequence #x27 #x7e)))))
+ (should (equal (encode-coding-string s 'utf-7-imap) s))
+ (should (equal (decode-coding-string s 'utf-7-imap) s)))
+ ;; Escaped ampersand.
+ (should (equal (encode-coding-string "a&bcd" 'utf-7-imap) "a&-bcd"))
+ (should (equal (decode-coding-string "a&-bcd" 'utf-7-imap) "a&bcd")))
+
;; Stop "Local Variables" above causing confusion when visiting this file.
\f
[-- Attachment #3: Type: text/plain, Size: 2 bytes --]
next prev parent reply other threads:[~2020-04-06 15:56 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii
2020-04-03 22:32 ` Mattias Engdegård
2020-04-04 9:26 ` Eli Zaretskii
2020-04-04 16:41 ` Mattias Engdegård
2020-04-04 17:22 ` Eli Zaretskii
2020-04-04 17:37 ` Eli Zaretskii
2020-04-04 18:06 ` Mattias Engdegård
2020-04-05 2:37 ` Eli Zaretskii
2020-04-05 3:42 ` Eli Zaretskii
2020-04-05 10:14 ` Mattias Engdegård
2020-04-05 13:28 ` Eli Zaretskii
2020-04-05 13:40 ` Mattias Engdegård
2020-04-04 10:26 ` Eli Zaretskii
2020-04-04 16:55 ` Mattias Engdegård
2020-04-04 17:04 ` Eli Zaretskii
2020-04-04 18:01 ` Mattias Engdegård
2020-04-04 18:25 ` Eli Zaretskii
2020-04-05 10:48 ` Mattias Engdegård
2020-04-05 13:39 ` Eli Zaretskii
2020-04-05 15:03 ` Mattias Engdegård
2020-04-05 15:35 ` Mattias Engdegård
2020-04-05 15:56 ` Eli Zaretskii
2020-04-06 18:13 ` Mattias Engdegård
2020-04-05 16:00 ` Eli Zaretskii
2020-04-06 10:10 ` OGAWA Hirofumi
2020-04-06 14:21 ` Eli Zaretskii
2020-04-06 15:56 ` Mattias Engdegård [this message]
2020-04-06 16:33 ` Eli Zaretskii
2020-04-06 16:55 ` Mattias Engdegård
2020-04-06 17:18 ` Eli Zaretskii
2020-04-06 17:49 ` Mattias Engdegård
2020-04-06 18:20 ` Eli Zaretskii
2020-04-06 18:34 ` OGAWA Hirofumi
2020-04-06 21:57 ` Mattias Engdegård
2020-04-09 11:03 ` Mattias Engdegård
2020-04-09 14:09 ` Kazuhiro Ito
2020-04-09 14:22 ` Mattias Engdegård
2020-04-11 15:09 ` Mattias Engdegård
2020-04-16 13:11 ` handa
2020-04-16 13:44 ` Eli Zaretskii
2020-04-16 13:59 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5D4B264A-C43B-4CEE-91DE-760AEBE80671@acm.org \
--to=mattiase@acm.org \
--cc=40407@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=hirofumi@mail.parknet.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).