unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 40407@debbugs.gnu.org, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Mon, 6 Apr 2020 17:56:27 +0200	[thread overview]
Message-ID: <5D4B264A-C43B-4CEE-91DE-760AEBE80671@acm.org> (raw)
In-Reply-To: <835zecsnip.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

6 apr. 2020 kl. 16.21 skrev Eli Zaretskii <eliz@gnu.org>:

> Kenichi, why was coding-type of UTF-7 systems set to 'utf-8'?
> Wouldn't it be better to set it to 'utf-16'?  Or is there some
> subtlety here that we should be aware of?  Do you have any comments on
> the patch below?

There is no reason why utf-7[-imap] should have utf-8 as coding-type, is there? utf-16 is definitely wrong (utf-7* are encoded in ASCII). What about the patch below instead?

By the way, there appears to be another, unrelated bug in utf-7-imap: According to RFC 2060, all C0 controls are base64-encoded, but in Emacs some of them are passed through unchanged (CR, LF and TAB). This is permitted by plain UTF-7 (RFC 1642) but not in the IMAP variant.


[-- Attachment #2: utf-7.diff --]
[-- Type: application/octet-stream, Size: 2474 bytes --]

diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index e6e6135243..c5cfbaeb87 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1511,20 +1511,25 @@ 'iso-safe
 
 (define-coding-system 'utf-7
   "UTF-7 encoding of Unicode (RFC 2152)."
-  :coding-type 'utf-8
+  :coding-type 'charset
+  :charset-list '(ascii)
   :mnemonic ?U
   :mime-charset 'utf-7
-  :charset-list '(unicode)
   :pre-write-conversion 'utf-7-pre-write-conversion
   :post-read-conversion 'utf-7-post-read-conversion)
+;; Having `ascii' in :charset-list automatically sets :ascii-compatible-p,
+;; but UTF-7 is not ASCII compatible; disable.
+(coding-system-put 'utf-7 :ascii-compatible-p nil)
 
 (define-coding-system 'utf-7-imap
   "UTF-7 encoding of Unicode, IMAP version (RFC 2060)"
-  :coding-type 'utf-8
+  :coding-type 'charset
+  :charset-list '(ascii)
   :mnemonic ?u
-  :charset-list '(unicode)
   :pre-write-conversion 'utf-7-imap-pre-write-conversion
   :post-read-conversion 'utf-7-imap-post-read-conversion)
+;; See comment for utf-7 above.
+(coding-system-put 'utf-7-imap :ascii-compatible-p nil)
 
 ;; Use us-ascii for terminal output if some other coding system is not
 ;; specified explicitly.
diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el
index 91e3c2279f..b5fbb4ab8e 100644
--- a/test/lisp/international/mule-tests.el
+++ b/test/lisp/international/mule-tests.el
@@ -48,6 +48,19 @@ mule-cmds--test-universal-coding-system-argument
                         (append (kbd "C-x RET c u t f - 8 RET C-u C-u c a b RET") nil)))
                    (read-string "prompt:")))))
 
+(ert-deftest mule-utf-7 ()
+  ;; utf-7 and utf-7-imap are not ASCII-compatible.
+  (should-not (coding-system-get 'utf-7 :ascii-compatible-p))
+  (should-not (coding-system-get 'utf-7-imap :ascii-compatible-p))
+  ;; Invariant ASCII subset.
+  (let ((s (apply #'string (append (number-sequence #x20 #x25)
+                                   (number-sequence #x27 #x7e)))))
+    (should (equal (encode-coding-string s 'utf-7-imap) s))
+    (should (equal (decode-coding-string s 'utf-7-imap) s)))
+  ;; Escaped ampersand.
+  (should (equal (encode-coding-string "a&bcd" 'utf-7-imap) "a&-bcd"))
+  (should (equal (decode-coding-string "a&-bcd" 'utf-7-imap) "a&bcd")))
+
 ;; Stop "Local Variables" above causing confusion when visiting this file.
 \f
 

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




  reply	other threads:[~2020-04-06 15:56 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii
2020-04-03 22:32   ` Mattias Engdegård
2020-04-04  9:26     ` Eli Zaretskii
2020-04-04 16:41       ` Mattias Engdegård
2020-04-04 17:22         ` Eli Zaretskii
2020-04-04 17:37           ` Eli Zaretskii
2020-04-04 18:06             ` Mattias Engdegård
2020-04-05  2:37               ` Eli Zaretskii
2020-04-05  3:42                 ` Eli Zaretskii
2020-04-05 10:14           ` Mattias Engdegård
2020-04-05 13:28             ` Eli Zaretskii
2020-04-05 13:40               ` Mattias Engdegård
2020-04-04 10:26     ` Eli Zaretskii
2020-04-04 16:55       ` Mattias Engdegård
2020-04-04 17:04         ` Eli Zaretskii
2020-04-04 18:01           ` Mattias Engdegård
2020-04-04 18:25             ` Eli Zaretskii
2020-04-05 10:48               ` Mattias Engdegård
2020-04-05 13:39                 ` Eli Zaretskii
2020-04-05 15:03                   ` Mattias Engdegård
2020-04-05 15:35                     ` Mattias Engdegård
2020-04-05 15:56                       ` Eli Zaretskii
2020-04-06 18:13                         ` Mattias Engdegård
2020-04-05 16:00                     ` Eli Zaretskii
2020-04-06 10:10   ` OGAWA Hirofumi
2020-04-06 14:21     ` Eli Zaretskii
2020-04-06 15:56       ` Mattias Engdegård [this message]
2020-04-06 16:33         ` Eli Zaretskii
2020-04-06 16:55           ` Mattias Engdegård
2020-04-06 17:18             ` Eli Zaretskii
2020-04-06 17:49               ` Mattias Engdegård
2020-04-06 18:20                 ` Eli Zaretskii
2020-04-06 18:34                   ` OGAWA Hirofumi
2020-04-06 21:57                     ` Mattias Engdegård
2020-04-09 11:03                     ` Mattias Engdegård
2020-04-09 14:09                       ` Kazuhiro Ito
2020-04-09 14:22                         ` Mattias Engdegård
2020-04-11 15:09                       ` Mattias Engdegård
2020-04-16 13:11       ` handa
2020-04-16 13:44         ` Eli Zaretskii
2020-04-16 13:59           ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5D4B264A-C43B-4CEE-91DE-760AEBE80671@acm.org \
    --to=mattiase@acm.org \
    --cc=40407@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=hirofumi@mail.parknet.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).