From: ynyaaa@gmail.com
To: handa <handa@gnu.org>
Cc: 23814@debbugs.gnu.org
Subject: bug#23814: 24.5; bug of hz coding-system
Date: Fri, 29 Jul 2016 10:05:14 +0900 [thread overview]
Message-ID: <871t2dz22d.fsf@gmail.com> (raw)
In-Reply-To: <877fdiu3xz.fsf@gmail.com>
handa <handa@gnu.org> writes:
> In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes:
>
>> But I found other bugs about decodings of "~" escape.
>> "~~" and "~{!!~}" should be encoded and decoded as below.
>> "~~" -> "~~~~" -> "~~"
>> "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"
>
>> In really they are encoded properly, but decoded in wrong way.
>> (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
>>>> "~"
>> (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
>>>> #("\x3000" 0 1 (charset chinese-gb2312))
>
> Thank you for finding those bugs. Could you please try the attached
> patch instead?
>
> ---
> K. Handa
> handa@gnu.org
If there are unencodable characters, encodable characters may be broken.
In this example, the second ?\x4E00 character disappears.
(set-language-environment 'Chinese-GB)
(decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
=> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"
To avoid this behavior, there are some solutions.
(a) While decoding, replace "~{...~}" with "\e$A...\e(B"
and decode with iso-2022-7bit.
(b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
and insert "\e$)A" at the beginning of the temp buffer
and decode with iso-2022-8bit-ss2.
(8bit data are decoded as euc-cn.)
(c) While encoding, use euc-cn instead of iso-2022-7bit
and translate each consecutive 8bit data to 7bit data
prefixed by "~{" and postfixed by "~}".
By the way, RFC1843 describes:
The escape sequence '~\n' is a line-continuation marker to be
consumed with no output produced.
This form shoud return "AB".
(decode-coding-string "A~\nB" 'hz)
=> "A\nB"
> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
> index e531640..9abdae1 100644
> --- a/lisp/language/china-util.el
> +++ b/lisp/language/china-util.el
> @@ -95,7 +95,12 @@ decode-hz-region
> (goto-char (point-min))
> (while (search-forward "~" nil t)
> (setq ch (following-char))
> - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
> + (if (= ch ?{)
> + (search-forward "~}" nil 'move)
> + (when (or (= ch ?\n) (= ch ?~))
> + (delete-char -1)
> + (put-text-property (point) (1+ (point)) 'hz-decoded t)
> + (forward-char 1))))
>
> ;; "^zW...\n" -> Chinese GB2312
> ;; "~{...~}" -> Chinese GB2312
> @@ -104,6 +109,8 @@ decode-hz-region
> (while (re-search-forward hz/zw-start-gb nil t)
> (setq pos (match-beginning 0)
> ch (char-after pos))
> + (if (and (= ch ?~) (get-text-property pos 'hz-decoded))
> + (forward-char 1)
> ;; Record the first position to start conversion.
> (or beg (setq beg pos))
> (end-of-line)
> @@ -122,9 +129,10 @@ decode-hz-region
> t)
> (delete-char -2))
> (setq end (point))
> - (translate-region pos (point) hz-set-msb-table))))
> + (translate-region pos (point) hz-set-msb-table)))))
> (if beg
> (decode-coding-region beg end 'euc-china)))
> + (remove-text-properties (point-min) (point-max) '(hz-decoded nil))
> (- (point-max) (point-min)))))
>
> ;;;###autoload
> @@ -142,6 +150,7 @@ encode-hz-region
> (save-restriction
> (narrow-to-region beg end)
>
> + (put-text-property beg end 'charset 'chinese-gb2312)
> ;; "~" -> "~~"
> (goto-char (point-min))
> (while (search-forward "~" nil t) (insert ?~))
next prev parent reply other threads:[~2016-07-29 1:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
2016-06-21 12:58 ` Eli Zaretskii
2016-06-22 13:47 ` ynyaaa
2016-06-22 15:28 ` Eli Zaretskii
2016-06-22 17:04 ` ynyaaa
2016-06-22 17:26 ` Eli Zaretskii
2016-07-09 11:20 ` Eli Zaretskii
2016-07-13 14:12 ` handa
2016-07-23 17:47 ` Eli Zaretskii
2016-07-24 8:21 ` ynyaaa
2016-07-26 15:09 ` handa
2016-07-29 1:05 ` ynyaaa [this message]
2016-08-14 11:22 ` handa
2016-08-17 6:33 ` ynyaaa
2016-08-17 14:43 ` handa
2016-08-17 15:28 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871t2dz22d.fsf@gmail.com \
--to=ynyaaa@gmail.com \
--cc=23814@debbugs.gnu.org \
--cc=handa@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.