From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ynyaaa@gmail.com Newsgroups: gmane.emacs.bugs Subject: bug#23814: 24.5; bug of hz coding-system Date: Fri, 29 Jul 2016 10:05:14 +0900 Message-ID: <871t2dz22d.fsf@gmail.com> References: <877fdiu3xz.fsf@gmail.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1469754394 15792 80.91.229.3 (29 Jul 2016 01:06:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 29 Jul 2016 01:06:34 +0000 (UTC) Cc: 23814@debbugs.gnu.org To: handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jul 29 03:06:23 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bSwFq-0002pI-BU for geb-bug-gnu-emacs@m.gmane.org; Fri, 29 Jul 2016 03:06:22 +0200 Original-Received: from localhost ([::1]:56484 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSwFk-0001fe-69 for geb-bug-gnu-emacs@m.gmane.org; Thu, 28 Jul 2016 21:06:16 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:44248) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSwFa-0001eV-Iq for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 21:06:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSwFW-0005JQ-FG for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 21:06:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52521) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSwFW-0005JG-B1 for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 21:06:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bSwFV-0001vd-Ua for bug-gnu-emacs@gnu.org; Thu, 28 Jul 2016 21:06:01 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <877fdiu3xz.fsf@gmail.com> Resent-From: ynyaaa@gmail.com Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 29 Jul 2016 01:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23814 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23814-submit@debbugs.gnu.org id=B23814.14697543317376 (code B ref 23814); Fri, 29 Jul 2016 01:06:01 +0000 Original-Received: (at 23814) by debbugs.gnu.org; 29 Jul 2016 01:05:31 +0000 Original-Received: from localhost ([127.0.0.1]:49818 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSwF0-0001ut-Oo for submit@debbugs.gnu.org; Thu, 28 Jul 2016 21:05:30 -0400 Original-Received: from mail-pa0-f67.google.com ([209.85.220.67]:33386) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSwEx-0001ud-Ai for 23814@debbugs.gnu.org; Thu, 28 Jul 2016 21:05:29 -0400 Original-Received: by mail-pa0-f67.google.com with SMTP id q2so4316525pap.0 for <23814@debbugs.gnu.org>; Thu, 28 Jul 2016 18:05:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:mime-version; bh=xQWk9CbWpMDw2WLrfDJ+3v2Rr+jJeApcYC6IMQvsTIM=; b=bGndvQzxxtpIWR9cpEjnqxNxhXwe9scKNKzoYGuwmTKynuMJwfIiLMpKo0IkVnB29F 44DU+qU7+VZWH/EYh3INNShqdAUmMLbyS6j1orV4rz60xV+h3kB6KjR41yg4rM4WZI4z MCNzYl3BZYbjjKbvC9djb+2h9CuRcj1dpN5dfE2t6yhlnSvsWnLyquUK0I4ASZ1jFPIJ sRDfFeUQlMSXrEJ8/PPTbmY305i/SXidTQmlfZVaBfs7pVy+qj7R30dfueqn+LIvpCrI x5ppZ+PQrBeNUOVhCmwfPJKz7y36escqhUbedbhdUcjpqVefR49uOJlLf1BBL5OuzI8t QRUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version; bh=xQWk9CbWpMDw2WLrfDJ+3v2Rr+jJeApcYC6IMQvsTIM=; b=Wi4TolwrUTZ1pUlu7srkoU7HM1tks1T+yyARvZG9CGNfJ28fdCAG2A16MwjSK2ND4i MeJX4xW4HUApiv1yjTaHyW5M+n5QG5G7883nA9fFEOTknwimMDHB5+CMJMmoX09Rit1r Totcwm291fwRl+lW5znPmANnDJ/5m7f9p7Khuq3ZgCj+QZtFqkEhJAz1RIsHdXiBG4YD UXwVWPtsLr97uM4ppjLmmirUNKX4LD7290MUhuEQunEvT37u5ft+GMdaQtSpu1n7eiuF L8/0rYT7RUXiagG7jvT8xxFsj9XqlyntoMsVPrvqFeQbaJCeHiws/C4bLZgHfjnc44wq JhsQ== X-Gm-Message-State: AEkoouuVJJZ6KV0+Sapy/cbaNb1/pAqtZV0ksdia/p+9TCeVYcleys89Oe0WmBglhlyvdg== X-Received: by 10.66.172.237 with SMTP id bf13mr65082966pac.42.1469754321242; Thu, 28 Jul 2016 18:05:21 -0700 (PDT) Original-Received: from PNUT-PC (east49-p99.eaccess.hi-ho.ne.jp. [219.105.5.100]) by smtp.gmail.com with ESMTPSA id q26sm19874134pfj.53.2016.07.28.18.05.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 28 Jul 2016 18:05:20 -0700 (PDT) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:121659 Archived-At: handa writes: > In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes: > >> But I found other bugs about decodings of "~" escape. >> "~~" and "~{!!~}" should be encoded and decoded as below. >> "~~" -> "~~~~" -> "~~" >> "~{!!~}" -> "~~{!!~~}" -> "~{!!~}" > >> In really they are encoded properly, but decoded in wrong way. >> (decode-coding-string (encode-coding-string "~~" 'hz) 'hz) >>>> "~" >> (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz) >>>> #("\x3000" 0 1 (charset chinese-gb2312)) > > Thank you for finding those bugs. Could you please try the attached > patch instead? > > --- > K. Handa > handa@gnu.org If there are unencodable characters, encodable characters may be broken. In this example, the second ?\x4E00 character disappears. (set-language-environment 'Chinese-GB) (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz) => "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273" To avoid this behavior, there are some solutions. (a) While decoding, replace "~{...~}" with "\e$A...\e(B" and decode with iso-2022-7bit. (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding and insert "\e$)A" at the beginning of the temp buffer and decode with iso-2022-8bit-ss2. (8bit data are decoded as euc-cn.) (c) While encoding, use euc-cn instead of iso-2022-7bit and translate each consecutive 8bit data to 7bit data prefixed by "~{" and postfixed by "~}". By the way, RFC1843 describes: The escape sequence '~\n' is a line-continuation marker to be consumed with no output produced. This form shoud return "AB". (decode-coding-string "A~\nB" 'hz) => "A\nB" > diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el > index e531640..9abdae1 100644 > --- a/lisp/language/china-util.el > +++ b/lisp/language/china-util.el > @@ -95,7 +95,12 @@ decode-hz-region > (goto-char (point-min)) > (while (search-forward "~" nil t) > (setq ch (following-char)) > - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) > + (if (= ch ?{) > + (search-forward "~}" nil 'move) > + (when (or (= ch ?\n) (= ch ?~)) > + (delete-char -1) > + (put-text-property (point) (1+ (point)) 'hz-decoded t) > + (forward-char 1)))) > > ;; "^zW...\n" -> Chinese GB2312 > ;; "~{...~}" -> Chinese GB2312 > @@ -104,6 +109,8 @@ decode-hz-region > (while (re-search-forward hz/zw-start-gb nil t) > (setq pos (match-beginning 0) > ch (char-after pos)) > + (if (and (= ch ?~) (get-text-property pos 'hz-decoded)) > + (forward-char 1) > ;; Record the first position to start conversion. > (or beg (setq beg pos)) > (end-of-line) > @@ -122,9 +129,10 @@ decode-hz-region > t) > (delete-char -2)) > (setq end (point)) > - (translate-region pos (point) hz-set-msb-table)))) > + (translate-region pos (point) hz-set-msb-table))))) > (if beg > (decode-coding-region beg end 'euc-china))) > + (remove-text-properties (point-min) (point-max) '(hz-decoded nil)) > (- (point-max) (point-min))))) > > ;;;###autoload > @@ -142,6 +150,7 @@ encode-hz-region > (save-restriction > (narrow-to-region beg end) > > + (put-text-property beg end 'charset 'chinese-gb2312) > ;; "~" -> "~~" > (goto-char (point-min)) > (while (search-forward "~" nil t) (insert ?~))