From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: handa Newsgroups: gmane.emacs.bugs Subject: bug#23814: 24.5; bug of hz coding-system Date: Wed, 17 Aug 2016 23:43:13 +0900 Message-ID: <87bn0rjw1q.fsf@gnu.org> References: <877fdiu3xz.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1471445058 10716 195.159.176.226 (17 Aug 2016 14:44:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 17 Aug 2016 14:44:18 +0000 (UTC) Cc: 23814@debbugs.gnu.org To: ynyaaa@gmail.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Aug 17 16:44:14 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba24k-0002an-2x for geb-bug-gnu-emacs@m.gmane.org; Wed, 17 Aug 2016 16:44:14 +0200 Original-Received: from localhost ([::1]:47870 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba24k-0007jM-B6 for geb-bug-gnu-emacs@m.gmane.org; Wed, 17 Aug 2016 10:44:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39791) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba24d-0007hZ-1R for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 10:44:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba24X-00011b-V6 for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 10:44:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:34422) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba24X-00011V-R3 for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 10:44:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ba24X-0002WB-MF for bug-gnu-emacs@gnu.org; Wed, 17 Aug 2016 10:44:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: handa Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 17 Aug 2016 14:44:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 23814 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 23814-submit@debbugs.gnu.org id=B23814.14714450199642 (code B ref 23814); Wed, 17 Aug 2016 14:44:01 +0000 Original-Received: (at 23814) by debbugs.gnu.org; 17 Aug 2016 14:43:39 +0000 Original-Received: from localhost ([127.0.0.1]:60367 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba247-0002VN-GH for submit@debbugs.gnu.org; Wed, 17 Aug 2016 10:43:39 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:48617) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba245-0002VB-O1 for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 10:43:34 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ba23z-0000ml-DZ for 23814@debbugs.gnu.org; Wed, 17 Aug 2016 10:43:28 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41002) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ba23u-0000m3-NQ; Wed, 17 Aug 2016 10:43:22 -0400 Original-Received: from fl1-122-134-89-8.iba.mesh.ad.jp ([122.134.89.8]:44844 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1ba23s-00047V-UV; Wed, 17 Aug 2016 10:43:21 -0400 Original-Received: from handa by shatin with local (Exim 4.86_2) (envelope-from ) id 1ba23m-000497-0f; Wed, 17 Aug 2016 23:43:14 +0900 In-Reply-To: <87oa4rdhvq.fsf@gmail.com> (ynyaaa@gmail.com) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:122314 Archived-At: In article <87oa4rdhvq.fsf@gmail.com>, ynyaaa@gmail.com writes: > Hi, I tried new china-util.el. It works very well. Thank you for testing it. > I prefer 7bit encoding to use only 7bit data, too. > As for elisp, "\u12345" is treated as "\u1234\ 5". Ah, ok, I changed to encode characters not in BMP to \UXXXXXXXX. I've just committed the attached change. --- K. Handa handa@gnu.org 2016-08-17 handa * lisp/language/china-util.el (decode-hz-region): Pay attention to "~~}" sequence at the end of Chinese character range. (hz-category-table): New variable. (encode-hz-region): Convert non-encodable characters to \u... and \U... Preserve ESC on ecoding. Put `chinese-gb2312' `charset' text property in advance to force iso-2022-encoding to select chinese-gb2312 designation. diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el index e531640..6505fb8 100644 --- a/lisp/language/china-util.el +++ b/lisp/language/china-util.el @@ -88,43 +88,34 @@ decode-hz-region (let (pos ch) (narrow-to-region beg end) - ;; We, at first, convert HZ/ZW to `euc-china', + ;; We, at first, convert HZ/ZW to `iso-2022-7bit', ;; then decode it. - ;; "~\n" -> "\n", "~~" -> "~" + ;; "~\n" -> "", "~~" -> "~" (goto-char (point-min)) (while (search-forward "~" nil t) (setq ch (following-char)) - (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))) + (cond ((= ch ?{) + (delete-region (1- (point)) (1+ (point))) + (setq pos (point)) + (insert iso2022-gb-designation) + (if (looking-at "\\([!-}][!-~]\\)*") + (goto-char (match-end 0))) + (if (looking-at hz-ascii-designation) + (delete-region (match-beginning 0) (match-end 0))) + (insert iso2022-ascii-designation) + (decode-coding-region pos (point) 'iso-2022-7bit)) + + ((= ch ?~) + (delete-char 1)) + + ((and (= ch ?\n) + decode-hz-line-continuation) + (delete-region (1- (point)) (1+ (point)))) + + (t + (forward-char 1))))) - ;; "^zW...\n" -> Chinese GB2312 - ;; "~{...~}" -> Chinese GB2312 - (goto-char (point-min)) - (setq beg nil) - (while (re-search-forward hz/zw-start-gb nil t) - (setq pos (match-beginning 0) - ch (char-after pos)) - ;; Record the first position to start conversion. - (or beg (setq beg pos)) - (end-of-line) - (setq end (point)) - (if (>= ch 128) ; 8bit GB2312 - nil - (goto-char pos) - (delete-char 2) - (setq end (- end 2)) - (if (= ch ?z) ; ZW -> euc-china - (progn - (translate-region (point) end hz-set-msb-table) - (goto-char end)) - (if (search-forward hz-ascii-designation - (if decode-hz-line-continuation nil end) - t) - (delete-char -2)) - (setq end (point)) - (translate-region pos (point) hz-set-msb-table)))) - (if beg - (decode-coding-region beg end 'euc-china))) (- (point-max) (point-min))))) ;;;###autoload @@ -133,33 +124,57 @@ decode-hz-buffer (interactive) (decode-hz-region (point-min) (point-max))) +(defvar hz-category-table nil) + ;;;###autoload (defun encode-hz-region (beg end) "Encode the text in the current region to HZ. Return the length of resulting text." (interactive "r") + (unless hz-category-table + (setq hz-category-table (make-category-table)) + (with-category-table hz-category-table + (define-category ?c "hz encodable") + (map-charset-chars #'modify-category-entry 'ascii ?c) + (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c))) (save-excursion (save-restriction (narrow-to-region beg end) + (with-category-table hz-category-table + ;; ~ -> ~~ + (goto-char (point-min)) + (while (search-forward "~" nil t) (insert ?~)) + + ;; ESC -> ESC ESC + (goto-char (point-min)) + (while (search-forward "\e" nil t) (insert ?\e)) - ;; "~" -> "~~" - (goto-char (point-min)) - (while (search-forward "~" nil t) (insert ?~)) - - ;; Chinese GB2312 -> "~{...~}" - (goto-char (point-min)) - (if (re-search-forward "\\cc" nil t) - (let (pos) - (goto-char (setq pos (match-beginning 0))) - (encode-coding-region pos (point-max) 'iso-2022-7bit) - (goto-char pos) - (while (search-forward iso2022-gb-designation nil t) - (delete-char -3) - (insert hz-gb-designation)) - (goto-char pos) - (while (search-forward iso2022-ascii-designation nil t) - (delete-char -3) - (insert hz-ascii-designation)))) + ;; Non-ASCII-GB2312 -> \uXXXX + (goto-char (point-min)) + (while (re-search-forward "\\Cc" nil t) + (let ((ch (preceding-char))) + (delete-char -1) + (insert (format (if (< ch #x10000) "\\u%04X" "\\U%08X") ch)))) + + ;; Prefer chinese-gb2312 for Chinese characters. + (put-text-property (point-min) (point-max) 'charset 'chinese-gb2312) + (encode-coding-region (point-min) (point-max) 'iso-2022-7bit) + + ;; ESC $ B ... ESC ( B -> ~{ ... ~} + ;; ESC ESC -> ESC + (goto-char (point-min)) + (while (search-forward "\e" nil t) + (if (= (following-char) ?\e) + ;; ESC ESC -> ESC + (delete-char 1) + (forward-char -1) + (if (looking-at iso2022-gb-designation) + (progn + (delete-region (match-beginning 0) (match-end 0)) + (insert hz-gb-designation) + (search-forward iso2022-ascii-designation nil 'move) + (delete-region (match-beginning 0) (match-end 0)) + (insert hz-ascii-designation)))))) (- (point-max) (point-min))))) ;;;###autoload