bug#23814: 24.5; bug of hz coding-system

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#23814: 24.5; bug of hz coding-system
@ 2016-06-21 12:22 ynyaaa
  2016-06-21 12:58 ` Eli Zaretskii
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: ynyaaa @ 2016-06-21 12:22 UTC (permalink / raw)
  To: 23814


hz coding-system should encode chinese-gb2312 characters,
it may fail to encode text without charset property.

current-language-environment
=>"Japanese"

;; wrong
(encode-coding-string "\x4E00" 'hz)
=>"\e$B0l~}"

;; correct
(encode-coding-string (propertize "\x4E00" 'charset 'chinese-gb2312) 'hz)
=>"~{R;~}"


When the second byte of chinese-gb2312 character equals to ?~,
hz coding-system may faile to decode.

(encode-coding-string (propertize "\x670D" 'charset 'chinese-gb2312) 'hz)
=>"~{7~~}"

;; wrong
(decode-coding-string "~{7~~}" 'hz)
=>"\300\267"



In GNU Emacs 24.5.1 (i686-pc-mingw32)
 of 2015-04-11 on LEG570
Windowing system distributor `Microsoft Corp.', version 6.0.6002
Configured using:
 `configure --prefix=/c/usr --host=i686-pc-mingw32'

Important settings:
  value of $LANG: JPN
  locale-coding-system: cp932

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:

Load-path shadows:
None found.

Features:
(network-stream starttls tls mailalias smtpmail auth-source eieio
byte-opt bytecomp byte-compile cl-extra cl-loaddefs cl-lib cconv
eieio-core password-cache rect warnings china-util misearch
multi-isearch pp shadow sort gnus-util mail-extr emacsbug message
format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils help-mode easymenu advice
help-fns time-date japan-util tooltip electric uniquify ediff-hook
vc-hooks lisp-float-type mwheel dos-w32 ls-lisp w32-common-fns
disp-table w32-win w32-vars tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment lisp-mode prog-mode register page menu-bar
rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax
facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak
czech european ethiopic indian cyrillic chinese case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer nadvice loaddefs button
faces cus-face macroexp files text-properties overlay sha1 md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process w32notify w32 multi-tty emacs)

Memory information:
((conses 8 94845 27098)
 (symbols 32 19573 0)
 (miscs 32 77 279)
 (strings 16 16482 13821)
 (string-bytes 1 462365)
 (vectors 8 12746)
 (vector-slots 4 519456 11240)
 (floats 8 62 556)
 (intervals 28 606 13)
 (buffers 508 18))





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
@ 2016-06-21 12:58 ` Eli Zaretskii
  2016-06-22 13:47 ` ynyaaa
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-06-21 12:58 UTC (permalink / raw)
  To: ynyaaa; +Cc: 23814

> From: ynyaaa@gmail.com
> Date: Tue, 21 Jun 2016 21:22:32 +0900
> 
> hz coding-system should encode chinese-gb2312 characters,
> it may fail to encode text without charset property.

This is by design, and mentioned in the doc string of that
coding-system.  Since Emacs is Unicode based, the _only_ way of having
"chinese-gb2312 characters" is by using that text property.

IOW, I don't think this is a bug.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
  2016-06-21 12:58 ` Eli Zaretskii
@ 2016-06-22 13:47 ` ynyaaa
  2016-06-22 15:28   ` Eli Zaretskii
  2016-06-22 17:04 ` ynyaaa
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: ynyaaa @ 2016-06-22 13:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23814

Eli Zaretskii <eliz@gnu.org> writes:

> This is by design, and mentioned in the doc string of that
> coding-system.  Since Emacs is Unicode based, the _only_ way of having
> "chinese-gb2312 characters" is by using that text property.

`encode-hz-region' uses `iso-2022-7bit' coding-system internally,
replacing it with the coding-system below will work.

(define-coding-system 'iso-2022-cn-gb
  "ISO 2022 based 7bit encoding only for Chinese GB2312."
  :coding-type 'iso-2022
  :mnemonic ?C
  :charset-list '(ascii chinese-gb2312)
  :designation [(ascii chinese-gb2312) nil nil nil]
  :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
  )





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-22 13:47 ` ynyaaa
@ 2016-06-22 15:28   ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-06-22 15:28 UTC (permalink / raw)
  To: ynyaaa; +Cc: 23814

> > From: ynyaaa@gmail.com
> Cc: 23814@debbugs.gnu.org
> Date: Wed, 22 Jun 2016 22:47:00 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > This is by design, and mentioned in the doc string of that
> > coding-system.  Since Emacs is Unicode based, the _only_ way of having
> > "chinese-gb2312 characters" is by using that text property.
> 
> `encode-hz-region' uses `iso-2022-7bit' coding-system internally,
> replacing it with the coding-system below will work.
> 
> (define-coding-system 'iso-2022-cn-gb
>   "ISO 2022 based 7bit encoding only for Chinese GB2312."
>   :coding-type 'iso-2022
>   :mnemonic ?C
>   :charset-list '(ascii chinese-gb2312)
>   :designation [(ascii chinese-gb2312) nil nil nil]
>   :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
>   )

What advantages does this change have?






^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
  2016-06-21 12:58 ` Eli Zaretskii
  2016-06-22 13:47 ` ynyaaa
@ 2016-06-22 17:04 ` ynyaaa
  2016-06-22 17:26   ` Eli Zaretskii
  2016-07-24  8:21 ` ynyaaa
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: ynyaaa @ 2016-06-22 17:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23814

Eli Zaretskii <eliz@gnu.org> writes:

>> `encode-hz-region' uses `iso-2022-7bit' coding-system internally,
>> replacing it with the coding-system below will work.
>> 
>> (define-coding-system 'iso-2022-cn-gb
>>   "ISO 2022 based 7bit encoding only for Chinese GB2312."
>>   :coding-type 'iso-2022
>>   :mnemonic ?C
>>   :charset-list '(ascii chinese-gb2312)
>>   :designation [(ascii chinese-gb2312) nil nil nil]
>>   :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
>>   )
>
> What advantages does this change have?

`iso-2022-7bit' may encode same character to various strings,
while `iso-2022-cn-gb' encodes same charcter to same string.

(mapcar (lambda (cs) (encode-coding-string
                      (propertize "\x4e00" 'charset cs)
                      'iso-2022-7bit))
        '(chinese-gb2312 japanese-jisx0208 korean-ksc5601
                         chinese-cns11643-1))
=>("\e$AR;\e(B"
   "\e$B0l\e(B"
   "\e$(Cli\e(B"
   "\e$(GD!\e(B")

(mapcar (lambda (cs) (encode-coding-string
                      (propertize "\x4e00" 'charset cs)
                      'iso-2022-cn-gb))
        '(chinese-gb2312 japanese-jisx0208 korean-ksc5601
                         chinese-cns11643-1))
=>("\e$AR;\e(B"
   "\e$AR;\e(B"
   "\e$AR;\e(B"
   "\e$AR;\e(B")

`encode-hz-region' expects `chinese-gb2312' characters are encoded
with "\e$A" sequences, and replaces them to "~{".





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-22 17:04 ` ynyaaa
@ 2016-06-22 17:26   ` Eli Zaretskii
  2016-07-09 11:20     ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2016-06-22 17:26 UTC (permalink / raw)
  To: ynyaaa, Kenichi Handa; +Cc: 23814

> From: ynyaaa@gmail.com
> Cc: 23814@debbugs.gnu.org
> Date: Thu, 23 Jun 2016 02:04:18 +0900
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally,
> >> replacing it with the coding-system below will work.
> >> 
> >> (define-coding-system 'iso-2022-cn-gb
> >>   "ISO 2022 based 7bit encoding only for Chinese GB2312."
> >>   :coding-type 'iso-2022
> >>   :mnemonic ?C
> >>   :charset-list '(ascii chinese-gb2312)
> >>   :designation [(ascii chinese-gb2312) nil nil nil]
> >>   :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
> >>   )
> >
> > What advantages does this change have?
> 
> `iso-2022-7bit' may encode same character to various strings,
> while `iso-2022-cn-gb' encodes same charcter to same string.
> 
> (mapcar (lambda (cs) (encode-coding-string
>                       (propertize "\x4e00" 'charset cs)
>                       'iso-2022-7bit))
>         '(chinese-gb2312 japanese-jisx0208 korean-ksc5601
>                          chinese-cns11643-1))
> =>("\e$AR;\e(B"
>    "\e$B0l\e(B"
>    "\e$(Cli\e(B"
>    "\e$(GD!\e(B")
> 
> (mapcar (lambda (cs) (encode-coding-string
>                       (propertize "\x4e00" 'charset cs)
>                       'iso-2022-cn-gb))
>         '(chinese-gb2312 japanese-jisx0208 korean-ksc5601
>                          chinese-cns11643-1))
> =>("\e$AR;\e(B"
>    "\e$AR;\e(B"
>    "\e$AR;\e(B"
>    "\e$AR;\e(B")
> 
> `encode-hz-region' expects `chinese-gb2312' characters are encoded
> with "\e$A" sequences, and replaces them to "~{".

I understand, but as I said, I think this is by design, and should not
be changed.  However, maybe I'm missing something, so I'll CC
Handa-san and ask him to comment on this proposal and the issue in
general.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-22 17:26   ` Eli Zaretskii
@ 2016-07-09 11:20     ` Eli Zaretskii
  2016-07-13 14:12       ` handa
  0 siblings, 1 reply; 16+ messages in thread
From: Eli Zaretskii @ 2016-07-09 11:20 UTC (permalink / raw)
  To: handa; +Cc: ynyaaa, 23814

Ping!  Could you please comment on this issue?

> Date: Wed, 22 Jun 2016 20:26:53 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 23814@debbugs.gnu.org
> 
> > From: ynyaaa@gmail.com
> > Cc: 23814@debbugs.gnu.org
> > Date: Thu, 23 Jun 2016 02:04:18 +0900
> > 
> > Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally,
> > >> replacing it with the coding-system below will work.
> > >> 
> > >> (define-coding-system 'iso-2022-cn-gb
> > >>   "ISO 2022 based 7bit encoding only for Chinese GB2312."
> > >>   :coding-type 'iso-2022
> > >>   :mnemonic ?C
> > >>   :charset-list '(ascii chinese-gb2312)
> > >>   :designation [(ascii chinese-gb2312) nil nil nil]
> > >>   :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
> > >>   )
> > >
> > > What advantages does this change have?
> > 
> > `iso-2022-7bit' may encode same character to various strings,
> > while `iso-2022-cn-gb' encodes same charcter to same string.
> > 
> > (mapcar (lambda (cs) (encode-coding-string
> >                       (propertize "\x4e00" 'charset cs)
> >                       'iso-2022-7bit))
> >         '(chinese-gb2312 japanese-jisx0208 korean-ksc5601
> >                          chinese-cns11643-1))
> > =>("\e$AR;\e(B"
> >    "\e$B0l\e(B"
> >    "\e$(Cli\e(B"
> >    "\e$(GD!\e(B")
> > 
> > (mapcar (lambda (cs) (encode-coding-string
> >                       (propertize "\x4e00" 'charset cs)
> >                       'iso-2022-cn-gb))
> >         '(chinese-gb2312 japanese-jisx0208 korean-ksc5601
> >                          chinese-cns11643-1))
> > =>("\e$AR;\e(B"
> >    "\e$AR;\e(B"
> >    "\e$AR;\e(B"
> >    "\e$AR;\e(B")
> > 
> > `encode-hz-region' expects `chinese-gb2312' characters are encoded
> > with "\e$A" sequences, and replaces them to "~{".
> 
> I understand, but as I said, I think this is by design, and should not
> be changed.  However, maybe I'm missing something, so I'll CC
> Handa-san and ask him to comment on this proposal and the issue in
> general.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-07-09 11:20     ` Eli Zaretskii
@ 2016-07-13 14:12       ` handa
  2016-07-23 17:47         ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: handa @ 2016-07-13 14:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ynyaaa, 23814

In article <83d1mngirw.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Ping!  Could you please comment on this issue?

Sorry, I've overlooked that mail.

> > > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally,
> > > >> replacing it with the coding-system below will work.
> > > >> 
> > > >> (define-coding-system 'iso-2022-cn-gb
> > > >>   "ISO 2022 based 7bit encoding only for Chinese GB2312."
> > > >>   :coding-type 'iso-2022
> > > >>   :mnemonic ?C
> > > >>   :charset-list '(ascii chinese-gb2312)
> > > >>   :designation [(ascii chinese-gb2312) nil nil nil]
> > > >>   :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
> > > >>   )

Right.  But, as there are already so many iso-2022 based coding systems,
I'd like to avoid adding a new one just for encode-hz-region.  I think
the attached patch is sufficent.  Could you please try it?  It also
fixes the problem of incorrect decoding of "~{7~~}".

---
K. Handa
handa@gnu.org

diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
index e531640..9735bd6 100644
--- a/lisp/language/china-util.el
+++ b/lisp/language/china-util.el
@@ -95,7 +95,9 @@ decode-hz-region
 	(goto-char (point-min))
 	(while (search-forward "~" nil t)
 	  (setq ch (following-char))
-	  (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
+          (if (= ch ?{)
+              (search-forward "~}" nil 'move)
+            (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))))
 
 	;; "^zW...\n" -> Chinese GB2312
 	;; "~{...~}"  -> Chinese GB2312
@@ -141,7 +143,7 @@ encode-hz-region
   (save-excursion
     (save-restriction
       (narrow-to-region beg end)
-
+      (put-text-property beg end 'charset 'chinese-gb2312)
       ;; "~" -> "~~"
       (goto-char (point-min))
       (while (search-forward "~" nil t)	(insert ?~))






^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-07-13 14:12       ` handa
@ 2016-07-23 17:47         ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-07-23 17:47 UTC (permalink / raw)
  To: ynyaaa; +Cc: 23814

Ping!  Could you please try this patch and see if it solves the
problem?

> From: handa <handa@gnu.org>
> Cc: ynyaaa@gmail.com, 23814@debbugs.gnu.org
> Date: Wed, 13 Jul 2016 23:12:47 +0900
> 
> > > > >> `encode-hz-region' uses `iso-2022-7bit' coding-system internally,
> > > > >> replacing it with the coding-system below will work.
> > > > >> 
> > > > >> (define-coding-system 'iso-2022-cn-gb
> > > > >>   "ISO 2022 based 7bit encoding only for Chinese GB2312."
> > > > >>   :coding-type 'iso-2022
> > > > >>   :mnemonic ?C
> > > > >>   :charset-list '(ascii chinese-gb2312)
> > > > >>   :designation [(ascii chinese-gb2312) nil nil nil]
> > > > >>   :flags '(ascii-at-eol ascii-at-cntl designation 7-bit safe)
> > > > >>   )
> 
> Right.  But, as there are already so many iso-2022 based coding systems,
> I'd like to avoid adding a new one just for encode-hz-region.  I think
> the attached patch is sufficent.  Could you please try it?  It also
> fixes the problem of incorrect decoding of "~{7~~}".
> 
> ---
> K. Handa
> handa@gnu.org
> 
> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
> index e531640..9735bd6 100644
> --- a/lisp/language/china-util.el
> +++ b/lisp/language/china-util.el
> @@ -95,7 +95,9 @@ decode-hz-region
>  	(goto-char (point-min))
>  	(while (search-forward "~" nil t)
>  	  (setq ch (following-char))
> -	  (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
> +          (if (= ch ?{)
> +              (search-forward "~}" nil 'move)
> +            (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))))
>  
>  	;; "^zW...\n" -> Chinese GB2312
>  	;; "~{...~}"  -> Chinese GB2312
> @@ -141,7 +143,7 @@ encode-hz-region
>    (save-excursion
>      (save-restriction
>        (narrow-to-region beg end)
> -
> +      (put-text-property beg end 'charset 'chinese-gb2312)
>        ;; "~" -> "~~"
>        (goto-char (point-min))
>        (while (search-forward "~" nil t)	(insert ?~))
> 
> 





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
                   ` (2 preceding siblings ...)
  2016-06-22 17:04 ` ynyaaa
@ 2016-07-24  8:21 ` ynyaaa
  2016-07-26 15:09   ` handa
  2016-07-29  1:05 ` ynyaaa
  2016-08-17  6:33 ` ynyaaa
  5 siblings, 1 reply; 16+ messages in thread
From: ynyaaa @ 2016-07-24  8:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 23814

Eli Zaretskii <eliz@gnu.org> writes:

> Ping!  Could you please try this patch and see if it solves the
> problem?

The patch seems to make better results.

But I found other bugs about decodings of "~" escape.
"~~" and "~{!!~}" should be encoded and decoded as below.
    "~~" -> "~~~~" -> "~~"
    "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"

In really they are encoded properly, but decoded in wrong way.
    (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
    => "~"
    (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
    => #("\x3000" 0 1 (charset chinese-gb2312))

These behaviors are not affected by the patch.

>> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
>> index e531640..9735bd6 100644
>> --- a/lisp/language/china-util.el
>> +++ b/lisp/language/china-util.el
>> @@ -95,7 +95,9 @@ decode-hz-region
>>  	(goto-char (point-min))
>>  	(while (search-forward "~" nil t)
>>  	  (setq ch (following-char))
>> -	  (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
>> +          (if (= ch ?{)
>> +              (search-forward "~}" nil 'move)
>> +            (if (or (= ch ?\n) (= ch ?~)) (delete-char -1))))
>>  
>>  	;; "^zW...\n" -> Chinese GB2312
>>  	;; "~{...~}"  -> Chinese GB2312
>> @@ -141,7 +143,7 @@ encode-hz-region
>>    (save-excursion
>>      (save-restriction
>>        (narrow-to-region beg end)
>> -
>> +      (put-text-property beg end 'charset 'chinese-gb2312)
>>        ;; "~" -> "~~"
>>        (goto-char (point-min))
>>        (while (search-forward "~" nil t)	(insert ?~))
>> 
>> 





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-07-24  8:21 ` ynyaaa
@ 2016-07-26 15:09   ` handa
  0 siblings, 0 replies; 16+ messages in thread
From: handa @ 2016-07-26 15:09 UTC (permalink / raw)
  To: ynyaaa; +Cc: 23814

In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes:

> But I found other bugs about decodings of "~" escape.
> "~~" and "~{!!~}" should be encoded and decoded as below.
>     "~~" -> "~~~~" -> "~~"
>     "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"

> In really they are encoded properly, but decoded in wrong way.
>     (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
>>> "~"
>     (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
>>> #("\x3000" 0 1 (charset chinese-gb2312))

Thank you for finding those bugs.  Could you please try the attached
patch instead?

---
K. Handa
handa@gnu.org

diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
index e531640..9abdae1 100644
--- a/lisp/language/china-util.el
+++ b/lisp/language/china-util.el
@@ -95,7 +95,12 @@ decode-hz-region
 	(goto-char (point-min))
 	(while (search-forward "~" nil t)
 	  (setq ch (following-char))
-	  (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
+          (if (= ch ?{)
+              (search-forward "~}" nil 'move)
+            (when (or (= ch ?\n) (= ch ?~))
+              (delete-char -1)
+              (put-text-property (point) (1+ (point)) 'hz-decoded t)
+              (forward-char 1))))
 
 	;; "^zW...\n" -> Chinese GB2312
 	;; "~{...~}"  -> Chinese GB2312
@@ -104,6 +109,8 @@ decode-hz-region
 	(while (re-search-forward hz/zw-start-gb nil t)
 	  (setq pos (match-beginning 0)
 		ch (char-after pos))
+          (if (and (= ch ?~) (get-text-property pos 'hz-decoded))
+              (forward-char 1)
 	  ;; Record the first position to start conversion.
 	  (or beg (setq beg pos))
 	  (end-of-line)
@@ -122,9 +129,10 @@ decode-hz-region
 				  t)
 		  (delete-char -2))
 	      (setq end (point))
-	      (translate-region pos (point) hz-set-msb-table))))
+	      (translate-region pos (point) hz-set-msb-table)))))
 	(if beg
 	    (decode-coding-region beg end 'euc-china)))
+      (remove-text-properties (point-min) (point-max) '(hz-decoded nil))
       (- (point-max) (point-min)))))
 
 ;;;###autoload
@@ -142,6 +150,7 @@ encode-hz-region
     (save-restriction
       (narrow-to-region beg end)
 
+      (put-text-property beg end 'charset 'chinese-gb2312)
       ;; "~" -> "~~"
       (goto-char (point-min))
       (while (search-forward "~" nil t)	(insert ?~))





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
                   ` (3 preceding siblings ...)
  2016-07-24  8:21 ` ynyaaa
@ 2016-07-29  1:05 ` ynyaaa
  2016-08-14 11:22   ` handa
  2016-08-17  6:33 ` ynyaaa
  5 siblings, 1 reply; 16+ messages in thread
From: ynyaaa @ 2016-07-29  1:05 UTC (permalink / raw)
  To: handa; +Cc: 23814

handa <handa@gnu.org> writes:

> In article <87twffigzv.fsf@gmail.com>, ynyaaa@gmail.com writes:
>
>> But I found other bugs about decodings of "~" escape.
>> "~~" and "~{!!~}" should be encoded and decoded as below.
>>     "~~" -> "~~~~" -> "~~"
>>     "~{!!~}" -> "~~{!!~~}" -> "~{!!~}"
>
>> In really they are encoded properly, but decoded in wrong way.
>>     (decode-coding-string (encode-coding-string "~~" 'hz) 'hz)
>>>> "~"
>>     (decode-coding-string (encode-coding-string "~{!!~}" 'hz) 'hz)
>>>> #("\x3000" 0 1 (charset chinese-gb2312))
>
> Thank you for finding those bugs.  Could you please try the attached
> patch instead?
>
> ---
> K. Handa
> handa@gnu.org

If there are unencodable characters, encodable characters may be broken.
In this example, the second ?\x4E00 character disappears.
    (set-language-environment 'Chinese-GB)
    (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
    => "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"

To avoid this behavior, there are some solutions.
(a) While decoding, replace "~{...~}" with "\e$A...\e(B"
    and decode with iso-2022-7bit.
(b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
    and insert "\e$)A" at the beginning of the temp buffer
    and decode with iso-2022-8bit-ss2.
    (8bit data are decoded as euc-cn.)
(c) While encoding, use euc-cn instead of iso-2022-7bit
    and translate each consecutive 8bit data to 7bit data
    prefixed by "~{" and postfixed by "~}".


By the way, RFC1843 describes:
    The escape sequence '~\n' is a line-continuation marker to be
    consumed with no output produced.

This form shoud return "AB".
    (decode-coding-string "A~\nB" 'hz)
    => "A\nB"

> diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
> index e531640..9abdae1 100644
> --- a/lisp/language/china-util.el
> +++ b/lisp/language/china-util.el
> @@ -95,7 +95,12 @@ decode-hz-region
>  	(goto-char (point-min))
>  	(while (search-forward "~" nil t)
>  	  (setq ch (following-char))
> -	  (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
> +          (if (= ch ?{)
> +              (search-forward "~}" nil 'move)
> +            (when (or (= ch ?\n) (= ch ?~))
> +              (delete-char -1)
> +              (put-text-property (point) (1+ (point)) 'hz-decoded t)
> +              (forward-char 1))))
>  
>  	;; "^zW...\n" -> Chinese GB2312
>  	;; "~{...~}"  -> Chinese GB2312
> @@ -104,6 +109,8 @@ decode-hz-region
>  	(while (re-search-forward hz/zw-start-gb nil t)
>  	  (setq pos (match-beginning 0)
>  		ch (char-after pos))
> +          (if (and (= ch ?~) (get-text-property pos 'hz-decoded))
> +              (forward-char 1)
>  	  ;; Record the first position to start conversion.
>  	  (or beg (setq beg pos))
>  	  (end-of-line)
> @@ -122,9 +129,10 @@ decode-hz-region
>  				  t)
>  		  (delete-char -2))
>  	      (setq end (point))
> -	      (translate-region pos (point) hz-set-msb-table))))
> +	      (translate-region pos (point) hz-set-msb-table)))))
>  	(if beg
>  	    (decode-coding-region beg end 'euc-china)))
> +      (remove-text-properties (point-min) (point-max) '(hz-decoded nil))
>        (- (point-max) (point-min)))))
>  
>  ;;;###autoload
> @@ -142,6 +150,7 @@ encode-hz-region
>      (save-restriction
>        (narrow-to-region beg end)
>  
> +      (put-text-property beg end 'charset 'chinese-gb2312)
>        ;; "~" -> "~~"
>        (goto-char (point-min))
>        (while (search-forward "~" nil t)	(insert ?~))





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-07-29  1:05 ` ynyaaa
@ 2016-08-14 11:22   ` handa
  0 siblings, 0 replies; 16+ messages in thread
From: handa @ 2016-08-14 11:22 UTC (permalink / raw)
  To: ynyaaa; +Cc: 23814

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

Hi, sorry for the late response.  I've just noticed that my reply mail
didn't go out successfully.  I'm trying to re-send it.

I wrote:

> In article <871t2dz22d.fsf@gmail.com>, ynyaaa@gmail.com writes:
> > If there are unencodable characters, encodable characters may be broken.
> > In this example, the second ?\x4E00 character disappears.
> >     (set-language-environment 'Chinese-GB)
> >     (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz)
> >>> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273"
> 
> How to treat unencodable characters on encoding is a difficult problem.
> As HZ is designed for 7-bit environment, I think it's important to keep
> 7-bit on encoding.  So, the new code uses \uXXXX for those characters.
> Another way is to use UTF-8 sequence for them, then we can decode it
> back.  Which, do yo think, is better?
> 
> > To avoid this behavior, there are some solutions.
> > (a) While decoding, replace "~{...~}" with "\e$A...\e(B"
> >     and decode with iso-2022-7bit.
> > (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding
> >     and insert "\e$)A" at the beginning of the temp buffer
> >     and decode with iso-2022-8bit-ss2.
> >     (8bit data are decoded as euc-cn.)
> > (c) While encoding, use euc-cn instead of iso-2022-7bit
> >     and translate each consecutive 8bit data to 7bit data
> >     prefixed by "~{" and postfixed by "~}".
> 
> I adopted the (a) method for decoding, and fix bugs encoding code.
> 
> > By the way, RFC1843 describes:
> >     The escape sequence '~\n' is a line-continuation marker to be
> >     consumed with no output produced.
> 
> The variable decode-hz-line-continuation controls this feature.  I don't
> remember why the default is nil (i.e. do not decode ~\n), perhaps some
> Chinese people I was discussing with on implementing HZ support
> suggested that.
> 
> Attched is the full china-util.el (not a diff).
> 
> ---
> K. Handa
> handa@gnu.org


[-- Attachment #2: china-util.el --]
[-- Type: application/emacs-lisp, Size: 6915 bytes --]

;;; china-util.el --- utilities for Chinese  -*- coding: utf-8 -*-

;; Copyright (C) 1995, 2001-2016 Free Software Foundation, Inc.
;; Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
;;   2005, 2006, 2007, 2008, 2009, 2010, 2011
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H14PRO021
;; Copyright (C) 2003
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H13PRO009

;; Keywords: mule, multilingual, Chinese

;; This file is part of GNU Emacs.

;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:

;;; Code:

;; Hz/ZW/EUC-TW encoding stuff

;; HZ is an encoding method for Chinese character set GB2312 used
;; widely in Internet.  It is very similar to 7-bit environment of
;; ISO-2022.  The difference is that HZ uses the sequence "~{" and
;; "~}" for designating GB2312 and ASCII respectively, hence, it
;; doesn't uses ESC (0x1B) code.

;; ZW is another encoding method for Chinese character set GB2312.  It
;; encodes Chinese characters line by line by starting each line with
;; the sequence "zW".  It also uses only 7-bit as HZ.

;; EUC-TW is similar to EUC-KS or EUC-JP.  Its main character set is
;; plane 1 of CNS 11643; characters of planes 2 to 7 are accessed with
;; a single shift escape followed by three bytes: the first gives the
;; plane, the second and third the character code.  Note that characters
;; of plane 1 are (redundantly) accessible with a single shift escape
;; also.

;; ISO-2022 escape sequence to designate GB2312.
(defvar iso2022-gb-designation "\e$A")
;; HZ escape sequence to designate GB2312.
(defvar hz-gb-designation "~{")
;; ISO-2022 escape sequence to designate ASCII.
(defvar iso2022-ascii-designation "\e(B")
;; HZ escape sequence to designate ASCII.
(defvar hz-ascii-designation "~}")
;; Regexp of ZW sequence to start GB2312.
(defvar zw-start-gb "^zW")
;; Regexp for start of GB2312 in an encoding mixture of HZ and ZW.
(defvar hz/zw-start-gb
  (concat hz-gb-designation "\\|" zw-start-gb "\\|[^\0-\177]"))

(defvar decode-hz-line-continuation nil
  "Flag to tell if we should care line continuation convention of Hz.")

(defconst hz-set-msb-table
  (eval-when-compile
    (let ((chars nil)
	  (i 0))
      (while (< i 33)
	(push i chars)
	(setq i (1+ i)))
      (while (< i 127)
	(push (decode-char 'eight-bit (+ i 128)) chars)
	(setq i (1+ i)))
      (apply 'string (nreverse chars)))))

;;;###autoload
(defun decode-hz-region (beg end)
  "Decode HZ/ZW encoded text in the current region.
Return the length of resulting text."
  (interactive "r")
  (save-excursion
    (save-restriction
      (let (pos ch)
	(narrow-to-region beg end)

	;; We, at first, convert HZ/ZW to `iso-2022-7bit',
	;; then decode it.

	;; "~\n" -> "", "~~" -> "~"
	(goto-char (point-min))
	(while (search-forward "~" nil t)
	  (setq ch (following-char))
	  (cond ((= ch ?{)
		 (delete-region (1- (point)) (1+ (point)))
		 (setq pos (point))
		 (insert iso2022-gb-designation)
		 (if (looking-at "\\([!-}][!-~]\\)*")
		     (goto-char (match-end 0)))
		 (if (looking-at hz-ascii-designation)
		     (delete-region (match-beginning 0) (match-end 0)))
		 (insert iso2022-ascii-designation)
		 (decode-coding-region pos (point) 'iso-2022-7bit))

		((= ch ?~)
		 (delete-char 1))

		((and (= ch ?\n)
		      decode-hz-line-continuation)
		 (delete-region (1- (point)) (1+ (point))))

		(t
		 (forward-char 1)))))

      (- (point-max) (point-min)))))

;;;###autoload
(defun decode-hz-buffer ()
  "Decode HZ/ZW encoded text in the current buffer."
  (interactive)
  (decode-hz-region (point-min) (point-max)))

(defvar hz-category-table nil)

;;;###autoload
(defun encode-hz-region (beg end)
  "Encode the text in the current region to HZ.
Return the length of resulting text."
  (interactive "r")
  (unless hz-category-table
    (setq hz-category-table (make-category-table))
    (with-category-table hz-category-table
      (define-category ?c "hz encodable")
      (map-charset-chars #'modify-category-entry 'ascii ?c)
      (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)))
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (with-category-table hz-category-table
	;; ~ -> ~~
	(goto-char (point-min))
	(while (search-forward "~" nil t) (insert ?~))

	;; ESC -> ESC ESC
	(goto-char (point-min))
	(while (search-forward "\e" nil t) (insert ?\e))

	;; Non-ASCII-GB2312 -> \uXXXX
	(goto-char (point-min))
	(while (re-search-forward "\\Cc" nil t)
	  (let ((ch (preceding-char)))
	    (delete-char -1)
	    (insert (format "\\u%04X" ch))))

	;; Prefer chinese-gb2312 for Chinese characters.
	(put-text-property (point-min) (point-max) 'charset 'chinese-gb2312)
	(encode-coding-region (point-min) (point-max) 'iso-2022-7bit)

	;; ESC $ B ... ESC ( B  -> ~{ ... ~}
	;; ESC ESC -> ESC
	(goto-char (point-min))
	(while (search-forward "\e" nil t)
	  (if (= (following-char) ?\e)
	      ;; ESC ESC -> ESC
	      (delete-char 1)
	    (forward-char -1)
	    (if (looking-at iso2022-gb-designation)
		(progn
		  (delete-region (match-beginning 0) (match-end 0))
		  (insert hz-gb-designation)
		  (search-forward iso2022-ascii-designation nil 'move)
		  (delete-region (match-beginning 0) (match-end 0))
		  (insert hz-ascii-designation))))))
      (- (point-max) (point-min)))))

;;;###autoload
(defun encode-hz-buffer ()
  "Encode the text in the current buffer to HZ."
  (interactive)
  (encode-hz-region (point-min) (point-max)))

;;;###autoload
(defun post-read-decode-hz (len)
  (let ((pos (point))
	(buffer-modified-p (buffer-modified-p))
	last-coding-system-used)
    (prog1
	(decode-hz-region pos (+ pos len))
      (set-buffer-modified-p buffer-modified-p))))

;;;###autoload
(defun pre-write-encode-hz (from to)
  (let ((buf (current-buffer)))
    (set-buffer (generate-new-buffer " *temp*"))
    (if (stringp from)
	(insert from)
      (insert-buffer-substring buf from to))
    (let (last-coding-system-used)
      (encode-hz-region 1 (point-max)))
    nil))
;;
(provide 'china-util)

;;; china-util.el ends here

^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
                   ` (4 preceding siblings ...)
  2016-07-29  1:05 ` ynyaaa
@ 2016-08-17  6:33 ` ynyaaa
  2016-08-17 14:43   ` handa
  5 siblings, 1 reply; 16+ messages in thread
From: ynyaaa @ 2016-08-17  6:33 UTC (permalink / raw)
  To: handa; +Cc: 23814

Hi, I tried new china-util.el. It works very well.

handa <handa@gnu.org> writes:
> Hi, sorry for the late response.  I've just noticed that my reply mail
> didn't go out successfully.  I'm trying to re-send it.

>> How to treat unencodable characters on encoding is a difficult problem.
>> As HZ is designed for 7-bit environment, I think it's important to keep
>> 7-bit on encoding.  So, the new code uses \uXXXX for those characters.
>> Another way is to use UTF-8 sequence for them, then we can decode it
>> back.  Which, do yo think, is better?

I prefer 7bit encoding to use only 7bit data, too.
As for elisp, "\u12345" is treated as "\u1234\ 5".





^ permalink raw reply	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-08-17  6:33 ` ynyaaa
@ 2016-08-17 14:43   ` handa
  2016-08-17 15:28     ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: handa @ 2016-08-17 14:43 UTC (permalink / raw)
  To: ynyaaa; +Cc: 23814

In article <87oa4rdhvq.fsf@gmail.com>, ynyaaa@gmail.com writes:

> Hi, I tried new china-util.el. It works very well.

Thank you for testing it.

> I prefer 7bit encoding to use only 7bit data, too.
> As for elisp, "\u12345" is treated as "\u1234\ 5".

Ah, ok, I changed to encode characters not in BMP to \UXXXXXXXX.

I've just committed the attached change.

---
K. Handa
handa@gnu.org

2016-08-17  handa  <handa@gnu.org>

	* lisp/language/china-util.el (decode-hz-region): Pay
	attention to "~~}" sequence at the end of Chinese character
	range.
	(hz-category-table): New variable.
	(encode-hz-region): Convert non-encodable characters to
	\u... and \U...  Preserve ESC on ecoding.  Put
	`chinese-gb2312' `charset' text property in advance to force
	iso-2022-encoding to select chinese-gb2312 designation.

diff --git a/lisp/language/china-util.el b/lisp/language/china-util.el
index e531640..6505fb8 100644
--- a/lisp/language/china-util.el
+++ b/lisp/language/china-util.el
@@ -88,43 +88,34 @@ decode-hz-region
       (let (pos ch)
 	(narrow-to-region beg end)
 
-	;; We, at first, convert HZ/ZW to `euc-china',
+	;; We, at first, convert HZ/ZW to `iso-2022-7bit',
 	;; then decode it.
 
-	;; "~\n" -> "\n", "~~" -> "~"
+	;; "~\n" -> "", "~~" -> "~"
 	(goto-char (point-min))
 	(while (search-forward "~" nil t)
 	  (setq ch (following-char))
-	  (if (or (= ch ?\n) (= ch ?~)) (delete-char -1)))
+	  (cond ((= ch ?{)
+		 (delete-region (1- (point)) (1+ (point)))
+		 (setq pos (point))
+		 (insert iso2022-gb-designation)
+		 (if (looking-at "\\([!-}][!-~]\\)*")
+		     (goto-char (match-end 0)))
+		 (if (looking-at hz-ascii-designation)
+		     (delete-region (match-beginning 0) (match-end 0)))
+		 (insert iso2022-ascii-designation)
+		 (decode-coding-region pos (point) 'iso-2022-7bit))
+
+		((= ch ?~)
+		 (delete-char 1))
+
+		((and (= ch ?\n)
+		      decode-hz-line-continuation)
+		 (delete-region (1- (point)) (1+ (point))))
+
+		(t
+		 (forward-char 1)))))
 
-	;; "^zW...\n" -> Chinese GB2312
-	;; "~{...~}"  -> Chinese GB2312
-	(goto-char (point-min))
-	(setq beg nil)
-	(while (re-search-forward hz/zw-start-gb nil t)
-	  (setq pos (match-beginning 0)
-		ch (char-after pos))
-	  ;; Record the first position to start conversion.
-	  (or beg (setq beg pos))
-	  (end-of-line)
-	  (setq end (point))
-	  (if (>= ch 128)		; 8bit GB2312
-	      nil
-	    (goto-char pos)
-	    (delete-char 2)
-	    (setq end (- end 2))
-	    (if (= ch ?z)			; ZW -> euc-china
-		(progn
-		  (translate-region (point) end hz-set-msb-table)
-		  (goto-char end))
-	      (if (search-forward hz-ascii-designation
-				  (if decode-hz-line-continuation nil end)
-				  t)
-		  (delete-char -2))
-	      (setq end (point))
-	      (translate-region pos (point) hz-set-msb-table))))
-	(if beg
-	    (decode-coding-region beg end 'euc-china)))
       (- (point-max) (point-min)))))
 
 ;;;###autoload
@@ -133,33 +124,57 @@ decode-hz-buffer
   (interactive)
   (decode-hz-region (point-min) (point-max)))
 
+(defvar hz-category-table nil)
+
 ;;;###autoload
 (defun encode-hz-region (beg end)
   "Encode the text in the current region to HZ.
 Return the length of resulting text."
   (interactive "r")
+  (unless hz-category-table
+    (setq hz-category-table (make-category-table))
+    (with-category-table hz-category-table
+      (define-category ?c "hz encodable")
+      (map-charset-chars #'modify-category-entry 'ascii ?c)
+      (map-charset-chars #'modify-category-entry 'chinese-gb2312 ?c)))
   (save-excursion
     (save-restriction
       (narrow-to-region beg end)
+      (with-category-table hz-category-table
+	;; ~ -> ~~
+	(goto-char (point-min))
+	(while (search-forward "~" nil t) (insert ?~))
+
+	;; ESC -> ESC ESC
+	(goto-char (point-min))
+	(while (search-forward "\e" nil t) (insert ?\e))
 
-      ;; "~" -> "~~"
-      (goto-char (point-min))
-      (while (search-forward "~" nil t)	(insert ?~))
-
-      ;; Chinese GB2312 -> "~{...~}"
-      (goto-char (point-min))
-      (if (re-search-forward "\\cc" nil t)
-	  (let (pos)
-	    (goto-char (setq pos (match-beginning 0)))
-	    (encode-coding-region pos (point-max) 'iso-2022-7bit)
-	    (goto-char pos)
-	    (while (search-forward iso2022-gb-designation nil t)
-	      (delete-char -3)
-	      (insert hz-gb-designation))
-	    (goto-char pos)
-	    (while (search-forward iso2022-ascii-designation nil t)
-	      (delete-char -3)
-	      (insert hz-ascii-designation))))
+	;; Non-ASCII-GB2312 -> \uXXXX
+	(goto-char (point-min))
+	(while (re-search-forward "\\Cc" nil t)
+	  (let ((ch (preceding-char)))
+	    (delete-char -1)
+	    (insert (format (if (< ch #x10000) "\\u%04X" "\\U%08X") ch))))
+
+	;; Prefer chinese-gb2312 for Chinese characters.
+	(put-text-property (point-min) (point-max) 'charset 'chinese-gb2312)
+	(encode-coding-region (point-min) (point-max) 'iso-2022-7bit)
+
+	;; ESC $ B ... ESC ( B  -> ~{ ... ~}
+	;; ESC ESC -> ESC
+	(goto-char (point-min))
+	(while (search-forward "\e" nil t)
+	  (if (= (following-char) ?\e)
+	      ;; ESC ESC -> ESC
+	      (delete-char 1)
+	    (forward-char -1)
+	    (if (looking-at iso2022-gb-designation)
+		(progn
+		  (delete-region (match-beginning 0) (match-end 0))
+		  (insert hz-gb-designation)
+		  (search-forward iso2022-ascii-designation nil 'move)
+		  (delete-region (match-beginning 0) (match-end 0))
+		  (insert hz-ascii-designation))))))
       (- (point-max) (point-min)))))
 
 ;;;###autoload





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* bug#23814: 24.5; bug of hz coding-system
  2016-08-17 14:43   ` handa
@ 2016-08-17 15:28     ` Eli Zaretskii
  0 siblings, 0 replies; 16+ messages in thread
From: Eli Zaretskii @ 2016-08-17 15:28 UTC (permalink / raw)
  To: handa; +Cc: ynyaaa, 23814

> From: handa <handa@gnu.org>
> Cc: eliz@gnu.org,  23814@debbugs.gnu.org
> Date: Wed, 17 Aug 2016 23:43:13 +0900
> 
> In article <87oa4rdhvq.fsf@gmail.com>, ynyaaa@gmail.com writes:
> 
> > Hi, I tried new china-util.el. It works very well.
> 
> Thank you for testing it.
> 
> > I prefer 7bit encoding to use only 7bit data, too.
> > As for elisp, "\u12345" is treated as "\u1234\ 5".
> 
> Ah, ok, I changed to encode characters not in BMP to \UXXXXXXXX.
> 
> I've just committed the attached change.

Thanks.  Please close the bug if satisfied with the solution.





^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-08-17 15:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-21 12:22 bug#23814: 24.5; bug of hz coding-system ynyaaa
2016-06-21 12:58 ` Eli Zaretskii
2016-06-22 13:47 ` ynyaaa
2016-06-22 15:28   ` Eli Zaretskii
2016-06-22 17:04 ` ynyaaa
2016-06-22 17:26   ` Eli Zaretskii
2016-07-09 11:20     ` Eli Zaretskii
2016-07-13 14:12       ` handa
2016-07-23 17:47         ` Eli Zaretskii
2016-07-24  8:21 ` ynyaaa
2016-07-26 15:09   ` handa
2016-07-29  1:05 ` ynyaaa
2016-08-14 11:22   ` handa
2016-08-17  6:33 ` ynyaaa
2016-08-17 14:43   ` handa
2016-08-17 15:28     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).