Hi, sorry for the late response. I've just noticed that my reply mail didn't go out successfully. I'm trying to re-send it. I wrote: > In article <871t2dz22d.fsf@gmail.com>, ynyaaa@gmail.com writes: > > If there are unencodable characters, encodable characters may be broken. > > In this example, the second ?\x4E00 character disappears. > > (set-language-environment 'Chinese-GB) > > (decode-coding-string (encode-coding-string "\x4E00\x00B7\x4E00" 'hz) 'hz) > >>> "\x4E00\e\x3048\x6070\x70B3\x11213D\300\273" > > How to treat unencodable characters on encoding is a difficult problem. > As HZ is designed for 7-bit environment, I think it's important to keep > 7-bit on encoding. So, the new code uses \uXXXX for those characters. > Another way is to use UTF-8 sequence for them, then we can decode it > back. Which, do yo think, is better? > > > To avoid this behavior, there are some solutions. > > (a) While decoding, replace "~{...~}" with "\e$A...\e(B" > > and decode with iso-2022-7bit. > > (b) Like (a), replace "~{...~}" with "\e$A...\e(B" while decoding > > and insert "\e$)A" at the beginning of the temp buffer > > and decode with iso-2022-8bit-ss2. > > (8bit data are decoded as euc-cn.) > > (c) While encoding, use euc-cn instead of iso-2022-7bit > > and translate each consecutive 8bit data to 7bit data > > prefixed by "~{" and postfixed by "~}". > > I adopted the (a) method for decoding, and fix bugs encoding code. > > > By the way, RFC1843 describes: > > The escape sequence '~\n' is a line-continuation marker to be > > consumed with no output produced. > > The variable decode-hz-line-continuation controls this feature. I don't > remember why the default is nil (i.e. do not decode ~\n), perhaps some > Chinese people I was discussing with on implementing HZ support > suggested that. > > Attched is the full china-util.el (not a diff). > > --- > K. Handa > handa@gnu.org