unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* chinese encoded in UTF-8 and XML
@ 2003-09-25 20:05 Knackeback
  2003-09-25 20:32 ` Andreas Prilop
  2003-09-26  2:52 ` Micah Cowan
  0 siblings, 2 replies; 7+ messages in thread
From: Knackeback @ 2003-09-25 20:05 UTC (permalink / raw)


Hi, I wrote a XML file with GNU emacs 21.2.2 and with 
chinese character content encoded in UTF-8.
I wrote something like:

<?xml version="1.0" encoding="UTF-8"?>
<test>
<chinese>撒</chinese>
<chinese>鰓</chinese>
</test>

and then I used "C-x RET f" and then I choosed utf-8.
Then I typed "C-x C-s" to save my file.
I hope this is the right way in emacs to store the content
as UTF-8 encoded text ?!
Now I tried to parse the file with xmllint. xmllint is a
small xml-parser program which comes with libxml2. 
The parser complains that the second "chinese line" is not proper
UTF-8.

==>

uhu:4: error: Input is not proper UTF-8, indicate encoding !
<chinese>鰓</chinese>
         ^
uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F
<chinese>鰓</chinese>

It is interesting that the parser only grumbles about the second 
chinese line.

I'm anxious to see an explanation !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: chinese encoded in UTF-8 and XML
  2003-09-25 20:05 chinese encoded in UTF-8 and XML Knackeback
@ 2003-09-25 20:32 ` Andreas Prilop
  2003-09-26  2:52 ` Micah Cowan
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Prilop @ 2003-09-25 20:32 UTC (permalink / raw)


Knackeback <knackeback@randspringer.de> wrote:

> Content-Type: text/plain; charset=big5
> 
> Hi, I wrote a XML file with GNU emacs 21.2.2 and with 
> chinese character content encoded in UTF-8.
> [...]
> I hope this is the right way in emacs to store the content
> as UTF-8 encoded text ?!

Probably not.

> uhu:4: error: Input is not proper UTF-8, indicate encoding !
> uhu:4: error: Bytes: 0xC4 0xCE 0x3C 0x2F

It seems your text was Big5-encoded, not UTF-8-encoded.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: chinese encoded in UTF-8 and XML
  2003-09-25 20:05 chinese encoded in UTF-8 and XML Knackeback
  2003-09-25 20:32 ` Andreas Prilop
@ 2003-09-26  2:52 ` Micah Cowan
  2003-09-26  4:58   ` Miles Bader
                     ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Micah Cowan @ 2003-09-26  2:52 UTC (permalink / raw)


Knackeback <knackeback@randspringer.de> writes:

> Hi, I wrote a XML file with GNU emacs 21.2.2 and with 
> chinese character content encoded in UTF-8.
> I wrote something like:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <test>
> <chinese>撒</chinese>
> <chinese>鰓</chinese>
> </test>
> 
> and then I used "C-x RET f" and then I choosed utf-8.
> Then I typed "C-x C-s" to save my file.
> I hope this is the right way in emacs to store the content
> as UTF-8 encoded text ?!
> Now I tried to parse the file with xmllint. xmllint is a
> small xml-parser program which comes with libxml2. 
> The parser complains that the second "chinese line" is not proper
> UTF-8.
> 
> ==>

FWICT, Emacs doesn't have a chinese input method which supports
unicode output... :-(  ...I've had similar troubles with
Japanese. I've also noted that, e.g. for greek, there are input
methods which explicitly support unicode, and others which do
not.

-Micah

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: chinese encoded in UTF-8 and XML
  2003-09-26  2:52 ` Micah Cowan
@ 2003-09-26  4:58   ` Miles Bader
  2003-09-26 14:12     ` James H.Cloos Jr.
       [not found]   ` <mailman.736.1064552317.21628.help-gnu-emacs@gnu.org>
  2003-09-26 16:16   ` Stefan Monnier
  2 siblings, 1 reply; 7+ messages in thread
From: Miles Bader @ 2003-09-26  4:58 UTC (permalink / raw)


Micah Cowan <micah@cowan.name> writes:
> FWICT, Emacs doesn't have a chinese input method which supports
> unicode output... :-( ...I've had similar troubles with Japanese. I've
> also noted that, e.g. for greek, there are input methods which
> explicitly support unicode, and others which do not.

CVS emacs supports unicode japanese fine if you do
`M-x utf-translate-cjk-mode'; not sure about chinese.

-Miles
-- 
"Whatever you do will be insignificant, but it is very important that
 you do it."  Mahatma Gandhi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: chinese encoded in UTF-8 and XML
       [not found]   ` <mailman.736.1064552317.21628.help-gnu-emacs@gnu.org>
@ 2003-09-26  6:39     ` Gernot Hassenpflug
  0 siblings, 0 replies; 7+ messages in thread
From: Gernot Hassenpflug @ 2003-09-26  6:39 UTC (permalink / raw)


Miles Bader <miles@lsi.nec.co.jp> writes:

> Micah Cowan <micah@cowan.name> writes:
>> FWICT, Emacs doesn't have a chinese input method which supports
>> unicode output... :-( ...I've had similar troubles with Japanese. I've
>> also noted that, e.g. for greek, there are input methods which
>> explicitly support unicode, and others which do not.
>
> CVS emacs supports unicode japanese fine if you do
> `M-x utf-translate-cjk-mode'; not sure about chinese.

Very interesting. So this function was not in 21.3, despite mule being
absorbed into emacs. Coding in iso-2022-jp was more prevalent and utf
was not required before this or what? I have been using emacs 21.3
with japanese encoding for years now. Is this issue part of the
argument between utf supporters and those who disagree with the idea?

-- 
G Hassenpflug RASC, Kyoto University

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: chinese encoded in UTF-8 and XML
  2003-09-26  4:58   ` Miles Bader
@ 2003-09-26 14:12     ` James H.Cloos Jr.
  0 siblings, 0 replies; 7+ messages in thread
From: James H.Cloos Jr. @ 2003-09-26 14:12 UTC (permalink / raw)
  Cc: help-gnu-emacs

>>>>> "Miles" == Miles Bader <miles@lsi.nec.co.jp> writes:

Miles> CVS emacs supports unicode japanese fine if you do `M-x
Miles> utf-translate-cjk-mode'; not sure about chinese.

I just tested it and it worked flawlessly.

It *might* be better, though, for utf-translate-cjk-mode to add lang
tags so that it can be reversed; right now when the files are read
back in they default to japanese text (at least with the mostly
default options I currently have).  If the runs of at least cjk text
were encased on lang tags then a save/kill-buffer/open sequence could
end up with the same glyphs as the input method generated.

The use of plane14 tags should be transparent to anything that
doesn't know them....

-JimC

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: chinese encoded in UTF-8 and XML
  2003-09-26  2:52 ` Micah Cowan
  2003-09-26  4:58   ` Miles Bader
       [not found]   ` <mailman.736.1064552317.21628.help-gnu-emacs@gnu.org>
@ 2003-09-26 16:16   ` Stefan Monnier
  2 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2003-09-26 16:16 UTC (permalink / raw)


>> and then I used "C-x RET f" and then I choosed utf-8.
>> Then I typed "C-x C-s" to save my file.
[...]
> FWICT, Emacs doesn't have a chinese input method which supports
> unicode output... :-(  ...I've had similar troubles with

But since he specified utf-8, Emacs should have complained rather than
silently use some other coding-system.
Please report the bug with M-x report-emacs-bug.


        Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-09-26 16:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-25 20:05 chinese encoded in UTF-8 and XML Knackeback
2003-09-25 20:32 ` Andreas Prilop
2003-09-26  2:52 ` Micah Cowan
2003-09-26  4:58   ` Miles Bader
2003-09-26 14:12     ` James H.Cloos Jr.
     [not found]   ` <mailman.736.1064552317.21628.help-gnu-emacs@gnu.org>
2003-09-26  6:39     ` Gernot Hassenpflug
2003-09-26 16:16   ` Stefan Monnier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).