emacs thinks UTF-8 can't encode Japanese text?

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* emacs thinks UTF-8 can't encode Japanese text?
@ 2005-01-12  6:32 James Ralston
  0 siblings, 0 replies; 3+ messages in thread
From: James Ralston @ 2005-01-12  6:32 UTC (permalink / raw)


I'm trying to use Emacs 21.3 on Fedora Core 3 to edit files containing
Japanese text encoded with UTF-8.

I've used the same version of Emacs on Fedora Core 2 with no problems.
Everything just works.  My locale is the same on both systems:
en_US.UTF-8.

But on my FC3 system, if I visit a UTF-8 encoded file, the Japanese
characters display as empty boxes.  Also, if I paste Japanese text
into an Emacs window, and try to save the buffer, I receive this
message:

> These default coding systems were tried:
>   utf-8-unix
> However, none of them safely encodes the target text.

This message makes no sense, because UTF-8 encodes everything.

On my FC2 system, here's what "C-u C-x =" says:

>   character: い (0151044, 53796, 0xd224)
>     charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87)
>  code point: 36 36
>      syntax: word
>    category: H:Japanese Hiragana characters of 2-byte character sets  
>              j:Japanese  
>              |:While filling, we can break a line at this character.  
> buffer code: 0x92 0xA4 0xA4
>   file code: 0xE3 0x81 0x84 (encoded by coding system utf-8-unix)
>        font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0

On my FC3 system, here's what "C-u C-x =" on the same character says:

>   character: い (0151044, 53796, 0xd224)
>     charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87)
>  code point: 36 36
>      syntax: word
>    category: H:Japanese Hiragana characters of 2-byte character sets  
>              j:Japanese  
>              |:While filling, we can break a line at this character.  
> buffer code: 0x92 0xA4 0xA4
>   file code: not encodable by coding system utf-8-unix
>        font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0

The only difference is the "file code:" line.  But I don't understand
why Emacs 21.3 on FC3 doesn't think that UTF-8 encodes that character,
because it absolutely does.

The FC3 packager claims that he has no problems:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144707

Does anyone have any ideas?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: emacs thinks UTF-8 can't encode Japanese text?
       [not found] <mailman.12573.1105513276.27204.help-gnu-emacs@gnu.org>
@ 2005-01-13  4:09 ` Edward Casey
  2005-01-13 22:56 ` James Ralston
  1 sibling, 0 replies; 3+ messages in thread
From: Edward Casey @ 2005-01-13  4:09 UTC (permalink / raw)



"James Ralston" <ralston@pobox.com> wrote in message
news:mailman.12573.1105513276.27204.help-gnu-emacs@gnu.org...
I'm trying to use Emacs 21.3 on Fedora Core 3 to edit files containing
Japanese text encoded with UTF-8.

I've used the same version of Emacs on Fedora Core 2 with no problems.
Everything just works.  My locale is the same on both systems:
en_US.UTF-8.

But on my FC3 system, if I visit a UTF-8 encoded file, the Japanese
characters display as empty boxes.  Also, if I paste Japanese text
into an Emacs window, and try to save the buffer, I receive this
message:

> These default coding systems were tried:
>   utf-8-unix
> However, none of them safely encodes the target text.

This message makes no sense, because UTF-8 encodes everything.

On my FC2 system, here's what "C-u C-x =" says:

>   character: い (0151044, 53796, 0xd224)
>     charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji:
ISO-IR-87)
>  code point: 36 36
>      syntax: word
>    category: H:Japanese Hiragana characters of 2-byte character sets
>              j:Japanese
>              |:While filling, we can break a line at this character.
> buffer code: 0x92 0xA4 0xA4
>   file code: 0xE3 0x81 0x84 (encoded by coding system utf-8-unix)
>






 font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0

On my FC3 system, here's what "C-u C-x =" on the same character says:

>   character: い (0151044, 53796, 0xd224)
>     charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji:
ISO-IR-87)
>  code point: 36 36
>      syntax: word
>    category: H:Japanese Hiragana characters of 2-byte character sets
>              j:Japanese
>              |:While filling, we can break a line at this character.
> buffer code: 0x92 0xA4 0xA4
>   file code: not encodable by coding system utf-8-unix
>






 font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0

The only difference is the "file code:" line.  But I don't understand
why Emacs 21.3 on FC3 doesn't think that UTF-8 encodes that character,
because it absolutely does.

The FC3 packager claims that he has no problems:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144707

Does anyone have any ideas?

i.
I don't have any intelligent ones but this symptom is exactly the one I
was suffering from when I tried to display glyphs in the Latin
Extended-A range. Since emacs is too slow running under cygwin on a
win98 machine, I am using the nt build of 21.3 (actually i386-mingw..
win98). This needs not only fonts capable of displaying the foreign
language characters but also the code page files in \windows\system
named something like cp_nnnn.nls. I think it was Jason Rumney who clued
me in about this. If Linux uses a similar mechanism then you might have
to set up something equivalent for Japanese even though your locale is
us english utf-8 and you have installed all of the Red Hat distribution.

ii. Related (maybe) question:
After inserting characters in the Latin Extended-A range (code points
(0100 - 017f) I am not able to move to them using C-s or M-%. Are the
search and replace functions supposed to work with all characters that
can be entered into the mini-buffer?

Thanks,
Ed

^ permalink raw reply	[flat|nested] 3+ messages in thread

* emacs thinks UTF-8 can't encode Japanese text?
       [not found] <mailman.12573.1105513276.27204.help-gnu-emacs@gnu.org>
  2005-01-13  4:09 ` emacs thinks UTF-8 can't encode Japanese text? Edward Casey
@ 2005-01-13 22:56 ` James Ralston
  1 sibling, 0 replies; 3+ messages in thread
From: James Ralston @ 2005-01-13 22:56 UTC (permalink / raw)


I posted the following to gnu.emacs.help:

On 2005-01-12 at 01:32-05, James Ralston wrote:

> I'm trying to use Emacs 21.3 on Fedora Core 3 to edit files
> containing Japanese text encoded with UTF-8.
> 
> I've used the same version of Emacs on Fedora Core 2 with no
> problems.  Everything just works.  My locale is the same on both
> systems: en_US.UTF-8.
> 
> But on my FC3 system, if I visit a UTF-8 encoded file, the Japanese
> characters display as empty boxes.  Also, if I paste Japanese text
> into an Emacs window, and try to save the buffer, I receive this
> message:
> 
>> These default coding systems were tried:
>>   utf-8-unix
>> However, none of them safely encodes the target text.
> 
> This message makes no sense, because UTF-8 encodes everything.
> 
> On my FC2 system, here's what "C-u C-x =" says:
> 
>>   character: い (0151044, 53796, 0xd224)
>>     charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87)
>>  code point: 36 36
>>      syntax: word
>>    category: H:Japanese Hiragana characters of 2-byte character sets  
>>              j:Japanese  
>>              |:While filling, we can break a line at this character.  
>> buffer code: 0x92 0xA4 0xA4
>>   file code: 0xE3 0x81 0x84 (encoded by coding system utf-8-unix)
>>        font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0
> 
> On my FC3 system, here's what "C-u C-x =" on the same character says:
> 
>>   character: い (0151044, 53796, 0xd224)
>>     charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87)
>>  code point: 36 36
>>      syntax: word
>>    category: H:Japanese Hiragana characters of 2-byte character sets  
>>              j:Japanese  
>>              |:While filling, we can break a line at this character.  
>> buffer code: 0x92 0xA4 0xA4
>>   file code: not encodable by coding system utf-8-unix
>>        font: -mplus-gothic-medium-R-normal--12-120-75-75-C-120-jisx0208.1990-0
> 
> The only difference is the "file code:" line.  But I don't
> understand why Emacs 21.3 on FC3 doesn't think that UTF-8 encodes
> that character, because it absolutely does.
> 
> The FC3 packager claims that he has no problems:
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=144707
> 
> Does anyone have any ideas?

The more I ponder this, the more I'm beginning to think that this is
actually a bug with Emacs that I've managed to trigger somehow.
Claiming that UTF-8 doesn't encode い is bogus.

I've even gone so far as to trace Emacs while I open a file that
contains Japanese characters, but I didn't detect any glaring
differences between Emacs on FC2 (which works) and Emacs on FC3 (which
doesn't work).

I'm just about out of ideas.  Does anyone else have any?

Thanks,
James

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-01-13 22:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.12573.1105513276.27204.help-gnu-emacs@gnu.org>
2005-01-13  4:09 ` emacs thinks UTF-8 can't encode Japanese text? Edward Casey
2005-01-13 22:56 ` James Ralston
2005-01-12  6:32 James Ralston

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.