utf-8 cjk translation bug?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* utf-8 cjk translation bug?
@ 2003-09-30  8:30 Miles Bader
  2003-09-30  9:50 ` Jason Rumney
  2003-09-30 12:59 ` Kenichi Handa
  0 siblings, 2 replies; 18+ messages in thread
From: Miles Bader @ 2003-09-30  8:30 UTC (permalink / raw)


I have `utf-translate-cjk-mode' enabled.

I have the following string in a buffer:

        ＮＥＣエレクトロニクス(株)

If I write it using say `euc-jp' coding system, no problem.  According
to `C-u C-x =', all the japanese characters are in the charset
japanese-jisx0208.

However, if I save it using utf-8, I get no complaints, but when I read
it back in, the first 3 characters show up as little boxes.  `C-u C-x ='
shows the boxes as being in charset mule-unicode-e000-ffff; the rest of
the characters are still listed as being in japanese-jisx0208.

I presume this is representable utf-8, because unicode is supposed to be
able to represent all characters in any component character set
simultaneously, so it would seem to be a bug in utf-translate-cjk-mode.

Any ideas?

Thanks,

-Miles
-- 
Is it true that nothing can be known?  If so how do we know this?  -Woody Allen

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-09-30  8:30 utf-8 cjk translation bug? Miles Bader
@ 2003-09-30  9:50 ` Jason Rumney
  2003-09-30 10:05   ` Miles Bader
  2003-09-30 12:59 ` Kenichi Handa
  1 sibling, 1 reply; 18+ messages in thread
From: Jason Rumney @ 2003-09-30  9:50 UTC (permalink / raw)
  Cc: emacs-devel

Miles Bader wrote:

> `C-u C-x =' shows the boxes as being in charset mule-unicode-e000-ffff; the rest of
> the characters are still listed as being in japanese-jisx0208.

I guess you would need a unicode font that includes double-width roman
characters. Or frob utf-translate-cjk-mode to ignore the fact that those
characters are within the representable range of unicode characters and
convert them to jisx0208 anyway.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-09-30  9:50 ` Jason Rumney
@ 2003-09-30 10:05   ` Miles Bader
  0 siblings, 0 replies; 18+ messages in thread
From: Miles Bader @ 2003-09-30 10:05 UTC (permalink / raw)
  Cc: emacs-devel

Jason Rumney <jasonr@gnu.org> writes:
> > `C-u C-x =' shows the boxes as being in charset
> > mule-unicode-e000-ffff; the rest of the characters are still listed
> > as being in japanese-jisx0208.
> 
> I guess you would need a unicode font that includes double-width roman
> characters. Or frob utf-translate-cjk-mode to ignore the fact that those
> characters are within the representable range of unicode characters and
> convert them to jisx0208 anyway.

Why would that be necessary?  The purpose of utf-translate-cjk-mode is
to translate external unicode encodings to/from emacs charsets.  For
instance, after reading the utf-8 file, the katakana characters in the
example I gave are represented in emacs using japanese-jisx0208, and
displayed using a JISX0208.1983 encoded font.  I don't think unicode
fonts come into play at all.

-Miles
-- 
Yo mama's so fat when she gets on an elevator it HAS to go down.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-09-30  8:30 utf-8 cjk translation bug? Miles Bader
  2003-09-30  9:50 ` Jason Rumney
@ 2003-09-30 12:59 ` Kenichi Handa
  2003-10-01 12:44   ` Dave Love
  1 sibling, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2003-09-30 12:59 UTC (permalink / raw)
  Cc: d.love, emacs-devel

In article <buo7k3q3cgu.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
> I have `utf-translate-cjk-mode' enabled.
> I have the following string in a buffer:

>         ＮＥＣエレクトロニクス(株)

> If I write it using say `euc-jp' coding system, no problem.  According
> to `C-u C-x =', all the japanese characters are in the charset
> japanese-jisx0208.

> However, if I save it using utf-8, I get no complaints, but when I read
> it back in, the first 3 characters show up as little boxes.  `C-u C-x ='
> shows the boxes as being in charset mule-unicode-e000-ffff; the rest of
> the characters are still listed as being in japanese-jisx0208.

> I presume this is representable utf-8, because unicode is supposed to be
> able to represent all characters in any component character set
> simultaneously, so it would seem to be a bug in utf-translate-cjk-mode.

The first three letters are "FULL WIDTH LATIN ?? LETTER"
(U+FF??).  Yes, they are representable in utf-8.  But, in
subst-jis.el, we have this code:

(mapc
 (lambda (pair)
   (let ((unicode (car pair))
	 (char (cadr pair)))
     ;; exclude non-CJK components from decode table
     (if (and (>= unicode #x2e80) (<= unicode #xd7a3))
	 (puthash unicode  char ucs-unicode-to-mule-cjk))
     (puthash char unicode ucs-mule-cjk-to-unicode)))

So, #xFF?? are excluded from ucs-unicode-to-mule-cjk, thus
they are not translated to japanese-jisx0208 on decoding.
If you have a ISO10646-1 font that contains full width
glyphs for those characters, you can see correct glyphs.

I think the reason why they are excluded from the
translation is that they are representable by the charset
mule-unicode-e000-ffff, thus there's no need of translation.
It seems to be a reasonable decision, but considering that
most users don't have an ISO10646-1 font containing those
glyphs, and that those characters can also be regarded as
CJK components (only CJK users uses them), I think we had
better not exclude them from the translation.

So, I suggest changing the above line (and similar lines in
the other subst-XXX.el) to:

     (if (>= unicode #x2e80)
	 (puthash unicode  char ucs-unicode-to-mule-cjk))

and modify ccl-decode-mule-utf-8 to check translation also
for those characters.

Dave, what do you think?  Does such a change leads to any
problem?  Aren't there anything else we should change?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-09-30 12:59 ` Kenichi Handa
@ 2003-10-01 12:44   ` Dave Love
  2003-10-02  1:08     ` Kenichi Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Love @ 2003-10-01 12:44 UTC (permalink / raw)
  Cc: emacs-devel, miles

Kenichi Handa <handa@m17n.org> writes:

> So, #xFF?? are excluded from ucs-unicode-to-mule-cjk, thus
> they are not translated to japanese-jisx0208 on decoding.
> If you have a ISO10646-1 font that contains full width
> glyphs for those characters, you can see correct glyphs.

Or you can display them with a jisx font, for instance.

> I think the reason why they are excluded from the
> translation is that they are representable by the charset
> mule-unicode-e000-ffff, thus there's no need of translation.

That was part of the reason for it -- the hash-based translation code
is only relevant because we more-or-less used up the code space for
the BMP.  I also chose the boundaries to avoid breaking the region
between the mule-unicode and CJK charsets.

> It seems to be a reasonable decision, but considering that
> most users don't have an ISO10646-1 font containing those
> glyphs,

I thought they typically did if they had 10646 fonts at all.  Is the
problem that in recent XFree86, for instance, the double-width
characters are in different fonts which have `adstyl' `ja' or `ko'?
As far as I remember, the fontset code doesn't deal with that yet.
(So many special cases, sigh.)

> and that those characters can also be regarded as
> CJK components (only CJK users uses them), I think we had
> better not exclude them from the translation.

I'm not really convinced, but I don't feel strongly about it.  (If the
extra charsets hadn't been added before mule-unicode, we'd just have
covered the BMP with more mule-unicode ones.)

> So, I suggest changing the above line (and similar lines in
> the other subst-XXX.el) to:
>
>      (if (>= unicode #x2e80)
> 	 (puthash unicode  char ucs-unicode-to-mule-cjk))
>
> and modify ccl-decode-mule-utf-8 to check translation also
> for those characters.
>
> Dave, what do you think?  Does such a change leads to any
> problem?

As far as I remember, it includes too much, and you end up displaying
some characters double width that probably shouldn't be, but I don't
remember which.  How about including the ranges of the double-width
Western characters and the high CJK stuff explicitly?  I guess it
doesn't expand the tables greatly.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-01 12:44   ` Dave Love
@ 2003-10-02  1:08     ` Kenichi Handa
  2003-10-03 16:04       ` Dave Love
  0 siblings, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2003-10-02  1:08 UTC (permalink / raw)
  Cc: emacs-devel, miles

In article <rzq65j9nn4j.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

>>  I think the reason why they are excluded from the
>>  translation is that they are representable by the charset
>>  mule-unicode-e000-ffff, thus there's no need of translation.

> That was part of the reason for it -- the hash-based translation code
> is only relevant because we more-or-less used up the code space for
> the BMP.  I also chose the boundaries to avoid breaking the region
> between the mule-unicode and CJK charsets.

Sorry, I don't understand the meaning of the last sentence.

>>  It seems to be a reasonable decision, but considering that
>>  most users don't have an ISO10646-1 font containing those
>>  glyphs,

> I thought they typically did if they had 10646 fonts at all.  Is the
> problem that in recent XFree86, for instance, the double-width
> characters are in different fonts which have `adstyl' `ja' or `ko'?

Ah, right, they have double-width glyphs for those chars.
But, I think there are still many those who are not using
the recent XFree86, or who have not installed those fonts.

> As far as I remember, the fontset code doesn't deal with that yet.
> (So many special cases, sigh.)

Right.  So, even for XFree86 users, to utilize those fonts,
we need extra work.

>>  and that those characters can also be regarded as
>>  CJK components (only CJK users uses them), I think we had
>>  better not exclude them from the translation.

> I'm not really convinced, but I don't feel strongly about it.  (If the
> extra charsets hadn't been added before mule-unicode, we'd just have
> covered the BMP with more mule-unicode ones.)

And if I knew it took that long time to release the code
that contains mule-unicode charsets, I'd implemented a
single 3-dimensional charset that covers almost all Unicode
characters (Charset-ID 159 is not yet used).

>>  So, I suggest changing the above line (and similar lines in
>>  the other subst-XXX.el) to:
>> 
>>       (if (>= unicode #x2e80)
>>  	 (puthash unicode  char ucs-unicode-to-mule-cjk))
>> 
>>  and modify ccl-decode-mule-utf-8 to check translation also
>>  for those characters.
>> 
>>  Dave, what do you think?  Does such a change leads to any
>>  problem?

> As far as I remember, it includes too much, and you end up displaying
> some characters double width that probably shouldn't be, but I don't
> remember which.  How about including the ranges of the double-width
> Western characters and the high CJK stuff explicitly?  I guess it
> doesn't expand the tables greatly.

Ok, I've just installed a code that include U+FF00..U+FFEF
in the decode tables.

Now, in utf-translate-cjk mode:

(decode-coding-string
 (encode-coding-string "ＮＥＣエレクトロニクス(株)" 'utf-8)
 'utf-8)
=> "ＮＥＣエレクトロニクス(株)"

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-02  1:08     ` Kenichi Handa
@ 2003-10-03 16:04       ` Dave Love
  2003-10-03 16:34         ` Jason Rumney
  2003-10-06 23:53         ` Kenichi Handa
  0 siblings, 2 replies; 18+ messages in thread
From: Dave Love @ 2003-10-03 16:04 UTC (permalink / raw)
  Cc: emacs-devel, miles

Kenichi Handa <handa@m17n.org> writes:

>> I also chose the boundaries to avoid breaking the region
>> between the mule-unicode and CJK charsets.
>
> Sorry, I don't understand the meaning of the last sentence.

mule-unicode-2500-33ff overlaps with one of the CJK blocks.  You want
to avoid translating the part that overlaps to mule-unicode-2500-33ff
so that the block is displayed in a consistent font by default.  Is
that clear?

> Ah, right, they have double-width glyphs for those chars.
> But, I think there are still many those who are not using
> the recent XFree86, or who have not installed those fonts.

I would have expected them to have iso10646 fonts if they are using
utf-8 (for the sake of applications other than Emacs) but maybe that
isn't the case.  You are obviously in a better position than I am to
decide the right thing.

> And if I knew it took that long time to release the code
> that contains mule-unicode charsets, I'd implemented a
> single 3-dimensional charset that covers almost all Unicode
> characters (Charset-ID 159 is not yet used).

I may have the remains of the partial implementation somewhere.  It
almost looks attractive again, as I guess there is no likelihood of
Emacs 22 being released remotely soon...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-03 16:04       ` Dave Love
@ 2003-10-03 16:34         ` Jason Rumney
  2003-10-06  2:29           ` Miles Bader
  2003-10-07 11:40           ` Dave Love
  2003-10-06 23:53         ` Kenichi Handa
  1 sibling, 2 replies; 18+ messages in thread
From: Jason Rumney @ 2003-10-03 16:34 UTC (permalink / raw)
  Cc: emacs-devel

Dave Love wrote:

> I would have expected them to have iso10646 fonts if they are using
> utf-8 (for the sake of applications other than Emacs) but maybe that
> isn't the case.

I think the problem is not that they don't have iso10646 fonts, it is
that the iso10646 fonts they do have do not contain any of the double
width characters, including double width roman that is in the 2500-33ff
range. Until Emacs gets a function to query which glyphs a font has (I
see such a function in emacs-unicode-2), then it is safer to use
localized fonts where possible instead of iso10646.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-03 16:34         ` Jason Rumney
@ 2003-10-06  2:29           ` Miles Bader
  2003-10-06 20:00             ` Miles Bader
  2003-10-07 11:41             ` Dave Love
  2003-10-07 11:40           ` Dave Love
  1 sibling, 2 replies; 18+ messages in thread
From: Miles Bader @ 2003-10-06  2:29 UTC (permalink / raw)
  Cc: Dave Love, emacs-devel

Jason Rumney <jasonr@gnu.org> writes:
> > I would have expected them to have iso10646 fonts if they are using
> > utf-8 (for the sake of applications other than Emacs) but maybe that
> > isn't the case.
> 
> I think the problem is not that they don't have iso10646 fonts, it is
> that the iso10646 fonts they do have do not contain any of the double
> width characters, including double width roman that is in the
> 2500-33ff range.

Yeah, that's definitely the case, and it's not just a problem with
double-width characters -- the coverage of many iso10646 fonts seems
completely crap.

E.g., see a post by `Danilo Segan' on this list.  It apparently contains
cyrillic characters encoded in UTF-8, which emacs dutifully tries to
render using an iso10646 font, but show up as square boxes on my
system...

Here's the output of `C-u C-x =', in case anyone is interested:

     character: с (01212141, 332897, 0x51461, U+0441)
       charset: mule-unicode-0100-24ff
                (Unicode characters of the range U+0100..U+24FF.)
    code point: 40 97
        syntax: w 	which means: word
      category: y:Cyrillic  
   buffer code: 0x9C 0xF4 0xA8 0xE1
     file code: 0x9C 0xF4 0xA8 0xE1 (encoded by coding system raw-text-unix)
       display: by this font (glyph code)
        -bitstream-bitstream vera sans mono-medium-r-normal--16-122-95-95-c-100-iso10646-1 (0x441)

-Miles
-- 
`Suppose Korea goes to the World Cup final against Japan and wins,' Moon said.
`All the past could be forgiven.'   [NYT]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-06  2:29           ` Miles Bader
@ 2003-10-06 20:00             ` Miles Bader
  2003-10-06 20:53               ` Jason Rumney
                                 ` (2 more replies)
  2003-10-07 11:41             ` Dave Love
  1 sibling, 3 replies; 18+ messages in thread
From: Miles Bader @ 2003-10-06 20:00 UTC (permalink / raw)


Miles Bader <miles@lsi.nec.co.jp> writes:
> Yeah, that's definitely the case, and it's not just a problem with
> double-width characters -- the coverage of many iso10646 fonts seems
> completely crap.

BTW, does this mean that the new unicode emacs will have problems
rendering many charsets that are currently displayed properly by emacs?

-Miles
-- 
We live, as we dream -- alone....

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-06 20:00             ` Miles Bader
@ 2003-10-06 20:53               ` Jason Rumney
  2003-10-06 23:18               ` Kenichi Handa
  2003-10-07  9:57               ` Stephen J. Turnbull
  2 siblings, 0 replies; 18+ messages in thread
From: Jason Rumney @ 2003-10-06 20:53 UTC (permalink / raw)
  Cc: emacs-devel

Miles Bader <miles@gnu.org> writes:

> Miles Bader <miles@lsi.nec.co.jp> writes:
> > Yeah, that's definitely the case, and it's not just a problem with
> > double-width characters -- the coverage of many iso10646 fonts seems
> > completely crap.
> 
> BTW, does this mean that the new unicode emacs will have problems
> rendering many charsets that are currently displayed properly by emacs?

I'm sure Handa-san can confirm for sure, but unicode Emacs has a
new function x_get_font_repertory which seems to deal with this
situation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-06 20:00             ` Miles Bader
  2003-10-06 20:53               ` Jason Rumney
@ 2003-10-06 23:18               ` Kenichi Handa
  2003-10-07  9:57               ` Stephen J. Turnbull
  2 siblings, 0 replies; 18+ messages in thread
From: Kenichi Handa @ 2003-10-06 23:18 UTC (permalink / raw)
  Cc: emacs-devel

In article <87u16mf867.fsf@tc-1-100.kawasaki.gol.ne.jp>, Miles Bader <miles@gnu.org> writes:

> Miles Bader <miles@lsi.nec.co.jp> writes:
>>  Yeah, that's definitely the case, and it's not just a problem with
>>  double-width characters -- the coverage of many iso10646 fonts seems
>>  completely crap.

> BTW, does this mean that the new unicode emacs will have problems
> rendering many charsets that are currently displayed properly by emacs?

No.  In emacs-unicode, we can assign multiple fonts for each
script, charset, or a range of character codes, and Emacs
selects one that has a requested glyph and has the highest
priority depending on the current langauge environment.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-03 16:04       ` Dave Love
  2003-10-03 16:34         ` Jason Rumney
@ 2003-10-06 23:53         ` Kenichi Handa
  2003-10-10 16:56           ` Dave Love
  1 sibling, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2003-10-06 23:53 UTC (permalink / raw)
  Cc: emacs-devel, miles

In article <rzqzngigvdy.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>>  I also chose the boundaries to avoid breaking the region
>>>  between the mule-unicode and CJK charsets.
>> 
>>  Sorry, I don't understand the meaning of the last sentence.

> mule-unicode-2500-33ff overlaps with one of the CJK blocks.  You want
> to avoid translating the part that overlaps to mule-unicode-2500-33ff
> so that the block is displayed in a consistent font by default.  Is
> that clear?

Yes.  I see.

>>  Ah, right, they have double-width glyphs for those chars.
>>  But, I think there are still many those who are not using
>>  the recent XFree86, or who have not installed those fonts.

> I would have expected them to have iso10646 fonts if they are using
> utf-8 (for the sake of applications other than Emacs) but maybe that
> isn't the case.  You are obviously in a better position than I am to
> decide the right thing.

"using utf-8" means many things.  If they are using utf-8
locale, I think they surely have those fonts.  But, as far
as I know, ja_JP.UTF-8 is still not that popular in Japan.
And, even in ja_JP.eucJP locale, people occasionally have to
use utf-8 file for many reasons.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-06 20:00             ` Miles Bader
  2003-10-06 20:53               ` Jason Rumney
  2003-10-06 23:18               ` Kenichi Handa
@ 2003-10-07  9:57               ` Stephen J. Turnbull
  2 siblings, 0 replies; 18+ messages in thread
From: Stephen J. Turnbull @ 2003-10-07  9:57 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Miles" == Miles Bader <miles@gnu.org> writes:

    Miles> Miles Bader <miles@lsi.nec.co.jp> writes:

    >> Yeah, that's definitely the case, and it's not just a problem
    >> with double-width characters -- the coverage of many iso10646
    >> fonts seems completely crap.

Microsoft Arial, for one of the egregious worst.  :-)  Why should a
Russian font designer be good at designing Sanskrit glyphs?  Who knows
which Thai fonts go well with a given Arabic one?  ISO 10646 fonts
should cover what their designers are good at, no more---and not
necessarily no less.

It's not even obvious that there should be terminal fonts with
universal coverage.  Some Unihan users will surely object to any
given choice of glyphs, for example.

    Miles> BTW, does this mean that the new unicode emacs will have
    Miles> problems rendering many charsets that are currently
    Miles> displayed properly by emacs?

In practice, probably yes ... but that will be a bug easily fixed, and
long before release.  :-)  Post-modern Emacsen will have to have
support for efficiently querying font repertoire (I see emacs-unicode
already has a defined API).  For example, you'll get this for free
with Xft2/fontconfig.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-03 16:34         ` Jason Rumney
  2003-10-06  2:29           ` Miles Bader
@ 2003-10-07 11:40           ` Dave Love
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Love @ 2003-10-07 11:40 UTC (permalink / raw)
  Cc: emacs-devel

Jason Rumney <jasonr@gnu.org> writes:

> I think the problem is not that they don't have iso10646 fonts, it is
> that the iso10646 fonts they do have do not contain any of the double
> width characters,

I meant one with the appropriate repertoire.

> including double width roman that is in the 2500-33ff
> range. Until Emacs gets a function to query which glyphs a font has (I
> see such a function in emacs-unicode-2), then it is safer to use
> localized fonts where possible instead of iso10646.

See TODO, I think.  However, checking the large number of
tiny-repertoire fonts that are randomly encoded as iso10646 that you
might have under XFree86, for instance, seems like bad news.  You at
least have to load them to extract information.

[The issue of which fonts you use is actually orthogonal to which
charsets you decode into, as long as you have the translation tables
available.  The CCL for the font encoding may not exist currently in
cases where it would be useful, but it is trivial to provide.  See
examples which update `font-ccl-encoder-alist'.]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-06  2:29           ` Miles Bader
  2003-10-06 20:00             ` Miles Bader
@ 2003-10-07 11:41             ` Dave Love
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Love @ 2003-10-07 11:41 UTC (permalink / raw)
  Cc: emacs-devel, Jason Rumney

Miles Bader <miles@lsi.nec.co.jp> writes:

> Yeah, that's definitely the case, and it's not just a problem with
> double-width characters -- the coverage of many iso10646 fonts seems
> completely crap.

You must have missed my rants on the topic (and Unicode fundamentalism
in general).  Emacs 22 tries to work around the problems with such
fonts.  presumably the proper solution would be to have low-volume
meta information available about the repertoires.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-06 23:53         ` Kenichi Handa
@ 2003-10-10 16:56           ` Dave Love
  2003-10-13 23:55             ` Kenichi Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Love @ 2003-10-10 16:56 UTC (permalink / raw)
  Cc: emacs-devel, miles

Kenichi Handa <handa@m17n.org> writes:

> "using utf-8" means many things.  If they are using utf-8
> locale, I think they surely have those fonts.  But, as far
> as I know, ja_JP.UTF-8 is still not that popular in Japan.
> And, even in ja_JP.eucJP locale, people occasionally have to
> use utf-8 file for many reasons.

Of course, but I assume they don't use Emacs in isolation and they
probably would display such files in other ways sometimes.  Since we
have to make the best of things which we probably can't get obviously
right in all circumstances, I want to understand the context and I
don't mean to argue with what you think is appropriate.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: utf-8 cjk translation bug?
  2003-10-10 16:56           ` Dave Love
@ 2003-10-13 23:55             ` Kenichi Handa
  0 siblings, 0 replies; 18+ messages in thread
From: Kenichi Handa @ 2003-10-13 23:55 UTC (permalink / raw)
  Cc: miles, emacs-devel

In article <rzqzng9vxou.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  "using utf-8" means many things.  If they are using utf-8
>>  locale, I think they surely have those fonts.  But, as far
>>  as I know, ja_JP.UTF-8 is still not that popular in Japan.
>>  And, even in ja_JP.eucJP locale, people occasionally have to
>>  use utf-8 file for many reasons.

> Of course, but I assume they don't use Emacs in isolation and they
> probably would display such files in other ways sometimes.

Sometimes, Emacs is the only way (or, at least, the only way
he knows) to see files that are encoded in a way that his
locale doesn't support.

> Since we have to make the best of things which we probably
> can't get obviously right in all circumstances, I want to
> understand the context and I don't mean to argue with what
> you think is appropriate.

Yes, I understand your intention.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-10-13 23:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-30  8:30 utf-8 cjk translation bug? Miles Bader
2003-09-30  9:50 ` Jason Rumney
2003-09-30 10:05   ` Miles Bader
2003-09-30 12:59 ` Kenichi Handa
2003-10-01 12:44   ` Dave Love
2003-10-02  1:08     ` Kenichi Handa
2003-10-03 16:04       ` Dave Love
2003-10-03 16:34         ` Jason Rumney
2003-10-06  2:29           ` Miles Bader
2003-10-06 20:00             ` Miles Bader
2003-10-06 20:53               ` Jason Rumney
2003-10-06 23:18               ` Kenichi Handa
2003-10-07  9:57               ` Stephen J. Turnbull
2003-10-07 11:41             ` Dave Love
2003-10-07 11:40           ` Dave Love
2003-10-06 23:53         ` Kenichi Handa
2003-10-10 16:56           ` Dave Love
2003-10-13 23:55             ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).