Re: Possible UTF-8 CJK Regressions in Terminal Emulators

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
       [not found]           ` <200403010815.RAA14365@etlken.m17n.org>
@ 2004-03-18 15:34             ` Dave Love
  2004-04-07 12:30               ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Love @ 2004-03-18 15:34 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

[I don't know what this has to do with the subject.]

Kenichi Handa <handa@m17n.org> writes:

> In article <buoeksjyee3.fsf@mcspd15.ucom.lsi.nec.co.jp>, Miles Bader <miles@lsi.nec.co.jp> writes:
>
>> Alexander Winston <alexander.winston@comcast.net> writes:
>>>  Okay, back to UTF-8. With regard to CJK being disabled to default, I
>>>  believe that this decision is rather prejudicial to many Asian users.

I don't think so.  There's no reason why you shouldn't define a
language environment corresponding to ja_JP.UTF-8 which turned it on
(not that language environments is the right approach to locale
handling).

Anyway, there seems to be very little interest from users; I don't
recall any of the contributions I expected.  Mostly I've just had
unhelpful remarks from non-CJK users.

>> I've been told that the reason `utf-translate-cjk-mode' is disabled by
>> default is that it consumes some non-trivial amount of memory (and
>> loading time,

Yes.  If I remember correctly, I posted measurements.

>> unless it's dumped I guess).

It doesn't make sense to dump it, the way it works.

> As we have post-read-conversion function for utf-8, it is
> possible to detect untranslated CJK characters and translate
> them.
>
> How abut this?
>
> Change utf-translate-cjk-mode to a customizable variable
> utf-translate-cjk which is nil, t, or auto (default).  The
> values nil and t mean the same thing as the current value of
> utf-translate-cjk-mode.  The value `auto' means setting up
> tables for translating CJK characters automatically if
> necessary.
>
> By adding pre-write-conversion function, we can make the
> above work also on writing.  But, in that case, it seems
> difficult to make find-coding-systems-region/string work
> consistently.  To check if a text is encodable by utf-8, we
> must load translation tables.

As far as I remember, that's why I didn't implement that sort of
thing.  post-read-conversion machinery is already there, I think.

[Is this code base ever going to be released so that most users
actually can use it?]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-03-18 15:34             ` Possible UTF-8 CJK Regressions in Terminal Emulators Dave Love
@ 2004-04-07 12:30               ` Kenichi Handa
  2004-04-08 11:27                 ` Dave Love
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2004-04-07 12:30 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

In article <rzqvfl2yxsg.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
>>  Change utf-translate-cjk-mode to a customizable variable
>>  utf-translate-cjk which is nil, t, or auto (default).  The
>>  values nil and t mean the same thing as the current value of
>>  utf-translate-cjk-mode.  The value `auto' means setting up
>>  tables for translating CJK characters automatically if
>>  necessary.
>> 
>>  By adding pre-write-conversion function, we can make the
>>  above work also on writing.  But, in that case, it seems
>>  difficult to make find-coding-systems-region/string work
>>  consistently.  To check if a text is encodable by utf-8, we
>>  must load translation tables.

> As far as I remember, that's why I didn't implement that sort of
> thing.

Wait!  If utf-translate-cjk-mode can encode all jis, kcs,
big5, and gb to utf-8, we can tell that they can be encoded
by utf-8 without loading tables.  What we have to do is to
simply include those charsets in `safe-charsets' on defining
utf-8.

> post-read-conversion machinery is already there, I think.

Yes, utf-8 already has utf-8-post-read-conversion which
composes unencoded raw-bytes into Unicode U+FFFD.

> [Is this code base ever going to be released so that most users
> actually can use it?]

I'd like to ask it too.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-04-07 12:30               ` Kenichi Handa
@ 2004-04-08 11:27                 ` Dave Love
  2004-04-09 11:28                   ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Love @ 2004-04-08 11:27 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

Kenichi Handa <handa@m17n.org> writes:

> Wait!  If utf-translate-cjk-mode can encode all jis, kcs,
> big5, and gb to utf-8, 

I don't think that's true (or I think it wasn't when I built the
tables).  Maybe that's not so (now).  Also, the tables are
customizable by design -- for instance, I anticipated people adding
characters from CNS.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-04-08 11:27                 ` Dave Love
@ 2004-04-09 11:28                   ` Kenichi Handa
  2004-06-07 12:27                     ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2004-04-09 11:28 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

In article <rzqu0zu1zh4.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  Wait!  If utf-translate-cjk-mode can encode all jis, kcs,
>>  big5, and gb to utf-8, 

> I don't think that's true (or I think it wasn't when I built the
> tables).  Maybe that's not so (now).  Also, the tables are
> customizable by design -- for instance, I anticipated people adding
> characters from CNS.

I've just checked all subst-*.el.  They all contain full
maps, i.e. all defined characters can be encoded into utf-8.
Of course, a character not defined in each standard (e.g.  a
character made by (make-char japanese-jisx0208 37 126))
can't be encoded, but I think the merit of ignoring such a
character is higher than correctly telling that they can't
be encoded into utf-8.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-04-09 11:28                   ` Kenichi Handa
@ 2004-06-07 12:27                     ` Kenichi Handa
  2004-06-07 12:36                       ` Miles Bader
                                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Kenichi Handa @ 2004-06-07 12:27 UTC (permalink / raw)

While fixing a bug of utf-8-post-read-conversion (it may
modify a text out of range), I remembered this discussion,
and did some work.

In article <200404091128.UAA02120@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:

> In article <rzqu0zu1zh4.fsf@albion.dl.ac.uk>, Dave Love
> <d.love@dl.ac.uk> writes:
>>  Kenichi Handa <handa@m17n.org> writes:
>>>   Wait!  If utf-translate-cjk-mode can encode all jis,
>>> kcs, big5, and gb to utf-8,

>>  I don't think that's true (or I think it wasn't when I
>> built the tables).  Maybe that's not so (now).  Also, the
>> tables are customizable by design -- for instance, I
>> anticipated people adding characters from CNS.

> I've just checked all subst-*.el.  They all contain full
> maps, i.e. all defined characters can be encoded into
> utf-8.  Of course, a character not defined in each
> standard (e.g.  a character made by (make-char
> japanese-jisx0208 37 126)) can't be encoded, but I think
> the merit of ignoring such a character is higher than
> correctly telling that they can't be encoded into utf-8.

I think I succeeded in loading subst-*.el not at the time of
customizing utf-translate-cjk-mode to t but only when it is
found that loading them is necessary on decoding or encoding
utf-8, or on running decode/encode-char.  This means that we
can make the default value of utf-translate-cjk-mode to t
without loading subst-*.el at building time.

I think it's a big improvement especially for CJK users, and
is an improvement of an existing feature rather than a new
feature.  If people agree on making utf-translate-cjk-mode
to t, I'll brush-up the current working code and install the
changes.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-07 12:27                     ` Kenichi Handa
@ 2004-06-07 12:36                       ` Miles Bader
  2004-06-07 13:00                         ` Kenichi Handa
  2004-06-08 17:56                       ` Dave Love
  2004-06-12  2:41                       ` Kenichi Handa
  2 siblings, 1 reply; 20+ messages in thread
From: Miles Bader @ 2004-06-07 12:36 UTC (permalink / raw)
  Cc: mariano, alexander.winston, d.love, emacs-devel, danilo, monnier,
	miles

On Mon, Jun 07, 2004 at 09:27:36PM +0900, Kenichi Handa wrote:
> I think it's a big improvement especially for CJK users, and
> is an improvement of an existing feature rather than a new
> feature.  If people agree on making utf-translate-cjk-mode
> to t, I'll brush-up the current working code and install the
> changes.

Absolutely!  Then we can say "utf-8 is (almost) completely supported"...
I think this is a very important thing.

-Miles
-- 
"Though they may have different meanings, the cries of 'Yeeeee-haw!' and
 'Allahu akbar!' are, in spirit, not actually all that different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-07 12:36                       ` Miles Bader
@ 2004-06-07 13:00                         ` Kenichi Handa
  2004-06-08 18:02                           ` Dave Love
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2004-06-07 13:00 UTC (permalink / raw)
  Cc: mariano, alexander.winston, d.love, emacs-devel, danilo, monnier,
	miles

In article <20040607123615.GA29450@fencepost>, Miles Bader <miles@gnu.org> writes:

> On Mon, Jun 07, 2004 at 09:27:36PM +0900, Kenichi Handa
> wrote:
>>  I think it's a big improvement especially for CJK users,
>> and is an improvement of an existing feature rather than
>> a new feature.  If people agree on making
>> utf-translate-cjk-mode to t, I'll brush-up the current
>> working code and install the changes.

> Absolutely!  Then we can say "utf-8 is (almost) completely
> supported"...  I think this is a very important thing.

I think "completely" is still too strong even with preceding
"(almost)".  Perhaps "utf-8 support is fairly good" or
"Unicode BMP support is fairly good".

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-07 12:27                     ` Kenichi Handa
  2004-06-07 12:36                       ` Miles Bader
@ 2004-06-08 17:56                       ` Dave Love
  2004-06-09  7:24                         ` Kenichi Handa
  2004-06-12  2:41                       ` Kenichi Handa
  2 siblings, 1 reply; 20+ messages in thread
From: Dave Love @ 2004-06-08 17:56 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

Kenichi Handa <handa@m17n.org> writes:

> I think I succeeded in loading subst-*.el not at the time of
> customizing utf-translate-cjk-mode to t but only when it is
> found that loading them is necessary on decoding or encoding
> utf-8, or on running decode/encode-char.  This means that we
> can make the default value of utf-translate-cjk-mode to t
> without loading subst-*.el at building time.

It doesn't fix the potential effects on non-CJK users if decoding a
bit of Unicode text containing such a character will load the large
tables even if they're useless to the user.  Maybe there aren't many
people now with 48MB P133s or old SPARCs like me, in which case it's a
reasonable default, but I suggest an entry in NEWS/PROBLEMS about it.

> I think it's a big improvement especially for CJK users,

I agree it should be on for CJK users anyway.  (I thought it was now
conditional on the language environment.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-07 13:00                         ` Kenichi Handa
@ 2004-06-08 18:02                           ` Dave Love
  2004-06-09  7:37                             ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Love @ 2004-06-08 18:02 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

Kenichi Handa <handa@m17n.org> writes:

>> Absolutely!  Then we can say "utf-8 is (almost) completely
>> supported"...  I think this is a very important thing.
>
> I think "completely" is still too strong even with preceding
> "(almost)".

I know what you mean, but I think that's the sort of thing that
encourages the established user confusion over encoding issues.

UTF-8 per se is fully supported up to some limit on the code point.
(I hope that's as large as the Emacs 22 maximum codepoint, but I don't
remember.)  Whether or not valid unicodes can be decoded into a
character Emacs can actually encode/display/input properly is a
different matter, and the feature should affect all relevant CCL
coding systems, especially UTF-16.

> Perhaps "utf-8 support is fairly good" or
> "Unicode BMP support is fairly good".

The latter is much better.  (Exceptions include at least: various
complex scripts, much of the CJK space (little used?), reliable
display of CJK e.g. with XFree86 10646-encoded fonts, locale support
(including customization of the font encodings preferred), and BIDI.)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-08 17:56                       ` Dave Love
@ 2004-06-09  7:24                         ` Kenichi Handa
  0 siblings, 0 replies; 20+ messages in thread
From: Kenichi Handa @ 2004-06-09  7:24 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

In article <rzqd64agch6.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

> Kenichi Handa <handa@m17n.org> writes:
> > I think I succeeded in loading subst-*.el not at the time of
> > customizing utf-translate-cjk-mode to t but only when it is
> > found that loading them is necessary on decoding or encoding
> > utf-8, or on running decode/encode-char.  This means that we
> > can make the default value of utf-translate-cjk-mode to t
> > without loading subst-*.el at building time.

> It doesn't fix the potential effects on non-CJK users if decoding a
> bit of Unicode text containing such a character will load the large
> tables even if they're useless to the user.  Maybe there aren't many
> people now with 48MB P133s or old SPARCs like me, in which case it's a
> reasonable default, but I suggest an entry in NEWS/PROBLEMS about it.

I'm going to modify the current entry in NEWS as below.

** The utf-8/16 coding systems have been enhanced.
By default, untranslatable utf-8 sequences are simply composed into
single quasi-characters.  User option `utf-translate-cjk-mode' (it is
turned on by default) arranges to translate many utf-8 CJK character
sequences into real Emacs characters in a similar way to the Mule-UCS
system.  As this loads a fairly big data on demand, people who are not
interested in CJK characters may want to customize it to nil.  You can
augment/amend the CJK translation via hash tables
`ucs-mule-cjk-to-unicode' and `ucs-unicode-to-mule-cjk'.  The utf-8
coding systems now also encodes characters from most of Emacs's
one-dimensional internal charsets, specifically the ISO-8859 ones.
The utf-16 coding system is affected similarly.

> > I think it's a big improvement especially for CJK users,

> I agree it should be on for CJK users anyway.  (I thought it was now
> conditional on the language environment.)

It's not.  I think we had better avoid turning on/off a user
option depending on language environment.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-08 18:02                           ` Dave Love
@ 2004-06-09  7:37                             ` Kenichi Handa
  2004-06-09  9:38                               ` Stefan Monnier
  0 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2004-06-09  7:37 UTC (permalink / raw)
  Cc: mariano, alexander.winston, emacs-devel, danilo, monnier, miles

In article <rzq4qpmgc80.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

> Kenichi Handa <handa@m17n.org> writes:
> >> Absolutely!  Then we can say "utf-8 is (almost) completely
> >> supported"...  I think this is a very important thing.
> >
> > I think "completely" is still too strong even with preceding
> > "(almost)".

> I know what you mean, but I think that's the sort of thing that
> encourages the established user confusion over encoding issues.

> UTF-8 per se is fully supported up to some limit on the code point.
> (I hope that's as large as the Emacs 22 maximum codepoint, but I don't
> remember.)

No, the current support of UTF-8 is limitted to U+10FFFF
(the maximum Unicode character).

> Whether or not valid unicodes can be decoded into a
> character Emacs can actually encode/display/input properly is a
> different matter,

Ah, yes.  In that sense, we can say utf-8 encoding/decoding
is completely supportted.

> and the feature should affect all relevant CCL
> coding systems, especially UTF-16.

As surrogate pair was not handled well by UTF-16 converter,
I've just fixed it too (not yet installed, I'm now adding
comments in a code).  Untranslatable characters are decoded
into UTF-8 form represented by the sequence of
eight-bit-graphic/control characters (the same way as UTF-8
decoding, thus we can use utf-8-post-read-conversion).  The
UTF-16 encoder encodes such a sequence back to the origianl
UTF-16 form.  So, now the UTF-16 support is at the same
level as UTF-8.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-09  7:37                             ` Kenichi Handa
@ 2004-06-09  9:38                               ` Stefan Monnier
  2004-06-10  0:20                                 ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Monnier @ 2004-06-09  9:38 UTC (permalink / raw)
  Cc: mariano, alexander.winston, d.love, emacs-devel, danilo, miles

> As surrogate pair was not handled well by UTF-16 converter,
> I've just fixed it too (not yet installed, I'm now adding
> comments in a code).  Untranslatable characters are decoded
> into UTF-8 form represented by the sequence of
> eight-bit-graphic/control characters (the same way as UTF-8
> decoding, thus we can use utf-8-post-read-conversion).  The
> UTF-16 encoder encodes such a sequence back to the origianl
> UTF-16 form.  So, now the UTF-16 support is at the same
> level as UTF-8.

Does that mean that some sequences of eight-bit-graphic/control are not
encoded into the corresponding raw bytes?
If so, that makes me a bit uneasy, since those special chars were
introduced specifically to handle things like binary input or
bad-byte-sequences and make sure that we at least preserve the raw bytes in
those cases.


        Stefan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-09  9:38                               ` Stefan Monnier
@ 2004-06-10  0:20                                 ` Kenichi Handa
  0 siblings, 0 replies; 20+ messages in thread
From: Kenichi Handa @ 2004-06-10  0:20 UTC (permalink / raw)
  Cc: mariano, alexander.winston, d.love, emacs-devel, danilo, miles

In article <jwvu0xlhy6d.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > As surrogate pair was not handled well by UTF-16 converter,
> > I've just fixed it too (not yet installed, I'm now adding
> > comments in a code).  Untranslatable characters are decoded
> > into UTF-8 form represented by the sequence of
> > eight-bit-graphic/control characters (the same way as UTF-8
> > decoding, thus we can use utf-8-post-read-conversion).  The
> > UTF-16 encoder encodes such a sequence back to the origianl
> > UTF-16 form.  So, now the UTF-16 support is at the same
> > level as UTF-8.

> Does that mean that some sequences of eight-bit-graphic/control are not
> encoded into the corresponding raw bytes?

No.  But, that's only the case that we encode a modified
text (i.e. eight-bit-graphic/control chars are
added/modified after we decoded a source).

> If so, that makes me a bit uneasy, since those special chars were
> introduced specifically to handle things like binary input or
> bad-byte-sequences and make sure that we at least preserve the raw bytes in
> those cases.

As far as we encode a non-modified text that is generated by
decoding a source, we can preserve the byte sequence even if
the original source contains bad-byte-sequence (for the case
of UTF-8, I found a case that doesn't work as expected and
fixed).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-07 12:27                     ` Kenichi Handa
  2004-06-07 12:36                       ` Miles Bader
  2004-06-08 17:56                       ` Dave Love
@ 2004-06-12  2:41                       ` Kenichi Handa
  2004-06-12 13:46                         ` Juanma Barranquero
  2 siblings, 1 reply; 20+ messages in thread
From: Kenichi Handa @ 2004-06-12  2:41 UTC (permalink / raw)
  Cc: mariano, alexander.winston, d.love, emacs-devel, danilo, monnier,
	miles

In article <200406071227.VAA06216@etlken.m17n.org>, Kenichi Handa <handa@m17n.org> writes:
> I think I succeeded in loading subst-*.el not at the time of
> customizing utf-translate-cjk-mode to t but only when it is
> found that loading them is necessary on decoding or encoding
> utf-8, or on running decode/encode-char.  This means that we
> can make the default value of utf-translate-cjk-mode to t
> without loading subst-*.el at building time.

> I think it's a big improvement especially for CJK users, and
> is an improvement of an existing feature rather than a new
> feature.  If people agree on making utf-translate-cjk-mode
> to t, I'll brush-up the current working code and install the
> changes.

As it seems there's no strong objection, I've just installed
the necessary changes.  I also modified
set-language-environment to re-load substitution tables if
necessary, i.e. in the case that utf-translate-cjk-mode is
on and the tables are already loaded in the different
language environment.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-12  2:41                       ` Kenichi Handa
@ 2004-06-12 13:46                         ` Juanma Barranquero
  2004-06-13  8:42                           ` Kenichi Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Juanma Barranquero @ 2004-06-12 13:46 UTC (permalink / raw)
  Cc: handa

On Sat, 12 Jun 2004 11:41:51 +0900 (JST), Kenichi Handa <handa@m17n.org> wrote:

> As it seems there's no strong objection, I've just installed
> the necessary changes.  I also modified
> set-language-environment to re-load substitution tables if
> necessary, i.e. in the case that utf-translate-cjk-mode is
> on and the tables are already loaded in the different
> language environment.

I'm getting a perhaps related bootstrapping error:

Loading language/chinese (source)...
Loading language/cyrillic (source)...
Loading subst-ksc (source)...
Invalid read syntax: "?"

                                                           /L/e/k/t/u

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-12 13:46                         ` Juanma Barranquero
@ 2004-06-13  8:42                           ` Kenichi Handa
  2004-06-13 11:36                             ` Juanma Barranquero
                                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Kenichi Handa @ 2004-06-13  8:42 UTC (permalink / raw)
  Cc: emacs-devel

In article <20040612154533.152E.LEKTU@mi.madritel.es>, Juanma Barranquero <lektu@mi.madritel.es> writes:

> I'm getting a perhaps related bootstrapping error:

> Loading language/chinese (source)...
> Loading language/cyrillic (source)...
> Loading subst-ksc (source)...
> Invalid read syntax: "?"

It seems that my recent change caused this bug, but I can't
reproduce it.  At which bootstrapping stage, does the above
happen?  Before byte-compiling or after byte-compiling?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-13  8:42                           ` Kenichi Handa
@ 2004-06-13 11:36                             ` Juanma Barranquero
  2004-06-13 13:18                             ` Andreas Schwab
  2004-06-13 20:39                             ` Luc Teirlinck
  2 siblings, 0 replies; 20+ messages in thread
From: Juanma Barranquero @ 2004-06-13 11:36 UTC (permalink / raw)
  Cc: Kenichi Handa

On Sun, 13 Jun 2004 17:42:19 +0900 (JST), Kenichi Handa <handa@m17n.org> wrote:

> It seems that my recent change caused this bug, but I can't
> reproduce it.  At which bootstrapping stage, does the above
> happen?  Before byte-compiling or after byte-compiling?

No byte-compiling has taken place.  Right after dumping temacs.bin to
temacs.exe:

Loading loadup.el (source)...
Using load-path (../lisp c:/bin/emacs/HEAD/lisp/emacs-lisp c:/bin/emacs/HEAD/lisp/language c:/bin/emacs/HEAD/lisp/international c:/bin/emacs/HEAD/lisp/textmodes)
Loading emacs-lisp/byte-run (source)...
Loading emacs-lisp/backquote (source)...
Loading subr (source)...
Loading version.el (source)...
Loading widget (source)...
Loading custom (source)...
Loading emacs-lisp/map-ynp (source)...
Loading env (source)...
Loading cus-start (source)...
Note, built-in variable `x-use-underline-position-properties' not bound
Loading international/mule (source)...
Loading international/mule-conf.el (source)...
Loading format (source)...
Loading bindings (source)...
Loading files (source)...
Loading cus-face (source)...
Loading faces (source)...
Lists of integers (garbage collection statistics) are normal output
while building Emacs; they do not indicate a problem.
((86815 . 21581) (5221 . 22) (575 . 122) 322995 14120 (9 . 1) (18 . 0) (6522 . 975))
Loading loaddefs.el (source)...
((106390 . 4237) (7624 . 0) (583 . 114) 1111827 14120 (35 . 33) (18 . 0) (13741 . 56))
Loading simple (source)...
Loading help (source)...
Loading international/mule-cmds (source)...
Loading case-table (source)...
Loading international/utf-8 (source)...
Loading international/utf-16 (source)...
Loading international/characters (source)...
Loading international/latin-1 (source)...
Loading international/latin-2 (source)...
Loading international/latin-3 (source)...
Loading international/latin-4 (source)...
Loading international/latin-5 (source)...
Loading international/latin-8 (source)...
Loading international/latin-9 (source)...
Loading language/chinese (source)...
Loading language/cyrillic (source)...
Loading subst-ksc (source)...
Invalid read syntax: "?"
NMAKE : fatal error U1077: '"C:\bin\emacs\HEAD\src/obj-spd/i386/temacs.exe"' : return code '0xffffffff'
Stop.
NMAKE : fatal error U1077: '"C:\Archivos de programa\Microsoft Visual Studio .NET 2003\VC7\BIN\nmake.exe"' : return code '0x2'
Stop.

                                                           /L/e/k/t/u

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-13  8:42                           ` Kenichi Handa
  2004-06-13 11:36                             ` Juanma Barranquero
@ 2004-06-13 13:18                             ` Andreas Schwab
  2004-06-14  1:05                               ` Kenichi Handa
  2004-06-13 20:39                             ` Luc Teirlinck
  2 siblings, 1 reply; 20+ messages in thread
From: Andreas Schwab @ 2004-06-13 13:18 UTC (permalink / raw)
  Cc: lektu, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> It seems that my recent change caused this bug, but I can't
> reproduce it.  At which bootstrapping stage, does the above
> happen?  Before byte-compiling or after byte-compiling?

It happens before byte-compiling, try removing all *.elc files first.
utf-translate-cjk-load-tables probably shouldn't load the tables during
dumping.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-13  8:42                           ` Kenichi Handa
  2004-06-13 11:36                             ` Juanma Barranquero
  2004-06-13 13:18                             ` Andreas Schwab
@ 2004-06-13 20:39                             ` Luc Teirlinck
  2 siblings, 0 replies; 20+ messages in thread
From: Luc Teirlinck @ 2004-06-13 20:39 UTC (permalink / raw)
  Cc: lektu, emacs-devel

Ken'ichi HANDA wrote:

   It seems that my recent change caused this bug, but I can't
   reproduce it.  At which bootstrapping stage, does the above
   happen?  Before byte-compiling or after byte-compiling?

Juanma already answered that question.

Some extra info that may or may not be useful, but I provide it just
in case.  Bootstrapping works completely fine after emptying
subst-ksc.el and subst-jis.el.  Loading (or byte-compiling) these
files in a running Emacs works fine, but for some strange reason
apparently not at the early stage at which you are trying to load
them.  Just a wild guess: _maybe_ you might be able to reproduce the
problem by running `make maintainer-clean' before bootstrapping.

Sincerely,

Luc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Possible UTF-8 CJK Regressions in Terminal Emulators
  2004-06-13 13:18                             ` Andreas Schwab
@ 2004-06-14  1:05                               ` Kenichi Handa
  0 siblings, 0 replies; 20+ messages in thread
From: Kenichi Handa @ 2004-06-14  1:05 UTC (permalink / raw)
  Cc: lektu, emacs-devel

In article <jept83d2ap.fsf@sykes.suse.de>, Andreas Schwab <schwab@suse.de> writes:

> Kenichi Handa <handa@m17n.org> writes:
> > It seems that my recent change caused this bug, but I can't
> > reproduce it.  At which bootstrapping stage, does the above
> > happen?  Before byte-compiling or after byte-compiling?

> It happens before byte-compiling, try removing all *.elc files first.

Thank you for the info.  I found what was wrong and fixed
it.  Please try the latest code.

The reason of the bug was that when cyrillic.el (not
cyrillic.elc) was loaded, code-pages.el was also loaded.
But, some characters (incorrect mapping) in this file caused
loading subst-ksc.el which is encoded by euc-kr which is
defined in not-yet-loaded korean.el.  I fixed that incorrect
mapping.

> utf-translate-cjk-load-tables probably shouldn't load the tables during
> dumping.

Yes, but modifying utf-translate-cjk-load-tables not to load
the tables will just hide such bugs as above.

In general, preloaded files encoded in utf-8 should not
contain a Unicode character that will be translated in
utf-translate-cjk-mode because such a character may cause
incorrect behaviour when utf-translate-cjk-mode is turned
off.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-06-14  1:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1077557604.1632.26.camel@duende>
     [not found] ` <jwv3c90bgi7.fsf-monnier+emacs/pretest@asado.iro.umontreal.ca>
     [not found]   ` <1077643915.12919.2.camel@duende>
     [not found]     ` <jwvwu6c6yca.fsf-monnier+emacs/pretest@asado.iro.umontreal.ca>
     [not found]       ` <1077682436.28482.9.camel@duende>
     [not found]         ` <buoeksjyee3.fsf@mcspd15.ucom.lsi.nec.co.jp>
     [not found]           ` <200403010815.RAA14365@etlken.m17n.org>
2004-03-18 15:34             ` Possible UTF-8 CJK Regressions in Terminal Emulators Dave Love
2004-04-07 12:30               ` Kenichi Handa
2004-04-08 11:27                 ` Dave Love
2004-04-09 11:28                   ` Kenichi Handa
2004-06-07 12:27                     ` Kenichi Handa
2004-06-07 12:36                       ` Miles Bader
2004-06-07 13:00                         ` Kenichi Handa
2004-06-08 18:02                           ` Dave Love
2004-06-09  7:37                             ` Kenichi Handa
2004-06-09  9:38                               ` Stefan Monnier
2004-06-10  0:20                                 ` Kenichi Handa
2004-06-08 17:56                       ` Dave Love
2004-06-09  7:24                         ` Kenichi Handa
2004-06-12  2:41                       ` Kenichi Handa
2004-06-12 13:46                         ` Juanma Barranquero
2004-06-13  8:42                           ` Kenichi Handa
2004-06-13 11:36                             ` Juanma Barranquero
2004-06-13 13:18                             ` Andreas Schwab
2004-06-14  1:05                               ` Kenichi Handa
2004-06-13 20:39                             ` Luc Teirlinck

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).