unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* severe problems with composite characters
@ 2003-09-17  5:45 Werner LEMBERG
  2003-09-17  6:49 ` Kenichi Handa
  0 siblings, 1 reply; 6+ messages in thread
From: Werner LEMBERG @ 2003-09-17  5:45 UTC (permalink / raw)
  Cc: kazu


Kazu Yamamoto has reported the following two problems in the end of
July on the emacs-pretest-bug list which I repeat here so that more
people are reading his messages -- there were no replies.  AFAIK, they
are still valid with the current CVS.  Both problems are serious and
affect rendering of Thai at least in the mew mailing program.


    Werner

======================================================================

string-width() returns a wrong number if its argument string
has composite characters.

Consider two bytes strings 0xcd 0xeb, whose width is one since they
are composed.

On Emacs 20.7 string-width() returns 1.
On Emacs 21.3.50 string-width() returns 2.

======================================================================

Suppose that composite characters are stored to a file with a
multi-lingual coding-system. An example is TIS-620 characters with
UTF-8 (or ctext).

When Emacs reads the file, the composite characters are not composed
since there is no post-conv function associated to the multi-lingual
coding-system.

Is this a bug?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: severe problems with composite characters
  2003-09-17  5:45 severe problems with composite characters Werner LEMBERG
@ 2003-09-17  6:49 ` Kenichi Handa
  2003-09-18  9:24   ` Dave Love
  2003-09-19  8:37   ` Kazu Yamamoto
  0 siblings, 2 replies; 6+ messages in thread
From: Kenichi Handa @ 2003-09-17  6:49 UTC (permalink / raw)
  Cc: kazu, d.love, emacs-devel

In article <20030917.074537.51710930.wl@gnu.org>, Werner LEMBERG <wl@gnu.org> writes:
> ======================================================================

> string-width() returns a wrong number if its argument string
> has composite characters.

> Consider two bytes strings 0xcd 0xeb, whose width is one since they
> are composed.

> On Emacs 20.7 string-width() returns 1.
> On Emacs 21.3.50 string-width() returns 2.

??? I've just confirmed this result with 21.3.50.

(string-width (decode-coding-string "\xcd\xeb" 'thai-tis620)) => 1

Please note that Emacs 21 doesn't have a composite character
anymore.  For instance, compose-region doesn't change the
characters in a region to a single composite character,
instead it just puts text property `composition'.  The
display routine checks this text property and display the
sequence correctly.

I suspect that you evaluated something like this:

	(string-width "__some_composed_text__")

in *scratch* buffer.  As the Lisp reader ignores any text
properties on reading a string expression in *scratch*
buffer, the string given to string-width doesn't have
`composition' property.

> ======================================================================

> Suppose that composite characters are stored to a file with a
> multi-lingual coding-system. An example is TIS-620 characters with
> UTF-8 (or ctext).

> When Emacs reads the file, the composite characters are not composed
> since there is no post-conv function associated to the multi-lingual
> coding-system.

> Is this a bug?

As such a post conv function is rather heavy, it is by
default turned off.  When you customize the variable
utf-8-compose-scripts to t, Thai characters should be
composed on decoding.

But, I've just found a bug in this facility, and installed a
fix.  Please update your working directory, and try again.
Don't forget to do "make autoloads" in "lisp" subdirectory.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: severe problems with composite characters
  2003-09-17  6:49 ` Kenichi Handa
@ 2003-09-18  9:24   ` Dave Love
  2003-09-30 11:22     ` Kenichi Handa
  2003-09-19  8:37   ` Kazu Yamamoto
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Love @ 2003-09-18  9:24 UTC (permalink / raw)
  Cc: wl, kazu, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <20030917.074537.51710930.wl@gnu.org>, Werner LEMBERG <wl@gnu.org> writes:

[...]

>> Suppose that composite characters are stored to a file with a
>> multi-lingual coding-system. An example is TIS-620 characters with
>> UTF-8 (or ctext).
>
>> When Emacs reads the file, the composite characters are not composed
>> since there is no post-conv function associated to the multi-lingual
>> coding-system.

ctext works for me, the same as iso-2022-7bit (which C-h h uses).
Those coding systems store composition information as escape
sequences, so post-conversion isn't relevant for them (unlike utf-8).

Is the width being wrong a severe problem somehow, or is the severity
mentioned in the subject just the lack of composition?  (I ask partly
because character width is currently somewhat ambiguous in Emacs 22,
e.g. for non-CJK characters from CJK coding systems.)

> But, I've just found a bug in this facility, and installed a
> fix.

What bug?  I can't see a log message.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: severe problems with composite characters
  2003-09-17  6:49 ` Kenichi Handa
  2003-09-18  9:24   ` Dave Love
@ 2003-09-19  8:37   ` Kazu Yamamoto
  2003-09-19 11:06     ` Kenichi Handa
  1 sibling, 1 reply; 6+ messages in thread
From: Kazu Yamamoto @ 2003-09-19  8:37 UTC (permalink / raw)
  Cc: wl, d.love, emacs-devel

From: Kenichi Handa <handa@m17n.org>
Subject: Re: severe problems with composite characters

> ??? I've just confirmed this result with 21.3.50.
> 
> (string-width (decode-coding-string "\xcd\xeb" 'thai-tis620)) => 1

This also returns 1 in my environment.

> I suspect that you evaluated something like this:
> 
> 	(string-width "__some_composed_text__")
> 
> in *scratch* buffer.  As the Lisp reader ignores any text
> properties on reading a string expression in *scratch*
> buffer, the string given to string-width doesn't have
> `composition' property.

I tried this in other buffers and this returns 2 again.

Please look at this window dump:

	http://www.mew.org/~kazu/tmp/a.png

--Kazu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: severe problems with composite characters
  2003-09-19  8:37   ` Kazu Yamamoto
@ 2003-09-19 11:06     ` Kenichi Handa
  0 siblings, 0 replies; 6+ messages in thread
From: Kenichi Handa @ 2003-09-19 11:06 UTC (permalink / raw)
  Cc: wl, d.love, emacs-devel

In article <20030919.173740.264807114.kazu@iijlab.net>, Kazu Yamamoto (山本和彦) <kazu@iijlab.net> writes:
>>  I suspect that you evaluated something like this:
>>  
>>  	(string-width "__some_composed_text__")
>>  
>>  in *scratch* buffer.  As the Lisp reader ignores any text
>>  properties on reading a string expression in *scratch*
>>  buffer, the string given to string-width doesn't have
>>  `composition' property.

> I tried this in other buffers and this returns 2 again.

Yes, that is the expected behaviour.  If a text is given to
string-width without `compostion' property, there's no way
for the function to know how the text is composed in a
buffer.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: severe problems with composite characters
  2003-09-18  9:24   ` Dave Love
@ 2003-09-30 11:22     ` Kenichi Handa
  0 siblings, 0 replies; 6+ messages in thread
From: Kenichi Handa @ 2003-09-30 11:22 UTC (permalink / raw)
  Cc: wl, kazu, emacs-devel

Sorry for this late response.

In article <rzq4qza5u01.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Is the width being wrong a severe problem somehow, or is the severity
> mentioned in the subject just the lack of composition?

The latter.

>>  But, I've just found a bug in this facility, and installed a
>>  fix.

> What bug?  I can't see a log message.

This is the relevant changelog.

2003-09-24  Kenichi Handa  <handa@m17n.org>

	* language/devan-util.el (devanagari-post-read-conversion):
	* language/mlm-util.el (malayalam-post-read-conversion):
	* language/tml-util.el (tamil-post-read-conversion):
	Add autoload cookie.

	* international/utf-8.el (utf-8-post-read-conversion):
	Call post-read-conversion functions for Devanagari, Malayalam,
	and Tamil.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-30 11:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-17  5:45 severe problems with composite characters Werner LEMBERG
2003-09-17  6:49 ` Kenichi Handa
2003-09-18  9:24   ` Dave Love
2003-09-30 11:22     ` Kenichi Handa
2003-09-19  8:37   ` Kazu Yamamoto
2003-09-19 11:06     ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).