Problem of auto-fill-mode for wide character

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Problem of auto-fill-mode for wide character
@ 2005-12-17 15:58 Herbert Euler
  2005-12-28  7:46 ` Kenichi Handa
  0 siblings, 1 reply; 7+ messages in thread
From: Herbert Euler @ 2005-12-17 15:58 UTC (permalink / raw)


Hello everyone,

I'm very happy to see that Emacs supports Unicode internally,
but there is a problem in auto-fill-mode with this modification.
I'm going to explain why I think it's because of Unicode support.

In Emacs 21, some wide characters, such as Chinese characters,
are inserted with the command 'encoded-kbd-self-insert-iso2022
-8bit'. This makes it different from inserting command for ASCII
characters, which are insterted with 'self-insert-command'. Auto-
fill-mode works fine in this way (although I don't know about
the detail); if one inputs more Chinese characters beyond 'current
-fill-column', these Chinese characters will be moved to the next
line automatically. Perhaps it's because Chinese characters are
inserted with 'encoded-kbd-self-insert-iso2022-8bit' other than
'self-insert-command'.

In the current Unicode 2 branch, Chinese characters are inserted
with the command 'self-insert-command' as ASCII characters.
This makes auto-fill in Chinese like other languages such as English,
since "in Auto Fill mode, lines are broken automatically _at spaces_
when they get longer than the desired width". This is good to
languages in which words are separated with spaces, but at
least it's not appropriate for Chinese, because there are _no_
spaces between Chinese characters. So one can force Emacs
auto-fill either by inserting spaces or by pressing M-q to invoke
'fill-paragraph', both of them is not "natural" in Chinese editing.

Is my understanding correct? Could somebody help solve this
problem? Thanks.

Regards,
Guanpeng Xu

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of auto-fill-mode for wide character
  2005-12-17 15:58 Problem of auto-fill-mode for wide character Herbert Euler
@ 2005-12-28  7:46 ` Kenichi Handa
  2005-12-30  2:43   ` Herbert Euler
  2006-01-09  2:56   ` Herbert Euler
  0 siblings, 2 replies; 7+ messages in thread
From: Kenichi Handa @ 2005-12-28  7:46 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY109-F3569866FC9384654E0EDBBDA3D0@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:

> Hello everyone,
> I'm very happy to see that Emacs supports Unicode internally,
> but there is a problem in auto-fill-mode with this modification.
> I'm going to explain why I think it's because of Unicode support.
[...]
> In the current Unicode 2 branch, Chinese characters are inserted
> with the command 'self-insert-command' as ASCII characters.
> This makes auto-fill in Chinese like other languages such as English,
> since "in Auto Fill mode, lines are broken automatically _at spaces_
> when they get longer than the desired width". This is good to
> languages in which words are separated with spaces, but at
> least it's not appropriate for Chinese, because there are _no_
> spaces between Chinese characters. So one can force Emacs
> auto-fill either by inserting spaces or by pressing M-q to invoke
> 'fill-paragraph', both of them is not "natural" in Chinese editing.

> Is my understanding correct? Could somebody help solve this
> problem? Thanks.

Thank you for reminding of this unsolved problem.  The
reason why Chinese characters invoke auto-fill is that they
are not yet registered in the char-table auto-fill-chars.
And why I have not yet done in Unicode 2 branch is that I
don't know any "authorized" information about that.

I've just registered these apparent characters:
  U+3041..U+30FF, U+3400..U+4DB5, U+4e00..U+9fbb, U+F900..U+FAFF,
  U+FF00..U+FF9F, U+20000..U+2FFFF
So, now auto-fill should work for most Han characters.

But, there are many more questionable characters, for instance:
  U+3000..U+303F, U+3200..U+33FF, ...

Do you have some idea about exactly which set of characters
to register in auto-fill-chars?

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of auto-fill-mode for wide character
  2005-12-28  7:46 ` Kenichi Handa
@ 2005-12-30  2:43   ` Herbert Euler
  2006-01-04  4:28     ` Kenichi Handa
  2006-01-09  2:56   ` Herbert Euler
  1 sibling, 1 reply; 7+ messages in thread
From: Herbert Euler @ 2005-12-30  2:43 UTC (permalink / raw)
  Cc: emacs-devel

>From: Kenichi Handa <handa@m17n.org>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Problem of auto-fill-mode for wide character
>Date: Wed, 28 Dec 2005 16:46:27 +0900
>
>Thank you for reminding of this unsolved problem.  The
>reason why Chinese characters invoke auto-fill is that they
>are not yet registered in the char-table auto-fill-chars.
>And why I have not yet done in Unicode 2 branch is that I
>don't know any "authorized" information about that.
>
>I've just registered these apparent characters:
>   U+3041..U+30FF, U+3400..U+4DB5, U+4e00..U+9fbb, U+F900..U+FAFF,
>   U+FF00..U+FF9F, U+20000..U+2FFFF
>So, now auto-fill should work for most Han characters.
>
>But, there are many more questionable characters, for instance:
>   U+3000..U+303F, U+3200..U+33FF, ...

In my opinion, this solution is not an applicable one. Trying to register
most characters in Chinese, Japanese and Korean as auto-fill-chars would
waste lots of memory, and perhaps some characters would be forgot
to be registered. For example, in Japanese, Hiragana and Katakana
probably work, but not for most Kanji. Besides, the policy for filling
punctuations in English and in Chinese is different: usually, if a 
punctuation
appears to be the last character of a line but exceeds the fill-column,
it will be extended to the next line with the word it follows in English,
but left there (and following characters will be moved to the next
line) in Chinese. I don't know whether this is supported by registering
auto-fill-chars.

>Do you have some idea about exactly which set of characters
>to register in auto-fill-chars?

I don't know the detail of how Emacs distincts auto-fill between languages
in which words are separated by blanks and those not; but if the orignal
design of auto-fill ignored situations in the latter ones, one possible 
better
solution seems to be modifying the mechanism of auto-fill to make it support
the different concepts of auto-fill in such languages.

If the words in a language are not separated by blanks, then all characters
except punctuations should be moved to the next line, assuming that it
exceeds the fill-column. For some punctuations, if it appears to exceed the
fill-column, it should be left there. For others, just deal with them like 
for
characters.

Regards,
Guanpeng Xu

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of auto-fill-mode for wide character
  2005-12-30  2:43   ` Herbert Euler
@ 2006-01-04  4:28     ` Kenichi Handa
  0 siblings, 0 replies; 7+ messages in thread
From: Kenichi Handa @ 2006-01-04  4:28 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY110-F80BEFA32ECA37B6399567DA280@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:
[...]
>> I've just registered these apparent characters:
>>    U+3041..U+30FF, U+3400..U+4DB5, U+4e00..U+9fbb, U+F900..U+FAFF,
>>    U+FF00..U+FF9F, U+20000..U+2FFFF
>> So, now auto-fill should work for most Han characters.
>> 
>> But, there are many more questionable characters, for instance:
>>    U+3000..U+303F, U+3200..U+33FF, ...

> In my opinion, this solution is not an applicable one. Trying to register
> most characters in Chinese, Japanese and Korean as auto-fill-chars would
> waste lots of memory, and perhaps some characters would be forgot
> to be registered. For example, in Japanese, Hiragana and Katakana
> probably work, but not for most Kanji. Besides, the policy for filling
> punctuations in English and in Chinese is different: usually, if a 
> punctuation
> appears to be the last character of a line but exceeds the fill-column,
> it will be extended to the next line with the word it follows in English,
> but left there (and following characters will be moved to the next
> line) in Chinese. I don't know whether this is supported by registering
> auto-fill-chars.

At first, a char-table doesn't consume that much space if
you register characters of continuous codes.  For instance,
registering all Han characters is not a problem.

Next, it seems that you misunderstand the role of
auto-fill-chars.  It's a table to register characters that
triggers the auto-fill-function.  How auto-fill-function
fills the line(s) is a different thing.

And, although it's difficult to explain how lines are filled
(it's encoded in functions), at least Emacs considers
special treatment of punctuations (e.g. opening/closing
parentheses at the end/beginning of line) for Chinese and
Japanese.  You'll get a hint if you read
lisp/international/kinsoku.el.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of auto-fill-mode for wide character
  2005-12-28  7:46 ` Kenichi Handa
  2005-12-30  2:43   ` Herbert Euler
@ 2006-01-09  2:56   ` Herbert Euler
  2006-01-10  1:20     ` Kenichi Handa
  1 sibling, 1 reply; 7+ messages in thread
From: Herbert Euler @ 2006-01-09  2:56 UTC (permalink / raw)
  Cc: emacs-devel

I think in Chinese, all Chinese characters (except punctuations)
should be registered.  And it seems that the following punctuations
should be registered as 'kinsoku-bol' (all punctuations refered here
are wide ones):

	&#65504;&#65505;&#65509;%

Since the followings does not frequently appear in Chinese text, I'm
not sure whether they should be registered:

	#$&'*+-/@0123456789

(Whether or not 'kinsoku.el' is applied for English texts?  I didn't
see '%' in 'kinsoku-bol'.)

Well, I'm not sensitive in text editing, so perhaps some characters
are missed.

Thank you very much.

Regards,
Guanpeng Xu

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of auto-fill-mode for wide character
  2006-01-09  2:56   ` Herbert Euler
@ 2006-01-10  1:20     ` Kenichi Handa
  2006-01-10  1:58       ` Herbert Euler
  0 siblings, 1 reply; 7+ messages in thread
From: Kenichi Handa @ 2006-01-10  1:20 UTC (permalink / raw)
  Cc: emacs-devel

In article <BAY107-F319E013FBCFAC1F1D29DCEDA220@phx.gbl>, "Herbert Euler" <herberteuler@hotmail.com> writes:

> I think in Chinese, all Chinese characters (except punctuations)
> should be registered.  And it seems that the following punctuations
> should be registered as 'kinsoku-bol' (all punctuations refered here
> are wide ones):

> 	&#65504;&#65505;&#65509;%

At least the last one (FULLWIDTH YEN SIGN) should not be in
kinsoku-bol because we Japanese write a price as "￥１００".
When we write "YEN" at the end of numbers, we usually use
Han character "円" as "１００円".  I don't know the correct
usage of FULLWIDTH CENT and POUND signs in CJK context, but
as we write both "$100" and "100$", I think cent and pound
are also used before and after numbers.

> Since the followings does not frequently appear in Chinese text, I'm
> not sure whether they should be registered:

> 	#$&'*+-/@0123456789

They should not be registered because I think they can
appear both at the beginning and end of line.

> (Whether or not 'kinsoku.el' is applied for English texts?  I didn't
> see '%' in 'kinsoku-bol'.)

% can appear both at the beginning and end of line too.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem of auto-fill-mode for wide character
  2006-01-10  1:20     ` Kenichi Handa
@ 2006-01-10  1:58       ` Herbert Euler
  0 siblings, 0 replies; 7+ messages in thread
From: Herbert Euler @ 2006-01-10  1:58 UTC (permalink / raw)
  Cc: emacs-devel

There are lots of custom relate problems here. So the best
solution seems to be creating a customizing group, and let
the user decide which should be in 'kinsoku-eol', which should
be in 'kinsoku-bol', and so on. However, we may provide
some examples and default configurations. And perhaps we
could provide some default Language specific configurations
e.g. Japanese specific configurations, although the user
may choose to alter it.

Regards,
Guanpeng Xu

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-01-10  1:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-17 15:58 Problem of auto-fill-mode for wide character Herbert Euler
2005-12-28  7:46 ` Kenichi Handa
2005-12-30  2:43   ` Herbert Euler
2006-01-04  4:28     ` Kenichi Handa
2006-01-09  2:56   ` Herbert Euler
2006-01-10  1:20     ` Kenichi Handa
2006-01-10  1:58       ` Herbert Euler

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).