* More Cyrillic vs UTF-8
@ 2003-04-25 16:35 Simon Josefsson
2003-04-25 22:42 ` Eli Zaretskii
2003-04-26 7:52 ` Kenichi Handa
0 siblings, 2 replies; 25+ messages in thread
From: Simon Josefsson @ 2003-04-25 16:35 UTC (permalink / raw)
(Same configuration as last mail)
Cut'n'paste the following string into a new file and save it:
Горбачев
UTF-8 isn't shown as an option, and indeed selecting UTF-8 destroys
the data. Doesn't Emacs CVS support the entire Unicode repertoire?
(The string above, encoded as shift_jis, is, according to od -x:
0000000 4384 8084 8284 7184 7084 8984 7584 7284)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-25 16:35 More Cyrillic vs UTF-8 Simon Josefsson
@ 2003-04-25 22:42 ` Eli Zaretskii
2003-04-26 0:26 ` Simon Josefsson
2003-04-26 7:52 ` Kenichi Handa
1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2003-04-25 22:42 UTC (permalink / raw)
Cc: emacs-devel
> From: Simon Josefsson <jas@extundo.com>
> Date: Fri, 25 Apr 2003 18:35:37 +0200
>
> Doesn't Emacs CVS support the entire Unicode repertoire?
It currently only supports the parts of the BMP whose codepoints are
in the ranges 0000-33ff and e000-ffff. It doesn't support anything
beyond that.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-25 22:42 ` Eli Zaretskii
@ 2003-04-26 0:26 ` Simon Josefsson
2003-04-26 13:45 ` Richard Stallman
0 siblings, 1 reply; 25+ messages in thread
From: Simon Josefsson @ 2003-04-26 0:26 UTC (permalink / raw)
Cc: emacs-devel
"Eli Zaretskii" <eliz@elta.co.il> writes:
>> From: Simon Josefsson <jas@extundo.com>
>> Date: Fri, 25 Apr 2003 18:35:37 +0200
>>
>> Doesn't Emacs CVS support the entire Unicode repertoire?
>
> It currently only supports the parts of the BMP whose codepoints are
> in the ranges 0000-33ff and e000-ffff. It doesn't support anything
> beyond that.
Could we add that information to the PROBLEMS file?
--- PROBLEMS.~1.147.~ Tue Feb 4 16:44:10 2003
+++ PROBLEMS Sat Apr 26 02:26:21 2003
@@ -27,6 +27,11 @@
mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1
+* Some Unicode characters are not supported.
+
+Emacs currently only supports the parts of the BMP whose codepoints
+are in the ranges 0000-33ff and e000-ffff.
+
* Problems with file dialogs in Emacs built with Open Motif.
When Emacs 21 is built with Open Motif 2.1, it can happen that the
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-25 16:35 More Cyrillic vs UTF-8 Simon Josefsson
2003-04-25 22:42 ` Eli Zaretskii
@ 2003-04-26 7:52 ` Kenichi Handa
2003-04-26 11:54 ` Simon Josefsson
1 sibling, 1 reply; 25+ messages in thread
From: Kenichi Handa @ 2003-04-26 7:52 UTC (permalink / raw)
Cc: emacs-devel
In article <ilu4r4m357q.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> (Same configuration as last mail)
> Cut'n'paste the following string into a new file and save it:
> Горбачев
> UTF-8 isn't shown as an option, and indeed selecting UTF-8 destroys
> the data. Doesn't Emacs CVS support the entire Unicode repertoire?
> (The string above, encoded as shift_jis, is, according to od -x:
> 0000000 4384 8084 8284 7184 7084 8984 7584 7284)
Those characters belongs to the charset japanese-jisx0208,
and the current Emacs still can't encode them into UTF-8.
How did you get such characters?
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 7:52 ` Kenichi Handa
@ 2003-04-26 11:54 ` Simon Josefsson
0 siblings, 0 replies; 25+ messages in thread
From: Simon Josefsson @ 2003-04-26 11:54 UTC (permalink / raw)
Cc: emacs-devel
Kenichi Handa <handa@m17n.org> writes:
> In article <ilu4r4m357q.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> (Same configuration as last mail)
>> Cut'n'paste the following string into a new file and save it:
>
>> Горбачев
>
>> UTF-8 isn't shown as an option, and indeed selecting UTF-8 destroys
>> the data. Doesn't Emacs CVS support the entire Unicode repertoire?
>
>> (The string above, encoded as shift_jis, is, according to od -x:
>> 0000000 4384 8084 8284 7184 7084 8984 7584 7284)
>
> Those characters belongs to the charset japanese-jisx0208,
> and the current Emacs still can't encode them into UTF-8.
>
> How did you get such characters?
That may be interesting by itself. Go to
http://www.nns.ru/persons/gorbach.html using galeon (or mozilla, I
think). Cut'n'paste the first word and yank it in Emacs. It looks as
single-width in galeon, but when yanked into emacs it becomes double
width. Yanking it into xterm or gnome-terminal doesn't change the
string, it looks like single-width. Save the HTML file and open it in
emacs as a koi8 file (note that emacs doesn't auto detect it as koi8
so you to do that manually), then it is single-width too.
I guess it is the emacs X cut'n'paste code that somehow makes the
string into double width japanese characters.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 0:26 ` Simon Josefsson
@ 2003-04-26 13:45 ` Richard Stallman
2003-04-26 14:15 ` Simon Josefsson
0 siblings, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2003-04-26 13:45 UTC (permalink / raw)
Cc: emacs-devel
Could we add that information to the PROBLEMS file?
--- PROBLEMS.~1.147.~ Tue Feb 4 16:44:10 2003
+++ PROBLEMS Sat Apr 26 02:26:21 2003
@@ -27,6 +27,11 @@
mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1
+* Some Unicode characters are not supported.
+
+Emacs currently only supports the parts of the BMP whose codepoints
+are in the ranges 0000-33ff and e000-ffff.
+
Mentioning this in PROBLEMS seems like a good idea to me, but a useful
entry needs to be stated in terms of what behavior the user sees.
This text doesn't explain the practical consequences; a user would say
"so what does that mean for me?"
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 13:45 ` Richard Stallman
@ 2003-04-26 14:15 ` Simon Josefsson
2003-04-26 20:19 ` Kai Großjohann
2003-04-28 4:37 ` Richard Stallman
0 siblings, 2 replies; 25+ messages in thread
From: Simon Josefsson @ 2003-04-26 14:15 UTC (permalink / raw)
Cc: emacs-devel
Richard Stallman <rms@gnu.org> writes:
> Could we add that information to the PROBLEMS file?
>
> --- PROBLEMS.~1.147.~ Tue Feb 4 16:44:10 2003
> +++ PROBLEMS Sat Apr 26 02:26:21 2003
> @@ -27,6 +27,11 @@
> mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
> mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1
>
> +* Some Unicode characters are not supported.
> +
> +Emacs currently only supports the parts of the BMP whose codepoints
> +are in the ranges 0000-33ff and e000-ffff.
> +
>
> Mentioning this in PROBLEMS seems like a good idea to me, but a useful
> entry needs to be stated in terms of what behavior the user sees.
> This text doesn't explain the practical consequences; a user would say
> "so what does that mean for me?"
Is this better? This was the behaviour I got when trying to save the
data; I had specified that the coding system for saving should be
utf-8 but when I tried to save the buffer Emacs was unable to encode
the characters and suggested shift_jis (etc) instead.
--- PROBLEMS.~1.147.~ Tue Feb 4 16:44:10 2003
+++ PROBLEMS Sat Apr 26 16:13:07 2003
@@ -27,6 +27,13 @@
mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1
+* Encoding some characters as Unicode is rejected by Emacs.
+
+Emacs currently only supports the parts of the BMP whose codepoints
+are in the ranges 0000-33ff and e000-ffff. If you try to save a file
+containing characters with code points outside this range, Emacs will
+suggest other compatible coding systems.
+
* Problems with file dialogs in Emacs built with Open Motif.
When Emacs 21 is built with Open Motif 2.1, it can happen that the
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 14:15 ` Simon Josefsson
@ 2003-04-26 20:19 ` Kai Großjohann
2003-04-26 21:16 ` Simon Josefsson
2003-04-28 4:37 ` Richard Stallman
1 sibling, 1 reply; 25+ messages in thread
From: Kai Großjohann @ 2003-04-26 20:19 UTC (permalink / raw)
Simon Josefsson <jas@extundo.com> writes:
> Richard Stallman <rms@gnu.org> writes:
>
>> Mentioning this in PROBLEMS seems like a good idea to me, but a useful
>> entry needs to be stated in terms of what behavior the user sees.
>> This text doesn't explain the practical consequences; a user would say
>> "so what does that mean for me?"
>
> Is this better?
Can you say what characters you're talking about, instead of just the
code points? I guess that most people haven't memorized the Unicode
table (your truly included ;-).
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 20:19 ` Kai Großjohann
@ 2003-04-26 21:16 ` Simon Josefsson
2003-04-26 21:29 ` Kai Großjohann
0 siblings, 1 reply; 25+ messages in thread
From: Simon Josefsson @ 2003-04-26 21:16 UTC (permalink / raw)
kai.grossjohann@gmx.net (Kai Großjohann) writes:
> Simon Josefsson <jas@extundo.com> writes:
>
>> Richard Stallman <rms@gnu.org> writes:
>>
>>> Mentioning this in PROBLEMS seems like a good idea to me, but a useful
>>> entry needs to be stated in terms of what behavior the user sees.
>>> This text doesn't explain the practical consequences; a user would say
>>> "so what does that mean for me?"
>>
>> Is this better?
>
> Can you say what characters you're talking about, instead of just the
> code points? I guess that most people haven't memorized the Unicode
> table (your truly included ;-).
I agree, but I don't know which they are, and maybe the range includes
very many different kind of characters. And as new characters are
added all the time, I fear that both the list of supported characters
and the list of unsupported characters would be too long to be useful.
Hm.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 21:16 ` Simon Josefsson
@ 2003-04-26 21:29 ` Kai Großjohann
2003-04-26 21:47 ` Simon Josefsson
0 siblings, 1 reply; 25+ messages in thread
From: Kai Großjohann @ 2003-04-26 21:29 UTC (permalink / raw)
Simon Josefsson <jas@extundo.com> writes:
> kai.grossjohann@gmx.net (Kai Großjohann) writes:
>
>> Simon Josefsson <jas@extundo.com> writes:
>>
>>> Richard Stallman <rms@gnu.org> writes:
>>>
>>>> Mentioning this in PROBLEMS seems like a good idea to me, but a useful
>>>> entry needs to be stated in terms of what behavior the user sees.
>>>> This text doesn't explain the practical consequences; a user would say
>>>> "so what does that mean for me?"
>>>
>>> Is this better?
>>
>> Can you say what characters you're talking about, instead of just the
>> code points? I guess that most people haven't memorized the Unicode
>> table (your truly included ;-).
>
> I agree, but I don't know which they are, and maybe the range includes
> very many different kind of characters. And as new characters are
> added all the time, I fear that both the list of supported characters
> and the list of unsupported characters would be too long to be useful.
> Hm.
Well, isn't Unicode divided into blocks so that one can list the
blocks? Hm. Oh! See http://www.unicode.org/charts/ -- looks quite
promising. Searching for the code blocks there and then giving the
names ought to be useful. WDYT?
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 21:29 ` Kai Großjohann
@ 2003-04-26 21:47 ` Simon Josefsson
2003-04-27 8:37 ` Kai Großjohann
2003-04-28 4:37 ` Richard Stallman
0 siblings, 2 replies; 25+ messages in thread
From: Simon Josefsson @ 2003-04-26 21:47 UTC (permalink / raw)
kai.grossjohann@gmx.net (Kai Großjohann) writes:
> Simon Josefsson <jas@extundo.com> writes:
>
>> kai.grossjohann@gmx.net (Kai Großjohann) writes:
>>
>>> Simon Josefsson <jas@extundo.com> writes:
>>>
>>>> Richard Stallman <rms@gnu.org> writes:
>>>>
>>>>> Mentioning this in PROBLEMS seems like a good idea to me, but a useful
>>>>> entry needs to be stated in terms of what behavior the user sees.
>>>>> This text doesn't explain the practical consequences; a user would say
>>>>> "so what does that mean for me?"
>>>>
>>>> Is this better?
>>>
>>> Can you say what characters you're talking about, instead of just the
>>> code points? I guess that most people haven't memorized the Unicode
>>> table (your truly included ;-).
>>
>> I agree, but I don't know which they are, and maybe the range includes
>> very many different kind of characters. And as new characters are
>> added all the time, I fear that both the list of supported characters
>> and the list of unsupported characters would be too long to be useful.
>> Hm.
>
> Well, isn't Unicode divided into blocks so that one can list the
> blocks? Hm. Oh! See http://www.unicode.org/charts/ -- looks quite
> promising. Searching for the code blocks there and then giving the
> names ought to be useful. WDYT?
The compiled list is below. Does it really help anyone to list all of
them?
Supported:
Basic Latin Optical Character Recognition
Latin-1 Supplement Enclosed Alphanumerics
Latin Extended-A Box Drawing
Latin Extended-B Block Elements
IPA Extensions Geometric Shapes
Spacing Modifier Letters Miscellaneous Symbols
Combining Diacritical Marks Dingbats
Greek Miscellaneous Mathematical Symbols-A
Cyrillic Supplemental Arrows-A
Cyrillic Supplement Braille Patterns
Armenian Supplemental Arrows-B
Hebrew Miscellaneous Mathematical Symbols-B
Arabic Supplemental Mathematical Operators
Syriac CJK Radicals Supplement
Thaana Kangxi Radicals
Devanagari Ideographic Description Characters
Bengali CJK Symbols and Punctuation
Gurmukhi Hiragana
Gujarati Katakana
Oriya Bopomofo
Tamil Hangul Compatibility Jamo
Telugu Kanbun
Kannada Bopomofo Extended
Malayalam Enclosed CJK Letters and Months
Sinhala CJK Compatibility
Thai
Lao
Tibetan
Myanmar
Georgian
Hangul Jamo
Ethiopic
Cherokee Private Use Area
Unified Canadian Aboriginal Syllabic CJK Compatibility Ideographs
Ogham Alphabetic Presentation Forms
Runic Arabic Presentation Forms-A
Tagalog Variation Selectors
Hanunoo Combining Half Marks
Buhid CJK Compatibility Forms
Tagbanwa Small Form Variants
Khmer Arabic Presentation Forms-B
Mongolian Halfwidth and Fullwidth Forms
Latin Extended Additional Specials
Greek Extended
General Punctuation
Superscripts and Subscripts
Currency Symbols
Combining Marks for Symbols
Letterlike Symbols
Number Forms
Arrows
Mathematical Operators
Miscellaneous Technical
Control Pictures
Unsupported:
CJK Unified Ideographs Extension A (1.5MB)
CJK Unified Ideographs (5MB)
Yi Syllables
Yi Radicals
Hangul Syllables (7MB)
High Surrogates
Low Surrogates
Old Italic
Gothic
Deseret
Byzantine Musical Symbols
Musical Symbols
Mathematical Alphanumeric Symbols
CJK Unified Ideographs Extension B (13MB)
CJK Compatibility Ideographs Supplement
Tags
Supplementary Private Use Area-A
Supplementary Private Use Area-B
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 21:47 ` Simon Josefsson
@ 2003-04-27 8:37 ` Kai Großjohann
2003-04-28 12:35 ` Kenichi Handa
2003-04-28 23:38 ` Richard Stallman
2003-04-28 4:37 ` Richard Stallman
1 sibling, 2 replies; 25+ messages in thread
From: Kai Großjohann @ 2003-04-27 8:37 UTC (permalink / raw)
Simon Josefsson <jas@extundo.com> writes:
> Unsupported:
>
> CJK Unified Ideographs Extension A (1.5MB)
> CJK Unified Ideographs (5MB)
> Yi Syllables
> Yi Radicals
> Hangul Syllables (7MB)
> High Surrogates
> Low Surrogates
> Old Italic
> Gothic
> Deseret
> Byzantine Musical Symbols
> Musical Symbols
> Mathematical Alphanumeric Symbols
> CJK Unified Ideographs Extension B (13MB)
> CJK Compatibility Ideographs Supplement
> Tags
> Supplementary Private Use Area-A
> Supplementary Private Use Area-B
It seems that these might be summarized by CJK, Music, Maths, Private
Use Area.
WDYT?
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 14:15 ` Simon Josefsson
2003-04-26 20:19 ` Kai Großjohann
@ 2003-04-28 4:37 ` Richard Stallman
1 sibling, 0 replies; 25+ messages in thread
From: Richard Stallman @ 2003-04-28 4:37 UTC (permalink / raw)
Cc: emacs-devel
+* Encoding some characters as Unicode is rejected by Emacs.
+
+Emacs currently only supports the parts of the BMP whose codepoints
+are in the ranges 0000-33ff and e000-ffff. If you try to save a file
+containing characters with code points outside this range, Emacs will
+suggest other compatible coding systems.
That is clearer; it's written in terms of behavior the user sees.
I agree with the people who said that the codepoint numbers may not
be clear enough.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-26 21:47 ` Simon Josefsson
2003-04-27 8:37 ` Kai Großjohann
@ 2003-04-28 4:37 ` Richard Stallman
1 sibling, 0 replies; 25+ messages in thread
From: Richard Stallman @ 2003-04-28 4:37 UTC (permalink / raw)
Cc: emacs-devel
The compiled list is below. Does it really help anyone to list all of
them?
The list of unsupported ones is not too long.
Listing them might be compact and useful.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-27 8:37 ` Kai Großjohann
@ 2003-04-28 12:35 ` Kenichi Handa
2003-04-28 23:08 ` Simon Josefsson
` (2 more replies)
2003-04-28 23:38 ` Richard Stallman
1 sibling, 3 replies; 25+ messages in thread
From: Kenichi Handa @ 2003-04-28 12:35 UTC (permalink / raw)
Cc: jas
In article <8465p0l4jp.fsf@lucy.is.informatik.uni-duisburg.de>, kai.grossjohann@gmx.net (Kai Großjohann) writes:
> Simon Josefsson <jas@extundo.com> writes:
>> Unsupported:
>>
>> CJK Unified Ideographs Extension A (1.5MB)
>> CJK Unified Ideographs (5MB)
[...]
>> Supplementary Private Use Area-A
>> Supplementary Private Use Area-B
> It seems that these might be summarized by CJK, Music, Maths, Private
> Use Area.
Private Use Area in U+E000..U+F8FF are supported.
Richard Stallman <rms@gnu.org> writes:
> +* Encoding some characters as Unicode is rejected by Emacs.
> +
> +Emacs currently only supports the parts of the BMP whose codepoints
> +are in the ranges 0000-33ff and e000-ffff. If you try to save a file
> +containing characters with code points outside this range, Emacs will
> +suggest other compatible coding systems.
> That is clearer; it's written in terms of behavior the user sees.
> I agree with the people who said that the codepoint numbers may not
> be clear enough.
Perhaps, it is better to mention utf-translate-cjk mode as this.
* Encoding some characters as Unicode (UTF-8) is rejected by Emacs.
Emacs currently, by default, only supports the parts of the
BMP whose codepoints are in the ranges 0000-33ff and
e000-ffff. This excludes CJK, Yi, Music, and Maths.
If you try to save a file containing characters with code
points outside this range, Emacs will suggest other
compatible coding systems.
By turing Utf-Translate-Cjk mode on, many more CJK
characters are included in the support.
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-28 12:35 ` Kenichi Handa
@ 2003-04-28 23:08 ` Simon Josefsson
2003-04-29 16:51 ` Kai Großjohann
2003-04-29 5:39 ` Richard Stallman
[not found] ` <87llxusaj9.fsf@gnu.org>
2 siblings, 1 reply; 25+ messages in thread
From: Simon Josefsson @ 2003-04-28 23:08 UTC (permalink / raw)
Cc: kai.grossjohann
Kenichi Handa <handa@m17n.org> writes:
> Richard Stallman <rms@gnu.org> writes:
>> +* Encoding some characters as Unicode is rejected by Emacs.
>> +
>> +Emacs currently only supports the parts of the BMP whose codepoints
>> +are in the ranges 0000-33ff and e000-ffff. If you try to save a file
>> +containing characters with code points outside this range, Emacs will
>> +suggest other compatible coding systems.
>
>> That is clearer; it's written in terms of behavior the user sees.
>> I agree with the people who said that the codepoint numbers may not
>> be clear enough.
>
> Perhaps, it is better to mention utf-translate-cjk mode as this.
>
> * Encoding some characters as Unicode (UTF-8) is rejected by Emacs.
>
> Emacs currently, by default, only supports the parts of the
> BMP whose codepoints are in the ranges 0000-33ff and
> e000-ffff. This excludes CJK, Yi, Music, and Maths.
>
> If you try to save a file containing characters with code
> points outside this range, Emacs will suggest other
> compatible coding systems.
>
> By turing Utf-Translate-Cjk mode on, many more CJK
> characters are included in the support.
This looks good.
As for utf-translate-cjk, it does sounds like that functionality
should be enabled by default. Is the only problem that loading them
is slow? Perhaps it can be loaded lazily?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-27 8:37 ` Kai Großjohann
2003-04-28 12:35 ` Kenichi Handa
@ 2003-04-28 23:38 ` Richard Stallman
2003-04-29 16:17 ` Benjamin Riefenstahl
1 sibling, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2003-04-28 23:38 UTC (permalink / raw)
Cc: emacs-devel
> Unsupported:
>
> CJK Unified Ideographs Extension A (1.5MB)
> CJK Unified Ideographs (5MB)
> Yi Syllables
> Yi Radicals
> Hangul Syllables (7MB)
> High Surrogates
> Low Surrogates
> Old Italic
> Gothic
> Deseret
> Byzantine Musical Symbols
> Musical Symbols
> Mathematical Alphanumeric Symbols
> CJK Unified Ideographs Extension B (13MB)
> CJK Compatibility Ideographs Supplement
> Tags
> Supplementary Private Use Area-A
> Supplementary Private Use Area-B
It seems that these might be summarized by CJK, Music, Maths, Private
Use Area.
I don't know what "Surrogates" are. Also, Old Italic and Gothic do not
fit in that list. What are "Tags"?
Also, I am not sure whether ALL CJK characters are included here.
For instance, are Hangul letters included here?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-28 12:35 ` Kenichi Handa
2003-04-28 23:08 ` Simon Josefsson
@ 2003-04-29 5:39 ` Richard Stallman
2003-04-29 13:36 ` Simon Josefsson
[not found] ` <87llxusaj9.fsf@gnu.org>
2 siblings, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2003-04-29 5:39 UTC (permalink / raw)
Cc: jas
Perhaps, it is better to mention utf-translate-cjk mode as this.
* Encoding some characters as Unicode (UTF-8) is rejected by Emacs.
Emacs currently, by default, only supports the parts of the
BMP whose codepoints are in the ranges 0000-33ff and
e000-ffff. This excludes CJK, Yi, Music, and Maths.
If you try to save a file containing characters with code
points outside this range, Emacs will suggest other
compatible coding systems.
By turing Utf-Translate-Cjk mode on, many more CJK
characters are included in the support.
Please install that now, even though it may require a little
further modification.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-29 5:39 ` Richard Stallman
@ 2003-04-29 13:36 ` Simon Josefsson
0 siblings, 0 replies; 25+ messages in thread
From: Simon Josefsson @ 2003-04-29 13:36 UTC (permalink / raw)
Cc: Kenichi Handa
Richard Stallman <rms@gnu.org> writes:
> Perhaps, it is better to mention utf-translate-cjk mode as this.
>
> * Encoding some characters as Unicode (UTF-8) is rejected by Emacs.
>
> Emacs currently, by default, only supports the parts of the
> BMP whose codepoints are in the ranges 0000-33ff and
> e000-ffff. This excludes CJK, Yi, Music, and Maths.
>
> If you try to save a file containing characters with code
> points outside this range, Emacs will suggest other
> compatible coding systems.
>
> By turing Utf-Translate-Cjk mode on, many more CJK
> characters are included in the support.
>
> Please install that now, even though it may require a little
> further modification.
I added it. I changed UTF-8 into UTF-8/16, since I assume the same
holds for UTF-16.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-28 23:38 ` Richard Stallman
@ 2003-04-29 16:17 ` Benjamin Riefenstahl
2003-04-30 5:43 ` Richard Stallman
0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Riefenstahl @ 2003-04-29 16:17 UTC (permalink / raw)
Cc: Kai Großjohann
Hi Richard,
> > Unsupported:
> >
> > CJK Unified Ideographs Extension A (1.5MB)
> > CJK Unified Ideographs (5MB)
> > Yi Syllables
> > Yi Radicals
> > Hangul Syllables (7MB)
> > High Surrogates
> > Low Surrogates
> > Old Italic
> > Gothic
> > Deseret
> > Byzantine Musical Symbols
> > Musical Symbols
> > Mathematical Alphanumeric Symbols
> > CJK Unified Ideographs Extension B (13MB)
> > CJK Compatibility Ideographs Supplement
> > Tags
> > Supplementary Private Use Area-A
> > Supplementary Private Use Area-B
>
> It seems that these might be summarized by CJK, Music, Maths,
> Private Use Area.
Richard Stallman <rms@gnu.org> writes:
> I don't know what "Surrogates" are. Also, Old Italic and Gothic do
> not fit in that list. What are "Tags"?
"Surrogates" are the codes that are used in UTF-16 to encode
characters with code points above \uFFFF.
"Tags" are codes used for in-band language tagging.
> Also, I am not sure whether ALL CJK characters are included here.
> For instance, are Hangul letters included here?
Kana, Bopomofo and some CJK compatibility and special symbols are
below \u03FF and/or above \uE000, but the major part of the CJK and
all of Hangul is unsupported.
so long, benny
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-28 23:08 ` Simon Josefsson
@ 2003-04-29 16:51 ` Kai Großjohann
2003-04-29 20:00 ` Robert J. Chassell
0 siblings, 1 reply; 25+ messages in thread
From: Kai Großjohann @ 2003-04-29 16:51 UTC (permalink / raw)
Simon Josefsson <jas@extundo.com> writes:
> As for utf-translate-cjk, it does sounds like that functionality
> should be enabled by default. Is the only problem that loading them
> is slow? Perhaps it can be loaded lazily?
I don't think it can be done lazily. I'm sure that Dave would have
done that, if possible.
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-29 16:51 ` Kai Großjohann
@ 2003-04-29 20:00 ` Robert J. Chassell
0 siblings, 0 replies; 25+ messages in thread
From: Robert J. Chassell @ 2003-04-29 20:00 UTC (permalink / raw)
By the way, SergeyFleytin <fleytin@mail.ru> just posted a message to
the Emacspeak mailing list that he is using a version of Emacspeak
that converts Cyrillic text to spoken Russian.
I don't know how good this is, nor its licenses (I asked him), but
you might want to listen as well as read Russian.
The FTP site is:
ftp://ftp.rakurs.spb.ru/pub/Goga/
Sergey says
I am using emacspeak with a so called 'multilingual server'. It
was written by one of the Russian programmers, who also wrote a
Russian tts engine for it. This server uses freephone&mbrola for
English and ru_tts for Russian. Moreover, that person also
produced a special installation cd-rom called 'slackspeak'. On
that disk one would find pre-installed, ready to use,
speech-enabled linux system. It uses emacspeak as a speech
interface and uses only software synth for output. If the system
fails to recognize your sound card, the output is directed to the
pc-speaker. You can either boot directly from that cd or start it
from dos promt.
--
Robert J. Chassell Rattlesnake Enterprises
http://www.rattlesnake.com GnuPG Key ID: 004B4AC8
http://www.teak.cc bob@rattlesnake.com
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-29 16:17 ` Benjamin Riefenstahl
@ 2003-04-30 5:43 ` Richard Stallman
2003-04-30 8:01 ` Kai Großjohann
0 siblings, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2003-04-30 5:43 UTC (permalink / raw)
Cc: kai.grossjohann
So, how should we amend the list "CJK, Music, Maths, Private Use
Area"? Is adding "Gothic and Old Italic" enough?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
2003-04-30 5:43 ` Richard Stallman
@ 2003-04-30 8:01 ` Kai Großjohann
0 siblings, 0 replies; 25+ messages in thread
From: Kai Großjohann @ 2003-04-30 8:01 UTC (permalink / raw)
Richard Stallman <rms@gnu.org> writes:
> So, how should we amend the list "CJK, Music, Maths, Private Use
> Area"? Is adding "Gothic and Old Italic" enough?
I guess "CJK, Music, Maths, Private Use Area, Gothic, and Old Italic"
is good enough.
But don't delete the code points -- then Unicode experts know the
full story.
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: More Cyrillic vs UTF-8
[not found] ` <87llxusaj9.fsf@gnu.org>
@ 2003-05-01 11:27 ` Kenichi Handa
0 siblings, 0 replies; 25+ messages in thread
From: Kenichi Handa @ 2003-05-01 11:27 UTC (permalink / raw)
Cc: jas
In article <87llxusaj9.fsf@gnu.org>, Alex Schroeder <alex@gnu.org> writes:
> I'm attaching the real message as a text file encoded using
> iso-2022-jp, and using the MIME type application/octet-stream. You
> will probably have to save the file and open it using Emacs. :(
The problem described in your real message should be fixed
now. Please you try again with the latest HEAD?
---
Ken'ichi HANDA
handa@m17n.org
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2003-05-01 11:27 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-25 16:35 More Cyrillic vs UTF-8 Simon Josefsson
2003-04-25 22:42 ` Eli Zaretskii
2003-04-26 0:26 ` Simon Josefsson
2003-04-26 13:45 ` Richard Stallman
2003-04-26 14:15 ` Simon Josefsson
2003-04-26 20:19 ` Kai Großjohann
2003-04-26 21:16 ` Simon Josefsson
2003-04-26 21:29 ` Kai Großjohann
2003-04-26 21:47 ` Simon Josefsson
2003-04-27 8:37 ` Kai Großjohann
2003-04-28 12:35 ` Kenichi Handa
2003-04-28 23:08 ` Simon Josefsson
2003-04-29 16:51 ` Kai Großjohann
2003-04-29 20:00 ` Robert J. Chassell
2003-04-29 5:39 ` Richard Stallman
2003-04-29 13:36 ` Simon Josefsson
[not found] ` <87llxusaj9.fsf@gnu.org>
2003-05-01 11:27 ` Kenichi Handa
2003-04-28 23:38 ` Richard Stallman
2003-04-29 16:17 ` Benjamin Riefenstahl
2003-04-30 5:43 ` Richard Stallman
2003-04-30 8:01 ` Kai Großjohann
2003-04-28 4:37 ` Richard Stallman
2003-04-28 4:37 ` Richard Stallman
2003-04-26 7:52 ` Kenichi Handa
2003-04-26 11:54 ` Simon Josefsson
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.