Cyrillic vs UTF-8

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Cyrillic vs UTF-8
@ 2003-04-25 16:12 Simon Josefsson
  2003-04-25 16:40 ` Eli Zaretskii
  2003-04-25 16:54 ` Simon Josefsson
  0 siblings, 2 replies; 55+ messages in thread
From: Simon Josefsson @ 2003-04-25 16:12 UTC (permalink / raw)


$ emacs -q --no-site-file
C-h H   (view HELLO file)
Mark the line with Russian text with mouse
q   (quit HELLO file)
C-x C-f ff RET   (open a new file)
C-y              (yank the text, looks fine in the new buffer)
C-x C-s          (save file, it complains that iso-latin-1 cannot
                  encode the data, and suggests utf-8)
RET              (go with the default utf-8)
C-x C-k          (kill buffer)
C-x C-f ff RET   (open file again)
                 (emacs fail to recognize it as utf-8 and displays gibberish)
C-x C-k          (kill buffer)
C-x RET c utf-8 C-x C-f ff RET (open fail as utf-8)
                 (emacs recognize the file as utf-8 but display empty boxes)

Pressing C-u C-x = on the first empty box (first non-ascii character)
shows:

  character: Р (01212100, 332864, 0x51440)
    charset: mule-unicode-0100-24ff
	     (Unicode characters of the range U+0100..U+24FF.)
 code point: 40 64
     syntax: w 	which means: word
   category: y:Cyrillic  
buffer code: 0x9C 0xF4 0xA8 0xC0
  file code: 0xD0 0xA0 (encoded by coding system mule-utf-8-unix)
    Unicode: 0420
       font: -Adobe-Courier-Medium-R-Normal--17-120-100-100-M-100-ISO10646-1

I think there are two problems.  Opening the file the first time
should guess it is a utf-8 file.  Secondly, emacs should be able to
find a font that contains the characters -- I have all font packages
from Debian installed.  The following works fine:

-Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1

In GNU Emacs 21.3.50.12 (i686-pc-linux-gnu)
 of 2003-04-25 on latte.josefsson.org
configured using `configure '--with-gtk''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: en_US.UTF-8
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: en_US.UTF-8
  value of $LANG: nil
  locale-coding-system: nil
  default-enable-multibyte-characters: t

Recent input:
M-x r e p o r <tab> <return>

Recent messages:
(emacs -q)
Loading tool-bar...done
Loading image...done
Loading tooltip...done
For information about the GNU Project and its goals, type C-h C-p.
Loading emacsbug...done

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson
@ 2003-04-25 16:40 ` Eli Zaretskii
  2003-04-25 17:09   ` Simon Josefsson
  2003-04-25 16:54 ` Simon Josefsson
  1 sibling, 1 reply; 55+ messages in thread
From: Eli Zaretskii @ 2003-04-25 16:40 UTC (permalink / raw)
  Cc: emacs-devel

> From: Simon Josefsson <jas@extundo.com>
> Date: Fri, 25 Apr 2003 18:12:17 +0200
> 
> I think there are two problems.  Opening the file the first time
> should guess it is a utf-8 file.

IIRC, you need to make the priority of utf-8 higher for this to
happen.  Unless that's changed in the current CVS, try evaluating the
following expression:

  (prefer-coding-system 'utf-8)

before you visit a utf-8 encoded file, and see if that helps.  I think
this is because the encoding detection routines cannot distinguish
between Latin-n and utf encoding without some help.

Apologies if the current code base no longer works as I remember.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson
  2003-04-25 16:40 ` Eli Zaretskii
@ 2003-04-25 16:54 ` Simon Josefsson
  2003-04-26  3:55   ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull
  2003-04-26  7:59   ` Cyrillic vs UTF-8 Kenichi Handa
  1 sibling, 2 replies; 55+ messages in thread
From: Simon Josefsson @ 2003-04-25 16:54 UTC (permalink / raw)

Simon Josefsson <jas@extundo.com> writes:

> I think there are two problems.  Opening the file the first time
> should guess it is a utf-8 file.  Secondly, emacs should be able to
> find a font that contains the characters -- I have all font packages
> from Debian installed.  The following works fine:
>
> -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1

It seems the second problem was documented in PROBLEMS (see below).
Sorry.

Still, I don't see similar behaviour with, e.g., Mozilla, so wouldn't
it be possible to check which characters exist within the font, and
perhaps change font as appropriate?

It would be nice if there were some more information how to set the
suggested fontset.  Reading the manual I get the impression that
'emacs -fn mule-unicode-...' should work, but it doesn't.  I also
tried setting the emacs.font X resource, but same problem.  Starting
emacs says:

No fonts match `mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1'

I do have GNU unifont (from Debian unstable) installed.

* Characters from the mule-unicode charsets aren't displayed under X.

XFree86 4 contains many fonts in iso10646-1 encoding which have
minimal character repertoires (whereas the encoding is meant to be a
reasonable indication of the repertoire).  Emacs may choose one of
these to display characters from the mule-unicode charsets and then
typically won't be able to find the glyphs to display many characters.
(Check with C-u C-x = .)  To avoid this, you may need to use a fontset
which sets the font for the mule-unicode sets explicitly.  E.g. to use
GNU unifont, include in the fontset spec:

mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,\
mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\
mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 16:40 ` Eli Zaretskii
@ 2003-04-25 17:09   ` Simon Josefsson
  2003-04-25 22:39     ` Eli Zaretskii
                       ` (3 more replies)
  0 siblings, 4 replies; 55+ messages in thread
From: Simon Josefsson @ 2003-04-25 17:09 UTC (permalink / raw)
  Cc: emacs-devel

"Eli Zaretskii" <eliz@elta.co.il> writes:

>> From: Simon Josefsson <jas@extundo.com>
>> Date: Fri, 25 Apr 2003 18:12:17 +0200
>> 
>> I think there are two problems.  Opening the file the first time
>> should guess it is a utf-8 file.
>
> IIRC, you need to make the priority of utf-8 higher for this to
> happen.  Unless that's changed in the current CVS, try evaluating the
> following expression:
>
>   (prefer-coding-system 'utf-8)
>
> before you visit a utf-8 encoded file, and see if that helps.  I think
> this is because the encoding detection routines cannot distinguish
> between Latin-n and utf encoding without some help.

This works, but note that Emacs didn't recognize the file as being in
any encoding without it.  The modeline says '-:--'.

It seems binary is preferred over utf-8 and utf-16-* in
coding-category-list.  This seems extremely conservative.  I guess it
means UTF-8 can never be autodetected by default?  Is the unicode
support so bad it shouldn't even be preferred over binary?  UTF-8 is
well formed and restricted; detecting it properly (even compared to
Latin-n) can be done well enough that failures rarely happen in
practice.

Can't we move binary down below UTF-8 in CVS?  IMHO we should move
UTF-8 earlier still, since determining whether data is UTF-8 or not
can be done with good probability.  Prefering binary over UTF-8 seems
just wrong.

There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you
say, but it has been removed both in 21.3 and in CVS.  I thought that
meant UTF-8 was better supported now, but this doesn't seem to be the
case.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 17:09   ` Simon Josefsson
@ 2003-04-25 22:39     ` Eli Zaretskii
  2003-04-26  8:11     ` Kenichi Handa
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 55+ messages in thread
From: Eli Zaretskii @ 2003-04-25 22:39 UTC (permalink / raw)
  Cc: emacs-devel

> From: Simon Josefsson <jas@extundo.com>
> Date: Fri, 25 Apr 2003 19:09:07 +0200
> 
> There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you
> say, but it has been removed both in 21.3 and in CVS.  I thought that
> meant UTF-8 was better supported now, but this doesn't seem to be the
> case.

"cvs annotate" will show you who removed that entry, and then you can
ask that person for the reasons.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Implementing charset-aware X font names [was: Cyrillic vs UTF-8]
  2003-04-25 16:54 ` Simon Josefsson
@ 2003-04-26  3:55   ` Stephen J. Turnbull
  2003-04-28 11:09     ` Kenichi Handa
  2003-04-26  7:59   ` Cyrillic vs UTF-8 Kenichi Handa
  1 sibling, 1 reply; 55+ messages in thread
From: Stephen J. Turnbull @ 2003-04-26  3:55 UTC (permalink / raw)


>>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes:

    PROBLEMS> * Characters from the mule-unicode charsets aren't
    PROBLEMS> displayed under X.

    PROBLEMS> XFree86 4 contains many fonts in iso10646-1 encoding
    PROBLEMS> which have minimal character repertoires (whereas the
    PROBLEMS> encoding is meant to be a reasonable indication of the
    PROBLEMS> repertoire).

*sigh*  "iso10646" is not meant to be an indication of repertoire.
See section 13 of the ISO 10646 standard.  It's intended to fix the
ISO 8859 ambiguity.  There is a deficiency in XFree86, but it's not
that the fonts are incomplete (note the word "implicit" in the XLFD
standard, that refers to current national encoding practice at
definition time, not to UCSes); that's gonna happen.  Why should a
Russian font designer provide Thai glyphs?  And what Thai in her right
mind would prefer those over native-designed fonts (without looking at
them)?

Instead, the font names and properties should provide encoding range
specifications instead of the useless "1" (which in ISO 10646-1 is not
an encoding specification, really).  As a first take, I think a
reasonable way to do this would be to specify that for the iso10646
registry the encoding field of an XLFD name should contain a
comma-separated list of Unicode block names, or a comma-separated list
of hex ranges xxxx..yyyy (can't use hyphens for the ranges,
obviously).  As long as the XLFD is otherwise fully-qualified (ie,
contains 14 hyphens), the block name format allows you to query with
"-*-*-*-*-*-*-*-*-*-*-*-*-iso10646-*CYRILLIC*" and guarantee sane
results.  Mostly "*-iso10646-*CYRILLIC*" should work OK, too.  With
the hex range format, the app has to work harder, querying with
"-*-*-*-*-*-*-*-*-*-*-*-*-iso10646-*" and checking for the ranges it
needs.  IIRC, since the actual font loaded is known to the server, you
could even have multiple such aliases, one for each block, and with
languages using multiple blocks (basically, all of them, since
everybody uses ASCII), you'd just want to be careful to query for the
"rare" blocks first.

This would also allow Emacs and other smart apps to create virtual
fonts (ie, in faces) by requesting Ryumin Light for the Han and Kana
blocks and Times-Roman for the Basic Latin and Latin-1 Supplement
blocks, as an alternative to X Font Sets.  (This would be nearly
trivial to implement in XEmacs since we use specifiers to implement
faces, and specifiers already do magic to connect charsets to font
registries.  I suppose it would be more work in GNU Emacs, but I
haven't looked at Emacs's font set code.)

Does this look like something reasonable for Emacs (and XEmacs) to
implement on the client side?  If so, I'll play with it a bit (note
that implementing this server-side is simply a matter of editing
fonts.aliases) and then put it in play with the X11 and XFree86
people.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 16:54 ` Simon Josefsson
  2003-04-26  3:55   ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull
@ 2003-04-26  7:59   ` Kenichi Handa
  2003-04-26 12:14     ` Simon Josefsson
  1 sibling, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-04-26  7:59 UTC (permalink / raw)
  Cc: emacs-devel

In article <iluznme1ps2.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> It would be nice if there were some more information how to set the
> suggested fontset.  Reading the manual I get the impression that
> 'emacs -fn mule-unicode-...' should work, but it doesn't.

From which part of manual, did you get that impression?

> I also tried setting the emacs.font X resource, but same
> problem.  Starting emacs says:

> No fonts match `mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1'

It doesn't work.  Please follow what described in the
"Difining Fontsets" node of Emacs info.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 17:09   ` Simon Josefsson
  2003-04-25 22:39     ` Eli Zaretskii
@ 2003-04-26  8:11     ` Kenichi Handa
  2003-04-26 12:25       ` Simon Josefsson
                         ` (2 more replies)
  2003-04-26 13:44     ` Richard Stallman
  2003-04-28 21:49     ` Stefan Monnier
  3 siblings, 3 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-04-26  8:11 UTC (permalink / raw)
  Cc: emacs-devel

In article <iluvfx21p3g.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> It seems binary is preferred over utf-8 and utf-16-* in
> coding-category-list.  This seems extremely conservative.  I guess it
> means UTF-8 can never be autodetected by default?  Is the unicode
> support so bad it shouldn't even be preferred over binary?  UTF-8 is
> well formed and restricted; detecting it properly (even compared to
> Latin-n) can be done well enough that failures rarely happen in
> practice.

> Can't we move binary down below UTF-8 in CVS?  IMHO we should move
> UTF-8 earlier still, since determining whether data is UTF-8 or not
> can be done with good probability.  Prefering binary over UTF-8 seems
> just wrong.

Unfortunately, the current Emacs doesn't have a facility to
detect UTF-8 byte sequence.  So, if we put UTF-8 the higher
priority, all files are detected as UTF-8.  :-(

> There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you
> say, but it has been removed both in 21.3 and in CVS.  I thought that
> meant UTF-8 was better supported now, but this doesn't seem to be the
> case.

The UTF-8 support was surely improved but not that much as
you expect.

By the way, all these problems are solved in emacs-unicode.
It's available from CVS server as a branch tag
"emacs-unicode" (see
http://savannah.gnu.org/cvs/?group=emacs).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26  7:59   ` Cyrillic vs UTF-8 Kenichi Handa
@ 2003-04-26 12:14     ` Simon Josefsson
  2003-05-01  7:20       ` Kenichi Handa
  0 siblings, 1 reply; 55+ messages in thread
From: Simon Josefsson @ 2003-04-26 12:14 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <iluznme1ps2.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> It would be nice if there were some more information how to set the
>> suggested fontset.  Reading the manual I get the impression that
>> 'emacs -fn mule-unicode-...' should work, but it doesn't.
>
> From which part of manual, did you get that impression?

"Fontsets" together with "Font X".  But I now realize I didn't read it
carefully.

>> I also tried setting the emacs.font X resource, but same
>> problem.  Starting emacs says:
>
>> No fonts match `mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1'
>
> It doesn't work.  Please follow what described in the
> "Difining Fontsets" node of Emacs info.

It seems to work, thanks.  Wouldn't it be useful to at least be able
to customize the fontset?  Requiring use of X resources to get Unicode
to show up correctly is not user friendly.  I guess these problems
goes away when Emacs stops chosing fonts with empty characters, so
perhaps users simply will have to wait for 22.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26  8:11     ` Kenichi Handa
@ 2003-04-26 12:25       ` Simon Josefsson
  2003-04-28  9:18         ` Kenichi Handa
  2003-04-26 16:21       ` Benjamin Riefenstahl
  2003-04-28  4:38       ` Richard Stallman
  2 siblings, 1 reply; 55+ messages in thread
From: Simon Josefsson @ 2003-04-26 12:25 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> Unfortunately, the current Emacs doesn't have a facility to
> detect UTF-8 byte sequence.  So, if we put UTF-8 the higher
> priority, all files are detected as UTF-8.  :-(

I see.  Is this very difficult to solve, or why hasn't it?  The
algorithm to detect UTF-8 is not that complicated.

> By the way, all these problems are solved in emacs-unicode.
> It's available from CVS server as a branch tag
> "emacs-unicode" (see
> http://savannah.gnu.org/cvs/?group=emacs).

I'm trying it, thanks.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 17:09   ` Simon Josefsson
  2003-04-25 22:39     ` Eli Zaretskii
  2003-04-26  8:11     ` Kenichi Handa
@ 2003-04-26 13:44     ` Richard Stallman
  2003-04-26 14:10       ` Simon Josefsson
  2003-04-28 21:49     ` Stefan Monnier
  3 siblings, 1 reply; 55+ messages in thread
From: Richard Stallman @ 2003-04-26 13:44 UTC (permalink / raw)
  Cc: emacs-devel

    It seems binary is preferred over utf-8 and utf-16-* in
    coding-category-list.  This seems extremely conservative.  I guess it
    means UTF-8 can never be autodetected by default?

That certainly seems undesirable.  Unless there is a specific reason
why it needs to be this way, I agree with you that we should raise
the priority of utf-8 and utf-16.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26 13:44     ` Richard Stallman
@ 2003-04-26 14:10       ` Simon Josefsson
  0 siblings, 0 replies; 55+ messages in thread
From: Simon Josefsson @ 2003-04-26 14:10 UTC (permalink / raw)
  Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     It seems binary is preferred over utf-8 and utf-16-* in
>     coding-category-list.  This seems extremely conservative.  I guess it
>     means UTF-8 can never be autodetected by default?
>
> That certainly seems undesirable.  Unless there is a specific reason
> why it needs to be this way, I agree with you that we should raise
> the priority of utf-8 and utf-16.

Kenichi Handa said that moving utf-8 earlier would make all files be
regarded as UTF-8, so until that is fixed I agree moving it higher up
in the hierarchy is bad.

But I'm not sure I understand the situation completely anyway.  When I
run emacs in an UTF-8 locale (LANG=sv_SE.UTF-8), which I usually do,
the utf-8 coding system _is_ first in coding-category-list, yet I have
no problems reading iso-8859-1 files.  They aren't regarded as UTF-8.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26  8:11     ` Kenichi Handa
  2003-04-26 12:25       ` Simon Josefsson
@ 2003-04-26 16:21       ` Benjamin Riefenstahl
  2003-04-26 16:27         ` Benjamin Riefenstahl
  2003-04-28  4:38       ` Richard Stallman
  2 siblings, 1 reply; 55+ messages in thread
From: Benjamin Riefenstahl @ 2003-04-26 16:21 UTC (permalink / raw)
  Cc: jas

Hi,


Kenichi Handa <handa@m17n.org> writes:
> Unfortunately, the current Emacs doesn't have a facility to detect
> UTF-8 byte sequence.  So, if we put UTF-8 the higher priority, all
> files are detected as UTF-8.  :-(

Hm, I have Emacs 21.2.1 on Windows NT and

 >>>
 Priority order for recognizing coding systems when reading files:
   1. mule-utf-8 (alias: utf-8)
   2. windows-1252 (alias: cp1252)
 <<<

Detecting UTF-8 works fine.  I'm not sure it's completly reliable but
it works for most of my everyday work.  Am I missunderstanding
something?


so long, benny

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26 16:21       ` Benjamin Riefenstahl
@ 2003-04-26 16:27         ` Benjamin Riefenstahl
  0 siblings, 0 replies; 55+ messages in thread
From: Benjamin Riefenstahl @ 2003-04-26 16:27 UTC (permalink / raw)
  Cc: jas

Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
>  >>>
>  Priority order for recognizing coding systems when reading files:
>    1. mule-utf-8 (alias: utf-8)
>    2. windows-1252 (alias: cp1252)
>  <<<
> 
> Detecting UTF-8 works fine.

To clarify: It also detects fine, when a file is *not* UTF-8, falling
back on cp1252.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26  8:11     ` Kenichi Handa
  2003-04-26 12:25       ` Simon Josefsson
  2003-04-26 16:21       ` Benjamin Riefenstahl
@ 2003-04-28  4:38       ` Richard Stallman
  2003-05-01  8:27         ` Kenichi Handa
  2 siblings, 1 reply; 55+ messages in thread
From: Richard Stallman @ 2003-04-28  4:38 UTC (permalink / raw)
  Cc: jas

    Unfortunately, the current Emacs doesn't have a facility to
    detect UTF-8 byte sequence.  So, if we put UTF-8 the higher
    priority, all files are detected as UTF-8.  :-(

Is there any easy way to add such detection to the trunk version?
It would not be worth while if it is difficult, but it would be
worth while if it is easy.

    By the way, all these problems are solved in emacs-unicode.

Could you report on what work is needed before we
can release this code?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26 12:25       ` Simon Josefsson
@ 2003-04-28  9:18         ` Kenichi Handa
  2003-04-28 11:11           ` Simon Josefsson
  0 siblings, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-04-28  9:18 UTC (permalink / raw)
  Cc: emacs-devel

In article <ilullxxxx78.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:

> Kenichi Handa <handa@m17n.org> writes:
>>  Unfortunately, the current Emacs doesn't have a facility to
>>  detect UTF-8 byte sequence.  So, if we put UTF-8 the higher
>>  priority, all files are detected as UTF-8.  :-(

> I see.  Is this very difficult to solve, or why hasn't it?  The
> algorithm to detect UTF-8 is not that complicated.

Ooops, I'm very sorry that I was wrong.  The current Emacs
contains a builtin utf-8 and utf-16 (with BOM) detectors.
So, putting UTF-8 the higher priority should have no
problem.

Richard Stallman <rms@gnu.org> writes:
>     It seems binary is preferred over utf-8 and utf-16-* in
>     coding-category-list.  This seems extremely conservative.  I guess it
>     means UTF-8 can never be autodetected by default?

> That certainly seems undesirable.  Unless there is a specific reason
> why it needs to be this way, I agree with you that we should raise
> the priority of utf-8 and utf-16.

We can raise the priority of utf-16-le-with-signature and
utf-16-be-with-signature, but can't raise the priority of
utf-16-le, utf-16-be, utf-16 because it's impossible to
distinguish them from binary data.

So, I've just installed these changes.


2003-04-28  Kenichi Handa  <handa@m17n.org>

	* international/mule-cmds.el (reset-language-environment): Raise
	the priority of mule-utf-8, mule-utf-16-be-with-signature and
	mule-utf-16-le.-with-signature.

	* international/mule-conf.el: Set coding-category-utf-16-be to
	mule-utf-16-be-with-signature, coding-category-utf-16-le to
	mule-utf-16-le-with-signature.  Raise the priority of
	coding-category-utf-8, coding-category-utf-16-be, and
	coding-category-utf-16-le

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implementing charset-aware X font names [was: Cyrillic vs UTF-8]
  2003-04-26  3:55   ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull
@ 2003-04-28 11:09     ` Kenichi Handa
  2003-04-28 12:27       ` Implementing charset-aware X font names Stephen J. Turnbull
  0 siblings, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-04-28 11:09 UTC (permalink / raw)
  Cc: emacs-devel

In article <87ist17vzu.fsf_-_@tleepslib.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes:
> Instead, the font names and properties should provide encoding range
> specifications instead of the useless "1" (which in ISO 10646-1 is not
> an encoding specification, really).  As a first take, I think a
> reasonable way to do this would be to specify that for the iso10646
> registry the encoding field of an XLFD name should contain a
> comma-separated list of Unicode block names, or a comma-separated list
> of hex ranges xxxx..yyyy (can't use hyphens for the ranges,
> obviously).

I fully agree with that idea.

[...]
> This would also allow Emacs and other smart apps to create virtual
> fonts (ie, in faces) by requesting Ryumin Light for the Han and Kana
> blocks and Times-Roman for the Basic Latin and Latin-1 Supplement
> blocks, as an alternative to X Font Sets.  (This would be nearly
> trivial to implement in XEmacs since we use specifiers to implement
> faces, and specifiers already do magic to connect charsets to font
> registries.  I suppose it would be more work in GNU Emacs, but I
> haven't looked at Emacs's font set code.)

We connect charsets to font registries vis fontset.  And in
the emacs-unicode version, we have enhanced it so that we
can connect scripts, charsets, range of characters to
multiple font specs.  In addtion, in emacs-unicode, we
separate the concept of font encoding and font repertory,
and for *-iso10646-1 fonts, we checks the font contents to
get the true repertory as a char table.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-28  9:18         ` Kenichi Handa
@ 2003-04-28 11:11           ` Simon Josefsson
  0 siblings, 0 replies; 55+ messages in thread
From: Simon Josefsson @ 2003-04-28 11:11 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> We can raise the priority of utf-16-le-with-signature and
> utf-16-be-with-signature, but can't raise the priority of
> utf-16-le, utf-16-be, utf-16 because it's impossible to
> distinguish them from binary data.
>
> So, I've just installed these changes.

Thanks!  I don't really care much about UTF-16, and I don't think most
users do either, so this seems like a good solution.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implementing charset-aware X font names
  2003-04-28 11:09     ` Kenichi Handa
@ 2003-04-28 12:27       ` Stephen J. Turnbull
  2003-05-01 11:13         ` Kenichi Handa
  0 siblings, 1 reply; 55+ messages in thread
From: Stephen J. Turnbull @ 2003-04-28 12:27 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Kenichi" == Kenichi Handa <handa@m17n.org> writes:

    Kenichi> We connect charsets to font registries vis fontset.  And
    Kenichi> in the emacs-unicode version, we have enhanced it so that
    Kenichi> we can connect scripts, charsets, range of characters to
    Kenichi> multiple font specs.

Is this documented outside of source code?  (Not necessarily as a
formal spec, discussions on emacs-devel would help too.  Also, I can
read Japanese, so mule-ja would be useful if there were discussions
there.)

How does it compare to the specifier interface used by XEmacs?  XEmacs
specifiers allow a face to automatically select the correct font by
X11 font registry, but there must also be a similer mechanism for
Windows, so this must be somewhat more general than "font registry".
Also, specifiers implement both inheritance and repeated queries (eg,
you can have both "arial" and "helvetica" fonts for ascii/latin-1, and
they will be tried in a specified order, usually "last added first",
until the face can be displayed).

    Kenichi> In addtion, in emacs-unicode, we separate the concept of
    Kenichi> font encoding and font repertory, and for *-iso10646-1
    Kenichi> fonts, we checks the font contents to get the true
    Kenichi> repertory as a char table.

Ah, I'll have to ask Ben if he's handled that in the devel branch.
That sounds like a very good interface, and if possible I'd like to
use it in XEmacs too.  I assume it is exported so Lisp programs can
find out the repertoire?

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-25 17:09   ` Simon Josefsson
                       ` (2 preceding siblings ...)
  2003-04-26 13:44     ` Richard Stallman
@ 2003-04-28 21:49     ` Stefan Monnier
  2003-04-28 22:29       ` Simon Josefsson
  2003-05-19  0:40       ` Kenichi Handa
  3 siblings, 2 replies; 55+ messages in thread
From: Stefan Monnier @ 2003-04-28 21:49 UTC (permalink / raw)
  Cc: emacs-devel

> Can't we move binary down below UTF-8 in CVS?  IMHO we should move
> UTF-8 earlier still, since determining whether data is UTF-8 or not
> can be done with good probability.  Prefering binary over UTF-8 seems

Agreed, but I think one of the problems is that the preference-ordering
is the same for load-time-detection as it is for save-time-detection,
so if you move utf-8 up for detection you end up saving all new files
in utf-8 which is not OK in non-utf-8 locales.
I suggested introducing a second preference-order, but nothing came
out of it (probably because I didn't code anything up).

	Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-28 21:49     ` Stefan Monnier
@ 2003-04-28 22:29       ` Simon Josefsson
  2003-04-29 13:49         ` Stefan Monnier
  2003-05-19  0:40       ` Kenichi Handa
  1 sibling, 1 reply; 55+ messages in thread
From: Simon Josefsson @ 2003-04-28 22:29 UTC (permalink / raw)
  Cc: emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

>> Can't we move binary down below UTF-8 in CVS?  IMHO we should move
>> UTF-8 earlier still, since determining whether data is UTF-8 or not
>> can be done with good probability.  Prefering binary over UTF-8 seems
>
> Agreed, but I think one of the problems is that the preference-ordering
> is the same for load-time-detection as it is for save-time-detection,
> so if you move utf-8 up for detection you end up saving all new files
> in utf-8 which is not OK in non-utf-8 locales.

This sounds serious in theory, but I was unable to make emacs behave
unexpectedly in practice.  Do you have an example?

I tried opening a new file and typing åäö and saving it.  It was saved
(without query) as latin-1 with sv_SE, en_GB, en_US and C locales.
All are what I would expect, and is consistent with what I get for
emacs 21.3. (Of course, this is a western-centric test case, but I
don't know what non-western users expect so I can't really test
anything else.)

Note that iso-8-1 is still prefered over utf-8 with Kenichi's change.

Note also that mule-cmds.el seem to guess the appropriate charset for
most locales, so UTF-8 will never be prefered over the "locale
charset".  A jp_JP user will have a low priority for UTF-8.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-28 22:29       ` Simon Josefsson
@ 2003-04-29 13:49         ` Stefan Monnier
  2003-04-29 14:27           ` Simon Josefsson
  2003-04-30  5:43           ` Richard Stallman
  0 siblings, 2 replies; 55+ messages in thread
From: Stefan Monnier @ 2003-04-29 13:49 UTC (permalink / raw)
  Cc: Stefan Monnier

> >> Can't we move binary down below UTF-8 in CVS?  IMHO we should move
> >> UTF-8 earlier still, since determining whether data is UTF-8 or not
> >> can be done with good probability.  Prefering binary over UTF-8 seems
> >
> > Agreed, but I think one of the problems is that the preference-ordering
> > is the same for load-time-detection as it is for save-time-detection,
> > so if you move utf-8 up for detection you end up saving all new files
> > in utf-8 which is not OK in non-utf-8 locales.
> 
> This sounds serious in theory, but I was unable to make emacs behave
> unexpectedly in practice.  Do you have an example?

The problem only appears if you move utf-8 to the first spot.
Moving it to the first spot otherwise makes sense since auto-detection
of utf-8 is about as reliable as it gets.


	Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-29 13:49         ` Stefan Monnier
@ 2003-04-29 14:27           ` Simon Josefsson
  2003-04-30  4:42             ` Stephen J. Turnbull
  2003-04-30  5:43           ` Richard Stallman
  1 sibling, 1 reply; 55+ messages in thread
From: Simon Josefsson @ 2003-04-29 14:27 UTC (permalink / raw)
  Cc: emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

>> >> Can't we move binary down below UTF-8 in CVS?  IMHO we should move
>> >> UTF-8 earlier still, since determining whether data is UTF-8 or not
>> >> can be done with good probability.  Prefering binary over UTF-8 seems
>> >
>> > Agreed, but I think one of the problems is that the preference-ordering
>> > is the same for load-time-detection as it is for save-time-detection,
>> > so if you move utf-8 up for detection you end up saving all new files
>> > in utf-8 which is not OK in non-utf-8 locales.
>> 
>> This sounds serious in theory, but I was unable to make emacs behave
>> unexpectedly in practice.  Do you have an example?
>
> The problem only appears if you move utf-8 to the first spot.

But utf-8 hasn't been moved first, so this isn't a problem?

I agree it would be useful to be able to configure different loading
and saving time preferences.  Then I would be able to specify that
emacs should try to save data as ascii first, then latin-1, then
latin-9 and then UTF-8, then give up and ask.  On loading, I'd want it
to try latin-9 instead of latin-1 though.  In non-UTF-8 locales, I
think this behaviour is what many europeans would want.

> Moving it to the first spot otherwise makes sense since
> auto-detection of utf-8 is about as reliable as it gets.

Yup.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-29 14:27           ` Simon Josefsson
@ 2003-04-30  4:42             ` Stephen J. Turnbull
  0 siblings, 0 replies; 55+ messages in thread
From: Stephen J. Turnbull @ 2003-04-30  4:42 UTC (permalink / raw)
  Cc: Stefan Monnier

>>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes:

    Simon> I agree it would be useful to be able to configure
    Simon> different loading and saving time preferences.  Then I
    Simon> would be able to specify that emacs should try to save data
    Simon> as ascii first, then latin-1, then latin-9 and then UTF-8,
    Simon> then give up and ask.  On loading, I'd want it to try
    Simon> latin-9 instead of latin-1 though.  In non-UTF-8 locales, I
    Simon> think this behaviour is what many europeans would want.

latin-unity provides this (but basically only for Latin scripts).  I
don't think it works under GNU Emacs, and it's pretty crufty, but it's
all my code and is assigned to the FSF.  So if somebody wants to port/
improve, I'll be happy to answer questions.  NB: probably requires
Mule-UCS, although I imagine emacs-unicode has the necessary
facilities to build tables.

cvs -d :pserver:cvs@cvs.xemacs.org:/pack/xemacscvs checkout latin-unity

(password is either null or "cvs", I forget).

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-29 13:49         ` Stefan Monnier
  2003-04-29 14:27           ` Simon Josefsson
@ 2003-04-30  5:43           ` Richard Stallman
  1 sibling, 0 replies; 55+ messages in thread
From: Richard Stallman @ 2003-04-30  5:43 UTC (permalink / raw)
  Cc: jas

    > > Agreed, but I think one of the problems is that the preference-ordering
    > > is the same for load-time-detection as it is for save-time-detection,
    > > so if you move utf-8 up for detection you end up saving all new files
    > > in utf-8 which is not OK in non-utf-8 locales.

    The problem only appears if you move utf-8 to the first spot.
    Moving it to the first spot otherwise makes sense since auto-detection
    of utf-8 is about as reliable as it gets.

This suggests we do want to have two separate preferences lists,
and put utf-8 at the top for reading.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-26 12:14     ` Simon Josefsson
@ 2003-05-01  7:20       ` Kenichi Handa
  2003-05-01 14:06         ` Alex Schroeder
  2003-05-01 18:03         ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz
  0 siblings, 2 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-05-01  7:20 UTC (permalink / raw)
  Cc: emacs-devel

In article <iluptn9xxp0.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> It seems to work, thanks.  Wouldn't it be useful to at least be able
> to customize the fontset?  Requiring use of X resources to get Unicode
> to show up correctly is not user friendly.

Unfortunately, a fontset is not a variable, thus can't be
customized easily.  Another way to modify a fontset is to do
something like this in .emacs.

(set-fontset-font "fontset-default"
		  'mule-unicode-0100-24ff
		  '("gnu-unifont" . "iso10646-1"))
...

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-28  4:38       ` Richard Stallman
@ 2003-05-01  8:27         ` Kenichi Handa
  2003-05-02  7:06           ` Richard Stallman
  2003-05-04 11:04           ` Dave Love
  0 siblings, 2 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-05-01  8:27 UTC (permalink / raw)
  Cc: jas

In article <E19A0Ou-0001sm-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     By the way, all these problems are solved in emacs-unicode.

> Could you report on what work is needed before we
> can release this code?

Dave has compiled the current problems in the file
emacs-unicode/README.unicode.  Some of them (especially
serious ones) are already fixed.

Dave, do you have anything else to add to that file?

I think the most difficult task for releasing that code is
to merge the changes into HEAD.  Emacs-unicode was branched
on 2002-03-01, and since then, there were a lot of changes
in HEAD.

---
Ken'ichi HANDA
handa@m17n.org

--- README.unicode ---
                                            -*-mode: text; coding: latin-1;-*-

Problems, fixmes and other issues in the emacs-unicode branch
-------------------------------------------------------------

Notes by fx to record various things of variable importance.  handa
needs to check them -- don't take too seriously, especially with
regard to completeness.

_Do take seriously that you don't want this branch unless you're
actually working on it; you risk your data by actually using it._  If
you just want to edit Unicode and/or unify iso-8859 et al, see the
existing support and the extra stuff at
<URL:ftp://dlpx1.dl.ac.uk/fx/emacs/Mule>, mostly now in the CVS trunk.
(Editing support is mostly orthogonal to the internal representation.)

 * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has
   undesirable effects.  E.g.:
   (multibyte-string-p (let ((s "x")) (aset s 0 ?Â£) s)) => nil
   (multibyte-string-p (concat [?Â£])) => nil
   (text-char-description ?Â£) => "M-#"

	These examples are all fixed by the change of 2002-10-14, but
	there still exist questionalble SINGLE_BYTE_CHAR_P in the
	code.

 * Rationalize character syntax and its relationship to the Unicode
   database.  (Applies mainly to symbol an punctuation syntax.)

 * Fontset handling and customization needs work.  We want to relate
   fonts to scripts, probably based on the Unicode blocks.  The
   presence of small-repertoire 10646-encoded fonts in XFree 4 is a
   pain, not currently worked round.

	With the change on 2002-07-26, multiple fonts can be
	specified in a fontset for a specific range of characters.
	Each range can also be specified by script.  Before using
	ISO10646 fonts, Emacs checks their repertories to avoid such
	fonts that don't have a glyph for a specific character.

 * Work is also needed on charset and coding system priorities.

 * The relevant bits of latin1-disp.el need porting (and probably
   re-naming/updating).  See also cyril-util.el.

 * Quail files need more work now the encoding is irrelevant.

 * What to do with the old coding categories stuff?

 * The preferred-coding-system property of charsets should probably be
   junked unless it can be made more useful now.

 * find-multibyte-characters needs looking at.

 * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing
   charsets.

 * Check up on definition of alternativnj.

 * Lazy-load tables for unify-charset somehow?

	Actually, Emacs clear out all charset maps and unify-map just
	before dumping, and their are loaded again on demand the
	dumped emacs.  But, those maps (char tables) generated while
	temacs is running can't be get rid of from the dumped emacs.

 * Translation tables for {en,de}code currently aren't supported.

	This should be fixed by the changes of 2002-10-14.

 * Defining CCL coding systems currently doesn't work.

	This should be fixed by the changes of 2003-01-30.

 * iso-2022 charsets get unified on i/o.

	With the change on 2003-01-06, decoding routines put `charset'
	property to decoded text, and iso-2022 encoder pay attention
	to it.  Thus, for instance, reading and writing by
	iso-2022-7bit preserve the original designation sequences.
	The property name `preferred-charset' may be better?

	We may have to utilize this property to decide a font.

 * Revisit locale processing: look at treating the language and
   charset parts separately.  (Language should affect things like
   speling and calendar, but that's not a Unicode issue.)

 * Handle Unicode combining characters usefully, e.g. diacritics, and
   handle more scripts specifically (Ã  la Devanagari).  There are
   issues with canonicalization.

 * Bidi is a separate issue with no support currently.

 * We need tabular input methods, e.g. for maths symbols.  (Not
   specific to Unicode.)

 * Need multibyte text in menus, e.g. for the above.  (Not specific to
   Unicode.)

 * There's currently no support for Unicode normalization.

 * Populate char-width-table correctly for Unicode chanaracters and
   worry about what happens when double-width charsets covering
   non-CJK characters are unified.

 * Emacs 20/21 .elc files are currently not loadable.  It may or may
   not be possible to do this properly.

	With the change on 2002-07-24, elc files generated by Emacs
	20.3 and later are correctly loaded (including those
	containing multibyte characters and compressed).  But, elc
	files generated by 20.2 and the primer are still not loadable.
	Is it really worth working on it?

 * Rmail won't work with non-ASCII text.  Encoding issues for Babyl
   files need sorting out, but rms says Babyl will go before this is
   released.

 * Gnus still needs some attention, and we need to get changes
   accepted by Gnus maintainers...

 * There are type errors lurking, e.g. in
   Fcheck_coding_systems_region.  Define ENABLE_CHECKING to find them.

 * You can grep the code for lots of fixmes.

 * Old auto-save files, and similar files, such as Gnus drafts,
   containing non-ASCII characters probably won't be re-read correctly.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implementing charset-aware X font names
  2003-04-28 12:27       ` Implementing charset-aware X font names Stephen J. Turnbull
@ 2003-05-01 11:13         ` Kenichi Handa
  2003-05-01 14:14           ` Alex Schroeder
  0 siblings, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-05-01 11:13 UTC (permalink / raw)
  Cc: emacs-devel

In article <87fzo24xj4.fsf@tleepslib.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes:
>>>>>>  "Kenichi" == Kenichi Handa <handa@m17n.org> writes:
Kenichi>  We connect charsets to font registries vis fontset.  And
Kenichi>  in the emacs-unicode version, we have enhanced it so that
Kenichi>  we can connect scripts, charsets, range of characters to
Kenichi>  multiple font specs.

> Is this documented outside of source code?

It is documented as the docstring of set-fontset-font (as
attached at the tail).  Internally, a fontset is implemented
by a char-table of a special format.

> How does it compare to the specifier interface used by XEmacs?  XEmacs
> specifiers allow a face to automatically select the correct font by
> X11 font registry, but there must also be a similer mechanism for
> Windows, so this must be somewhat more general than "font registry".
> Also, specifiers implement both inheritance and repeated queries (eg,
> you can have both "arial" and "helvetica" fonts for ascii/latin-1, and
> they will be tried in a specified order, usually "last added first",
> until the face can be displayed).

I don't know about "the specifiers interface of XEmacs".  In
Emacs, a face can have an attribute `fontset'.  In that
case, for displaying a non-ASCII character CHAR by that
face, the fontset is looked up.  If a face doesn't have
`fontset' attribute, the default fontset is looked up.  If a
multiple font specs are found for CHAR, one font spec is
selected as below:

(1) For each font spec, find the encoding charset (a charset
    that maps a character code to a glyph code).

(2) Sort the font specs by using that encoding charset as a
    key according to the charset priority of the current
    language environment.  So, for instance, in Japanese
    lang. env., most Han characters are displayed by a
    Japanese font.

(3) Select the first font spec whose repertory contains CHAR.

The font spec is merged with font related attributes of the
face, then the best matching font is selected.

Kenichi>  In addtion, in emacs-unicode, we separate the concept of
Kenichi>  font encoding and font repertory, and for *-iso10646-1
Kenichi>  fonts, we checks the font contents to get the true
Kenichi>  repertory as a char table.

> Ah, I'll have to ask Ben if he's handled that in the devel branch.
> That sounds like a very good interface, and if possible I'd like to
> use it in XEmacs too.  I assume it is exported so Lisp programs can
> find out the repertoire?

Currently no.  As the repertoire is checked automatically in
a fontset handler, for the moment, I see no necessity in
exporting that to Lisp.

---
Ken'ichi HANDA
handa@m17n.org

----------------------------------------------------------------------
set-fontset-font is a built-in function.
(set-fontset-font NAME CHARACTER FONT-SPEC &optional FRAME ADD)

Modify fontset NAME to use FONT-SPEC for CHARACTER.

CHARACTER may be a cons; (FROM . TO), where FROM and TO are
characters.  In that case, use FONT-SPEC for all characters in the
range FROM and TO (inclusive).

CHARACTER may be a script name symbol.  In that case, use FONT-SPEC
for all characters that belong to the script.

CHARACTER may be a charset which has a :code-offset attribute and the
attribute value is greater than the maximum Unicode character
(#x10FFFF).  In that case, use FONT-SPEC for all characters in the
charset.

FONT-SPEC may be:
 * A vector [ FAMILY WEIGHT SLANT WIDTH ADSTYLE REGISTRY ].
   See the documentation of `set-face-attribute' for the detail of
   these vector elements;
 * A cons (FAMILY . REGISTRY), where FAMILY is a font family name and
   REGISTRY is a font registry name;
 * A font name string.

Optional 4th argument FRAME, if non-nil, is a frame.  This argument is
kept for backward compatibility and has no meaning.

Optional 5th argument ADD, if non-nil, specifies how to add FONT-SPEC
to the font specifications for RANGE previously set.  If it is
`prepend', FONT-SPEC is prepended.  If it is `append', FONT-SPEC is
appended.  By default, FONT-SPEC overrides the previous settings.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-01  7:20       ` Kenichi Handa
@ 2003-05-01 14:06         ` Alex Schroeder
  2003-05-01 18:03         ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz
  1 sibling, 0 replies; 55+ messages in thread
From: Alex Schroeder @ 2003-05-01 14:06 UTC (permalink / raw)
  Cc: jas

Kenichi Handa <handa@m17n.org> writes:

> In article <iluptn9xxp0.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> It seems to work, thanks.  Wouldn't it be useful to at least be able
>> to customize the fontset?  Requiring use of X resources to get Unicode
>> to show up correctly is not user friendly.
>
> Unfortunately, a fontset is not a variable, thus can't be
> customized easily.  Another way to modify a fontset is to do
> something like this in .emacs.
>
> (set-fontset-font "fontset-default"
> 		  'mule-unicode-0100-24ff
> 		  '("gnu-unifont" . "iso10646-1"))
> ...

Well, we could create a "dummy" option with an interesting :set
property which will then call create-fontset-from-fontset-spec and
friends using its own value (which was customized by the user).

Alex.
-- 
http://www.emacswiki.org/cgi-bin/alex.pl

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implementing charset-aware X font names
  2003-05-01 11:13         ` Kenichi Handa
@ 2003-05-01 14:14           ` Alex Schroeder
  2003-05-01 23:16             ` Kenichi Handa
  0 siblings, 1 reply; 55+ messages in thread
From: Alex Schroeder @ 2003-05-01 14:14 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

>> Is this documented outside of source code?
>
> It is documented as the docstring of set-fontset-font (as
> attached at the tail).  Internally, a fontset is implemented
> by a char-table of a special format.

I would like to collect some of the stuff from your recent mails into
little articles on the Emacs Wiki (its content is licensed under the
FDL).  Is that ok with you?

Alex.
-- 
http://www.emacswiki.org/cgi-bin/alex.pl

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Customizing fontsets (was: Cyrillic vs UTF-8)
  2003-05-01  7:20       ` Kenichi Handa
  2003-05-01 14:06         ` Alex Schroeder
@ 2003-05-01 18:03         ` Oliver Scholz
  2003-05-02  5:17           ` Customizing fontsets Alex Schroeder
  1 sibling, 1 reply; 55+ messages in thread
From: Oliver Scholz @ 2003-05-01 18:03 UTC (permalink / raw)

Kenichi Handa <handa@m17n.org> writes:

> In article <iluptn9xxp0.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
>> It seems to work, thanks.  Wouldn't it be useful to at least be able
>> to customize the fontset?  Requiring use of X resources to get Unicode
>> to show up correctly is not user friendly.
>
> Unfortunately, a fontset is not a variable, thus can't be
> customized easily.
[...]

But wouldn't it be an option to add a `custom-set-fontsets' besides
`custom-set-faces' and `custom-set-variables'? It would make sense IMO
to treat the short alias names that way. I.e. 
`M-x customize-fontset RET fontset-default RET' could simply work and
it could be consistent with the rest of the customization interface
from the user's point of view.

In fact I started to work on it. (That's the deeper reason for my
patch to `set-fontset-font'.) I am mostly in the state of reading the
code for fontsets and for customize respectively, though.

Currently I wonder what the default values should be that a
"fontset-widget" should present to the user. The return-value of
`fontset-info' is simply to large, I think. It seems weird that a user
should specify a fontset covering two or three charsets in her .emacs
or in .Xresources -- and is confronted with a list of dozens of
charsets then, when she wants to customize it later.

    Oliver
-- 
12 Floréal an 211 de la Révolution
Liberté, Egalité, Fraternité!

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Implementing charset-aware X font names
  2003-05-01 14:14           ` Alex Schroeder
@ 2003-05-01 23:16             ` Kenichi Handa
  0 siblings, 0 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-05-01 23:16 UTC (permalink / raw)
  Cc: emacs-devel

In article <87k7dag3eh.fsf@gnu.org>, Alex Schroeder <alex@gnu.org> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>>  Is this documented outside of source code?
>> 
>>  It is documented as the docstring of set-fontset-font (as
>>  attached at the tail).  Internally, a fontset is implemented
>>  by a char-table of a special format.

> I would like to collect some of the stuff from your recent mails into
> little articles on the Emacs Wiki (its content is licensed under the
> FDL).  Is that ok with you?

Of course, no problem.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-01 18:03         ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz
@ 2003-05-02  5:17           ` Alex Schroeder
  2003-05-02  6:32             ` Kenichi Handa
  2003-05-03  0:33             ` Oliver Scholz
  0 siblings, 2 replies; 55+ messages in thread
From: Alex Schroeder @ 2003-05-02  5:17 UTC (permalink / raw)
  Cc: emacs-devel

Oliver Scholz <alkibiades@gmx.de> writes:

> But wouldn't it be an option to add a `custom-set-fontsets' besides
> `custom-set-faces' and `custom-set-variables'? 
> In fact I started to work on it.

I am happy to see somebody work on it!

> Currently I wonder what the default values should be that a
> "fontset-widget" should present to the user. The return-value of
> `fontset-info' is simply to large, I think. It seems weird that a user
> should specify a fontset covering two or three charsets in her .emacs
> or in .Xresources -- and is confronted with a list of dozens of
> charsets then, when she wants to customize it later.

I don't understand.  When you run M-x customize-fontset RET
fontset-default RET, you expect to see a widget that explains the
value of "fontset-default", and offer a way to change it.  Thus, all
the info returned by (fontset-info "fontset-default") must be visible
and editable at some point.  Unless you are proposing some sort of
fontset-inheritance mechanism?

Alex.
-- 
http://www.emacswiki.org/cgi-bin/alex.pl

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-02  5:17           ` Customizing fontsets Alex Schroeder
@ 2003-05-02  6:32             ` Kenichi Handa
  2003-05-02 13:25               ` Stefan Monnier
  2003-05-03  0:40               ` Oliver Scholz
  2003-05-03  0:33             ` Oliver Scholz
  1 sibling, 2 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-05-02  6:32 UTC (permalink / raw)
  Cc: alkibiades

In article <87llxqorkr.fsf@gnu.org>, Alex Schroeder <alex@gnu.org> writes:
> Oliver Scholz <alkibiades@gmx.de> writes:
>>  But wouldn't it be an option to add a `custom-set-fontsets' besides
>>  `custom-set-faces' and `custom-set-variables'? 
>>  In fact I started to work on it.

> I am happy to see somebody work on it!

Me too!!!

>>  Currently I wonder what the default values should be that a
>>  "fontset-widget" should present to the user. The return-value of
>>  `fontset-info' is simply to large, I think. It seems weird that a user
>>  should specify a fontset covering two or three charsets in her .emacs
>>  or in .Xresources -- and is confronted with a list of dozens of
>>  charsets then, when she wants to customize it later.

> I don't understand.  When you run M-x customize-fontset RET
> fontset-default RET, you expect to see a widget that explains the
> value of "fontset-default", and offer a way to change it.  Thus, all
> the info returned by (fontset-info "fontset-default") must be visible
> and editable at some point.

But, it is true that the value of fontset-info is very hard
to customize.  A fontset is created by new-fontset, and is
modified by the sequence of set-fontset-font.  In the
resulting fontset, the specified data are scattered around
in the char-table of the fontset.

I think the following idea will solve this problem.

The arguemnt FONTLIST of new-fontset is a list of this form:
    ((TARGET . FONT-SPEC) ...)
TARGET is a character, a cons (FROM-CHAR . TO-CHAR), or a
charset.  FONT-SPEC is (FAMILY . REGISTRY) or FONT-NAME.

The function set-fontset-font also takes the arguments
TARGET and FONT-SPEC.

In other words, a fontset can be re-created by the sequence
of (TARGET . FONT-SPEC).  So how about recording that
sequence as a list in each fontset (the order is important).
I think it's far user-friendly to customize that list than
to customize char-table elements.  In addition, we can use
the normal customization facility for a list (INS, DEL) at
the top level.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-01  8:27         ` Kenichi Handa
@ 2003-05-02  7:06           ` Richard Stallman
  2003-05-02 21:51             ` Eli Zaretskii
  2003-05-04 11:04           ` Dave Love
  1 sibling, 1 reply; 55+ messages in thread
From: Richard Stallman @ 2003-05-02  7:06 UTC (permalink / raw)
  Cc: jas

    Dave has compiled the current problems in the file
    emacs-unicode/README.unicode.  Some of them (especially
    serious ones) are already fixed.

How about if you edit that file, deleting the items that
are already fixed.  That should be an easy job, right?
Then you could post a file that is current.

    I think the most difficult task for releasing that code is
    to merge the changes into HEAD.  Emacs-unicode was branched
    on 2002-03-01, and since then, there were a lot of changes
    in HEAD.

Are you starting to work on this now?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-02  6:32             ` Kenichi Handa
@ 2003-05-02 13:25               ` Stefan Monnier
  2003-05-03  0:40               ` Oliver Scholz
  1 sibling, 0 replies; 55+ messages in thread
From: Stefan Monnier @ 2003-05-02 13:25 UTC (permalink / raw)
  Cc: alkibiades

> In other words, a fontset can be re-created by the sequence
> of (TARGET . FONT-SPEC).  So how about recording that
> sequence as a list in each fontset (the order is important).
> I think it's far user-friendly to customize that list than
> to customize char-table elements.  In addition, we can use
> the normal customization facility for a list (INS, DEL) at
> the top level.

And then a `deffontset' can be a simple macro that expands to
a `defcustom' with appropriate :get and :set.


	Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-02  7:06           ` Richard Stallman
@ 2003-05-02 21:51             ` Eli Zaretskii
  2003-05-03 13:37               ` Juanma Barranquero
  2003-05-04 13:03               ` Richard Stallman
  0 siblings, 2 replies; 55+ messages in thread
From: Eli Zaretskii @ 2003-05-02 21:51 UTC (permalink / raw)
  Cc: emacs-devel

> From: Richard Stallman <rms@gnu.org>
> Reply-to: rms@gnu.org
> 
>     I think the most difficult task for releasing that code is
>     to merge the changes into HEAD.  Emacs-unicode was branched
>     on 2002-03-01, and since then, there were a lot of changes
>     in HEAD.
> 
> Are you starting to work on this now?

Are you suggesting that the next non-bugfix Emacs release will have
emacs-unicode merged?  I thought we wanted to release the current
trunk first, as it has a lot of useful features and delaying them
(something that's probably unavoidable for such a major change as
Unicode-based Emacs) any more than we already did would be
undesirable.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-02  5:17           ` Customizing fontsets Alex Schroeder
  2003-05-02  6:32             ` Kenichi Handa
@ 2003-05-03  0:33             ` Oliver Scholz
  1 sibling, 0 replies; 55+ messages in thread
From: Oliver Scholz @ 2003-05-03  0:33 UTC (permalink / raw)
  Cc: emacs-devel

Alex Schroeder <alex@gnu.org> writes:

> Oliver Scholz <alkibiades@gmx.de> writes:
[...]
>> Currently I wonder what the default values should be that a
>> "fontset-widget" should present to the user. The return-value of
>> `fontset-info' is simply to large, I think. It seems weird that a user
>> should specify a fontset covering two or three charsets in her .emacs
>> or in .Xresources -- and is confronted with a list of dozens of
>> charsets then, when she wants to customize it later.
>
> I don't understand.  When you run M-x customize-fontset RET
> fontset-default RET, you expect to see a widget that explains the
> value of "fontset-default", and offer a way to change it.  Thus, all
> the info returned by (fontset-info "fontset-default") must be visible
> and editable at some point.[...]

To give an example: I have the following in my .emacs (simplified):

(create-fontset-from-fontset-spec "\
-b&h-lucidatypewriter-medium-r-*-*-*-100-*-*-*-*-fontset-egoge,\
mule-unicode-0100-24ff:-*-fixed-medium-r-*-*-*-120-*-*-*-*-iso10646-1")

Now, if I'd do `M-x custimize-fontset RET fontset-egoge RET', I'd
expect to see something like this:


Family: [b&h-lucidatypwriter                                      ]

[INS] [DEL] Charset:  [mule-unicode-0100-24ff                     ]
            Family:   [fixed-medium                               ]
            Registry: [iso10646-1
[INS]


But if we create this widget based on `fontset-info', I'd see a list
of dozens of charsets and character ranges. Have a look at it with 
`M-x describe-fontset RET fontset-egoge RET'.

I'd say that this would be surprising, if not confusing for users
that are not familiar with Emacs' concepts of charsets and
fontsets.

    Oliver
-- 
Oliver Scholz               14 Floréal an 211 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-02  6:32             ` Kenichi Handa
  2003-05-02 13:25               ` Stefan Monnier
@ 2003-05-03  0:40               ` Oliver Scholz
  2003-05-03  1:50                 ` Kenichi Handa
  1 sibling, 1 reply; 55+ messages in thread
From: Oliver Scholz @ 2003-05-03  0:40 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

[...]
> In other words, a fontset can be re-created by the sequence
> of (TARGET . FONT-SPEC).  So how about recording that
> sequence as a list in each fontset (the order is important).
> I think it's far user-friendly to customize that list than
> to customize char-table elements.  In addition, we can use
> the normal customization facility for a list (INS, DEL) at
> the top level.
[...]

I agree. So that would be done by creating an additional extra slot
in the fontset, right?

    Oliver
-- 
Oliver Scholz               14 Floréal an 211 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-03  0:40               ` Oliver Scholz
@ 2003-05-03  1:50                 ` Kenichi Handa
  2003-05-03 12:08                   ` Oliver Scholz
  0 siblings, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-05-03  1:50 UTC (permalink / raw)
  Cc: emacs-devel

In article <87d6j1x79x.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <alkibiades@gmx.de> writes:
> Kenichi Handa <handa@m17n.org> writes:
> [...]
>>  In other words, a fontset can be re-created by the sequence
>>  of (TARGET . FONT-SPEC).  So how about recording that
>>  sequence as a list in each fontset (the order is important).
>>  I think it's far user-friendly to customize that list than
>>  to customize char-table elements.  In addition, we can use
>>  the normal customization facility for a list (INS, DEL) at
>>  the top level.
> [...]

> I agree. So that would be done by creating an additional extra slot
> in the fontset, right?

Yes.  And we need to extract that information from a
fontset.  How about adding the 2nd optional arg
CREATION-HISTORY (?) to fontset-info, and if it is non-nil,
return that list instead.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-03  1:50                 ` Kenichi Handa
@ 2003-05-03 12:08                   ` Oliver Scholz
  2003-05-07  1:22                     ` Kenichi Handa
  0 siblings, 1 reply; 55+ messages in thread
From: Oliver Scholz @ 2003-05-03 12:08 UTC (permalink / raw)
  Cc: emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> In article <87d6j1x79x.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <alkibiades@gmx.de> writes:
>> Kenichi Handa <handa@m17n.org> writes:
>> [...]
>>>  In other words, a fontset can be re-created by the sequence
>>>  of (TARGET . FONT-SPEC).  So how about recording that
>>>  sequence as a list in each fontset (the order is important).
>>>  I think it's far user-friendly to customize that list than
>>>  to customize char-table elements.  In addition, we can use
>>>  the normal customization facility for a list (INS, DEL) at
>>>  the top level.
>> [...]
>
>> I agree. So that would be done by creating an additional extra slot
>> in the fontset, right?
>
> Yes.  And we need to extract that information from a
> fontset.  How about adding the 2nd optional arg
> CREATION-HISTORY (?) to fontset-info, and if it is non-nil,
> return that list instead.

Why not a separate function?

#define FONTSET_SPEC(fontset)  	XCHAR_TABLE (fontset)->extras[3]

DEFUN ("fontset-spec", Ffontset_spec, Sfontset_spec, 1, 1, 0,
       doc: /* FIXME */)
     (name)
     Lisp_Object name;
{
  Lisp_Object fontset;

  (*check_window_system_func) ();

  fontset = check_fontset_name (name);

  return FONTSET_SPEC (fontset);
}

... etc.

    Oliver
-- 
Oliver Scholz               14 Floréal an 211 de la Révolution
Taunusstr. 25               Liberté, Egalité, Fraternité!
60329 Frankfurt a. M.       http://www.jungdemokratenhessen.de
Tel. (069) 97 40 99 42      http://www.jdjl.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-02 21:51             ` Eli Zaretskii
@ 2003-05-03 13:37               ` Juanma Barranquero
  2003-05-03 19:04                 ` Eli Zaretskii
  2003-05-04 13:03               ` Richard Stallman
  1 sibling, 1 reply; 55+ messages in thread
From: Juanma Barranquero @ 2003-05-03 13:37 UTC (permalink / raw)
  Cc: emacs-devel

On Sat, 03 May 2003 00:51:07 +0300, "Eli Zaretskii" <eliz@elta.co.il> wrote:

> I thought we wanted to release the current
> trunk first, as it has a lot of useful features and delaying them
> (something that's probably unavoidable for such a major change as
> Unicode-based Emacs) any more than we already did would be
> undesirable.

From an exchange between you and RMS in past August:

(you:)

>        v19 - support for X
>        v20 - m17n
>        v21 - new display engine
> 
>     If we follow this, v22 should be the Unicode-based Emacs, not some 
>     intermediate release.

(Richard:)

> I agree.

Not to rehash the discussion again, but IMHO we should branch for a
feature release (21.5, I gather, as there's going to be a bugfix 21.4)
before merging the unicode branch.

                                                           /L/e/k/t/u

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-03 13:37               ` Juanma Barranquero
@ 2003-05-03 19:04                 ` Eli Zaretskii
  0 siblings, 0 replies; 55+ messages in thread
From: Eli Zaretskii @ 2003-05-03 19:04 UTC (permalink / raw)
  Cc: emacs-devel

> Date: Sat, 03 May 2003 15:37:18 +0200
> From: Juanma Barranquero <lektu@terra.es>
> 
> From an exchange between you and RMS in past August:
> 
> (you:)
> 
> >        v19 - support for X
> >        v20 - m17n
> >        v21 - new display engine
> > 
> >     If we follow this, v22 should be the Unicode-based Emacs, not some 
> >     intermediate release.
> 
> (Richard:)
> 
> > I agree.

Thanks.  It's good to know I hold to the same opinions even when I
don't quite remember my old ones ;-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-01  8:27         ` Kenichi Handa
  2003-05-02  7:06           ` Richard Stallman
@ 2003-05-04 11:04           ` Dave Love
  2003-05-04 12:01             ` Simon Josefsson
  2003-05-05  8:47             ` Kenichi Handa
  1 sibling, 2 replies; 55+ messages in thread
From: Dave Love @ 2003-05-04 11:04 UTC (permalink / raw)
  Cc: jas

Kenichi Handa <handa@m17n.org> writes:

> Dave, do you have anything else to add to that file?

Probably yes (if I thought about it) but I haven't been able to do
much work on it for ages.  In several respects it's a bit difficult to
tell what state it's in, since there are serious problems with things
like redisplay which make it essentially unusable.

> I think the most difficult task for releasing that code is
> to merge the changes into HEAD.

Yes, it will be a nightmare and things will get lost.

What is this thread all about?  I've replied to private mail to
correct misconceptions, but it seemed to be nothing to do with
Cyrillic (which I did all the recent work on as far as I know).

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-04 11:04           ` Dave Love
@ 2003-05-04 12:01             ` Simon Josefsson
  2003-05-04 17:13               ` Dave Love
  2003-05-05  8:47             ` Kenichi Handa
  1 sibling, 1 reply; 55+ messages in thread
From: Simon Josefsson @ 2003-05-04 12:01 UTC (permalink / raw)
  Cc: Kenichi Handa

Dave Love <d.love@dl.ac.uk> writes:

> What is this thread all about?  I've replied to private mail to
> correct misconceptions, but it seemed to be nothing to do with
> Cyrillic (which I did all the recent work on as far as I know).

The original problem was that saving cyrillic (e.g., from the HELLO
file) as UTF-8 weren't auto-detected as UTF-8 when loading the file
back again.  This has been fixed now.  Another problem was that Emacs,
when asked to load the file as UTF-8, picked a Unicode font that
didn't include this glyphs.  This has not been fixed (although Stephen
seemed to have some ideas).  To workaround the problem, users need to
define a fontset, and use it.  Doing this is rather user unfriendly (X
resource, or elisp) so I suggested making it possible to customize
fontsets.

My other Cyrillic thread was that (double-width) cyrillic isn't
possible to save as UTF-8 at all.  This was fixed by adding a PROBLEMS
entry that says not all of Unicode is supported.  While discussing it,
it seems like the real problem was the cut'n'paste behavior that
generated the double-width cyrillic in the first place, so there were
some discussion about making Emacs use UTF8_STRING, when available,
instead of COMPOUND_TEXT.

I hope this summarizes the thread.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-02 21:51             ` Eli Zaretskii
  2003-05-03 13:37               ` Juanma Barranquero
@ 2003-05-04 13:03               ` Richard Stallman
  1 sibling, 0 replies; 55+ messages in thread
From: Richard Stallman @ 2003-05-04 13:03 UTC (permalink / raw)
  Cc: emacs-devel

    >     I think the most difficult task for releasing that code is
    >     to merge the changes into HEAD.  Emacs-unicode was branched
    >     on 2002-03-01, and since then, there were a lot of changes
    >     in HEAD.
    > 
    > Are you starting to work on this now?

    Are you suggesting that the next non-bugfix Emacs release will have
    emacs-unicode merged?

That is not what I was talking about.  However, this might be a good
idea if the unicode stuff is ready for it soon.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-04 12:01             ` Simon Josefsson
@ 2003-05-04 17:13               ` Dave Love
  2003-05-04 18:03                 ` Simon Josefsson
  0 siblings, 1 reply; 55+ messages in thread
From: Dave Love @ 2003-05-04 17:13 UTC (permalink / raw)
  Cc: emacs-devel

Simon Josefsson <jas@extundo.com> writes:

> Another problem was that Emacs,
> when asked to load the file as UTF-8, picked a Unicode font that
> didn't include this glyphs.

I assume that's the general xfree86 4 lossage I mentioned in PROBLEMS.
I can't remember how the font will get chosen by default, but there's
code in cyrillic.el that should allow mule-unicode-0100-24ff
characters to be displayed with an 8859-5 or KOI font.  You can also
change into which Emacs characters utf-8 decodes.

> To workaround the problem, users need to
> define a fontset, and use it.

Yes (or purge the unhelpful fonts).  If the combination of the
PROBLEMS entry and the manual aren't good enough, suggestions would be
useful.

> Doing this is rather user unfriendly (X resource, or elisp) so I
> suggested making it possible to customize fontsets.

Yes.  I mostly implemented customizing the default set (which I think
is all that needs customizing) for Emacs 22, but was stymied by the
treatment of the default face somewhere.  I complained about that some
time ago, but it never got resolved and I've not had time to go back
and try to sort it out.  (I think that problem is the same in Emacs 21
and 22, but the fontset mechanism in the latter is different.)

> My other Cyrillic thread was that (double-width) cyrillic

I assume that means the Cyrillic parts of the CJK charsets.

> isn't possible to save as UTF-8 at all.

It's possible if you amend the tables defined in ucs-tables.el or
utf-8.el -- wherever it is now.  I can't remember whether there are
potential problems with that, but I at least thought it wasn't
worthwhile.  If you want to experiment, Mule-UCS has tables with the
non-CJK characters labelled for JISX &c.

> This was fixed by adding a
> PROBLEMS entry that says not all of Unicode is supported.

It looks as though that needs work...

> I hope this summarizes the thread.

Thanks.

[This has got strange recipients because the original mail had
`Mail-Copies-To: nobody'.  As far as I know, that's a non-standard
header for news only, so perhaps there's a Gnus bug there.]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-04 17:13               ` Dave Love
@ 2003-05-04 18:03                 ` Simon Josefsson
  0 siblings, 0 replies; 55+ messages in thread
From: Simon Josefsson @ 2003-05-04 18:03 UTC (permalink / raw)
  Cc: Kenichi Handa

Dave Love <d.love@dl.ac.uk> writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> Another problem was that Emacs,
>> when asked to load the file as UTF-8, picked a Unicode font that
>> didn't include this glyphs.
>
> I assume that's the general xfree86 4 lossage I mentioned in PROBLEMS.

Yes.

> I can't remember how the font will get chosen by default, but
> there's code in cyrillic.el that should allow mule-unicode-0100-24ff
> characters to be displayed with an 8859-5 or KOI font.  You can also
> change into which Emacs characters utf-8 decodes.

The remaining problem is that this should happen automatically,
without user configuration.

>> To workaround the problem, users need to
>> define a fontset, and use it.
>
> Yes (or purge the unhelpful fonts).

Purging incomplete fonts is not a realistic option, like (I think it
was) Stephen said, it does not make sense for a font designer for,
e.g., cyrillic to include non-cyrillic fonts just because he (rightly)
decided to use the iso-10646 encoding.

>> My other Cyrillic thread was that (double-width) cyrillic
>
> I assume that means the Cyrillic parts of the CJK charsets.

Yes.

>> isn't possible to save as UTF-8 at all.
>
> It's possible if you amend the tables defined in ucs-tables.el or
> utf-8.el -- wherever it is now.  I can't remember whether there are
> potential problems with that, but I at least thought it wasn't
> worthwhile.  If you want to experiment, Mule-UCS has tables with the
> non-CJK characters labelled for JISX &c.

I don't normally use cyrillic, so I don't care much.

But I do believe that when a user like me (who don't normally use
cyrillic) happen to cut'n'paste a cyrillic string from another
application, it should Simply Work without requiring the user to
become familiar with cyrillic usage in emacs.

> [This has got strange recipients because the original mail had
> `Mail-Copies-To: nobody'.  As far as I know, that's a non-standard
> header for news only, so perhaps there's a Gnus bug there.]

The recipient list looked fine; I read the replies to my messages on
the list, no need to CC me.  But it is a non-standard header, so I
don't expect everyone to support it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-04 11:04           ` Dave Love
  2003-05-04 12:01             ` Simon Josefsson
@ 2003-05-05  8:47             ` Kenichi Handa
  1 sibling, 0 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-05-05  8:47 UTC (permalink / raw)
  Cc: jas

In article <rzq4r4b55xu.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  Dave, do you have anything else to add to that file?

> Probably yes (if I thought about it) but I haven't been able to do
> much work on it for ages.  In several respects it's a bit difficult to
> tell what state it's in, since there are serious problems with things
> like redisplay which make it essentially unusable.

As far as I remember, the redisplay problem is because of a
bug of the original display routine which is already fixed
in HEAD, and thus, once emacs-unicode is merged with HEAD,
the problem will disappear.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Customizing fontsets
  2003-05-03 12:08                   ` Oliver Scholz
@ 2003-05-07  1:22                     ` Kenichi Handa
  0 siblings, 0 replies; 55+ messages in thread
From: Kenichi Handa @ 2003-05-07  1:22 UTC (permalink / raw)
  Cc: emacs-devel

In article <87of2kdyg6.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <alkibiades@gmx.de> writes:
>>  Yes.  And we need to extract that information from a
>>  fontset.  How about adding the 2nd optional arg
>>  CREATION-HISTORY (?) to fontset-info, and if it is non-nil,
>>  return that list instead.

> Why not a separate function?

One reason is that the returned value is very similar to
that of fontset-info.  Another is I just can't think of a
good name for a new function.

> #define FONTSET_SPEC(fontset)  	XCHAR_TABLE (fontset)->extras[3]

> DEFUN ("fontset-spec", Ffontset_spec, Sfontset_spec, 1, 1, 0,
>        doc: /* FIXME */)
>      (name)
>      Lisp_Object name;

It seems that the name fontset-spec is too vague especially
because we already have fontset-info.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-04-28 21:49     ` Stefan Monnier
  2003-04-28 22:29       ` Simon Josefsson
@ 2003-05-19  0:40       ` Kenichi Handa
  2003-05-19  0:52         ` Stefan Monnier
  1 sibling, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-05-19  0:40 UTC (permalink / raw)
  Cc: jas

I'm sorry for the late response on this thread.

In article <200304282149.h3SLnxSU002624@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
> Agreed, but I think one of the problems is that the preference-ordering
> is the same for load-time-detection as it is for save-time-detection,
> so if you move utf-8 up for detection you end up saving all new files
> in utf-8 which is not OK in non-utf-8 locales.
> I suggested introducing a second preference-order, but nothing came
> out of it (probably because I didn't code anything up).

I'd like to avoid introducing a new mechanism to control a
coding system as far as possible.  And, the second
preference-order (used for saving) works only in this case:

(1) The buffer file coding system can't encode the current
    buffer, and
(2) The most preferred coding system can encode the current
    buffer, and
(3) A user doesn't want to use the most preferred one.

Isn't it a very rare case?

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-19  0:40       ` Kenichi Handa
@ 2003-05-19  0:52         ` Stefan Monnier
  2003-05-19  2:31           ` Kenichi Handa
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Monnier @ 2003-05-19  0:52 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> > Agreed, but I think one of the problems is that the preference-ordering
> > is the same for load-time-detection as it is for save-time-detection,
> > so if you move utf-8 up for detection you end up saving all new files
> > in utf-8 which is not OK in non-utf-8 locales.
> > I suggested introducing a second preference-order, but nothing came
> > out of it (probably because I didn't code anything up).
> 
> I'd like to avoid introducing a new mechanism to control a
> coding system as far as possible.  And, the second
> preference-order (used for saving) works only in this case:
> 
> (1) The buffer file coding system can't encode the current
>     buffer, and
> (2) The most preferred coding system can encode the current
>     buffer, and
> (3) A user doesn't want to use the most preferred one.
> 
> Isn't it a very rare case?

Maybe it is.  In my situation, I'd like utf-8 to be at the top
of the preferences w.r.t decoding because it virtually never
guesses wrong.
OTOH, I'm still using a mostly-latin-1 environment, so I'd
still rather avoid utf-8 when I can.  I.e. latin-1 should be at
the top of my preferences w.r.t encoding.

I.e. utf-8 is definitely not my most preferred encoding, but
since Emacs will often mistake a utf-8 text for latin-1 whereas
it virtually never mistakes a latin-1 text for utf-8, I do
put utf-8 as my most preferred encoding (and then try not to
forget to do C-x RET f when saving a new file).

	Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-19  0:52         ` Stefan Monnier
@ 2003-05-19  2:31           ` Kenichi Handa
  2003-05-19 13:28             ` Stefan Monnier
  0 siblings, 1 reply; 55+ messages in thread
From: Kenichi Handa @ 2003-05-19  2:31 UTC (permalink / raw)
  Cc: jas

In article <200305190052.h4J0qUfa017404@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  I'd like to avoid introducing a new mechanism to control a
>>  coding system as far as possible.  And, the second
>>  preference-order (used for saving) works only in this case:
>>  
>>  (1) The buffer file coding system can't encode the current
>>      buffer, and
>>  (2) The most preferred coding system can encode the current
>>      buffer, and
>>  (3) A user doesn't want to use the most preferred one.
>>  
>>  Isn't it a very rare case?

> Maybe it is.  In my situation, I'd like utf-8 to be at the top
> of the preferences w.r.t decoding because it virtually never
> guesses wrong.
> OTOH, I'm still using a mostly-latin-1 environment, so I'd
> still rather avoid utf-8 when I can.  I.e. latin-1 should be at
> the top of my preferences w.r.t encoding.

In that case, I think the source of the problem is that the
command prefer-coding-system doesn't satisfy this request of
yours:
   Prefer utf-8 only in automatic detection on reading a
   file, not for the other situations.

(defun prefer-coding-system (coding-system)
  "Add CODING-SYSTEM at the front of the priority list for automatic detection.
This also sets the following coding systems:
  o coding system of a newly created buffer
  o default coding system for subprocess I/O
This also sets the following values:
  o default value used as `file-name-coding-system' for converting file names.
  o default value for the command `set-terminal-coding-system' (not on MSDOS)
  o default value for the command `set-keyboard-coding-system'

How about changing it to skip "This also ..." parts if
called with a prefix argument?

Then, on writing, if buffer-file-coding-system is not
locally bound, default-buffer-file-coding-system is tried
automatically.

And, for the case that buffer-file-coding-system is locally
bound differently from default-buffer-file-coding-system,
but it can'd encode the current buffer, we can change
select-safe-coding-system to try
default-buffer-file-coding-system before trying the most
preferred coding system.

That way, I think we can satisfy your request completely.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-19  2:31           ` Kenichi Handa
@ 2003-05-19 13:28             ` Stefan Monnier
  2003-05-19 13:49               ` Stefan Monnier
  0 siblings, 1 reply; 55+ messages in thread
From: Stefan Monnier @ 2003-05-19 13:28 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> > Maybe it is.  In my situation, I'd like utf-8 to be at the top
> > of the preferences w.r.t decoding because it virtually never
> > guesses wrong.
> > OTOH, I'm still using a mostly-latin-1 environment, so I'd
> > still rather avoid utf-8 when I can.  I.e. latin-1 should be at
> > the top of my preferences w.r.t encoding.
> 
> In that case, I think the source of the problem is that the
> command prefer-coding-system doesn't satisfy this request of
> yours:
>    Prefer utf-8 only in automatic detection on reading a
>    file, not for the other situations.
> 
> (defun prefer-coding-system (coding-system)
>   "Add CODING-SYSTEM at the front of the priority list for automatic detection.
> This also sets the following coding systems:
>   o coding system of a newly created buffer
>   o default coding system for subprocess I/O
> This also sets the following values:
>   o default value used as `file-name-coding-system' for converting file names.
>   o default value for the command `set-terminal-coding-system' (not on MSDOS)
>   o default value for the command `set-keyboard-coding-system'
> 
> How about changing it to skip "This also ..." parts if
> called with a prefix argument?
> 
> Then, on writing, if buffer-file-coding-system is not
> locally bound, default-buffer-file-coding-system is tried
> automatically.
> 
> And, for the case that buffer-file-coding-system is locally
> bound differently from default-buffer-file-coding-system,
> but it can'd encode the current buffer, we can change
> select-safe-coding-system to try
> default-buffer-file-coding-system before trying the most
> preferred coding system.
> 
> That way, I think we can satisfy your request completely.

That seems like a cheap way to get what I want indeed.


	Stefan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: Cyrillic vs UTF-8
  2003-05-19 13:28             ` Stefan Monnier
@ 2003-05-19 13:49               ` Stefan Monnier
  0 siblings, 0 replies; 55+ messages in thread
From: Stefan Monnier @ 2003-05-19 13:49 UTC (permalink / raw)
  Cc: monnier+gnu/emacs

> > > Maybe it is.  In my situation, I'd like utf-8 to be at the top
> > > of the preferences w.r.t decoding because it virtually never
> > > guesses wrong.
> > > OTOH, I'm still using a mostly-latin-1 environment, so I'd
> > > still rather avoid utf-8 when I can.  I.e. latin-1 should be at
> > > the top of my preferences w.r.t encoding.
> > 
> > In that case, I think the source of the problem is that the
> > command prefer-coding-system doesn't satisfy this request of
> > yours:
> >    Prefer utf-8 only in automatic detection on reading a
> >    file, not for the other situations.
> > 
> > (defun prefer-coding-system (coding-system)
> >   "Add CODING-SYSTEM at the front of the priority list for automatic detection.
> > This also sets the following coding systems:
> >   o coding system of a newly created buffer
> >   o default coding system for subprocess I/O
> > This also sets the following values:
> >   o default value used as `file-name-coding-system' for converting file names.
> >   o default value for the command `set-terminal-coding-system' (not on MSDOS)
> >   o default value for the command `set-keyboard-coding-system'
> > 
> > How about changing it to skip "This also ..." parts if
> > called with a prefix argument?
> > 
> > Then, on writing, if buffer-file-coding-system is not
> > locally bound, default-buffer-file-coding-system is tried
> > automatically.
> > 
> > And, for the case that buffer-file-coding-system is locally
> > bound differently from default-buffer-file-coding-system,
> > but it can'd encode the current buffer, we can change
> > select-safe-coding-system to try
> > default-buffer-file-coding-system before trying the most
> > preferred coding system.
> > 
> > That way, I think we can satisfy your request completely.
> 
> That seems like a cheap way to get what I want indeed.

Actually I don't currently use prefer-coding-system (specifically
because I didn't want to set all those other coding-systems),
instead I use

(when (boundp 'coding-category-utf-8)
  (set-coding-priority '(coding-category-utf-8)))

so I guess the only change that I care about is the part that uses
default-buffer-file-coding-system in preference to the most preferred
coding system (although it does sound paradoxical ;-)

The patch below would work for me; any comment/objection ?


	Stefan


Index: mule-cmds.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/international/mule-cmds.el,v
retrieving revision 1.231
diff -u -u -b -r1.231 mule-cmds.el
--- mule-cmds.el	16 May 2003 04:15:20 -0000	1.231
+++ mule-cmds.el	19 May 2003 13:45:16 -0000
@@ -1,5 +1,5 @@
 ;;; mule-cmds.el --- commands for mulitilingual environment
-;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
+;; Copyright (C) 1995, 2003 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
 
@@ -631,7 +631,8 @@
 between FROM and TO are shown in a popup window.  Among them, the most
 proper one is suggested as the default.
 
-The list of `buffer-file-coding-system' of the current buffer and the
+The list of `buffer-file-coding-system' of the current buffer,
+the `default-buffer-file-coding-system', and the
 most preferred coding system (if it corresponds to a MIME charset) is
 treated as the default coding system list.  Among them, the first one
 that safely encodes the text is normally selected silently and
@@ -648,8 +649,8 @@
 list of coding systems to be prepended to the default coding system
 list.  However, if DEFAULT-CODING-SYSTEM is a list and the first
 element is t, the cdr part is used as the defualt coding system list,
-i.e. `buffer-file-coding-system' and the most prepended coding system
-is not used.
+i.e. `buffer-file-coding-system', `default-buffer-file-coding-system',
+and the most preferred coding system are not used.
 
 Optional 4th arg ACCEPT-DEFAULT-P, if non-nil, is a function to
 determine the acceptability of the silently selected coding system.
@@ -679,6 +680,9 @@
 	  (mapcar (function (lambda (x) (cons x (coding-system-base x))))
 		  default-coding-system))
 
+    ;; From now on, the list of defaults is reversed.
+    (setq default-coding-system (nreverse default-coding-system))
+
     (unless no-other-defaults
       ;; If buffer-file-coding-system is not nil nor undecided, append it
       ;; to the defaults.
@@ -686,24 +690,30 @@
 	  (let ((base (coding-system-base buffer-file-coding-system)))
 	    (or (eq base 'undecided)
 		(rassq base default-coding-system)
-		(setq default-coding-system
-		      (append default-coding-system
-			      (list (cons buffer-file-coding-system base)))))))
+		(push (cons buffer-file-coding-system base)
+		      default-coding-system))))
+
+      ;; If default-buffer-file-coding-system is not nil nor undecided,
+      ;; append it to the defaults.
+      (if default-buffer-file-coding-system
+	  (let ((base (coding-system-base default-buffer-file-coding-system)))
+	    (or (eq base 'undecided)
+		(rassq base default-coding-system)
+		(push (cons default-buffer-file-coding-system base)
+		      default-coding-system))))
 
       ;; If the most preferred coding system has the property mime-charset,
       ;; append it to the defaults.
       (let ((tail coding-category-list)
 	    preferred base)
-	(while (and tail
-		    (not (setq preferred (symbol-value (car tail)))))
+	(while (and tail (not (setq preferred (symbol-value (car tail)))))
 	  (setq tail (cdr tail)))
 	(and (coding-system-p preferred)
 	     (setq base (coding-system-base preferred))
 	     (coding-system-get preferred 'mime-charset)
 	     (not (rassq base default-coding-system))
-	     (setq default-coding-system
-		   (append default-coding-system
-			   (list (cons preferred base))))))))
+	     (push (cons preferred base)
+		   default-coding-system)))))
 
   (if select-safe-coding-system-accept-default-p
       (setq accept-default-p select-safe-coding-system-accept-default-p))
@@ -724,7 +734,7 @@
 	      (push (car elt) safe))
 	  (push (car elt) unsafe)))
       (if safe
-	  (setq coding-system (car (last safe)))))
+	  (setq coding-system (car safe))))
 
     ;; If all the defaults failed, ask a user.
     (when (not coding-system)

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2003-05-19 13:49 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson
2003-04-25 16:40 ` Eli Zaretskii
2003-04-25 17:09   ` Simon Josefsson
2003-04-25 22:39     ` Eli Zaretskii
2003-04-26  8:11     ` Kenichi Handa
2003-04-26 12:25       ` Simon Josefsson
2003-04-28  9:18         ` Kenichi Handa
2003-04-28 11:11           ` Simon Josefsson
2003-04-26 16:21       ` Benjamin Riefenstahl
2003-04-26 16:27         ` Benjamin Riefenstahl
2003-04-28  4:38       ` Richard Stallman
2003-05-01  8:27         ` Kenichi Handa
2003-05-02  7:06           ` Richard Stallman
2003-05-02 21:51             ` Eli Zaretskii
2003-05-03 13:37               ` Juanma Barranquero
2003-05-03 19:04                 ` Eli Zaretskii
2003-05-04 13:03               ` Richard Stallman
2003-05-04 11:04           ` Dave Love
2003-05-04 12:01             ` Simon Josefsson
2003-05-04 17:13               ` Dave Love
2003-05-04 18:03                 ` Simon Josefsson
2003-05-05  8:47             ` Kenichi Handa
2003-04-26 13:44     ` Richard Stallman
2003-04-26 14:10       ` Simon Josefsson
2003-04-28 21:49     ` Stefan Monnier
2003-04-28 22:29       ` Simon Josefsson
2003-04-29 13:49         ` Stefan Monnier
2003-04-29 14:27           ` Simon Josefsson
2003-04-30  4:42             ` Stephen J. Turnbull
2003-04-30  5:43           ` Richard Stallman
2003-05-19  0:40       ` Kenichi Handa
2003-05-19  0:52         ` Stefan Monnier
2003-05-19  2:31           ` Kenichi Handa
2003-05-19 13:28             ` Stefan Monnier
2003-05-19 13:49               ` Stefan Monnier
2003-04-25 16:54 ` Simon Josefsson
2003-04-26  3:55   ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull
2003-04-28 11:09     ` Kenichi Handa
2003-04-28 12:27       ` Implementing charset-aware X font names Stephen J. Turnbull
2003-05-01 11:13         ` Kenichi Handa
2003-05-01 14:14           ` Alex Schroeder
2003-05-01 23:16             ` Kenichi Handa
2003-04-26  7:59   ` Cyrillic vs UTF-8 Kenichi Handa
2003-04-26 12:14     ` Simon Josefsson
2003-05-01  7:20       ` Kenichi Handa
2003-05-01 14:06         ` Alex Schroeder
2003-05-01 18:03         ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz
2003-05-02  5:17           ` Customizing fontsets Alex Schroeder
2003-05-02  6:32             ` Kenichi Handa
2003-05-02 13:25               ` Stefan Monnier
2003-05-03  0:40               ` Oliver Scholz
2003-05-03  1:50                 ` Kenichi Handa
2003-05-03 12:08                   ` Oliver Scholz
2003-05-07  1:22                     ` Kenichi Handa
2003-05-03  0:33             ` Oliver Scholz

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).