* Cyrillic vs UTF-8 @ 2003-04-25 16:12 Simon Josefsson 2003-04-25 16:40 ` Eli Zaretskii 2003-04-25 16:54 ` Simon Josefsson 0 siblings, 2 replies; 55+ messages in thread From: Simon Josefsson @ 2003-04-25 16:12 UTC (permalink / raw) $ emacs -q --no-site-file C-h H (view HELLO file) Mark the line with Russian text with mouse q (quit HELLO file) C-x C-f ff RET (open a new file) C-y (yank the text, looks fine in the new buffer) C-x C-s (save file, it complains that iso-latin-1 cannot encode the data, and suggests utf-8) RET (go with the default utf-8) C-x C-k (kill buffer) C-x C-f ff RET (open file again) (emacs fail to recognize it as utf-8 and displays gibberish) C-x C-k (kill buffer) C-x RET c utf-8 C-x C-f ff RET (open fail as utf-8) (emacs recognize the file as utf-8 but display empty boxes) Pressing C-u C-x = on the first empty box (first non-ascii character) shows: character: Р (01212100, 332864, 0x51440) charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: 40 64 syntax: w which means: word category: y:Cyrillic buffer code: 0x9C 0xF4 0xA8 0xC0 file code: 0xD0 0xA0 (encoded by coding system mule-utf-8-unix) Unicode: 0420 font: -Adobe-Courier-Medium-R-Normal--17-120-100-100-M-100-ISO10646-1 I think there are two problems. Opening the file the first time should guess it is a utf-8 file. Secondly, emacs should be able to find a font that contains the characters -- I have all font packages from Debian installed. The following works fine: -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 In GNU Emacs 21.3.50.12 (i686-pc-linux-gnu) of 2003-04-25 on latte.josefsson.org configured using `configure '--with-gtk'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: en_US.UTF-8 value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: en_US.UTF-8 value of $LANG: nil locale-coding-system: nil default-enable-multibyte-characters: t Recent input: M-x r e p o r <tab> <return> Recent messages: (emacs -q) Loading tool-bar...done Loading image...done Loading tooltip...done For information about the GNU Project and its goals, type C-h C-p. Loading emacsbug...done ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson @ 2003-04-25 16:40 ` Eli Zaretskii 2003-04-25 17:09 ` Simon Josefsson 2003-04-25 16:54 ` Simon Josefsson 1 sibling, 1 reply; 55+ messages in thread From: Eli Zaretskii @ 2003-04-25 16:40 UTC (permalink / raw) Cc: emacs-devel > From: Simon Josefsson <jas@extundo.com> > Date: Fri, 25 Apr 2003 18:12:17 +0200 > > I think there are two problems. Opening the file the first time > should guess it is a utf-8 file. IIRC, you need to make the priority of utf-8 higher for this to happen. Unless that's changed in the current CVS, try evaluating the following expression: (prefer-coding-system 'utf-8) before you visit a utf-8 encoded file, and see if that helps. I think this is because the encoding detection routines cannot distinguish between Latin-n and utf encoding without some help. Apologies if the current code base no longer works as I remember. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 16:40 ` Eli Zaretskii @ 2003-04-25 17:09 ` Simon Josefsson 2003-04-25 22:39 ` Eli Zaretskii ` (3 more replies) 0 siblings, 4 replies; 55+ messages in thread From: Simon Josefsson @ 2003-04-25 17:09 UTC (permalink / raw) Cc: emacs-devel "Eli Zaretskii" <eliz@elta.co.il> writes: >> From: Simon Josefsson <jas@extundo.com> >> Date: Fri, 25 Apr 2003 18:12:17 +0200 >> >> I think there are two problems. Opening the file the first time >> should guess it is a utf-8 file. > > IIRC, you need to make the priority of utf-8 higher for this to > happen. Unless that's changed in the current CVS, try evaluating the > following expression: > > (prefer-coding-system 'utf-8) > > before you visit a utf-8 encoded file, and see if that helps. I think > this is because the encoding detection routines cannot distinguish > between Latin-n and utf encoding without some help. This works, but note that Emacs didn't recognize the file as being in any encoding without it. The modeline says '-:--'. It seems binary is preferred over utf-8 and utf-16-* in coding-category-list. This seems extremely conservative. I guess it means UTF-8 can never be autodetected by default? Is the unicode support so bad it shouldn't even be preferred over binary? UTF-8 is well formed and restricted; detecting it properly (even compared to Latin-n) can be done well enough that failures rarely happen in practice. Can't we move binary down below UTF-8 in CVS? IMHO we should move UTF-8 earlier still, since determining whether data is UTF-8 or not can be done with good probability. Prefering binary over UTF-8 seems just wrong. There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you say, but it has been removed both in 21.3 and in CVS. I thought that meant UTF-8 was better supported now, but this doesn't seem to be the case. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 17:09 ` Simon Josefsson @ 2003-04-25 22:39 ` Eli Zaretskii 2003-04-26 8:11 ` Kenichi Handa ` (2 subsequent siblings) 3 siblings, 0 replies; 55+ messages in thread From: Eli Zaretskii @ 2003-04-25 22:39 UTC (permalink / raw) Cc: emacs-devel > From: Simon Josefsson <jas@extundo.com> > Date: Fri, 25 Apr 2003 19:09:07 +0200 > > There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you > say, but it has been removed both in 21.3 and in CVS. I thought that > meant UTF-8 was better supported now, but this doesn't seem to be the > case. "cvs annotate" will show you who removed that entry, and then you can ask that person for the reasons. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 17:09 ` Simon Josefsson 2003-04-25 22:39 ` Eli Zaretskii @ 2003-04-26 8:11 ` Kenichi Handa 2003-04-26 12:25 ` Simon Josefsson ` (2 more replies) 2003-04-26 13:44 ` Richard Stallman 2003-04-28 21:49 ` Stefan Monnier 3 siblings, 3 replies; 55+ messages in thread From: Kenichi Handa @ 2003-04-26 8:11 UTC (permalink / raw) Cc: emacs-devel In article <iluvfx21p3g.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > It seems binary is preferred over utf-8 and utf-16-* in > coding-category-list. This seems extremely conservative. I guess it > means UTF-8 can never be autodetected by default? Is the unicode > support so bad it shouldn't even be preferred over binary? UTF-8 is > well formed and restricted; detecting it properly (even compared to > Latin-n) can be done well enough that failures rarely happen in > practice. > Can't we move binary down below UTF-8 in CVS? IMHO we should move > UTF-8 earlier still, since determining whether data is UTF-8 or not > can be done with good probability. Prefering binary over UTF-8 seems > just wrong. Unfortunately, the current Emacs doesn't have a facility to detect UTF-8 byte sequence. So, if we put UTF-8 the higher priority, all files are detected as UTF-8. :-( > There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you > say, but it has been removed both in 21.3 and in CVS. I thought that > meant UTF-8 was better supported now, but this doesn't seem to be the > case. The UTF-8 support was surely improved but not that much as you expect. By the way, all these problems are solved in emacs-unicode. It's available from CVS server as a branch tag "emacs-unicode" (see http://savannah.gnu.org/cvs/?group=emacs). --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 8:11 ` Kenichi Handa @ 2003-04-26 12:25 ` Simon Josefsson 2003-04-28 9:18 ` Kenichi Handa 2003-04-26 16:21 ` Benjamin Riefenstahl 2003-04-28 4:38 ` Richard Stallman 2 siblings, 1 reply; 55+ messages in thread From: Simon Josefsson @ 2003-04-26 12:25 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > Unfortunately, the current Emacs doesn't have a facility to > detect UTF-8 byte sequence. So, if we put UTF-8 the higher > priority, all files are detected as UTF-8. :-( I see. Is this very difficult to solve, or why hasn't it? The algorithm to detect UTF-8 is not that complicated. > By the way, all these problems are solved in emacs-unicode. > It's available from CVS server as a branch tag > "emacs-unicode" (see > http://savannah.gnu.org/cvs/?group=emacs). I'm trying it, thanks. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 12:25 ` Simon Josefsson @ 2003-04-28 9:18 ` Kenichi Handa 2003-04-28 11:11 ` Simon Josefsson 0 siblings, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-04-28 9:18 UTC (permalink / raw) Cc: emacs-devel In article <ilullxxxx78.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > Kenichi Handa <handa@m17n.org> writes: >> Unfortunately, the current Emacs doesn't have a facility to >> detect UTF-8 byte sequence. So, if we put UTF-8 the higher >> priority, all files are detected as UTF-8. :-( > I see. Is this very difficult to solve, or why hasn't it? The > algorithm to detect UTF-8 is not that complicated. Ooops, I'm very sorry that I was wrong. The current Emacs contains a builtin utf-8 and utf-16 (with BOM) detectors. So, putting UTF-8 the higher priority should have no problem. Richard Stallman <rms@gnu.org> writes: > It seems binary is preferred over utf-8 and utf-16-* in > coding-category-list. This seems extremely conservative. I guess it > means UTF-8 can never be autodetected by default? > That certainly seems undesirable. Unless there is a specific reason > why it needs to be this way, I agree with you that we should raise > the priority of utf-8 and utf-16. We can raise the priority of utf-16-le-with-signature and utf-16-be-with-signature, but can't raise the priority of utf-16-le, utf-16-be, utf-16 because it's impossible to distinguish them from binary data. So, I've just installed these changes. 2003-04-28 Kenichi Handa <handa@m17n.org> * international/mule-cmds.el (reset-language-environment): Raise the priority of mule-utf-8, mule-utf-16-be-with-signature and mule-utf-16-le.-with-signature. * international/mule-conf.el: Set coding-category-utf-16-be to mule-utf-16-be-with-signature, coding-category-utf-16-le to mule-utf-16-le-with-signature. Raise the priority of coding-category-utf-8, coding-category-utf-16-be, and coding-category-utf-16-le --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-28 9:18 ` Kenichi Handa @ 2003-04-28 11:11 ` Simon Josefsson 0 siblings, 0 replies; 55+ messages in thread From: Simon Josefsson @ 2003-04-28 11:11 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > We can raise the priority of utf-16-le-with-signature and > utf-16-be-with-signature, but can't raise the priority of > utf-16-le, utf-16-be, utf-16 because it's impossible to > distinguish them from binary data. > > So, I've just installed these changes. Thanks! I don't really care much about UTF-16, and I don't think most users do either, so this seems like a good solution. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 8:11 ` Kenichi Handa 2003-04-26 12:25 ` Simon Josefsson @ 2003-04-26 16:21 ` Benjamin Riefenstahl 2003-04-26 16:27 ` Benjamin Riefenstahl 2003-04-28 4:38 ` Richard Stallman 2 siblings, 1 reply; 55+ messages in thread From: Benjamin Riefenstahl @ 2003-04-26 16:21 UTC (permalink / raw) Cc: jas Hi, Kenichi Handa <handa@m17n.org> writes: > Unfortunately, the current Emacs doesn't have a facility to detect > UTF-8 byte sequence. So, if we put UTF-8 the higher priority, all > files are detected as UTF-8. :-( Hm, I have Emacs 21.2.1 on Windows NT and >>> Priority order for recognizing coding systems when reading files: 1. mule-utf-8 (alias: utf-8) 2. windows-1252 (alias: cp1252) <<< Detecting UTF-8 works fine. I'm not sure it's completly reliable but it works for most of my everyday work. Am I missunderstanding something? so long, benny ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 16:21 ` Benjamin Riefenstahl @ 2003-04-26 16:27 ` Benjamin Riefenstahl 0 siblings, 0 replies; 55+ messages in thread From: Benjamin Riefenstahl @ 2003-04-26 16:27 UTC (permalink / raw) Cc: jas Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes: > >>> > Priority order for recognizing coding systems when reading files: > 1. mule-utf-8 (alias: utf-8) > 2. windows-1252 (alias: cp1252) > <<< > > Detecting UTF-8 works fine. To clarify: It also detects fine, when a file is *not* UTF-8, falling back on cp1252. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 8:11 ` Kenichi Handa 2003-04-26 12:25 ` Simon Josefsson 2003-04-26 16:21 ` Benjamin Riefenstahl @ 2003-04-28 4:38 ` Richard Stallman 2003-05-01 8:27 ` Kenichi Handa 2 siblings, 1 reply; 55+ messages in thread From: Richard Stallman @ 2003-04-28 4:38 UTC (permalink / raw) Cc: jas Unfortunately, the current Emacs doesn't have a facility to detect UTF-8 byte sequence. So, if we put UTF-8 the higher priority, all files are detected as UTF-8. :-( Is there any easy way to add such detection to the trunk version? It would not be worth while if it is difficult, but it would be worth while if it is easy. By the way, all these problems are solved in emacs-unicode. Could you report on what work is needed before we can release this code? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-28 4:38 ` Richard Stallman @ 2003-05-01 8:27 ` Kenichi Handa 2003-05-02 7:06 ` Richard Stallman 2003-05-04 11:04 ` Dave Love 0 siblings, 2 replies; 55+ messages in thread From: Kenichi Handa @ 2003-05-01 8:27 UTC (permalink / raw) Cc: jas In article <E19A0Ou-0001sm-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > By the way, all these problems are solved in emacs-unicode. > Could you report on what work is needed before we > can release this code? Dave has compiled the current problems in the file emacs-unicode/README.unicode. Some of them (especially serious ones) are already fixed. Dave, do you have anything else to add to that file? I think the most difficult task for releasing that code is to merge the changes into HEAD. Emacs-unicode was branched on 2002-03-01, and since then, there were a lot of changes in HEAD. --- Ken'ichi HANDA handa@m17n.org --- README.unicode --- -*-mode: text; coding: latin-1;-*- Problems, fixmes and other issues in the emacs-unicode branch ------------------------------------------------------------- Notes by fx to record various things of variable importance. handa needs to check them -- don't take too seriously, especially with regard to completeness. _Do take seriously that you don't want this branch unless you're actually working on it; you risk your data by actually using it._ If you just want to edit Unicode and/or unify iso-8859 et al, see the existing support and the extra stuff at <URL:ftp://dlpx1.dl.ac.uk/fx/emacs/Mule>, mostly now in the CVS trunk. (Editing support is mostly orthogonal to the internal representation.) * SINGLE_BYTE_CHAR_P returns true for Latin-1 characters, which has undesirable effects. E.g.: (multibyte-string-p (let ((s "x")) (aset s 0 ?£) s)) => nil (multibyte-string-p (concat [?£])) => nil (text-char-description ?£) => "M-#" These examples are all fixed by the change of 2002-10-14, but there still exist questionalble SINGLE_BYTE_CHAR_P in the code. * Rationalize character syntax and its relationship to the Unicode database. (Applies mainly to symbol an punctuation syntax.) * Fontset handling and customization needs work. We want to relate fonts to scripts, probably based on the Unicode blocks. The presence of small-repertoire 10646-encoded fonts in XFree 4 is a pain, not currently worked round. With the change on 2002-07-26, multiple fonts can be specified in a fontset for a specific range of characters. Each range can also be specified by script. Before using ISO10646 fonts, Emacs checks their repertories to avoid such fonts that don't have a glyph for a specific character. * Work is also needed on charset and coding system priorities. * The relevant bits of latin1-disp.el need porting (and probably re-naming/updating). See also cyril-util.el. * Quail files need more work now the encoding is irrelevant. * What to do with the old coding categories stuff? * The preferred-coding-system property of charsets should probably be junked unless it can be made more useful now. * find-multibyte-characters needs looking at. * Implement Korean cp949/UHC, BIG5-HKSCS and any other important missing charsets. * Check up on definition of alternativnj. * Lazy-load tables for unify-charset somehow? Actually, Emacs clear out all charset maps and unify-map just before dumping, and their are loaded again on demand the dumped emacs. But, those maps (char tables) generated while temacs is running can't be get rid of from the dumped emacs. * Translation tables for {en,de}code currently aren't supported. This should be fixed by the changes of 2002-10-14. * Defining CCL coding systems currently doesn't work. This should be fixed by the changes of 2003-01-30. * iso-2022 charsets get unified on i/o. With the change on 2003-01-06, decoding routines put `charset' property to decoded text, and iso-2022 encoder pay attention to it. Thus, for instance, reading and writing by iso-2022-7bit preserve the original designation sequences. The property name `preferred-charset' may be better? We may have to utilize this property to decide a font. * Revisit locale processing: look at treating the language and charset parts separately. (Language should affect things like speling and calendar, but that's not a Unicode issue.) * Handle Unicode combining characters usefully, e.g. diacritics, and handle more scripts specifically (à la Devanagari). There are issues with canonicalization. * Bidi is a separate issue with no support currently. * We need tabular input methods, e.g. for maths symbols. (Not specific to Unicode.) * Need multibyte text in menus, e.g. for the above. (Not specific to Unicode.) * There's currently no support for Unicode normalization. * Populate char-width-table correctly for Unicode chanaracters and worry about what happens when double-width charsets covering non-CJK characters are unified. * Emacs 20/21 .elc files are currently not loadable. It may or may not be possible to do this properly. With the change on 2002-07-24, elc files generated by Emacs 20.3 and later are correctly loaded (including those containing multibyte characters and compressed). But, elc files generated by 20.2 and the primer are still not loadable. Is it really worth working on it? * Rmail won't work with non-ASCII text. Encoding issues for Babyl files need sorting out, but rms says Babyl will go before this is released. * Gnus still needs some attention, and we need to get changes accepted by Gnus maintainers... * There are type errors lurking, e.g. in Fcheck_coding_systems_region. Define ENABLE_CHECKING to find them. * You can grep the code for lots of fixmes. * Old auto-save files, and similar files, such as Gnus drafts, containing non-ASCII characters probably won't be re-read correctly. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-01 8:27 ` Kenichi Handa @ 2003-05-02 7:06 ` Richard Stallman 2003-05-02 21:51 ` Eli Zaretskii 2003-05-04 11:04 ` Dave Love 1 sibling, 1 reply; 55+ messages in thread From: Richard Stallman @ 2003-05-02 7:06 UTC (permalink / raw) Cc: jas Dave has compiled the current problems in the file emacs-unicode/README.unicode. Some of them (especially serious ones) are already fixed. How about if you edit that file, deleting the items that are already fixed. That should be an easy job, right? Then you could post a file that is current. I think the most difficult task for releasing that code is to merge the changes into HEAD. Emacs-unicode was branched on 2002-03-01, and since then, there were a lot of changes in HEAD. Are you starting to work on this now? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-02 7:06 ` Richard Stallman @ 2003-05-02 21:51 ` Eli Zaretskii 2003-05-03 13:37 ` Juanma Barranquero 2003-05-04 13:03 ` Richard Stallman 0 siblings, 2 replies; 55+ messages in thread From: Eli Zaretskii @ 2003-05-02 21:51 UTC (permalink / raw) Cc: emacs-devel > From: Richard Stallman <rms@gnu.org> > Reply-to: rms@gnu.org > > I think the most difficult task for releasing that code is > to merge the changes into HEAD. Emacs-unicode was branched > on 2002-03-01, and since then, there were a lot of changes > in HEAD. > > Are you starting to work on this now? Are you suggesting that the next non-bugfix Emacs release will have emacs-unicode merged? I thought we wanted to release the current trunk first, as it has a lot of useful features and delaying them (something that's probably unavoidable for such a major change as Unicode-based Emacs) any more than we already did would be undesirable. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-02 21:51 ` Eli Zaretskii @ 2003-05-03 13:37 ` Juanma Barranquero 2003-05-03 19:04 ` Eli Zaretskii 2003-05-04 13:03 ` Richard Stallman 1 sibling, 1 reply; 55+ messages in thread From: Juanma Barranquero @ 2003-05-03 13:37 UTC (permalink / raw) Cc: emacs-devel On Sat, 03 May 2003 00:51:07 +0300, "Eli Zaretskii" <eliz@elta.co.il> wrote: > I thought we wanted to release the current > trunk first, as it has a lot of useful features and delaying them > (something that's probably unavoidable for such a major change as > Unicode-based Emacs) any more than we already did would be > undesirable. From an exchange between you and RMS in past August: (you:) > v19 - support for X > v20 - m17n > v21 - new display engine > > If we follow this, v22 should be the Unicode-based Emacs, not some > intermediate release. (Richard:) > I agree. Not to rehash the discussion again, but IMHO we should branch for a feature release (21.5, I gather, as there's going to be a bugfix 21.4) before merging the unicode branch. /L/e/k/t/u ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-03 13:37 ` Juanma Barranquero @ 2003-05-03 19:04 ` Eli Zaretskii 0 siblings, 0 replies; 55+ messages in thread From: Eli Zaretskii @ 2003-05-03 19:04 UTC (permalink / raw) Cc: emacs-devel > Date: Sat, 03 May 2003 15:37:18 +0200 > From: Juanma Barranquero <lektu@terra.es> > > From an exchange between you and RMS in past August: > > (you:) > > > v19 - support for X > > v20 - m17n > > v21 - new display engine > > > > If we follow this, v22 should be the Unicode-based Emacs, not some > > intermediate release. > > (Richard:) > > > I agree. Thanks. It's good to know I hold to the same opinions even when I don't quite remember my old ones ;-) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-02 21:51 ` Eli Zaretskii 2003-05-03 13:37 ` Juanma Barranquero @ 2003-05-04 13:03 ` Richard Stallman 1 sibling, 0 replies; 55+ messages in thread From: Richard Stallman @ 2003-05-04 13:03 UTC (permalink / raw) Cc: emacs-devel > I think the most difficult task for releasing that code is > to merge the changes into HEAD. Emacs-unicode was branched > on 2002-03-01, and since then, there were a lot of changes > in HEAD. > > Are you starting to work on this now? Are you suggesting that the next non-bugfix Emacs release will have emacs-unicode merged? That is not what I was talking about. However, this might be a good idea if the unicode stuff is ready for it soon. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-01 8:27 ` Kenichi Handa 2003-05-02 7:06 ` Richard Stallman @ 2003-05-04 11:04 ` Dave Love 2003-05-04 12:01 ` Simon Josefsson 2003-05-05 8:47 ` Kenichi Handa 1 sibling, 2 replies; 55+ messages in thread From: Dave Love @ 2003-05-04 11:04 UTC (permalink / raw) Cc: jas Kenichi Handa <handa@m17n.org> writes: > Dave, do you have anything else to add to that file? Probably yes (if I thought about it) but I haven't been able to do much work on it for ages. In several respects it's a bit difficult to tell what state it's in, since there are serious problems with things like redisplay which make it essentially unusable. > I think the most difficult task for releasing that code is > to merge the changes into HEAD. Yes, it will be a nightmare and things will get lost. What is this thread all about? I've replied to private mail to correct misconceptions, but it seemed to be nothing to do with Cyrillic (which I did all the recent work on as far as I know). ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-04 11:04 ` Dave Love @ 2003-05-04 12:01 ` Simon Josefsson 2003-05-04 17:13 ` Dave Love 2003-05-05 8:47 ` Kenichi Handa 1 sibling, 1 reply; 55+ messages in thread From: Simon Josefsson @ 2003-05-04 12:01 UTC (permalink / raw) Cc: Kenichi Handa Dave Love <d.love@dl.ac.uk> writes: > What is this thread all about? I've replied to private mail to > correct misconceptions, but it seemed to be nothing to do with > Cyrillic (which I did all the recent work on as far as I know). The original problem was that saving cyrillic (e.g., from the HELLO file) as UTF-8 weren't auto-detected as UTF-8 when loading the file back again. This has been fixed now. Another problem was that Emacs, when asked to load the file as UTF-8, picked a Unicode font that didn't include this glyphs. This has not been fixed (although Stephen seemed to have some ideas). To workaround the problem, users need to define a fontset, and use it. Doing this is rather user unfriendly (X resource, or elisp) so I suggested making it possible to customize fontsets. My other Cyrillic thread was that (double-width) cyrillic isn't possible to save as UTF-8 at all. This was fixed by adding a PROBLEMS entry that says not all of Unicode is supported. While discussing it, it seems like the real problem was the cut'n'paste behavior that generated the double-width cyrillic in the first place, so there were some discussion about making Emacs use UTF8_STRING, when available, instead of COMPOUND_TEXT. I hope this summarizes the thread. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-04 12:01 ` Simon Josefsson @ 2003-05-04 17:13 ` Dave Love 2003-05-04 18:03 ` Simon Josefsson 0 siblings, 1 reply; 55+ messages in thread From: Dave Love @ 2003-05-04 17:13 UTC (permalink / raw) Cc: emacs-devel Simon Josefsson <jas@extundo.com> writes: > Another problem was that Emacs, > when asked to load the file as UTF-8, picked a Unicode font that > didn't include this glyphs. I assume that's the general xfree86 4 lossage I mentioned in PROBLEMS. I can't remember how the font will get chosen by default, but there's code in cyrillic.el that should allow mule-unicode-0100-24ff characters to be displayed with an 8859-5 or KOI font. You can also change into which Emacs characters utf-8 decodes. > To workaround the problem, users need to > define a fontset, and use it. Yes (or purge the unhelpful fonts). If the combination of the PROBLEMS entry and the manual aren't good enough, suggestions would be useful. > Doing this is rather user unfriendly (X resource, or elisp) so I > suggested making it possible to customize fontsets. Yes. I mostly implemented customizing the default set (which I think is all that needs customizing) for Emacs 22, but was stymied by the treatment of the default face somewhere. I complained about that some time ago, but it never got resolved and I've not had time to go back and try to sort it out. (I think that problem is the same in Emacs 21 and 22, but the fontset mechanism in the latter is different.) > My other Cyrillic thread was that (double-width) cyrillic I assume that means the Cyrillic parts of the CJK charsets. > isn't possible to save as UTF-8 at all. It's possible if you amend the tables defined in ucs-tables.el or utf-8.el -- wherever it is now. I can't remember whether there are potential problems with that, but I at least thought it wasn't worthwhile. If you want to experiment, Mule-UCS has tables with the non-CJK characters labelled for JISX &c. > This was fixed by adding a > PROBLEMS entry that says not all of Unicode is supported. It looks as though that needs work... > I hope this summarizes the thread. Thanks. [This has got strange recipients because the original mail had `Mail-Copies-To: nobody'. As far as I know, that's a non-standard header for news only, so perhaps there's a Gnus bug there.] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-04 17:13 ` Dave Love @ 2003-05-04 18:03 ` Simon Josefsson 0 siblings, 0 replies; 55+ messages in thread From: Simon Josefsson @ 2003-05-04 18:03 UTC (permalink / raw) Cc: Kenichi Handa Dave Love <d.love@dl.ac.uk> writes: > Simon Josefsson <jas@extundo.com> writes: > >> Another problem was that Emacs, >> when asked to load the file as UTF-8, picked a Unicode font that >> didn't include this glyphs. > > I assume that's the general xfree86 4 lossage I mentioned in PROBLEMS. Yes. > I can't remember how the font will get chosen by default, but > there's code in cyrillic.el that should allow mule-unicode-0100-24ff > characters to be displayed with an 8859-5 or KOI font. You can also > change into which Emacs characters utf-8 decodes. The remaining problem is that this should happen automatically, without user configuration. >> To workaround the problem, users need to >> define a fontset, and use it. > > Yes (or purge the unhelpful fonts). Purging incomplete fonts is not a realistic option, like (I think it was) Stephen said, it does not make sense for a font designer for, e.g., cyrillic to include non-cyrillic fonts just because he (rightly) decided to use the iso-10646 encoding. >> My other Cyrillic thread was that (double-width) cyrillic > > I assume that means the Cyrillic parts of the CJK charsets. Yes. >> isn't possible to save as UTF-8 at all. > > It's possible if you amend the tables defined in ucs-tables.el or > utf-8.el -- wherever it is now. I can't remember whether there are > potential problems with that, but I at least thought it wasn't > worthwhile. If you want to experiment, Mule-UCS has tables with the > non-CJK characters labelled for JISX &c. I don't normally use cyrillic, so I don't care much. But I do believe that when a user like me (who don't normally use cyrillic) happen to cut'n'paste a cyrillic string from another application, it should Simply Work without requiring the user to become familiar with cyrillic usage in emacs. > [This has got strange recipients because the original mail had > `Mail-Copies-To: nobody'. As far as I know, that's a non-standard > header for news only, so perhaps there's a Gnus bug there.] The recipient list looked fine; I read the replies to my messages on the list, no need to CC me. But it is a non-standard header, so I don't expect everyone to support it. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-04 11:04 ` Dave Love 2003-05-04 12:01 ` Simon Josefsson @ 2003-05-05 8:47 ` Kenichi Handa 1 sibling, 0 replies; 55+ messages in thread From: Kenichi Handa @ 2003-05-05 8:47 UTC (permalink / raw) Cc: jas In article <rzq4r4b55xu.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes: > Kenichi Handa <handa@m17n.org> writes: >> Dave, do you have anything else to add to that file? > Probably yes (if I thought about it) but I haven't been able to do > much work on it for ages. In several respects it's a bit difficult to > tell what state it's in, since there are serious problems with things > like redisplay which make it essentially unusable. As far as I remember, the redisplay problem is because of a bug of the original display routine which is already fixed in HEAD, and thus, once emacs-unicode is merged with HEAD, the problem will disappear. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 17:09 ` Simon Josefsson 2003-04-25 22:39 ` Eli Zaretskii 2003-04-26 8:11 ` Kenichi Handa @ 2003-04-26 13:44 ` Richard Stallman 2003-04-26 14:10 ` Simon Josefsson 2003-04-28 21:49 ` Stefan Monnier 3 siblings, 1 reply; 55+ messages in thread From: Richard Stallman @ 2003-04-26 13:44 UTC (permalink / raw) Cc: emacs-devel It seems binary is preferred over utf-8 and utf-16-* in coding-category-list. This seems extremely conservative. I guess it means UTF-8 can never be autodetected by default? That certainly seems undesirable. Unless there is a specific reason why it needs to be this way, I agree with you that we should raise the priority of utf-8 and utf-16. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 13:44 ` Richard Stallman @ 2003-04-26 14:10 ` Simon Josefsson 0 siblings, 0 replies; 55+ messages in thread From: Simon Josefsson @ 2003-04-26 14:10 UTC (permalink / raw) Cc: emacs-devel Richard Stallman <rms@gnu.org> writes: > It seems binary is preferred over utf-8 and utf-16-* in > coding-category-list. This seems extremely conservative. I guess it > means UTF-8 can never be autodetected by default? > > That certainly seems undesirable. Unless there is a specific reason > why it needs to be this way, I agree with you that we should raise > the priority of utf-8 and utf-16. Kenichi Handa said that moving utf-8 earlier would make all files be regarded as UTF-8, so until that is fixed I agree moving it higher up in the hierarchy is bad. But I'm not sure I understand the situation completely anyway. When I run emacs in an UTF-8 locale (LANG=sv_SE.UTF-8), which I usually do, the utf-8 coding system _is_ first in coding-category-list, yet I have no problems reading iso-8859-1 files. They aren't regarded as UTF-8. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 17:09 ` Simon Josefsson ` (2 preceding siblings ...) 2003-04-26 13:44 ` Richard Stallman @ 2003-04-28 21:49 ` Stefan Monnier 2003-04-28 22:29 ` Simon Josefsson 2003-05-19 0:40 ` Kenichi Handa 3 siblings, 2 replies; 55+ messages in thread From: Stefan Monnier @ 2003-04-28 21:49 UTC (permalink / raw) Cc: emacs-devel > Can't we move binary down below UTF-8 in CVS? IMHO we should move > UTF-8 earlier still, since determining whether data is UTF-8 or not > can be done with good probability. Prefering binary over UTF-8 seems Agreed, but I think one of the problems is that the preference-ordering is the same for load-time-detection as it is for save-time-detection, so if you move utf-8 up for detection you end up saving all new files in utf-8 which is not OK in non-utf-8 locales. I suggested introducing a second preference-order, but nothing came out of it (probably because I didn't code anything up). Stefan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-28 21:49 ` Stefan Monnier @ 2003-04-28 22:29 ` Simon Josefsson 2003-04-29 13:49 ` Stefan Monnier 2003-05-19 0:40 ` Kenichi Handa 1 sibling, 1 reply; 55+ messages in thread From: Simon Josefsson @ 2003-04-28 22:29 UTC (permalink / raw) Cc: emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> Can't we move binary down below UTF-8 in CVS? IMHO we should move >> UTF-8 earlier still, since determining whether data is UTF-8 or not >> can be done with good probability. Prefering binary over UTF-8 seems > > Agreed, but I think one of the problems is that the preference-ordering > is the same for load-time-detection as it is for save-time-detection, > so if you move utf-8 up for detection you end up saving all new files > in utf-8 which is not OK in non-utf-8 locales. This sounds serious in theory, but I was unable to make emacs behave unexpectedly in practice. Do you have an example? I tried opening a new file and typing åäö and saving it. It was saved (without query) as latin-1 with sv_SE, en_GB, en_US and C locales. All are what I would expect, and is consistent with what I get for emacs 21.3. (Of course, this is a western-centric test case, but I don't know what non-western users expect so I can't really test anything else.) Note that iso-8-1 is still prefered over utf-8 with Kenichi's change. Note also that mule-cmds.el seem to guess the appropriate charset for most locales, so UTF-8 will never be prefered over the "locale charset". A jp_JP user will have a low priority for UTF-8. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-28 22:29 ` Simon Josefsson @ 2003-04-29 13:49 ` Stefan Monnier 2003-04-29 14:27 ` Simon Josefsson 2003-04-30 5:43 ` Richard Stallman 0 siblings, 2 replies; 55+ messages in thread From: Stefan Monnier @ 2003-04-29 13:49 UTC (permalink / raw) Cc: Stefan Monnier > >> Can't we move binary down below UTF-8 in CVS? IMHO we should move > >> UTF-8 earlier still, since determining whether data is UTF-8 or not > >> can be done with good probability. Prefering binary over UTF-8 seems > > > > Agreed, but I think one of the problems is that the preference-ordering > > is the same for load-time-detection as it is for save-time-detection, > > so if you move utf-8 up for detection you end up saving all new files > > in utf-8 which is not OK in non-utf-8 locales. > > This sounds serious in theory, but I was unable to make emacs behave > unexpectedly in practice. Do you have an example? The problem only appears if you move utf-8 to the first spot. Moving it to the first spot otherwise makes sense since auto-detection of utf-8 is about as reliable as it gets. Stefan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-29 13:49 ` Stefan Monnier @ 2003-04-29 14:27 ` Simon Josefsson 2003-04-30 4:42 ` Stephen J. Turnbull 2003-04-30 5:43 ` Richard Stallman 1 sibling, 1 reply; 55+ messages in thread From: Simon Josefsson @ 2003-04-29 14:27 UTC (permalink / raw) Cc: emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> >> Can't we move binary down below UTF-8 in CVS? IMHO we should move >> >> UTF-8 earlier still, since determining whether data is UTF-8 or not >> >> can be done with good probability. Prefering binary over UTF-8 seems >> > >> > Agreed, but I think one of the problems is that the preference-ordering >> > is the same for load-time-detection as it is for save-time-detection, >> > so if you move utf-8 up for detection you end up saving all new files >> > in utf-8 which is not OK in non-utf-8 locales. >> >> This sounds serious in theory, but I was unable to make emacs behave >> unexpectedly in practice. Do you have an example? > > The problem only appears if you move utf-8 to the first spot. But utf-8 hasn't been moved first, so this isn't a problem? I agree it would be useful to be able to configure different loading and saving time preferences. Then I would be able to specify that emacs should try to save data as ascii first, then latin-1, then latin-9 and then UTF-8, then give up and ask. On loading, I'd want it to try latin-9 instead of latin-1 though. In non-UTF-8 locales, I think this behaviour is what many europeans would want. > Moving it to the first spot otherwise makes sense since > auto-detection of utf-8 is about as reliable as it gets. Yup. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-29 14:27 ` Simon Josefsson @ 2003-04-30 4:42 ` Stephen J. Turnbull 0 siblings, 0 replies; 55+ messages in thread From: Stephen J. Turnbull @ 2003-04-30 4:42 UTC (permalink / raw) Cc: Stefan Monnier >>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes: Simon> I agree it would be useful to be able to configure Simon> different loading and saving time preferences. Then I Simon> would be able to specify that emacs should try to save data Simon> as ascii first, then latin-1, then latin-9 and then UTF-8, Simon> then give up and ask. On loading, I'd want it to try Simon> latin-9 instead of latin-1 though. In non-UTF-8 locales, I Simon> think this behaviour is what many europeans would want. latin-unity provides this (but basically only for Latin scripts). I don't think it works under GNU Emacs, and it's pretty crufty, but it's all my code and is assigned to the FSF. So if somebody wants to port/ improve, I'll be happy to answer questions. NB: probably requires Mule-UCS, although I imagine emacs-unicode has the necessary facilities to build tables. cvs -d :pserver:cvs@cvs.xemacs.org:/pack/xemacscvs checkout latin-unity (password is either null or "cvs", I forget). -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-29 13:49 ` Stefan Monnier 2003-04-29 14:27 ` Simon Josefsson @ 2003-04-30 5:43 ` Richard Stallman 1 sibling, 0 replies; 55+ messages in thread From: Richard Stallman @ 2003-04-30 5:43 UTC (permalink / raw) Cc: jas > > Agreed, but I think one of the problems is that the preference-ordering > > is the same for load-time-detection as it is for save-time-detection, > > so if you move utf-8 up for detection you end up saving all new files > > in utf-8 which is not OK in non-utf-8 locales. The problem only appears if you move utf-8 to the first spot. Moving it to the first spot otherwise makes sense since auto-detection of utf-8 is about as reliable as it gets. This suggests we do want to have two separate preferences lists, and put utf-8 at the top for reading. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-28 21:49 ` Stefan Monnier 2003-04-28 22:29 ` Simon Josefsson @ 2003-05-19 0:40 ` Kenichi Handa 2003-05-19 0:52 ` Stefan Monnier 1 sibling, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-05-19 0:40 UTC (permalink / raw) Cc: jas I'm sorry for the late response on this thread. In article <200304282149.h3SLnxSU002624@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > Agreed, but I think one of the problems is that the preference-ordering > is the same for load-time-detection as it is for save-time-detection, > so if you move utf-8 up for detection you end up saving all new files > in utf-8 which is not OK in non-utf-8 locales. > I suggested introducing a second preference-order, but nothing came > out of it (probably because I didn't code anything up). I'd like to avoid introducing a new mechanism to control a coding system as far as possible. And, the second preference-order (used for saving) works only in this case: (1) The buffer file coding system can't encode the current buffer, and (2) The most preferred coding system can encode the current buffer, and (3) A user doesn't want to use the most preferred one. Isn't it a very rare case? --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-19 0:40 ` Kenichi Handa @ 2003-05-19 0:52 ` Stefan Monnier 2003-05-19 2:31 ` Kenichi Handa 0 siblings, 1 reply; 55+ messages in thread From: Stefan Monnier @ 2003-05-19 0:52 UTC (permalink / raw) Cc: monnier+gnu/emacs > > Agreed, but I think one of the problems is that the preference-ordering > > is the same for load-time-detection as it is for save-time-detection, > > so if you move utf-8 up for detection you end up saving all new files > > in utf-8 which is not OK in non-utf-8 locales. > > I suggested introducing a second preference-order, but nothing came > > out of it (probably because I didn't code anything up). > > I'd like to avoid introducing a new mechanism to control a > coding system as far as possible. And, the second > preference-order (used for saving) works only in this case: > > (1) The buffer file coding system can't encode the current > buffer, and > (2) The most preferred coding system can encode the current > buffer, and > (3) A user doesn't want to use the most preferred one. > > Isn't it a very rare case? Maybe it is. In my situation, I'd like utf-8 to be at the top of the preferences w.r.t decoding because it virtually never guesses wrong. OTOH, I'm still using a mostly-latin-1 environment, so I'd still rather avoid utf-8 when I can. I.e. latin-1 should be at the top of my preferences w.r.t encoding. I.e. utf-8 is definitely not my most preferred encoding, but since Emacs will often mistake a utf-8 text for latin-1 whereas it virtually never mistakes a latin-1 text for utf-8, I do put utf-8 as my most preferred encoding (and then try not to forget to do C-x RET f when saving a new file). Stefan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-19 0:52 ` Stefan Monnier @ 2003-05-19 2:31 ` Kenichi Handa 2003-05-19 13:28 ` Stefan Monnier 0 siblings, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-05-19 2:31 UTC (permalink / raw) Cc: jas In article <200305190052.h4J0qUfa017404@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> I'd like to avoid introducing a new mechanism to control a >> coding system as far as possible. And, the second >> preference-order (used for saving) works only in this case: >> >> (1) The buffer file coding system can't encode the current >> buffer, and >> (2) The most preferred coding system can encode the current >> buffer, and >> (3) A user doesn't want to use the most preferred one. >> >> Isn't it a very rare case? > Maybe it is. In my situation, I'd like utf-8 to be at the top > of the preferences w.r.t decoding because it virtually never > guesses wrong. > OTOH, I'm still using a mostly-latin-1 environment, so I'd > still rather avoid utf-8 when I can. I.e. latin-1 should be at > the top of my preferences w.r.t encoding. In that case, I think the source of the problem is that the command prefer-coding-system doesn't satisfy this request of yours: Prefer utf-8 only in automatic detection on reading a file, not for the other situations. (defun prefer-coding-system (coding-system) "Add CODING-SYSTEM at the front of the priority list for automatic detection. This also sets the following coding systems: o coding system of a newly created buffer o default coding system for subprocess I/O This also sets the following values: o default value used as `file-name-coding-system' for converting file names. o default value for the command `set-terminal-coding-system' (not on MSDOS) o default value for the command `set-keyboard-coding-system' How about changing it to skip "This also ..." parts if called with a prefix argument? Then, on writing, if buffer-file-coding-system is not locally bound, default-buffer-file-coding-system is tried automatically. And, for the case that buffer-file-coding-system is locally bound differently from default-buffer-file-coding-system, but it can'd encode the current buffer, we can change select-safe-coding-system to try default-buffer-file-coding-system before trying the most preferred coding system. That way, I think we can satisfy your request completely. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-19 2:31 ` Kenichi Handa @ 2003-05-19 13:28 ` Stefan Monnier 2003-05-19 13:49 ` Stefan Monnier 0 siblings, 1 reply; 55+ messages in thread From: Stefan Monnier @ 2003-05-19 13:28 UTC (permalink / raw) Cc: monnier+gnu/emacs > > Maybe it is. In my situation, I'd like utf-8 to be at the top > > of the preferences w.r.t decoding because it virtually never > > guesses wrong. > > OTOH, I'm still using a mostly-latin-1 environment, so I'd > > still rather avoid utf-8 when I can. I.e. latin-1 should be at > > the top of my preferences w.r.t encoding. > > In that case, I think the source of the problem is that the > command prefer-coding-system doesn't satisfy this request of > yours: > Prefer utf-8 only in automatic detection on reading a > file, not for the other situations. > > (defun prefer-coding-system (coding-system) > "Add CODING-SYSTEM at the front of the priority list for automatic detection. > This also sets the following coding systems: > o coding system of a newly created buffer > o default coding system for subprocess I/O > This also sets the following values: > o default value used as `file-name-coding-system' for converting file names. > o default value for the command `set-terminal-coding-system' (not on MSDOS) > o default value for the command `set-keyboard-coding-system' > > How about changing it to skip "This also ..." parts if > called with a prefix argument? > > Then, on writing, if buffer-file-coding-system is not > locally bound, default-buffer-file-coding-system is tried > automatically. > > And, for the case that buffer-file-coding-system is locally > bound differently from default-buffer-file-coding-system, > but it can'd encode the current buffer, we can change > select-safe-coding-system to try > default-buffer-file-coding-system before trying the most > preferred coding system. > > That way, I think we can satisfy your request completely. That seems like a cheap way to get what I want indeed. Stefan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-19 13:28 ` Stefan Monnier @ 2003-05-19 13:49 ` Stefan Monnier 0 siblings, 0 replies; 55+ messages in thread From: Stefan Monnier @ 2003-05-19 13:49 UTC (permalink / raw) Cc: monnier+gnu/emacs > > > Maybe it is. In my situation, I'd like utf-8 to be at the top > > > of the preferences w.r.t decoding because it virtually never > > > guesses wrong. > > > OTOH, I'm still using a mostly-latin-1 environment, so I'd > > > still rather avoid utf-8 when I can. I.e. latin-1 should be at > > > the top of my preferences w.r.t encoding. > > > > In that case, I think the source of the problem is that the > > command prefer-coding-system doesn't satisfy this request of > > yours: > > Prefer utf-8 only in automatic detection on reading a > > file, not for the other situations. > > > > (defun prefer-coding-system (coding-system) > > "Add CODING-SYSTEM at the front of the priority list for automatic detection. > > This also sets the following coding systems: > > o coding system of a newly created buffer > > o default coding system for subprocess I/O > > This also sets the following values: > > o default value used as `file-name-coding-system' for converting file names. > > o default value for the command `set-terminal-coding-system' (not on MSDOS) > > o default value for the command `set-keyboard-coding-system' > > > > How about changing it to skip "This also ..." parts if > > called with a prefix argument? > > > > Then, on writing, if buffer-file-coding-system is not > > locally bound, default-buffer-file-coding-system is tried > > automatically. > > > > And, for the case that buffer-file-coding-system is locally > > bound differently from default-buffer-file-coding-system, > > but it can'd encode the current buffer, we can change > > select-safe-coding-system to try > > default-buffer-file-coding-system before trying the most > > preferred coding system. > > > > That way, I think we can satisfy your request completely. > > That seems like a cheap way to get what I want indeed. Actually I don't currently use prefer-coding-system (specifically because I didn't want to set all those other coding-systems), instead I use (when (boundp 'coding-category-utf-8) (set-coding-priority '(coding-category-utf-8))) so I guess the only change that I care about is the part that uses default-buffer-file-coding-system in preference to the most preferred coding system (although it does sound paradoxical ;-) The patch below would work for me; any comment/objection ? Stefan Index: mule-cmds.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/international/mule-cmds.el,v retrieving revision 1.231 diff -u -u -b -r1.231 mule-cmds.el --- mule-cmds.el 16 May 2003 04:15:20 -0000 1.231 +++ mule-cmds.el 19 May 2003 13:45:16 -0000 @@ -1,5 +1,5 @@ ;;; mule-cmds.el --- commands for mulitilingual environment -;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN. +;; Copyright (C) 1995, 2003 Electrotechnical Laboratory, JAPAN. ;; Licensed to the Free Software Foundation. ;; Copyright (C) 2000, 2001, 2002, 2003 Free Software Foundation, Inc. @@ -631,7 +631,8 @@ between FROM and TO are shown in a popup window. Among them, the most proper one is suggested as the default. -The list of `buffer-file-coding-system' of the current buffer and the +The list of `buffer-file-coding-system' of the current buffer, +the `default-buffer-file-coding-system', and the most preferred coding system (if it corresponds to a MIME charset) is treated as the default coding system list. Among them, the first one that safely encodes the text is normally selected silently and @@ -648,8 +649,8 @@ list of coding systems to be prepended to the default coding system list. However, if DEFAULT-CODING-SYSTEM is a list and the first element is t, the cdr part is used as the defualt coding system list, -i.e. `buffer-file-coding-system' and the most prepended coding system -is not used. +i.e. `buffer-file-coding-system', `default-buffer-file-coding-system', +and the most preferred coding system are not used. Optional 4th arg ACCEPT-DEFAULT-P, if non-nil, is a function to determine the acceptability of the silently selected coding system. @@ -679,6 +680,9 @@ (mapcar (function (lambda (x) (cons x (coding-system-base x)))) default-coding-system)) + ;; From now on, the list of defaults is reversed. + (setq default-coding-system (nreverse default-coding-system)) + (unless no-other-defaults ;; If buffer-file-coding-system is not nil nor undecided, append it ;; to the defaults. @@ -686,24 +690,30 @@ (let ((base (coding-system-base buffer-file-coding-system))) (or (eq base 'undecided) (rassq base default-coding-system) - (setq default-coding-system - (append default-coding-system - (list (cons buffer-file-coding-system base))))))) + (push (cons buffer-file-coding-system base) + default-coding-system)))) + + ;; If default-buffer-file-coding-system is not nil nor undecided, + ;; append it to the defaults. + (if default-buffer-file-coding-system + (let ((base (coding-system-base default-buffer-file-coding-system))) + (or (eq base 'undecided) + (rassq base default-coding-system) + (push (cons default-buffer-file-coding-system base) + default-coding-system)))) ;; If the most preferred coding system has the property mime-charset, ;; append it to the defaults. (let ((tail coding-category-list) preferred base) - (while (and tail - (not (setq preferred (symbol-value (car tail))))) + (while (and tail (not (setq preferred (symbol-value (car tail))))) (setq tail (cdr tail))) (and (coding-system-p preferred) (setq base (coding-system-base preferred)) (coding-system-get preferred 'mime-charset) (not (rassq base default-coding-system)) - (setq default-coding-system - (append default-coding-system - (list (cons preferred base)))))))) + (push (cons preferred base) + default-coding-system))))) (if select-safe-coding-system-accept-default-p (setq accept-default-p select-safe-coding-system-accept-default-p)) @@ -724,7 +734,7 @@ (push (car elt) safe)) (push (car elt) unsafe))) (if safe - (setq coding-system (car (last safe))))) + (setq coding-system (car safe)))) ;; If all the defaults failed, ask a user. (when (not coding-system) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson 2003-04-25 16:40 ` Eli Zaretskii @ 2003-04-25 16:54 ` Simon Josefsson 2003-04-26 3:55 ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull 2003-04-26 7:59 ` Cyrillic vs UTF-8 Kenichi Handa 1 sibling, 2 replies; 55+ messages in thread From: Simon Josefsson @ 2003-04-25 16:54 UTC (permalink / raw) Simon Josefsson <jas@extundo.com> writes: > I think there are two problems. Opening the file the first time > should guess it is a utf-8 file. Secondly, emacs should be able to > find a font that contains the characters -- I have all font packages > from Debian installed. The following works fine: > > -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1 It seems the second problem was documented in PROBLEMS (see below). Sorry. Still, I don't see similar behaviour with, e.g., Mozilla, so wouldn't it be possible to check which characters exist within the font, and perhaps change font as appropriate? It would be nice if there were some more information how to set the suggested fontset. Reading the manual I get the impression that 'emacs -fn mule-unicode-...' should work, but it doesn't. I also tried setting the emacs.font X resource, but same problem. Starting emacs says: No fonts match `mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1' I do have GNU unifont (from Debian unstable) installed. * Characters from the mule-unicode charsets aren't displayed under X. XFree86 4 contains many fonts in iso10646-1 encoding which have minimal character repertoires (whereas the encoding is meant to be a reasonable indication of the repertoire). Emacs may choose one of these to display characters from the mule-unicode charsets and then typically won't be able to find the glyphs to display many characters. (Check with C-u C-x = .) To avoid this, you may need to use a fontset which sets the font for the mule-unicode sets explicitly. E.g. to use GNU unifont, include in the fontset spec: mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,\ mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,\ mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1 ^ permalink raw reply [flat|nested] 55+ messages in thread
* Implementing charset-aware X font names [was: Cyrillic vs UTF-8] 2003-04-25 16:54 ` Simon Josefsson @ 2003-04-26 3:55 ` Stephen J. Turnbull 2003-04-28 11:09 ` Kenichi Handa 2003-04-26 7:59 ` Cyrillic vs UTF-8 Kenichi Handa 1 sibling, 1 reply; 55+ messages in thread From: Stephen J. Turnbull @ 2003-04-26 3:55 UTC (permalink / raw) >>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes: PROBLEMS> * Characters from the mule-unicode charsets aren't PROBLEMS> displayed under X. PROBLEMS> XFree86 4 contains many fonts in iso10646-1 encoding PROBLEMS> which have minimal character repertoires (whereas the PROBLEMS> encoding is meant to be a reasonable indication of the PROBLEMS> repertoire). *sigh* "iso10646" is not meant to be an indication of repertoire. See section 13 of the ISO 10646 standard. It's intended to fix the ISO 8859 ambiguity. There is a deficiency in XFree86, but it's not that the fonts are incomplete (note the word "implicit" in the XLFD standard, that refers to current national encoding practice at definition time, not to UCSes); that's gonna happen. Why should a Russian font designer provide Thai glyphs? And what Thai in her right mind would prefer those over native-designed fonts (without looking at them)? Instead, the font names and properties should provide encoding range specifications instead of the useless "1" (which in ISO 10646-1 is not an encoding specification, really). As a first take, I think a reasonable way to do this would be to specify that for the iso10646 registry the encoding field of an XLFD name should contain a comma-separated list of Unicode block names, or a comma-separated list of hex ranges xxxx..yyyy (can't use hyphens for the ranges, obviously). As long as the XLFD is otherwise fully-qualified (ie, contains 14 hyphens), the block name format allows you to query with "-*-*-*-*-*-*-*-*-*-*-*-*-iso10646-*CYRILLIC*" and guarantee sane results. Mostly "*-iso10646-*CYRILLIC*" should work OK, too. With the hex range format, the app has to work harder, querying with "-*-*-*-*-*-*-*-*-*-*-*-*-iso10646-*" and checking for the ranges it needs. IIRC, since the actual font loaded is known to the server, you could even have multiple such aliases, one for each block, and with languages using multiple blocks (basically, all of them, since everybody uses ASCII), you'd just want to be careful to query for the "rare" blocks first. This would also allow Emacs and other smart apps to create virtual fonts (ie, in faces) by requesting Ryumin Light for the Han and Kana blocks and Times-Roman for the Basic Latin and Latin-1 Supplement blocks, as an alternative to X Font Sets. (This would be nearly trivial to implement in XEmacs since we use specifiers to implement faces, and specifiers already do magic to connect charsets to font registries. I suppose it would be more work in GNU Emacs, but I haven't looked at Emacs's font set code.) Does this look like something reasonable for Emacs (and XEmacs) to implement on the client side? If so, I'll play with it a bit (note that implementing this server-side is simply a matter of editing fonts.aliases) and then put it in play with the X11 and XFree86 people. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Implementing charset-aware X font names [was: Cyrillic vs UTF-8] 2003-04-26 3:55 ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull @ 2003-04-28 11:09 ` Kenichi Handa 2003-04-28 12:27 ` Implementing charset-aware X font names Stephen J. Turnbull 0 siblings, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-04-28 11:09 UTC (permalink / raw) Cc: emacs-devel In article <87ist17vzu.fsf_-_@tleepslib.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes: > Instead, the font names and properties should provide encoding range > specifications instead of the useless "1" (which in ISO 10646-1 is not > an encoding specification, really). As a first take, I think a > reasonable way to do this would be to specify that for the iso10646 > registry the encoding field of an XLFD name should contain a > comma-separated list of Unicode block names, or a comma-separated list > of hex ranges xxxx..yyyy (can't use hyphens for the ranges, > obviously). I fully agree with that idea. [...] > This would also allow Emacs and other smart apps to create virtual > fonts (ie, in faces) by requesting Ryumin Light for the Han and Kana > blocks and Times-Roman for the Basic Latin and Latin-1 Supplement > blocks, as an alternative to X Font Sets. (This would be nearly > trivial to implement in XEmacs since we use specifiers to implement > faces, and specifiers already do magic to connect charsets to font > registries. I suppose it would be more work in GNU Emacs, but I > haven't looked at Emacs's font set code.) We connect charsets to font registries vis fontset. And in the emacs-unicode version, we have enhanced it so that we can connect scripts, charsets, range of characters to multiple font specs. In addtion, in emacs-unicode, we separate the concept of font encoding and font repertory, and for *-iso10646-1 fonts, we checks the font contents to get the true repertory as a char table. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Implementing charset-aware X font names 2003-04-28 11:09 ` Kenichi Handa @ 2003-04-28 12:27 ` Stephen J. Turnbull 2003-05-01 11:13 ` Kenichi Handa 0 siblings, 1 reply; 55+ messages in thread From: Stephen J. Turnbull @ 2003-04-28 12:27 UTC (permalink / raw) Cc: emacs-devel >>>>> "Kenichi" == Kenichi Handa <handa@m17n.org> writes: Kenichi> We connect charsets to font registries vis fontset. And Kenichi> in the emacs-unicode version, we have enhanced it so that Kenichi> we can connect scripts, charsets, range of characters to Kenichi> multiple font specs. Is this documented outside of source code? (Not necessarily as a formal spec, discussions on emacs-devel would help too. Also, I can read Japanese, so mule-ja would be useful if there were discussions there.) How does it compare to the specifier interface used by XEmacs? XEmacs specifiers allow a face to automatically select the correct font by X11 font registry, but there must also be a similer mechanism for Windows, so this must be somewhat more general than "font registry". Also, specifiers implement both inheritance and repeated queries (eg, you can have both "arial" and "helvetica" fonts for ascii/latin-1, and they will be tried in a specified order, usually "last added first", until the face can be displayed). Kenichi> In addtion, in emacs-unicode, we separate the concept of Kenichi> font encoding and font repertory, and for *-iso10646-1 Kenichi> fonts, we checks the font contents to get the true Kenichi> repertory as a char table. Ah, I'll have to ask Ben if he's handled that in the devel branch. That sounds like a very good interface, and if possible I'd like to use it in XEmacs too. I assume it is exported so Lisp programs can find out the repertoire? -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Implementing charset-aware X font names 2003-04-28 12:27 ` Implementing charset-aware X font names Stephen J. Turnbull @ 2003-05-01 11:13 ` Kenichi Handa 2003-05-01 14:14 ` Alex Schroeder 0 siblings, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-05-01 11:13 UTC (permalink / raw) Cc: emacs-devel In article <87fzo24xj4.fsf@tleepslib.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes: >>>>>> "Kenichi" == Kenichi Handa <handa@m17n.org> writes: Kenichi> We connect charsets to font registries vis fontset. And Kenichi> in the emacs-unicode version, we have enhanced it so that Kenichi> we can connect scripts, charsets, range of characters to Kenichi> multiple font specs. > Is this documented outside of source code? It is documented as the docstring of set-fontset-font (as attached at the tail). Internally, a fontset is implemented by a char-table of a special format. > How does it compare to the specifier interface used by XEmacs? XEmacs > specifiers allow a face to automatically select the correct font by > X11 font registry, but there must also be a similer mechanism for > Windows, so this must be somewhat more general than "font registry". > Also, specifiers implement both inheritance and repeated queries (eg, > you can have both "arial" and "helvetica" fonts for ascii/latin-1, and > they will be tried in a specified order, usually "last added first", > until the face can be displayed). I don't know about "the specifiers interface of XEmacs". In Emacs, a face can have an attribute `fontset'. In that case, for displaying a non-ASCII character CHAR by that face, the fontset is looked up. If a face doesn't have `fontset' attribute, the default fontset is looked up. If a multiple font specs are found for CHAR, one font spec is selected as below: (1) For each font spec, find the encoding charset (a charset that maps a character code to a glyph code). (2) Sort the font specs by using that encoding charset as a key according to the charset priority of the current language environment. So, for instance, in Japanese lang. env., most Han characters are displayed by a Japanese font. (3) Select the first font spec whose repertory contains CHAR. The font spec is merged with font related attributes of the face, then the best matching font is selected. Kenichi> In addtion, in emacs-unicode, we separate the concept of Kenichi> font encoding and font repertory, and for *-iso10646-1 Kenichi> fonts, we checks the font contents to get the true Kenichi> repertory as a char table. > Ah, I'll have to ask Ben if he's handled that in the devel branch. > That sounds like a very good interface, and if possible I'd like to > use it in XEmacs too. I assume it is exported so Lisp programs can > find out the repertoire? Currently no. As the repertoire is checked automatically in a fontset handler, for the moment, I see no necessity in exporting that to Lisp. --- Ken'ichi HANDA handa@m17n.org ---------------------------------------------------------------------- set-fontset-font is a built-in function. (set-fontset-font NAME CHARACTER FONT-SPEC &optional FRAME ADD) Modify fontset NAME to use FONT-SPEC for CHARACTER. CHARACTER may be a cons; (FROM . TO), where FROM and TO are characters. In that case, use FONT-SPEC for all characters in the range FROM and TO (inclusive). CHARACTER may be a script name symbol. In that case, use FONT-SPEC for all characters that belong to the script. CHARACTER may be a charset which has a :code-offset attribute and the attribute value is greater than the maximum Unicode character (#x10FFFF). In that case, use FONT-SPEC for all characters in the charset. FONT-SPEC may be: * A vector [ FAMILY WEIGHT SLANT WIDTH ADSTYLE REGISTRY ]. See the documentation of `set-face-attribute' for the detail of these vector elements; * A cons (FAMILY . REGISTRY), where FAMILY is a font family name and REGISTRY is a font registry name; * A font name string. Optional 4th argument FRAME, if non-nil, is a frame. This argument is kept for backward compatibility and has no meaning. Optional 5th argument ADD, if non-nil, specifies how to add FONT-SPEC to the font specifications for RANGE previously set. If it is `prepend', FONT-SPEC is prepended. If it is `append', FONT-SPEC is appended. By default, FONT-SPEC overrides the previous settings. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Implementing charset-aware X font names 2003-05-01 11:13 ` Kenichi Handa @ 2003-05-01 14:14 ` Alex Schroeder 2003-05-01 23:16 ` Kenichi Handa 0 siblings, 1 reply; 55+ messages in thread From: Alex Schroeder @ 2003-05-01 14:14 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: >> Is this documented outside of source code? > > It is documented as the docstring of set-fontset-font (as > attached at the tail). Internally, a fontset is implemented > by a char-table of a special format. I would like to collect some of the stuff from your recent mails into little articles on the Emacs Wiki (its content is licensed under the FDL). Is that ok with you? Alex. -- http://www.emacswiki.org/cgi-bin/alex.pl ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Implementing charset-aware X font names 2003-05-01 14:14 ` Alex Schroeder @ 2003-05-01 23:16 ` Kenichi Handa 0 siblings, 0 replies; 55+ messages in thread From: Kenichi Handa @ 2003-05-01 23:16 UTC (permalink / raw) Cc: emacs-devel In article <87k7dag3eh.fsf@gnu.org>, Alex Schroeder <alex@gnu.org> writes: > Kenichi Handa <handa@m17n.org> writes: >>> Is this documented outside of source code? >> >> It is documented as the docstring of set-fontset-font (as >> attached at the tail). Internally, a fontset is implemented >> by a char-table of a special format. > I would like to collect some of the stuff from your recent mails into > little articles on the Emacs Wiki (its content is licensed under the > FDL). Is that ok with you? Of course, no problem. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-25 16:54 ` Simon Josefsson 2003-04-26 3:55 ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull @ 2003-04-26 7:59 ` Kenichi Handa 2003-04-26 12:14 ` Simon Josefsson 1 sibling, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-04-26 7:59 UTC (permalink / raw) Cc: emacs-devel In article <iluznme1ps2.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > It would be nice if there were some more information how to set the > suggested fontset. Reading the manual I get the impression that > 'emacs -fn mule-unicode-...' should work, but it doesn't. From which part of manual, did you get that impression? > I also tried setting the emacs.font X resource, but same > problem. Starting emacs says: > No fonts match `mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1' It doesn't work. Please follow what described in the "Difining Fontsets" node of Emacs info. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 7:59 ` Cyrillic vs UTF-8 Kenichi Handa @ 2003-04-26 12:14 ` Simon Josefsson 2003-05-01 7:20 ` Kenichi Handa 0 siblings, 1 reply; 55+ messages in thread From: Simon Josefsson @ 2003-04-26 12:14 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <iluznme1ps2.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> It would be nice if there were some more information how to set the >> suggested fontset. Reading the manual I get the impression that >> 'emacs -fn mule-unicode-...' should work, but it doesn't. > > From which part of manual, did you get that impression? "Fontsets" together with "Font X". But I now realize I didn't read it carefully. >> I also tried setting the emacs.font X resource, but same >> problem. Starting emacs says: > >> No fonts match `mule-unicode-2500-33ff:-gnu-unifont-*-iso10646-1,mule-unicode-e000-ffff:-gnu-unifont-*-iso10646-1,mule-unicode-0100-24ff:-gnu-unifont-*-iso10646-1' > > It doesn't work. Please follow what described in the > "Difining Fontsets" node of Emacs info. It seems to work, thanks. Wouldn't it be useful to at least be able to customize the fontset? Requiring use of X resources to get Unicode to show up correctly is not user friendly. I guess these problems goes away when Emacs stops chosing fonts with empty characters, so perhaps users simply will have to wait for 22. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-04-26 12:14 ` Simon Josefsson @ 2003-05-01 7:20 ` Kenichi Handa 2003-05-01 14:06 ` Alex Schroeder 2003-05-01 18:03 ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz 0 siblings, 2 replies; 55+ messages in thread From: Kenichi Handa @ 2003-05-01 7:20 UTC (permalink / raw) Cc: emacs-devel In article <iluptn9xxp0.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: > It seems to work, thanks. Wouldn't it be useful to at least be able > to customize the fontset? Requiring use of X resources to get Unicode > to show up correctly is not user friendly. Unfortunately, a fontset is not a variable, thus can't be customized easily. Another way to modify a fontset is to do something like this in .emacs. (set-fontset-font "fontset-default" 'mule-unicode-0100-24ff '("gnu-unifont" . "iso10646-1")) ... --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Cyrillic vs UTF-8 2003-05-01 7:20 ` Kenichi Handa @ 2003-05-01 14:06 ` Alex Schroeder 2003-05-01 18:03 ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz 1 sibling, 0 replies; 55+ messages in thread From: Alex Schroeder @ 2003-05-01 14:06 UTC (permalink / raw) Cc: jas Kenichi Handa <handa@m17n.org> writes: > In article <iluptn9xxp0.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> It seems to work, thanks. Wouldn't it be useful to at least be able >> to customize the fontset? Requiring use of X resources to get Unicode >> to show up correctly is not user friendly. > > Unfortunately, a fontset is not a variable, thus can't be > customized easily. Another way to modify a fontset is to do > something like this in .emacs. > > (set-fontset-font "fontset-default" > 'mule-unicode-0100-24ff > '("gnu-unifont" . "iso10646-1")) > ... Well, we could create a "dummy" option with an interesting :set property which will then call create-fontset-from-fontset-spec and friends using its own value (which was customized by the user). Alex. -- http://www.emacswiki.org/cgi-bin/alex.pl ^ permalink raw reply [flat|nested] 55+ messages in thread
* Customizing fontsets (was: Cyrillic vs UTF-8) 2003-05-01 7:20 ` Kenichi Handa 2003-05-01 14:06 ` Alex Schroeder @ 2003-05-01 18:03 ` Oliver Scholz 2003-05-02 5:17 ` Customizing fontsets Alex Schroeder 1 sibling, 1 reply; 55+ messages in thread From: Oliver Scholz @ 2003-05-01 18:03 UTC (permalink / raw) Kenichi Handa <handa@m17n.org> writes: > In article <iluptn9xxp0.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes: >> It seems to work, thanks. Wouldn't it be useful to at least be able >> to customize the fontset? Requiring use of X resources to get Unicode >> to show up correctly is not user friendly. > > Unfortunately, a fontset is not a variable, thus can't be > customized easily. [...] But wouldn't it be an option to add a `custom-set-fontsets' besides `custom-set-faces' and `custom-set-variables'? It would make sense IMO to treat the short alias names that way. I.e. `M-x customize-fontset RET fontset-default RET' could simply work and it could be consistent with the rest of the customization interface from the user's point of view. In fact I started to work on it. (That's the deeper reason for my patch to `set-fontset-font'.) I am mostly in the state of reading the code for fontsets and for customize respectively, though. Currently I wonder what the default values should be that a "fontset-widget" should present to the user. The return-value of `fontset-info' is simply to large, I think. It seems weird that a user should specify a fontset covering two or three charsets in her .emacs or in .Xresources -- and is confronted with a list of dozens of charsets then, when she wants to customize it later. Oliver -- 12 Floréal an 211 de la Révolution Liberté, Egalité, Fraternité! ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-01 18:03 ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz @ 2003-05-02 5:17 ` Alex Schroeder 2003-05-02 6:32 ` Kenichi Handa 2003-05-03 0:33 ` Oliver Scholz 0 siblings, 2 replies; 55+ messages in thread From: Alex Schroeder @ 2003-05-02 5:17 UTC (permalink / raw) Cc: emacs-devel Oliver Scholz <alkibiades@gmx.de> writes: > But wouldn't it be an option to add a `custom-set-fontsets' besides > `custom-set-faces' and `custom-set-variables'? > In fact I started to work on it. I am happy to see somebody work on it! > Currently I wonder what the default values should be that a > "fontset-widget" should present to the user. The return-value of > `fontset-info' is simply to large, I think. It seems weird that a user > should specify a fontset covering two or three charsets in her .emacs > or in .Xresources -- and is confronted with a list of dozens of > charsets then, when she wants to customize it later. I don't understand. When you run M-x customize-fontset RET fontset-default RET, you expect to see a widget that explains the value of "fontset-default", and offer a way to change it. Thus, all the info returned by (fontset-info "fontset-default") must be visible and editable at some point. Unless you are proposing some sort of fontset-inheritance mechanism? Alex. -- http://www.emacswiki.org/cgi-bin/alex.pl ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-02 5:17 ` Customizing fontsets Alex Schroeder @ 2003-05-02 6:32 ` Kenichi Handa 2003-05-02 13:25 ` Stefan Monnier 2003-05-03 0:40 ` Oliver Scholz 2003-05-03 0:33 ` Oliver Scholz 1 sibling, 2 replies; 55+ messages in thread From: Kenichi Handa @ 2003-05-02 6:32 UTC (permalink / raw) Cc: alkibiades In article <87llxqorkr.fsf@gnu.org>, Alex Schroeder <alex@gnu.org> writes: > Oliver Scholz <alkibiades@gmx.de> writes: >> But wouldn't it be an option to add a `custom-set-fontsets' besides >> `custom-set-faces' and `custom-set-variables'? >> In fact I started to work on it. > I am happy to see somebody work on it! Me too!!! >> Currently I wonder what the default values should be that a >> "fontset-widget" should present to the user. The return-value of >> `fontset-info' is simply to large, I think. It seems weird that a user >> should specify a fontset covering two or three charsets in her .emacs >> or in .Xresources -- and is confronted with a list of dozens of >> charsets then, when she wants to customize it later. > I don't understand. When you run M-x customize-fontset RET > fontset-default RET, you expect to see a widget that explains the > value of "fontset-default", and offer a way to change it. Thus, all > the info returned by (fontset-info "fontset-default") must be visible > and editable at some point. But, it is true that the value of fontset-info is very hard to customize. A fontset is created by new-fontset, and is modified by the sequence of set-fontset-font. In the resulting fontset, the specified data are scattered around in the char-table of the fontset. I think the following idea will solve this problem. The arguemnt FONTLIST of new-fontset is a list of this form: ((TARGET . FONT-SPEC) ...) TARGET is a character, a cons (FROM-CHAR . TO-CHAR), or a charset. FONT-SPEC is (FAMILY . REGISTRY) or FONT-NAME. The function set-fontset-font also takes the arguments TARGET and FONT-SPEC. In other words, a fontset can be re-created by the sequence of (TARGET . FONT-SPEC). So how about recording that sequence as a list in each fontset (the order is important). I think it's far user-friendly to customize that list than to customize char-table elements. In addition, we can use the normal customization facility for a list (INS, DEL) at the top level. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-02 6:32 ` Kenichi Handa @ 2003-05-02 13:25 ` Stefan Monnier 2003-05-03 0:40 ` Oliver Scholz 1 sibling, 0 replies; 55+ messages in thread From: Stefan Monnier @ 2003-05-02 13:25 UTC (permalink / raw) Cc: alkibiades > In other words, a fontset can be re-created by the sequence > of (TARGET . FONT-SPEC). So how about recording that > sequence as a list in each fontset (the order is important). > I think it's far user-friendly to customize that list than > to customize char-table elements. In addition, we can use > the normal customization facility for a list (INS, DEL) at > the top level. And then a `deffontset' can be a simple macro that expands to a `defcustom' with appropriate :get and :set. Stefan ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-02 6:32 ` Kenichi Handa 2003-05-02 13:25 ` Stefan Monnier @ 2003-05-03 0:40 ` Oliver Scholz 2003-05-03 1:50 ` Kenichi Handa 1 sibling, 1 reply; 55+ messages in thread From: Oliver Scholz @ 2003-05-03 0:40 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: [...] > In other words, a fontset can be re-created by the sequence > of (TARGET . FONT-SPEC). So how about recording that > sequence as a list in each fontset (the order is important). > I think it's far user-friendly to customize that list than > to customize char-table elements. In addition, we can use > the normal customization facility for a list (INS, DEL) at > the top level. [...] I agree. So that would be done by creating an additional extra slot in the fontset, right? Oliver -- Oliver Scholz 14 Floréal an 211 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-03 0:40 ` Oliver Scholz @ 2003-05-03 1:50 ` Kenichi Handa 2003-05-03 12:08 ` Oliver Scholz 0 siblings, 1 reply; 55+ messages in thread From: Kenichi Handa @ 2003-05-03 1:50 UTC (permalink / raw) Cc: emacs-devel In article <87d6j1x79x.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <alkibiades@gmx.de> writes: > Kenichi Handa <handa@m17n.org> writes: > [...] >> In other words, a fontset can be re-created by the sequence >> of (TARGET . FONT-SPEC). So how about recording that >> sequence as a list in each fontset (the order is important). >> I think it's far user-friendly to customize that list than >> to customize char-table elements. In addition, we can use >> the normal customization facility for a list (INS, DEL) at >> the top level. > [...] > I agree. So that would be done by creating an additional extra slot > in the fontset, right? Yes. And we need to extract that information from a fontset. How about adding the 2nd optional arg CREATION-HISTORY (?) to fontset-info, and if it is non-nil, return that list instead. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-03 1:50 ` Kenichi Handa @ 2003-05-03 12:08 ` Oliver Scholz 2003-05-07 1:22 ` Kenichi Handa 0 siblings, 1 reply; 55+ messages in thread From: Oliver Scholz @ 2003-05-03 12:08 UTC (permalink / raw) Cc: emacs-devel Kenichi Handa <handa@m17n.org> writes: > In article <87d6j1x79x.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <alkibiades@gmx.de> writes: >> Kenichi Handa <handa@m17n.org> writes: >> [...] >>> In other words, a fontset can be re-created by the sequence >>> of (TARGET . FONT-SPEC). So how about recording that >>> sequence as a list in each fontset (the order is important). >>> I think it's far user-friendly to customize that list than >>> to customize char-table elements. In addition, we can use >>> the normal customization facility for a list (INS, DEL) at >>> the top level. >> [...] > >> I agree. So that would be done by creating an additional extra slot >> in the fontset, right? > > Yes. And we need to extract that information from a > fontset. How about adding the 2nd optional arg > CREATION-HISTORY (?) to fontset-info, and if it is non-nil, > return that list instead. Why not a separate function? #define FONTSET_SPEC(fontset) XCHAR_TABLE (fontset)->extras[3] DEFUN ("fontset-spec", Ffontset_spec, Sfontset_spec, 1, 1, 0, doc: /* FIXME */) (name) Lisp_Object name; { Lisp_Object fontset; (*check_window_system_func) (); fontset = check_fontset_name (name); return FONTSET_SPEC (fontset); } ... etc. Oliver -- Oliver Scholz 14 Floréal an 211 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-03 12:08 ` Oliver Scholz @ 2003-05-07 1:22 ` Kenichi Handa 0 siblings, 0 replies; 55+ messages in thread From: Kenichi Handa @ 2003-05-07 1:22 UTC (permalink / raw) Cc: emacs-devel In article <87of2kdyg6.fsf@ID-87814.user.dfncis.de>, Oliver Scholz <alkibiades@gmx.de> writes: >> Yes. And we need to extract that information from a >> fontset. How about adding the 2nd optional arg >> CREATION-HISTORY (?) to fontset-info, and if it is non-nil, >> return that list instead. > Why not a separate function? One reason is that the returned value is very similar to that of fontset-info. Another is I just can't think of a good name for a new function. > #define FONTSET_SPEC(fontset) XCHAR_TABLE (fontset)->extras[3] > DEFUN ("fontset-spec", Ffontset_spec, Sfontset_spec, 1, 1, 0, > doc: /* FIXME */) > (name) > Lisp_Object name; It seems that the name fontset-spec is too vague especially because we already have fontset-info. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Customizing fontsets 2003-05-02 5:17 ` Customizing fontsets Alex Schroeder 2003-05-02 6:32 ` Kenichi Handa @ 2003-05-03 0:33 ` Oliver Scholz 1 sibling, 0 replies; 55+ messages in thread From: Oliver Scholz @ 2003-05-03 0:33 UTC (permalink / raw) Cc: emacs-devel Alex Schroeder <alex@gnu.org> writes: > Oliver Scholz <alkibiades@gmx.de> writes: [...] >> Currently I wonder what the default values should be that a >> "fontset-widget" should present to the user. The return-value of >> `fontset-info' is simply to large, I think. It seems weird that a user >> should specify a fontset covering two or three charsets in her .emacs >> or in .Xresources -- and is confronted with a list of dozens of >> charsets then, when she wants to customize it later. > > I don't understand. When you run M-x customize-fontset RET > fontset-default RET, you expect to see a widget that explains the > value of "fontset-default", and offer a way to change it. Thus, all > the info returned by (fontset-info "fontset-default") must be visible > and editable at some point.[...] To give an example: I have the following in my .emacs (simplified): (create-fontset-from-fontset-spec "\ -b&h-lucidatypewriter-medium-r-*-*-*-100-*-*-*-*-fontset-egoge,\ mule-unicode-0100-24ff:-*-fixed-medium-r-*-*-*-120-*-*-*-*-iso10646-1") Now, if I'd do `M-x custimize-fontset RET fontset-egoge RET', I'd expect to see something like this: Family: [b&h-lucidatypwriter ] [INS] [DEL] Charset: [mule-unicode-0100-24ff ] Family: [fixed-medium ] Registry: [iso10646-1 [INS] But if we create this widget based on `fontset-info', I'd see a list of dozens of charsets and character ranges. Have a look at it with `M-x describe-fontset RET fontset-egoge RET'. I'd say that this would be surprising, if not confusing for users that are not familiar with Emacs' concepts of charsets and fontsets. Oliver -- Oliver Scholz 14 Floréal an 211 de la Révolution Taunusstr. 25 Liberté, Egalité, Fraternité! 60329 Frankfurt a. M. http://www.jungdemokratenhessen.de Tel. (069) 97 40 99 42 http://www.jdjl.org ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2003-05-19 13:49 UTC | newest] Thread overview: 55+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-04-25 16:12 Cyrillic vs UTF-8 Simon Josefsson 2003-04-25 16:40 ` Eli Zaretskii 2003-04-25 17:09 ` Simon Josefsson 2003-04-25 22:39 ` Eli Zaretskii 2003-04-26 8:11 ` Kenichi Handa 2003-04-26 12:25 ` Simon Josefsson 2003-04-28 9:18 ` Kenichi Handa 2003-04-28 11:11 ` Simon Josefsson 2003-04-26 16:21 ` Benjamin Riefenstahl 2003-04-26 16:27 ` Benjamin Riefenstahl 2003-04-28 4:38 ` Richard Stallman 2003-05-01 8:27 ` Kenichi Handa 2003-05-02 7:06 ` Richard Stallman 2003-05-02 21:51 ` Eli Zaretskii 2003-05-03 13:37 ` Juanma Barranquero 2003-05-03 19:04 ` Eli Zaretskii 2003-05-04 13:03 ` Richard Stallman 2003-05-04 11:04 ` Dave Love 2003-05-04 12:01 ` Simon Josefsson 2003-05-04 17:13 ` Dave Love 2003-05-04 18:03 ` Simon Josefsson 2003-05-05 8:47 ` Kenichi Handa 2003-04-26 13:44 ` Richard Stallman 2003-04-26 14:10 ` Simon Josefsson 2003-04-28 21:49 ` Stefan Monnier 2003-04-28 22:29 ` Simon Josefsson 2003-04-29 13:49 ` Stefan Monnier 2003-04-29 14:27 ` Simon Josefsson 2003-04-30 4:42 ` Stephen J. Turnbull 2003-04-30 5:43 ` Richard Stallman 2003-05-19 0:40 ` Kenichi Handa 2003-05-19 0:52 ` Stefan Monnier 2003-05-19 2:31 ` Kenichi Handa 2003-05-19 13:28 ` Stefan Monnier 2003-05-19 13:49 ` Stefan Monnier 2003-04-25 16:54 ` Simon Josefsson 2003-04-26 3:55 ` Implementing charset-aware X font names [was: Cyrillic vs UTF-8] Stephen J. Turnbull 2003-04-28 11:09 ` Kenichi Handa 2003-04-28 12:27 ` Implementing charset-aware X font names Stephen J. Turnbull 2003-05-01 11:13 ` Kenichi Handa 2003-05-01 14:14 ` Alex Schroeder 2003-05-01 23:16 ` Kenichi Handa 2003-04-26 7:59 ` Cyrillic vs UTF-8 Kenichi Handa 2003-04-26 12:14 ` Simon Josefsson 2003-05-01 7:20 ` Kenichi Handa 2003-05-01 14:06 ` Alex Schroeder 2003-05-01 18:03 ` Customizing fontsets (was: Cyrillic vs UTF-8) Oliver Scholz 2003-05-02 5:17 ` Customizing fontsets Alex Schroeder 2003-05-02 6:32 ` Kenichi Handa 2003-05-02 13:25 ` Stefan Monnier 2003-05-03 0:40 ` Oliver Scholz 2003-05-03 1:50 ` Kenichi Handa 2003-05-03 12:08 ` Oliver Scholz 2003-05-07 1:22 ` Kenichi Handa 2003-05-03 0:33 ` Oliver Scholz
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.