* Several serious problems @ 2002-07-22 17:11 Richard Stallman 2002-07-22 19:01 ` Andre Spiegel ` (6 more replies) 0 siblings, 7 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-22 17:11 UTC (permalink / raw) Cc: emacs-devel I cannot save the file lisp/ChangeLog. It specifies coding system iso-2022-7bit, but it contains something that cannot be encoded in that coding system. I don't know any way to find the text that causes the problem; essentially I am helpless. Handa-san, would you please clean up whatever is wrong with that file so that it can save properly once again? We MUST do something to make it easier for users to cope with such a situation. We talked about this a few weeks ago but nothing was done. Perhaps we could add a command which simply scans forward for the next run of characters that can't be saved in the specified coding system. The message you get in that situation could tell you about this command. This would be a powerful solution, since you could easily find all the problems, not just the first one. Highlighting all of them would also be a useful thing to do. This problem prevented me from commiting changes to the file from Emacs. I was able to edit and save the file using find-file-literally, but when I tried to commit the changes, C-x v v tried to revisit the file non-literally. I think that is a serious bug in VC. VC should cope with visiting a file literally. Andre, would you please fix that? So I tried typing `cd lisp; cvs commit ChangeLog'. It put me into vi to ask me to edit a log message. Damn! I killed it, set EDITOR and VISUAL to `emacs', and tried again. This time it gave me Emacs to edit with. I deleted all the text, saved the log message file, and exited Emacs. cvs obnoxiously complained about the empty log message and asked me what to do. I typed `c RET' meaning "continue". At that point it never came back to me. Now the emacs/lisp directory is locked and nobody can do anything in it any more. Savannah people, would you please delete the lock? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman @ 2002-07-22 19:01 ` Andre Spiegel 2002-07-22 19:03 ` Andre Spiegel ` (5 subsequent siblings) 6 siblings, 0 replies; 90+ messages in thread From: Andre Spiegel @ 2002-07-22 19:01 UTC (permalink / raw) Cc: handa, emacs-devel > Handa-san, would you please clean up whatever is wrong with that file > so that it can save properly once again? When I visit the ChangeLog, Kai's most recent entry from 2002-07-21 displays with a german sharp 's' (ß), but all of his former entries have a \337 in place of it. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman 2002-07-22 19:01 ` Andre Spiegel @ 2002-07-22 19:03 ` Andre Spiegel 2002-07-23 4:00 ` Richard Stallman 2002-07-22 19:03 ` Andreas Schwab ` (4 subsequent siblings) 6 siblings, 1 reply; 90+ messages in thread From: Andre Spiegel @ 2002-07-22 19:03 UTC (permalink / raw) Cc: handa, emacs-devel > This problem prevented me from commiting changes to the file from > Emacs. I was able to edit and save the file using > find-file-literally, but when I tried to commit the changes, C-x v v > tried to revisit the file non-literally. I think that is a serious > bug in VC. VC should cope with visiting a file literally. > Andre, would you please fix that? It is fixed now. I've installed the patch in vc.el, but haven't made an entry in the ChangeLog yet, since it still seems corrupted. Will do so after it's been cleaned up. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 19:03 ` Andre Spiegel @ 2002-07-23 4:00 ` Richard Stallman 0 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-23 4:00 UTC (permalink / raw) Cc: handa, emacs-devel Thanks for jumping right on the problem. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman 2002-07-22 19:01 ` Andre Spiegel 2002-07-22 19:03 ` Andre Spiegel @ 2002-07-22 19:03 ` Andreas Schwab 2002-07-23 18:58 ` Richard Stallman 2002-07-22 19:11 ` Andre Spiegel ` (3 subsequent siblings) 6 siblings, 1 reply; 90+ messages in thread From: Andreas Schwab @ 2002-07-22 19:03 UTC (permalink / raw) Cc: handa, spiegel, savannah-hackers, emacs-devel, dominik Richard Stallman <rms@gnu.org> writes: |> I cannot save the file lisp/ChangeLog. It specifies coding system |> iso-2022-7bit, but it contains something that cannot be encoded in that |> coding system. I don't know any way to find the text that causes the |> problem; essentially I am helpless. It was the last commit by Carsten Dominik which broke the file. I have now fixed it by visiting as iso-latin-1, fixing the two remaining iso-2022 encoded characters and then saving it again in the right encoding. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 19:03 ` Andreas Schwab @ 2002-07-23 18:58 ` Richard Stallman 0 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-23 18:58 UTC (permalink / raw) Cc: handa, spiegel, savannah-hackers, emacs-devel, dominik It was the last commit by Carsten Dominik which broke the file. Carsten, can you figure out what action it was that broke the file? Can you find a way to reproduce it (prefereably without checking in the broken version!)? We need to figure this out so we can make changes to remove the risk users will do this. Andreas, thanks for fixing the file. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (2 preceding siblings ...) 2002-07-22 19:03 ` Andreas Schwab @ 2002-07-22 19:11 ` Andre Spiegel 2002-07-23 4:42 ` Karl Eichwalder ` (2 subsequent siblings) 6 siblings, 0 replies; 90+ messages in thread From: Andre Spiegel @ 2002-07-22 19:11 UTC (permalink / raw) Cc: handa, emacs-devel > This problem prevented me from commiting changes to the file from > Emacs. I was able to edit and save the file using > find-file-literally, but when I tried to commit the changes, C-x v v > tried to revisit the file non-literally. I think that is a serious > bug in VC. VC should cope with visiting a file literally. > Andre, would you please fix that? It is fixed now. I've installed the patch in vc.el, but haven't made an entry in the ChangeLog yet, since it still seems corrupted. Will do so after it's been cleaned up. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (3 preceding siblings ...) 2002-07-22 19:11 ` Andre Spiegel @ 2002-07-23 4:42 ` Karl Eichwalder 2002-07-24 3:25 ` Richard Stallman 2002-07-23 13:35 ` Kenichi Handa 2002-08-09 4:41 ` Stefan Monnier 6 siblings, 1 reply; 90+ messages in thread From: Karl Eichwalder @ 2002-07-23 4:42 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > We MUST do something to make it easier for users to cope with such a > situation. We talked about this a few weeks ago but nothing was done. Yes, you are right. As said months ago I hve to fix those files quite often; users don't how to do it on their own. Often it's getting even worse: Emacs proposes a "secure" encoding and when users go for it, all looks well until you want to process such a file with TeX... Please add this issue to etc/TODO. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-23 4:42 ` Karl Eichwalder @ 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:43 ` Karl Eichwalder 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-07-24 3:25 UTC (permalink / raw) Cc: handa, emacs-devel Often it's getting even worse: Emacs proposes a "secure" encoding and when users go for it, all looks well until you want to process such a file with TeX... I am not really sure what that means--would you please explain? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-24 3:25 ` Richard Stallman @ 2002-07-24 4:43 ` Karl Eichwalder 2002-07-25 3:12 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Karl Eichwalder @ 2002-07-24 4:43 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Often it's getting even worse: Emacs proposes a > "secure" encoding and when users go for it, all looks well until you > want to process such a file with TeX... > > I am not really sure what that means--would you please explain? We discussed the issue several times (e.g. under the subject "lisp/ChangeLog coding system"); here is a good remark by Stephen J. Turnbull. Yes, that's a different from your problem, but it's cause by the same implementation concept (enabling unification might cure most of these problems -- thus it's very important to release an Emacs with this feature, all released Emacs 21.x versions destroy user files at random...): From: "Stephen J. Turnbull" <stephen@xemacs.org> Subject: Re: lisp/ChangeLog coding system To: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> Cc: Eli Zaretskii <eliz@is.elta.co.il>, emacs-devel@gnu.org Date: 29 Apr 2002 20:28:55 +0900 >>>>> "Stefan" == Stefan Monnier <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> One aspect is making better guesses about desired coding >> systems. Stefan> I'm not sure what kind of improvements you're thinking Stefan> about. Well, in the version (mid-January, maybe?) of GNU Emacs I have, when I tried saving a buffer with mixed ascii, latin-1, and latin-2 in it, it gave me an abominably long list of coding systems including mule internal, all the -with-esc systems, and iso-2022-jp-2. But all of the characters used in the buffer are in ISO-8859-2, it's just Mule making false distinctions. At the very least, the defaults in Emacs should be to identify identical characters (eg, those from the Latin-## subsets) and to distinguish those where unification is controversial (the Han ideographs). Stefan> non-MIME coding-systems should be in the "unlikely" list, tho. There is no unique "the unlikely list". For example, if I were Croatian, I probably would want the buffer described above saved in ISO-8859-2 without being asked, but a German would probably want to save it in UTF-8 (or maybe ISO-2022-7 if she were an Emacs developer), or be queried, defaulting to ISO-8859-2. And some of the "universal" coding systems (UTF-32, mule internal, all the -with-esc systems) should probably not even be offered to most users; they should have to ask for them by name. But people with special needs should be able to configure them for regular use. And what's a "non-MIME coding system"? AFAIK MIME has nothing to do with coding systems except that the notation "the preferred MIME name" is a useful convention. But KOI8-R and all the Windows-125x sets are MIME registered. Stefan> Looking at the README, I have the impression that most of Stefan> the functionality is already part of the Emacs CVS code Stefan> (mostly thanks to Dave's ucs-tables.el). Someone should Stefan> try and figure out the details. As for most functionality being in Emacs, yes, that's why I said I'd help refactor; relative to ucs-tables.el the contribution is all UI. My duplication[1] of ucs-tables is straightforward, not terribly efficient code; all the meat is devoted to the question of "how do we know which coding systems to offer the user". Specifically I address the issues of preferred unibyte systems and preferred universal systems described above. Footnotes: [1] XEmacs 21.5 has built-in support for Unicode. The UCS tables are loaded at startup from (a local copy of) the Unicode Consortium tables, and an API is provided to reload if desirable. The code predates the release of Emacs 21, and so is different from ucs-tables.el, unfortunately. The duplicative parts are for 21.4. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-24 4:43 ` Karl Eichwalder @ 2002-07-25 3:12 ` Richard Stallman 2002-07-25 3:24 ` Karl Eichwalder 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-07-25 3:12 UTC (permalink / raw) Cc: handa, emacs-devel > Often it's getting even worse: Emacs proposes a > "secure" encoding and when users go for it, all looks well until you > want to process such a file with TeX... > > I am not really sure what that means--would you please explain? We discussed the issue several times (e.g. under the subject "lisp/ChangeLog coding system"); I did not recognize the issue because you said "a 'secure' encoding" and that is not a term we normally use. Well, in the version (mid-January, maybe?) of GNU Emacs I have, when I tried saving a buffer with mixed ascii, latin-1, and latin-2 in it, it gave me an abominably long list of coding systems including mule internal, all the -with-esc systems, and iso-2022-jp-2. But all of the characters used in the buffer are in ISO-8859-2, it's just Mule making false distinctions. The current development version of Emacs enables unify-8859-on-encoding-mode; does that solve this problem? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-25 3:12 ` Richard Stallman @ 2002-07-25 3:24 ` Karl Eichwalder 2002-07-26 15:35 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Karl Eichwalder @ 2002-07-25 3:24 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > I did not recognize the issue because you said "a 'secure' encoding" > and that is not a term we normally use. I thought that is were the Emacs wording. Sorry. > The current development version of Emacs enables > unify-8859-on-encoding-mode; does that solve this problem? Yes, that helps a lot. I must go into the RC branch, please, to make it available to the public. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-25 3:24 ` Karl Eichwalder @ 2002-07-26 15:35 ` Richard Stallman 2002-07-27 3:19 ` Karl Eichwalder 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-07-26 15:35 UTC (permalink / raw) Cc: handa, emacs-devel Yes, that helps a lot. I must go into the RC branch, please, to make it available to the public. Have we already considered this possibility? I can't remember, but chances are we would have considered it. It might depend on too many other changes to be easy to put into RC. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-26 15:35 ` Richard Stallman @ 2002-07-27 3:19 ` Karl Eichwalder 2002-07-29 1:12 ` Richard Stallman 2002-08-09 7:42 ` Stefan Monnier 0 siblings, 2 replies; 90+ messages in thread From: Karl Eichwalder @ 2002-07-27 3:19 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Yes, that helps a lot. I must go into the RC branch, please, to make > it available to the public. > > It might depend on too many other changes to be easy to put into RC. Since such a patch would prevent file corruptions from happening it's worth all effort. IIRC, the reason not to install the unification feature was: "it isn't tested enough". Of course, this argument isn't valid since we need a solution for a known problem -- users already suffering too long. Without the unification feature I cannot recommend Emacs 21.x to european users having to deal with latin1 and latin9 encodings. At the moment, they are better served using Emacs from the CVS trunk. Thanks for considering the issue and for your answer! -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-27 3:19 ` Karl Eichwalder @ 2002-07-29 1:12 ` Richard Stallman 2002-07-29 14:32 ` Karl Eichwalder 2002-08-09 7:42 ` Stefan Monnier 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-07-29 1:12 UTC (permalink / raw) Cc: handa, emacs-devel Could you make a patch that installs in the RC branch and that works for you? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-29 1:12 ` Richard Stallman @ 2002-07-29 14:32 ` Karl Eichwalder 2002-07-30 1:00 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Karl Eichwalder @ 2002-07-29 14:32 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Could you make a patch that installs in the RC branch and that works > for you? I fear that's too complicate for me. On 21.1 I installed the files Dave Love posted; when Dave's enhancements were added to the CVS HEAD I switch to the CVS HEAD version (and forgot all about the release branch). Maybe the one who installed Dave's files on the trunck can do the same on the release branch? I guess it happened here to the HEAD: 2001-12-07 Dave Love <fx@gnu.org> and later unification was enabled by default. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-29 14:32 ` Karl Eichwalder @ 2002-07-30 1:00 ` Richard Stallman 0 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-30 1:00 UTC (permalink / raw) Cc: handa, emacs-devel Maybe the one who installed Dave's files on the trunck can do the same on the release branch? I don't know who that was or whether he will do this. Anyone who would like to make this happen, I invite to work on it. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-27 3:19 ` Karl Eichwalder 2002-07-29 1:12 ` Richard Stallman @ 2002-08-09 7:42 ` Stefan Monnier 2002-08-09 16:08 ` Karl Eichwalder 2002-08-10 17:16 ` Richard Stallman 1 sibling, 2 replies; 90+ messages in thread From: Stefan Monnier @ 2002-08-09 7:42 UTC (permalink / raw) Cc: rms, handa, emacs-devel > > Yes, that helps a lot. I must go into the RC branch, please, to make > > it available to the public. > > > > It might depend on too many other changes to be easy to put into RC. > > Since such a patch would prevent file corruptions from happening it's > worth all effort. IIRC, the reason not to install the unification > feature was: "it isn't tested enough". Of course, this argument isn't > valid since we need a solution for a known problem -- users already > suffering too long. ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3. It is not turned on by default, tho. I think it's safe to turn on unify-8859-on-encoding-mode (as is done on the trunk), but I'll let others judge. After all, it's supposed to be a bug-fix release and this is not quite a bug-fix in that things work as designed (it's just that the design doesn't do what the user wants). Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-09 7:42 ` Stefan Monnier @ 2002-08-09 16:08 ` Karl Eichwalder 2002-08-10 17:16 ` Richard Stallman 1 sibling, 0 replies; 90+ messages in thread From: Karl Eichwalder @ 2002-08-09 16:08 UTC (permalink / raw) Cc: emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > ucs-tables is installed in the RC branch and will thus be part of > Emacs-21.3. Since 2002-07-11, great! And it is even mentioned in NEWS. Just today I started to switch to the RC branch; now I'll use it for my daily work. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-09 7:42 ` Stefan Monnier 2002-08-09 16:08 ` Karl Eichwalder @ 2002-08-10 17:16 ` Richard Stallman 2002-08-12 16:20 ` Stefan Monnier 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-10 17:16 UTC (permalink / raw) Cc: keichwa, handa, emacs-devel ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3. It is not turned on by default, tho. I think it's safe to turn on unify-8859-on-encoding-mode (as is done on the trunk), but I'll let others judge. I think we should try this. File corruption is a bug, and if we can fix it, we should. Can you or someone show me precisely what change is needed? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-10 17:16 ` Richard Stallman @ 2002-08-12 16:20 ` Stefan Monnier 2002-08-13 1:48 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-12 16:20 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, handa, emacs-devel > ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3. > It is not turned on by default, tho. I think it's safe to turn on > unify-8859-on-encoding-mode (as is done on the trunk), but I'll let > others judge. > > I think we should try this. File corruption is a bug, and if we can > fix it, we should. > > Can you or someone show me precisely what change is needed? I think we just need to add a call like (load "ucs-tables") (unify-8859-on-encoding-mode 1) to startup.el (and add ucs-tables.el to the list of files that are dumped). Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-12 16:20 ` Stefan Monnier @ 2002-08-13 1:48 ` Richard Stallman 2002-08-15 2:30 ` Karl Eichwalder 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-13 1:48 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, handa, emacs-devel I think we just need to add a call like (load "ucs-tables") (unify-8859-on-encoding-mode 1) to startup.el (and add ucs-tables.el to the list of files that are dumped). Eli, or someone else, can you try this in RC and see how it works? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-13 1:48 ` Richard Stallman @ 2002-08-15 2:30 ` Karl Eichwalder 2002-08-15 2:47 ` Stefan Monnier 0 siblings, 1 reply; 90+ messages in thread From: Karl Eichwalder @ 2002-08-15 2:30 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > I think we just need to add a call like > > (load "ucs-tables") > (unify-8859-on-encoding-mode 1) > > to startup.el (and add ucs-tables.el to the list of files that are > dumped). Excuse my ignorance: do you really mean startup.el? > > Eli, or someone else, can you try this in RC and see how it works? ATM, I'm running the appended patch without problems. I guess, it's a know limitation that unification of characters different from the latin-1 set, isn't supported by the RC branch? I can unify a-umlaut from latin-2; but unification does not take place for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). Index: src/puresize.h =================================================================== RCS file: /cvsroot/emacs/emacs/src/puresize.h,v retrieving revision 1.57.14.1 diff -u -r1.57.14.1 puresize.h *** src/puresize.h 22 Feb 2002 11:21:04 -0000 1.57.14.1 --- src/puresize.h 15 Aug 2002 02:18:49 -0000 *************** *** 42,48 **** #endif #ifndef BASE_PURESIZE ! #define BASE_PURESIZE (710000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA) #endif /* Increase BASE_PURESIZE by a ratio depending on the machine's word size. */ --- 42,48 ---- #endif #ifndef BASE_PURESIZE ! #define BASE_PURESIZE (715000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA) #endif /* Increase BASE_PURESIZE by a ratio depending on the machine's word size. */ Index: lisp/loadup.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/loadup.el,v retrieving revision 1.113 diff -u -r1.113 loadup.el *** lisp/loadup.el 15 Jul 2001 16:15:34 -0000 1.113 --- lisp/loadup.el 15 Aug 2002 02:18:49 -0000 *************** *** 106,111 **** --- 106,115 ---- (load "language/tibetan") (load "language/vietnamese") (load "language/misc-lang") + (load "international/ucs-tables") + (unify-8859-on-encoding-mode 1) + ;; (ucs-unify-8859 'encode-only) + (update-coding-systems-internal) (load "indent") -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-15 2:30 ` Karl Eichwalder @ 2002-08-15 2:47 ` Stefan Monnier 2002-08-15 5:31 ` Karl Eichwalder 0 siblings, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-15 2:47 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, handa, emacs-devel > > I think we just need to add a call like > > > > (load "ucs-tables") > > (unify-8859-on-encoding-mode 1) > > > > to startup.el (and add ucs-tables.el to the list of files that are > > dumped). > > Excuse my ignorance: do you really mean startup.el? Sorry, I meant loadup.el, of course. > > Eli, or someone else, can you try this in RC and see how it works? > > ATM, I'm running the appended patch without problems. I guess, it's a > know limitation that unification of characters different from the > latin-1 set, isn't supported by the RC branch? I'm not sure I understand, but I'm pretty sure it's known ;-) If you mean that a latin-2 char is not the same as a unicode char (e.g. for searching purposes), then you just need to use unify-8859-on-decoding-mode as well. This can't be the default because it has a few undesirable side-effects (harmless for the typical user, but annoying for people working on some Emacs files such as ucs-*.el where we do want to be able to talk about the difference between a latin-2 and a unicode char). > I can unify a-umlaut from latin-2; but unification does not take place > for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). I don't understand what you mean "the unification does not take place". Please just explain step by step what you did and what you expected as if we were terminally stupid (this seems necessary when discussing such things as unification because it has various different meanings in the context of the current Mule code and it's too often difficult to know which one we're talking about). Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-15 2:47 ` Stefan Monnier @ 2002-08-15 5:31 ` Karl Eichwalder 2002-08-15 15:30 ` Stefan Monnier 0 siblings, 1 reply; 90+ messages in thread From: Karl Eichwalder @ 2002-08-15 5:31 UTC (permalink / raw) Cc: rms, handa, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> I can unify a-umlaut from latin-2; but unification does not take place >> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). > > I don't understand what you mean "the unification does not take > place". Here is a recipe: Starting from an Latin-1 environment enter: Grüß Gott! C-x C-s (buffer is latin-1 encoded) Switch input encoding: C-x RET C-\ latin-2-prefix RET; enter: Dobr'y den ("'y" becomes one char, y with accent) C-x C-s (buffer stays latin-1 encoded, okay) Enter: Dzie'n dobry! ("'n" becomes one char, n with accent, not available in Latin-1) C-x C-s Emacs proposes iso-8859-2, okay, but I would have preferred UTF-8. C-x RET C-\ TeX RET; enter: \euro ("\euro becomes one char, the euro symbol, missing from latin-2) Emacs (RC) isn't able to unify the buffer to UTF-8 (it proposes "x-ctext" etc.); but Emacs (trunk version) can save the buffer UTF-8 encoded. Hope this helps. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-15 5:31 ` Karl Eichwalder @ 2002-08-15 15:30 ` Stefan Monnier 2002-08-15 17:33 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-15 15:30 UTC (permalink / raw) Cc: Stefan Monnier, rms, handa, emacs-devel, fx > >> I can unify a-umlaut from latin-2; but unification does not take place > >> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). > > > > I don't understand what you mean "the unification does not take > > place". [...] > Emacs (RC) isn't able to unify the buffer to UTF-8 (it proposes > "x-ctext" etc.); but Emacs (trunk version) can save the buffer UTF-8 > encoded. Hope this helps. Indeed, the safe-charsets property of the utf-8 coding-system has not been updated to list the extra charsets it can now encode. In the trunk utf-8.el says: '((safe-charsets ascii eight-bit-control eight-bit-graphic latin-iso8859-1 latin-iso8859-15 latin-iso8859-14 latin-iso8859-9 hebrew-iso8859-8 greek-iso8859-7 cyrillic-iso8859-5 latin-iso8859-4 latin-iso8859-3 latin-iso8859-2 vietnamese-viscii-lower vietnamese-viscii-upper thai-tis620 ipa ethiopic indian-is13194 katakana-jisx0201 chinese-sisheng lao mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff) where in the RC branch it only says '((safe-charsets ascii eight-bit-control eight-bit-graphic latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff) And turning on unify-8859-on-encoding-mode doesn't update the corresponding info either. I think Dave or Handa would now better how to fix that (whether unify-8859-on-encoding-mode should change the safe-charsets or whether it should simply always include the new charsets and load ucs-tables when needed. And also which charsets should be added). Thank you for pointing it out. Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-15 15:30 ` Stefan Monnier @ 2002-08-15 17:33 ` Dave Love 0 siblings, 0 replies; 90+ messages in thread From: Dave Love @ 2002-08-15 17:33 UTC (permalink / raw) Cc: Karl Eichwalder, rms, handa, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > Indeed, the safe-charsets property of the utf-8 coding-system has not been > updated to list the extra charsets it can now encode. I hope whatever's been changed has been properly tested if it's on the release branch. Please get handa to check it if he hasn't already. > I think Dave or Handa would now better how to fix that (whether > unify-8859-on-encoding-mode should change the safe-charsets or whether > it should simply always include the new charsets and load ucs-tables > when needed. And also which charsets should be added). Whoever changed it should sort it out. [Actually the stuff on the trunk should really use the encoding translation table to set `safe-chars', which would need to be re-registered if it changed, assuming that utf-8.el is how I left it. However, the default does encode the listed charsets completely and was unaffected by `unify-8859-on-encoding-mode' -- it deals with more than 8859 anyhow.] ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (4 preceding siblings ...) 2002-07-23 4:42 ` Karl Eichwalder @ 2002-07-23 13:35 ` Kenichi Handa 2002-07-23 13:52 ` Alan Shutko 2002-07-24 3:25 ` Richard Stallman 2002-08-09 4:41 ` Stefan Monnier 6 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-07-23 13:35 UTC (permalink / raw) Cc: spiegel, savannah-hackers, emacs-devel In article <200207221711.g6MHBZo02496@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > I cannot save the file lisp/ChangeLog. It specifies coding system > iso-2022-7bit, but it contains something that cannot be encoded in that > coding system. It seem that this problem was already fixed. As I also found one unnecessary mule-unicode-0100-24ff char, I deleted it. > I don't know any way to find the text that causes the > problem; essentially I am helpless. At least, (find-charset-region 1 (point-max)) will give you some information. If the returned value contains a suspicious charset, we can search it (if it's not eight-bit-xxx) by: (re-search-forward "[%c-%c]" (make-char CHARSET 32 32) (make-char CHARSET 127 127)) To search for eight-bit-control: (re-search-forward "[\200-\237]") To search for eight-bit-graphic: (re-search-forward (string-as-multibyte "[\240-\377]")) It's not sophisticated. :-( > We MUST do something to make it easier for users to cope with such a > situation. We talked about this a few weeks ago but nothing was done. > Perhaps we could add a command which simply scans forward for the next > run of characters that can't be saved in the specified coding system. > The message you get in that situation could tell you about this > command. This would be a powerful solution, since you could easily > find all the problems, not just the first one. Highlighting all of > them would also be a useful thing to do. Do you mean a command something like this? (defun check-coding-system-region (from to coding-system &optional max-num) "Check if the text after point is encodable by the specified coding system. When called from a program, takes three arguments: CODING-SYSTEM, FROM, and TO. START and END are buffer positions. Value is a list of positions of characters that are not encodable by CODING-SYSTEM. Optional 4th argument MAX-NUM, if non-nil, limits the length of returned list. By default, there's no limit." (interactive (list (point) (point-max) (read-non-nil-coding-system "Coding-system: ") 1)) (check-coding-system coding-system) (or (and coding-system (integerp (coding-system-type coding-system))) (error "Invalid coding system to check: %s" coding-system)) (let ((safe-chars (coding-system-get coding-system 'safe-chars)) (positions) (n 0)) (save-excursion (save-restriction (narrow-to-region from to) (goto-char (point-min)) (or max-num (setq max-num (- (point-max) (point-min)))) (if (eq safe-chars t) (let ((re (string-as-multibyte "[\200-\237\240-\377]"))) (while (and (< n max-num) (re-search-forward re nil t)) (setq positions (cons (1- (point)) positions) n (1+ n)))) (while (and (< n max-num) (re-search-forward "[^\000-\177]" nil t)) (or (aref safe-chars (preceding-char)) (setq positions (cons (1- (point)) positions) n (1+ n))))))) (if (interactive-p) (if (not positions) (message "All characters are encodable by %s" coding-system) (goto-char (car positions)) (error "This character can't be encoded by %s" coding-system)) (setq positions (nreverse positions))))) --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-23 13:35 ` Kenichi Handa @ 2002-07-23 13:52 ` Alan Shutko 2002-07-24 3:25 ` Richard Stallman 2002-07-24 3:25 ` Richard Stallman 1 sibling, 1 reply; 90+ messages in thread From: Alan Shutko @ 2002-07-23 13:52 UTC (permalink / raw) Cc: rms, spiegel, savannah-hackers, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > It seem that this problem was already fixed. As I also > found one unnecessary mule-unicode-0100-24ff char, I deleted > it. I took a quick look, and I think these are the commits that didn't make it into the ChangeLog: RCS file: /cvsroot/emacs/emacs/lisp/cus-start.el,v total revisions: 55; selected revisions: 1 description: Add customization information for intrinsics. ---------------------------- revision 1.51 date: 2002/07/22 15:22:49; author: rms; state: Exp; lines: +1 -0 (double-click-fuzz): Added. ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/vc.el,v total revisions: 341; selected revisions: 1 description: ;;; vc.el --- drive a version-control system from within Emacs ---------------------------- revision 1.335 date: 2002/07/22 18:52:04; author: spiegel; state: Exp; lines: +7 -6 (vc-next-action-on-file): Preserve find-file-literally. ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/cal-hebrew.el,v total revisions: 13; selected revisions: 1 description: ---------------------------- revision 1.13 date: 2002/07/22 15:31:13; author: rms; state: Exp; lines: +94 -77 (diary-omer, diary-yahrzeit, diary-rosh-hodesh, diary-parasha, diary-parasha): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/diary-lib.el,v total revisions: 55; selected revisions: 1 description: ---------------------------- revision 1.55 date: 2002/07/22 15:32:00; author: rms; state: Exp; lines: +96 -89 (mark-sexp-diary-entries): Retrieve mark from diary-sexp-entry and pass it to mark-visible-calendar-date. (list-sexp-diary-entries): Update doc string for new docs for .... If diary-sexp-entry returns a cons, only add the text to the diary list. (diary-sexp-entry): Allow sexps to return a cons of the form (MARK . STRING) to specify what face or character mark should be used in the calendar display. (diary-date, diary-block, diary-float, diary-anniversary) (diary-cyclic): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). (check-calendar-holidays, diary-iso-date) (calendar-holiday-list, diary-french-date, diary-mayan-date) (diary-julian-date, diary-astro-day-number, diary-chinese-date) (diary-islamic-date, list-islamic-diary-entries) (mark-islamic-diary-entries, mark-islamic-calendar-date-pattern) (diary-hebrew-date, diary-omer, diary-yahrzeit, diary-parasha) (diary-rosh-hodesh, list-hebrew-diary-entries) (mark-hebrew-diary-entries, mark-hebrew-calendar-date-pattern) (diary-coptic-date, diary-persian-date, diary-phases-of-moon) (diary-sunrise-sunset, diary-sabbath-candles): Remove interactive flag from autoloads. ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/lunar.el,v total revisions: 18; selected revisions: 1 description: ;;; lunar.el --- calendar functions for phases of the moon. ---------------------------- revision 1.18 date: 2002/07/22 15:30:43; author: rms; state: Exp; lines: +7 -4 (diary-phases-of-moon): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/solar.el,v total revisions: 45; selected revisions: 1 description: ;;; solar.el --- calendar functions for solar events. ---------------------------- revision 1.44 date: 2002/07/22 15:30:24; author: rms; state: Exp; lines: +8 -4 (diary-sabbath-candles): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/net/browse-url.el,v total revisions: 24; selected revisions: 1 description: ---------------------------- revision 1.23 date: 2002/07/22 15:21:41; author: rms; state: Exp; lines: +7 -3 (browse-url-lynx-input-attempts): Use defcustom. (browse-url-lynx-input-delay): Add custom type and group. ============================================================================= -- Alan Shutko <ats@acm.org> - In a variety of flavors! I failed as a proof-reader for M & M's. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-23 13:52 ` Alan Shutko @ 2002-07-24 3:25 ` Richard Stallman 0 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-24 3:25 UTC (permalink / raw) Cc: handa, spiegel, savannah-hackers, emacs-devel I took a quick look, and I think these are the commits that didn't make it into the ChangeLog: I think these are all included now, thanks. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-23 13:35 ` Kenichi Handa 2002-07-23 13:52 ` Alan Shutko @ 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:37 ` Kenichi Handa 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-07-24 3:25 UTC (permalink / raw) Cc: spiegel, savannah-hackers, emacs-devel Do you mean a command something like this? (defun check-coding-system-region (from to coding-system &optional max-num) "Check if the text after point is encodable by the specified coding system. When called from a program, takes three arguments: CODING-SYSTEM, FROM, and TO. START and END are buffer positions. Value is a list of positions of characters that are not encodable by CODING-SYSTEM. Optional 4th argument MAX-NUM, if non-nil, limits the length of returned list. By default, there's no limit." This could do the internals of the job. To be useful, it needs a user interface. How about if you modify it to make overlays to highlight those characters instead of returning a list saying where they are? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-24 3:25 ` Richard Stallman @ 2002-07-24 4:37 ` Kenichi Handa 2002-07-25 3:12 ` Richard Stallman 2002-08-09 7:44 ` Several serious problems Stefan Monnier 0 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-07-24 4:37 UTC (permalink / raw) Cc: spiegel, emacs-devel In article <200207240325.g6O3PdX04898@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > Do you mean a command something like this? > (defun check-coding-system-region (from to coding-system &optional max-num) > "Check if the text after point is encodable by the specified coding system. > When called from a program, takes three arguments: > CODING-SYSTEM, FROM, and TO. START and END are buffer positions. > Value is a list of positions of characters that are not encodable by > CODING-SYSTEM. > Optional 4th argument MAX-NUM, if non-nil, limits the length of > returned list. By default, there's no limit." > This could do the internals of the job. To be useful, it needs a user > interface. Ooops, I forgot to include this sentence in the docstring. If an unencodable character is found, move point to that character. So, this function can be used both for an internal job and for an interactive job (to find the next unencodable character). > How about if you modify it to make overlays to highlight those characters > instead of returning a list saying where they are? If the specified coding system is totally inappropriate for the buffer, highlighting them will results in huge amount of overlays and also it takes long time to finish the job. If we limit the number of highlighting, it may give users incorrect information (i.e. non-highlighted characters seems to be encodable). So, I thought just moving point to the next unencodable character is better. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-24 4:37 ` Kenichi Handa @ 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader ` (2 more replies) 2002-08-09 7:44 ` Several serious problems Stefan Monnier 1 sibling, 3 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-25 3:12 UTC (permalink / raw) Cc: spiegel, emacs-devel If the specified coding system is totally inappropriate for the buffer, highlighting them will results in huge amount of overlays and also it takes long time to finish the job. That is true. If we limit the number of highlighting, it may give users incorrect information (i.e. non-highlighted characters seems to be encodable). It could highlight the first N runs of such characters, and display a message saying "Many more unencodable characters found--type WHATEVER to view them". WHATEVER could be the same command with a prefix argument. What do you think of that? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-25 3:12 ` Richard Stallman @ 2002-07-25 5:53 ` Miles Bader 2002-07-26 14:29 ` Francesco Potorti` 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2 siblings, 0 replies; 90+ messages in thread From: Miles Bader @ 2002-07-25 5:53 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel Richard Stallman <rms@gnu.org> writes: > If we limit the number of highlighting, it may give users > incorrect information (i.e. non-highlighted characters seems to be > encodable). > > It could highlight the first N runs of such characters, and display a > message saying "Many more unencodable characters found--type WHATEVER > to view them". WHATEVER could be the same command with a prefix > argument. I'd like something similar to the way isearch works (when highlighting non-current matches) -- just highlight what's currently displayed and give the user a chance to jump to the next instance. [Maybe it could even use jit-lock-functions or something to allow free movement in the buffer while still using optimizing display] -Miles -- Somebody has to do something, and it's just incredibly pathetic that it has to be us. -- Jerry Garcia ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader @ 2002-07-26 14:29 ` Francesco Potorti` 2002-07-27 18:52 ` Richard Stallman 2002-08-09 7:43 ` Stefan Monnier 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2 siblings, 2 replies; 90+ messages in thread From: Francesco Potorti` @ 2002-07-26 14:29 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel Recently, a package called buffer-charset.el was posted to gnu.emacs-sources. It uses the machinery of hi-lock to work, and it's wonderfully simple to use: you just do M-x show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is already active) and you're done. You are asked what charset you want to highlight, and if you don't know you just press TAB and choose from the list. The offending characters are highlighted. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-26 14:29 ` Francesco Potorti` @ 2002-07-27 18:52 ` Richard Stallman 2002-08-09 7:43 ` Stefan Monnier 1 sibling, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-07-27 18:52 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel Recently, a package called buffer-charset.el was posted to gnu.emacs-sources. It uses the machinery of hi-lock to work, and it's wonderfully simple to use: you just do M-x show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is already active) and you're done. You are asked what charset you want to highlight, and if you don't know you just press TAB and choose from the list. The offending characters are highlighted. This might be useful for some purposes, but it is not the right interface to be a convenient solution to this particular problem. The user knows that the file can't be encoded in a certain coding system but she does not know which character sets are the problem. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-26 14:29 ` Francesco Potorti` 2002-07-27 18:52 ` Richard Stallman @ 2002-08-09 7:43 ` Stefan Monnier 1 sibling, 0 replies; 90+ messages in thread From: Stefan Monnier @ 2002-08-09 7:43 UTC (permalink / raw) Cc: rms, handa, spiegel, emacs-devel > Recently, a package called buffer-charset.el was posted to > gnu.emacs-sources. It uses the machinery of hi-lock to work, and it's > wonderfully simple to use: you just do M-x > show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is > already active) and you're done. You are asked what charset you want to > highlight, and if you don't know you just press TAB and choose from the > list. The offending characters are highlighted. Charsets are irrelevant (they're only an obscure internal implementation detail). Users only care about coding-systems. Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* unencodable-char-position [Re: Several serious problems] 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader 2002-07-26 14:29 ` Francesco Potorti` @ 2002-08-11 1:59 ` Kenichi Handa 2002-08-12 17:06 ` Richard Stallman 2002-08-15 17:51 ` Dave Love 2 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-08-11 1:59 UTC (permalink / raw) Cc: spiegel, emacs-devel, d.love In article <200207250312.g6P3C9J06653@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. > That is true. > If > we limit the number of highlighting, it may give users > incorrect information (i.e. non-highlighted characters seems > to be encodable). > It could highlight the first N runs of such characters, and display a > message saying "Many more unencodable characters found--type WHATEVER > to view them". WHATEVER could be the same command with a prefix > argument. I implemented that and tried on several files. But, it seems that such kind of feature is not that helpful. In the case that the buffer contains many unencodable chars, usually the specified coding system is wrong, and we must use a different coding system. So, it is not that interesting to know where are the other unencodable characters. In the case that the buffer contains a few unencodable chars, as it's seldam that more than one of them appear in one window, highlighting the other unencodable chars is not that useful. By the way, I've just noticed that Dave has already installed the function `unencodable-char-position' in mule-cmds.el and used it in select-safe-coding-system. That function resembles to check-coding-system-region on which we are currently discussing. But, as the docstring says, it's slow. So, I commited these changes. (1) Re-implementation of unencodable-char-position in C while adding two optional arguments. ---------------------------------------------------------------------- unencodable-char-position is a built-in function. (unencodable-char-position START END CODING-SYSTEM &optional COUNT STRING) Return position of first un-encodable character in a region. START and END specfiy the region and CODING-SYSTEM specifies the encoding to check. Return nil if CODING-SYSTEM does encode the region. If optional 4th argument COUNT is non-nil, it specifies at most how many un-encodable characters to search. In this case, the value is a list of positions. If optional 5th argument STRING is non-nil, it is a string to search for un-encodable characters. In that case, START and END are indexes to the string. ---------------------------------------------------------------------- (2) New function `search-unencodable-char' for interactive use. It utilizes `unencodable-char-position'. ---------------------------------------------------------------------- (search-unencodable-char CODING-SYSTEM) Search forward from point for a character that is not encodable. It asks which coding system to check. If such a character is found, set point after that character. Otherwise, don't move point. When called from a program, the value is a position of the found character, or nil if all characters are encodable. ---------------------------------------------------------------------- It may be good to bind C-x RET s to this command. Could someone make this command more user friendly (e.g. improving messages)? It is also easy to modify this funciton to highlight a few more (or windowful) unencodable characters if you think that is surely helpful. (3) Make select-safe-coding-system to show (at most 10) unencodable characters for each default coding systems tried. Now, if any unencodable chars are found, one can type C-g to cancel further saving. As C-g doesn't hide *Warning* buffer, one can clik on the displayed unencodable chars to jump to the corresponding position in a buffer. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa @ 2002-08-12 17:06 ` Richard Stallman 2002-08-12 17:15 ` Stefan Monnier 2002-08-15 17:51 ` Dave Love 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-12 17:06 UTC (permalink / raw) Cc: spiegel, emacs-devel, d.love I implemented that and tried on several files. But, it seems that such kind of feature is not that helpful. In the case that the buffer contains many unencodable chars, usually the specified coding system is wrong, and we must use a different coding system. So, it is not that interesting to know where are the other unencodable characters. In the case that the buffer contains a few unencodable chars, as it's seldam that more than one of them appear in one window, highlighting the other unencodable chars is not that useful. These seem like persuasive arguments; it sounds good. How can I make a test case to observe it functioning? I tried but I couldn't get encoding to "fail". ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-12 17:06 ` Richard Stallman @ 2002-08-12 17:15 ` Stefan Monnier 2002-08-13 0:37 ` Kenichi Handa 0 siblings, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-12 17:15 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel, d.love > I implemented that and tried on several files. But, it > seems that such kind of feature is not that helpful. > > In the case that the buffer contains many unencodable chars, > usually the specified coding system is wrong, and we must > use a different coding system. So, it is not that > interesting to know where are the other unencodable > characters. > > In the case that the buffer contains a few unencodable > chars, as it's seldam that more than one of them appear in > one window, highlighting the other unencodable chars is not > that useful. > > These seem like persuasive arguments; it sounds good. > > How can I make a test case to observe it functioning? > I tried but I couldn't get encoding to "fail". Try to save the HELLO file in utf-8. Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-12 17:15 ` Stefan Monnier @ 2002-08-13 0:37 ` Kenichi Handa 2002-08-13 22:47 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Kenichi Handa @ 2002-08-13 0:37 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel, d.love In article <200208121715.g7CHFrw29709@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> How can I make a test case to observe it functioning? >> I tried but I couldn't get encoding to "fail". > Try to save the HELLO file in utf-8. Yes. For instance: C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-13 0:37 ` Kenichi Handa @ 2002-08-13 22:47 ` Richard Stallman 2002-08-14 0:20 ` Kenichi Handa 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-13 22:47 UTC (permalink / raw) Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love Yes. For instance: C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET Yes, that indeed runs the new code. What I tried was C-h h C-x RET c utf-8 RET C-x C-w ~/temp RET. But it "worked"--it saved the file without complaint. Is this a bug? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-13 22:47 ` Richard Stallman @ 2002-08-14 0:20 ` Kenichi Handa 2002-08-14 23:13 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Kenichi Handa @ 2002-08-14 0:20 UTC (permalink / raw) Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love In article <200208132247.g7DMlHT07283@wijiji.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > Yes. For instance: > C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET > Yes, that indeed runs the new code. > What I tried was C-h h C-x RET c utf-8 RET C-x C-w ~/temp RET. > But it "worked"--it saved the file without complaint. But, I think it broke some part of the file. > Is this a bug? No, it is an intentional behaviour. C-x RET c _CODING_ RET means that "I'll take all responsibility, so just accept _CODING_, don't make any warnings!". --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-14 0:20 ` Kenichi Handa @ 2002-08-14 23:13 ` Richard Stallman 0 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-08-14 23:13 UTC (permalink / raw) Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love No, it is an intentional behaviour. C-x RET c _CODING_ RET means that "I'll take all responsibility, so just accept _CODING_, don't make any warnings!". Thanks. I explained this in the manual. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2002-08-12 17:06 ` Richard Stallman @ 2002-08-15 17:51 ` Dave Love 2002-08-19 5:04 ` Kenichi Handa 1 sibling, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-15 17:51 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > I implemented that and tried on several files. But, it > seems that such kind of feature is not that helpful. If I understand what's being talked about, I agree. Normally the first problematic character tells me what's up. > By the way, I've just noticed that Dave has already > installed the function `unencodable-char-position' in > mule-cmds.el and used it in select-safe-coding-system. > > That function resembles to check-coding-system-region on > which we are currently discussing. I'm sorry if that was wrong. I thought it was supposed to have been installed months ago, and I was trying to clear out the Mule changes I've had hanging around after rms was on about it. I thought that was all stuff you approved of, or `obviously right'. > But, as the docstring says, it's slow. [It seemed fast enough for that use since it's only executed occasionally, when there's actually a problem. It was probably developed on a P133.] By the way, aborting in select-safe-coding-system can have bad effects when you're using VC. As far as I remember, it actually loses your edits in some circumstance. I haven't had time to look at the problem. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-15 17:51 ` Dave Love @ 2002-08-19 5:04 ` Kenichi Handa 2002-08-29 22:52 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Kenichi Handa @ 2002-08-19 5:04 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel In article <rzqvg6cm2mq.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes: >> By the way, I've just noticed that Dave has already >> installed the function `unencodable-char-position' in >> mule-cmds.el and used it in select-safe-coding-system. >> >> That function resembles to check-coding-system-region on >> which we are currently discussing. > I'm sorry if that was wrong. I thought it was supposed to have been > installed months ago, and I was trying to clear out the Mule changes > I've had hanging around after rms was on about it. I thought that was > all stuff you approved of, or `obviously right'. You don't have to be sorry. Perhaps, I've overlooked that part when you asked about various changes long ago. >> But, as the docstring says, it's slow. > [It seemed fast enough for that use since it's only executed > occasionally, when there's actually a problem. It was probably > developed on a P133.] Ah, yes. Currently, it is used only interactively, thus the speed is not that problem. But, I'm thinking about using unencodable-char-position to check if default coding systems can encode the region or not in select-safe-coding-system (not yet done). I think such a change makes select-safe-coding-system runs much faster. > By the way, aborting in select-safe-coding-system can have bad effects > when you're using VC. As far as I remember, it actually loses your > edits in some circumstance. I haven't had time to look at the > problem. I noticed that too. But, I also don't have time to fix it for the moment. I've never read the code of vc. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-19 5:04 ` Kenichi Handa @ 2002-08-29 22:52 ` Dave Love 2002-08-30 6:53 ` Andre Spiegel 0 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-29 22:52 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > > By the way, aborting in select-safe-coding-system can have bad effects > > when you're using VC. As far as I remember, it actually loses your > > edits in some circumstance. I haven't had time to look at the > > problem. > > I noticed that too. But, I also don't have time to fix it > for the moment. I've never read the code of vc. Is someone going to fix this? (I have worked on VC, but ...) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-29 22:52 ` Dave Love @ 2002-08-30 6:53 ` Andre Spiegel 0 siblings, 0 replies; 90+ messages in thread From: Andre Spiegel @ 2002-08-30 6:53 UTC (permalink / raw) Cc: Kenichi Handa, rms, emacs-devel On Fri, 2002-08-30 at 00:52, Dave Love wrote: > Kenichi Handa <handa@etl.go.jp> writes: > > > > By the way, aborting in select-safe-coding-system can have bad effects > > > when you're using VC. As far as I remember, it actually loses your > > > edits in some circumstance. I haven't had time to look at the > > > problem. > > > > I noticed that too. But, I also don't have time to fix it > > for the moment. I've never read the code of vc. > > Is someone going to fix this? (I have worked on VC, but ...) I will look into it. Can someone give me a more detailed description of the circumstances when the problem arises? Sequence of commands? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-24 4:37 ` Kenichi Handa 2002-07-25 3:12 ` Richard Stallman @ 2002-08-09 7:44 ` Stefan Monnier 2002-08-10 17:16 ` Richard Stallman 2002-08-12 0:26 ` Kenichi Handa 1 sibling, 2 replies; 90+ messages in thread From: Stefan Monnier @ 2002-08-09 7:44 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. If That was also my concern, but I heard that Emacs-20 did just that. Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-09 7:44 ` Several serious problems Stefan Monnier @ 2002-08-10 17:16 ` Richard Stallman 2002-08-12 0:26 ` Kenichi Handa 1 sibling, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-08-10 17:16 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. If That was also my concern, but I heard that Emacs-20 did just that. If empirically it works well enough, there's no reason to object. Did anyone ever try this in Emacs 20 on a substantial file with many unsuitable characters? If not, would you like to try that now and see how bad it was? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-09 7:44 ` Several serious problems Stefan Monnier 2002-08-10 17:16 ` Richard Stallman @ 2002-08-12 0:26 ` Kenichi Handa 1 sibling, 0 replies; 90+ messages in thread From: Kenichi Handa @ 2002-08-12 0:26 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel In article <200208090744.g797irF11925@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> If the specified coding system is totally inappropriate for >> the buffer, highlighting them will results in huge amount of >> overlays and also it takes long time to finish the job. If > That was also my concern, but I heard that Emacs-20 did just that. Emacs 20 highlighted at most 256 such characters. And, in Emacs 20, detecting unencodable characters was easier because there's no coding system that can encode a part of a charset. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (5 preceding siblings ...) 2002-07-23 13:35 ` Kenichi Handa @ 2002-08-09 4:41 ` Stefan Monnier 2002-08-15 17:23 ` Dave Love 6 siblings, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-09 4:41 UTC (permalink / raw) Cc: emacs-devel, d.love > I cannot save the file lisp/ChangeLog. It specifies coding system > iso-2022-7bit, but it contains something that cannot be encoded in that > coding system. I don't know any way to find the text that causes the > problem; essentially I am helpless. > > Handa-san, would you please clean up whatever is wrong with that file > so that it can save properly once again? > > We MUST do something to make it easier for users to cope with such a > situation. We talked about this a few weeks ago but nothing was done. Dave Love has code for it (and has posted it here). I can't check it in, so could someone else take care of it ? Stefan "who pleads guilty of delaying this patch" ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-09 4:41 ` Stefan Monnier @ 2002-08-15 17:23 ` Dave Love 0 siblings, 0 replies; 90+ messages in thread From: Dave Love @ 2002-08-15 17:23 UTC (permalink / raw) Cc: Richard Stallman, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > > I cannot save the file lisp/ChangeLog. It specifies coding system > > iso-2022-7bit, but it contains something that cannot be encoded in that > > coding system. I don't know any way to find the text that causes the > > problem; essentially I am helpless. > > > > Handa-san, would you please clean up whatever is wrong with that file > > so that it can save properly once again? > > > > We MUST do something to make it easier for users to cope with such a > > situation. We talked about this a few weeks ago but nothing was done. > > Dave Love has code for it (and has posted it here). > I can't check it in, so could someone else take care of it ? > > > Stefan "who pleads guilty of delaying this patch" I don't know what that refers to. I suspect the problem concerns eight-bit-... characters. If you search for them, you have to get the multibyteness of the search string right in a way I always have to look up. [vc-annotate should show you what edit was responsible.] However, I installed code in `select-safe-coding-system' some time ago which should point to the first offending character when selection fails. (As far as I remember, that was supposed to be done long ago, but never was.) If the development source doesn't show you the offending character and advocate C-u C-x =, there's something wrong with that code. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems @ 2002-08-19 7:48 Kenichi Handa 2002-08-22 17:08 ` Dave Love 2002-08-24 12:11 ` Richard Stallman 0 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-08-19 7:48 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel Dave Love <d.love@dl.ac.uk> writes: > "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> Indeed, the safe-charsets property of the utf-8 coding-system has not been >> updated to list the extra charsets it can now encode. > I hope whatever's been changed has been properly tested if it's on the > release branch. Please get handa to check it if he hasn't already. >> I think Dave or Handa would now better how to fix that (whether >> unify-8859-on-encoding-mode should change the safe-charsets or whether >> it should simply always include the new charsets and load ucs-tables >> when needed. And also which charsets should be added). > Whoever changed it should sort it out. I'm quite confused with the current status of utf-8.el, ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and RC. They differ in many parts (utf-8-subst.el and the necessary change for that in mule.el and ccl.c don't exist in RC). It's IMPOSSIBLE for me to figure out what are the correct behaviour of them. I've thought that the current codes were the same one as what Dave had, but the above statement of Dave's tells that it's not. Could someone tell me why are they different in HEAD and RC, and why are they different from what Dave have written? --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-19 7:48 Kenichi Handa @ 2002-08-22 17:08 ` Dave Love 2002-08-29 13:25 ` Kenichi Handa 2002-08-24 12:11 ` Richard Stallman 1 sibling, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-22 17:08 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > I'm quite confused with the current status of utf-8.el, > ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and > RC. I've been confused too, struggling to maintain several different versions. > It's IMPOSSIBLE for me to figure out what are the correct > behaviour of them. As far as I know, what's installed in the trunk behaves correctly, but I'm not using that code and I don't know if I'd hear about real problems with it (as opposed to imagined problems). It should all be things you have said are OK or I'm sure you will think are OK, but I may have overlooked something. However, it could use work for CJK, in particular; there's a fixme in utf-8, and there could be additional interconversion tables for CJK charsets as well as a way of customizing the character preferences in utf-8-subst.el, and probably other things. > I've thought that the current codes were > the same one as what Dave had, but the above statement of > Dave's tells that it's not. Well, now I check, utf-8.el in the RC branch seems to be as I left it, which is what rms (I think) told me to do. As far as I can tell, its safe-charsets property is correct, and I don't understand what the complaint is about. When I couldn't check, I assumed someone had modified it incorrectly, but there's no sign of that in CVS. > Could someone tell me why are they different in HEAD and RC, > and why are they different from what Dave have written? Most changes aren't in RC since I was only allowed to add (a version of) ucs-tables, not changing the default behaviour, so people could turn on (partial) character translation themselves. It doesn't affect utf-8 or any other ccl coding systems because they don't use the translation table (although the useful extra coding systems in code-pages.el aren't included either, so I think only koi, alternativnyj and mac-roman are affected). I think I unilaterally added some other things (a utf-8 language environment and utf-16.el?) since they addressed somewhat misleading entries in PROBLEMS and the arguments against the Unicode support are either demonstrably wrong or spurious IMNSHO. I'm afraid I've had enough of all this, and I doubt it's worth more effort anyhow. Especially after all the FUD about them, the Mule additions probably won't get used much unless they're the default, even by i18n people, unfortunately. It's a pity your good work on Mule 5 is rather wasted. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-22 17:08 ` Dave Love @ 2002-08-29 13:25 ` Kenichi Handa 2002-08-29 17:32 ` Stefan Monnier ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Kenichi Handa @ 2002-08-29 13:25 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel In article <rzqlm6ybz38.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes: > As far as I know, what's installed in the trunk behaves correctly, but > I'm not using that code Why aren't you using that code? Does it mean that you changed some of them locally? > and I don't know if I'd hear about real > problems with it (as opposed to imagined problems). It should all be > things you have said are OK or I'm sure you will think are OK, but I > may have overlooked something. However, it could use work for CJK, in > particular; there's a fixme in utf-8, and there could be additional > interconversion tables for CJK charsets as well as a way of > customizing the character preferences in utf-8-subst.el, and probably > other things. I noticed those `fixme's. Yes, it is better to solve all of them, but, for the moment, I want to concentrate on fixing the problem of RC. >> I've thought that the current codes were >> the same one as what Dave had, but the above statement of >> Dave's tells that it's not. > Well, now I check, utf-8.el in the RC branch seems to be as I left it, > which is what rms (I think) told me to do. As far as I can tell, its > safe-charsets property is correct, The safe-charsets property of utf-8 in RC is this: ascii eight-bit-control eight-bit-graphic latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff ethiopic tibetan thai-tis620 katakana-jisx0201 ipa chinese-sisheng lao vietnamese-viscii-lower vietnamese-viscii-upper It doesn't contain latin-iso8859-[23...]. > and I don't understand what the complaint is about. When > I couldn't check, I assumed someone had modified it > incorrectly, but there's no sign of that in CVS. The complaint is that the coding-system utf-8 can't encode latin-2 characters in RC even if loadup.el has these lines. (load "international/ucs-tables") (ucs-unify-8859 'encode-only) The reason is, as far as I see, the ccl program `ccl-encode-mule-utf-8' doesn't have this line at the near to head. (translate-character ucs-mule-to-mule-unicode r0 r1)) So, even if we setup the translation table `ucs-mule-to-mule-unicode' at loadup time, it is not used in utf-8. >> Could someone tell me why are they different in HEAD and RC, >> and why are they different from what Dave have written? > Most changes aren't in RC since I was only allowed to add (a version > of) ucs-tables, not changing the default behaviour, so people could > turn on (partial) character translation themselves. It doesn't affect > utf-8 or any other ccl coding systems because they don't use the > translation table (although the useful extra coding systems in > code-pages.el aren't included either, so I think only koi, > alternativnyj and mac-roman are affected). Hmmm, I think I realized the situation of RC. It can unify charsets between iso-8859-X, but utf-8 can't encode iso-8859-X (intentionally), correct? Richard, is it what you asked Dave to install for RC? I think RC should also allow utf-8 to encode 8859-X correctly like in HEAD. I see no harm in it. > I think I unilaterally added some other things (a utf-8 language > environment and utf-16.el?) since they addressed somewhat misleading > entries in PROBLEMS and the arguments against the Unicode support are > either demonstrably wrong or spurious IMNSHO. I don't oppose to that. I found one problem with utf-16. It seems that utf-16-le/be can handle 8859-X correctly because of this line in ccl-encode-mule-utf-16-le/be, (translate-character ucs-mule-to-mule-unicode r0 r1) but the safe-charsets property lists only these: ascii eight-bit-control latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff thus, they can't be regarded as a safe coding system for them. > I'm afraid I've had enough of all this, Yah, you have done the excellent hack! When I implemented translation table stuffs, I didn't expect that it can be used this thoroughly. > and I doubt it's worth more effort anyhow. Especially > after all the FUD about them, the Mule additions probably > won't get used much unless they're the default, even by > i18n people, unfortunately. I thought containing ucs-tables and etc in RC is at least for making unify-on-encoding the default INCLUDING utf-8. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 13:25 ` Kenichi Handa @ 2002-08-29 17:32 ` Stefan Monnier 2002-08-29 23:15 ` Dave Love 2002-08-30 6:09 ` Richard Stallman 2002-08-29 23:09 ` Dave Love ` (2 subsequent siblings) 3 siblings, 2 replies; 90+ messages in thread From: Stefan Monnier @ 2002-08-29 17:32 UTC (permalink / raw) Cc: d.love, monnier+gnu/emacs, keichwa, rms, emacs-devel > I noticed those `fixme's. Yes, it is better to solve all > of them, but, for the moment, I want to concentrate on > fixing the problem of RC. I think the only "problem" in RC is that latin-N chars cannot be saved to utf-8. > >> I've thought that the current codes were > >> the same one as what Dave had, but the above statement of > >> Dave's tells that it's not. > > > Well, now I check, utf-8.el in the RC branch seems to be as I left it, > > which is what rms (I think) told me to do. As far as I can tell, its > > safe-charsets property is correct, > > The safe-charsets property of utf-8 in RC is this: > > ascii eight-bit-control eight-bit-graphic latin-iso8859-1 > mule-unicode-0100-24ff mule-unicode-2500-33ff > mule-unicode-e000-ffff ethiopic tibetan thai-tis620 > katakana-jisx0201 ipa chinese-sisheng lao > vietnamese-viscii-lower vietnamese-viscii-upper > > It doesn't contain latin-iso8859-[23...]. And it's correct as long as ucs-tables is not loaded. And since RC is "only bug-fixes" it's important that we don't make any change outside of ucs-tables.el except for bug-fixes, so we can't just change the safe-charsets property. I.e. we have to either accept the current situation or else change the safe-charsets property of utf-8 from ucs-tables.el. Unless RMS accepts to make changes to utf-8.el which are not bug-fixes but improvements to the utf-8 support. On the trunk it's easier since we just changed the safe-charsets property directly in utf-8.el and made sure that ucs-tables.el is loaded when necessary. Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 17:32 ` Stefan Monnier @ 2002-08-29 23:15 ` Dave Love 2002-08-30 14:36 ` Stefan Monnier 2002-08-30 6:09 ` Richard Stallman 1 sibling, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-29 23:15 UTC (permalink / raw) Cc: Kenichi Handa, keichwa, rms, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > I think the only "problem" in RC is that latin-N chars cannot > be saved to utf-8. In that case, I wasted considerable time... I know, for instance, that people whinge that keyboard input doesn't conform to the buffer file coding system, and that other coding systems &c are needed -- windows-1252 probably most importantly. > > It doesn't contain latin-iso8859-[23...]. > > And it's correct as long as ucs-tables is not loaded. What handa showed isn't correct. The utf-8 coding system on the RC branch doesn't encode lao, for instance. > And since RC is "only bug-fixes" For some value of `bug fix'... > it's important that we don't make > any change outside of ucs-tables.el except for bug-fixes, so > we can't just change the safe-charsets property. I don't understand. Of course you can't just change safe-charsets -- it has to reflect what the coding system actually encodes. > On the trunk it's easier since we just changed the safe-charsets > property directly in utf-8.el and made sure that ucs-tables.el > is loaded when necessary. Last I looked, it was preloaded. I don't see why it shouldn't be, and it would have been designed to be if I hadn't had to write it just as an add-on initially. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 23:15 ` Dave Love @ 2002-08-30 14:36 ` Stefan Monnier 2002-09-04 17:23 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-30 14:36 UTC (permalink / raw) Cc: Stefan Monnier, Kenichi Handa, keichwa, rms, emacs-devel > > I think the only "problem" in RC is that latin-N chars cannot > > be saved to utf-8. > > In that case, I wasted considerable time... I know, for instance, > that people whinge that keyboard input doesn't conform to the buffer > file coding system, and that other coding systems &c are needed -- > windows-1252 probably most importantly. By "in RC" I meant "in RC as it currently stands", not "in RC before you installed ucs-tables.el". As you know, I'm a big fan of ucs-tables.el. Please don't try and find offense where there isn't, it makes me rather sad. > > > It doesn't contain latin-iso8859-[23...]. > > > > And it's correct as long as ucs-tables is not loaded. > > What handa showed isn't correct. The utf-8 coding system on the RC > branch doesn't encode lao, for instance. I was referring to what's in the utf-8.el file. > > And since RC is "only bug-fixes" > For some value of `bug fix'... Obviously. > > it's important that we don't make > > any change outside of ucs-tables.el except for bug-fixes, so > > we can't just change the safe-charsets property. > > I don't understand. Of course you can't just change safe-charsets -- > it has to reflect what the coding system actually encodes. IIRC, on the trunk you changed utf-8.el directly and simply enforced that ucs-tables.el be loaded when necessary. Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-30 14:36 ` Stefan Monnier @ 2002-09-04 17:23 ` Dave Love 0 siblings, 0 replies; 90+ messages in thread From: Dave Love @ 2002-09-04 17:23 UTC (permalink / raw) Cc: Kenichi Handa, keichwa, rms, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > > In that case, I wasted considerable time... I know, for instance, > > that people whinge that keyboard input doesn't conform to the buffer > > file coding system, and that other coding systems &c are needed -- > > windows-1252 probably most importantly. > > By "in RC" I meant "in RC as it currently stands", not "in RC before you > installed ucs-tables.el". So did I, or at least as it stood a few days ago. I don't understand this (or the rest of the message). It's a non sequitur as far as I can tell. > Please don't try and find offense where there isn't, it makes me > rather sad. I don't know what you mean. I'm just sticking up for a large set of users. However I guess they are likely to find offence if maintainers dismiss -- or appear to -- m17n features they need. As far as I know, my opinions are roughly the same as handa's -- apologies if not -- and he was the one proposing more changes in this case. I'm glad he eventually gets listened to, anyhow. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 17:32 ` Stefan Monnier 2002-08-29 23:15 ` Dave Love @ 2002-08-30 6:09 ` Richard Stallman 2002-08-31 17:30 ` Dave Love 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-30 6:09 UTC (permalink / raw) Cc: handa, d.love, monnier+gnu/emacs, keichwa, emacs-devel And since RC is "only bug-fixes" it's important that we don't make any change outside of ucs-tables.el except for bug-fixes, so we can't just change the safe-charsets property. I don't follow the logic here. Why can't we just change the safe-charsets property? Is there some obstacle to doing that? Do you think other things would fail to work if we did? Are other changes are needed as well to make it work? Unless RMS accepts to make changes to utf-8.el which are not bug-fixes but improvements to the utf-8 support. If we can't save latin-N characters as utf-8, that is a bug. If the fix is safe and clear, we may as well install it in RC. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-30 6:09 ` Richard Stallman @ 2002-08-31 17:30 ` Dave Love 2002-09-02 0:01 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-31 17:30 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel Richard Stallman <rms@gnu.org> writes: > I don't follow the logic here. Why can't we just change the > safe-charsets property? If you change safe-charsets without changing what the CCL actually encodes, you're just courting data corruption. E.g. find-coding-systems-... will report utf-8 for lao text, but if you encode it, you'll just get U+FFFDs. > If we can't save latin-N characters as utf-8, that is a bug. [You argued against that before.] Why just Latin-N, and why just as utf-8? There shouldn't be anything special about Latin. That version of utf-8.el can't encode cyrillic-iso8859-5, for instance, and the Cyrillic coding systems can't encode the relevant characters from mule-unicode-0100-24ff. Is it also a bug that utf-8 can't encode the CJK space or that the CJK sets can't encode equivalent characters from other sets (which I haven't tried to address and people probably don't care about)? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-31 17:30 ` Dave Love @ 2002-09-02 0:01 ` Richard Stallman 2002-09-04 17:15 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-09-02 0:01 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel Why just Latin-N, and why just as utf-8? I am talking about that issue because that is the issue someone raised. I don't know what other issue there is. Could you tell us? There shouldn't be anything special about Latin. Latin-N character sets are very important in practice. It is also possible that they are easier to handle than some other character sets (but I don't know whether that is the case here). Those two factors are directly relevant to whether it is worth fixing this case in RC. The factors might be different for another character set. Is it also a bug that utf-8 can't encode the CJK space or that the CJK sets can't encode equivalent characters from other sets (which I haven't tried to address and people probably don't care about)? That is certainly a bug. The question is whether this bug may not be worth fixing in RC. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-02 0:01 ` Richard Stallman @ 2002-09-04 17:15 ` Dave Love 2002-09-08 12:54 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-09-04 17:15 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Why just Latin-N, and why just as utf-8? > > I am talking about that issue because that is the issue someone > raised. I don't know what other issue there is. Could you tell us? The issue is just the same for the other charsets that have translation tables in the head code, and for other CCL coding systems. For instance, the RC version of mule-utf-8 doesn't translate cyrillic-iso8859-5, and the Cyrillic coding systems don't translate mule-unicode-0100-24ff. > Latin-N character sets are very important in practice. I think the only thing which distinguishes Latin-N is that Latin-1 is (was?) the Internet default and its code points are a Unicode subset. I see no reason to treat, say, Latin-2 as more important than Cyrillic; I guess it has fewer users for a start. I also guess windows-1252 is more widely used than Latin-1, like it or not. > It is also possible that they are easier to handle than some other > character sets (but I don't know whether that is the case here). They're treated identically to the others that ucs-tables handles. You have to work to remove them. (The sets that are handled are just the ones I could conveniently make tables for.) > Is it also a bug that utf-8 can't encode the CJK space or that the CJK > sets can't encode equivalent characters from other sets (which I > haven't tried to address and people probably don't care about)? > > That is certainly a bug. I actually agree with your previous opinion that lack of translations isn't a bug as such, despite what PROBLEMS implied -- the features behave as designed and documented. I definitely don't agree that general lack of unification of Japanese characters is a bug. I got detailed information on the problems with jisx mappings to Unicode, and we were asked not to confuse matters by providing jisx0213 tables in Emacs 22, which is designed not to force that. (The jisx0208 that utf-8-subst.el uses is a case in point, but I assume the Mule-UCS table I used is what Japanese linguists agree on.) It's also not clear that one should unify double-width characters with iso8859, for instance. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-04 17:15 ` Dave Love @ 2002-09-08 12:54 ` Richard Stallman 2002-09-12 22:38 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-09-08 12:54 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel For instance, the RC version of mule-utf-8 doesn't translate cyrillic-iso8859-5, and the Cyrillic coding systems don't translate mule-unicode-0100-24ff. We could consider adding that support in RC. Is it a safe change? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-08 12:54 ` Richard Stallman @ 2002-09-12 22:38 ` Dave Love 2002-09-13 19:34 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-09-12 22:38 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel Richard Stallman <rms@gnu.org> writes: > For instance, the RC version of mule-utf-8 doesn't translate > cyrillic-iso8859-5, and the Cyrillic coding systems don't translate > mule-unicode-0100-24ff. > > We could consider adding that support in RC. Is it a safe change? It won't break anything if done correctly, but I don't remember how much of a change it is relative to the 21.2 code and I don't know who might have been testing it, if anyone. My Cyrillic changes also filled in the koi8-r and alternativnj translation tables properly, and that may be mixed up with it. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-12 22:38 ` Dave Love @ 2002-09-13 19:34 ` Richard Stallman 0 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-09-13 19:34 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel It won't break anything if done correctly, but I don't remember how much of a change it is relative to the 21.2 code and I don't know who might have been testing it, if anyone. My Cyrillic changes also filled in the koi8-r and alternativnj translation tables properly, and that may be mixed up with it. If you want to extract the precise changes that would make sense to install in Emacs 21.3, we could possibly do that. Otherwise I guess we have nothing to install. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 13:25 ` Kenichi Handa 2002-08-29 17:32 ` Stefan Monnier @ 2002-08-29 23:09 ` Dave Love 2002-08-30 6:11 ` Richard Stallman 2002-08-29 23:17 ` Dave Love 2002-08-30 6:09 ` Richard Stallman 3 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-29 23:09 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > In article <rzqlm6ybz38.fsf@albion.dl.ac.uk>, > Dave Love <d.love@dl.ac.uk> writes: > > As far as I know, what's installed in the trunk behaves correctly, but > > I'm not using that code > > Why aren't you using that code? I don't want to use an unstable Emacs with all sorts of things I don't understand. > I noticed those `fixme's. Yes, it is better to solve all > of them, but, for the moment, I want to concentrate on > fixing the problem of RC. I was trying to sort out RC, but I don't understand this problem. > The safe-charsets property of utf-8 in RC is this: > > ascii eight-bit-control eight-bit-graphic latin-iso8859-1 > mule-unicode-0100-24ff mule-unicode-2500-33ff > mule-unicode-e000-ffff ethiopic tibetan thai-tis620 > katakana-jisx0201 ipa chinese-sisheng lao > vietnamese-viscii-lower vietnamese-viscii-upper I see: '((safe-charsets ascii eight-bit-control eight-bit-graphic latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff) in what appears to be revision 1.9.4.2 with sticky tag `EMACS_21_1_RC'. > It doesn't contain latin-iso8859-[23...]. Indeed. > The complaint is that the coding-system utf-8 can't encode > latin-2 characters in RC even if loadup.el has these lines. Indeed, but the complaint seemed to be that it could encode latin-2 and safe-charsets didn't say so. That's why I thought someone had changed it. > The reason is, as far as I see, the ccl program > `ccl-encode-mule-utf-8' doesn't have this line at the near > to head. > > (translate-character ucs-mule-to-mule-unicode r0 r1)) Yes. > So, even if we setup the translation table > `ucs-mule-to-mule-unicode' at loadup time, it is not used in > utf-8. Nor in other CCL coding systems. > Hmmm, I think I realized the situation of RC. It can unify > charsets between iso-8859-X, but utf-8 can't encode > iso-8859-X (intentionally), correct? Yes. > Richard, is it what you asked Dave to install for RC? I'm pretty sure ucs-tables was only allowed to be installed because just adding the file couldn't break anything. > I think RC should also allow utf-8 to encode 8859-X > correctly like in HEAD. I see no harm in it. I'm sure there's no harm in my Mule changes generally, but that's not what everyone has been told, unfortunately. > > I think I unilaterally added some other things (a utf-8 language > > environment and utf-16.el?) since they addressed somewhat misleading > > entries in PROBLEMS and the arguments against the Unicode support are > > either demonstrably wrong or spurious IMNSHO. > > I don't oppose to that. I didn't think you would. > I found one problem with utf-16. > It seems that utf-16-le/be can handle 8859-X correctly > because of this line in ccl-encode-mule-utf-16-le/be, > (translate-character ucs-mule-to-mule-unicode r0 r1) I guess that's an error, and I should have taken that out for consistency with utf-8. > > I'm afraid I've had enough of all this, > > Yah, you have done the excellent hack! I don't mean anything to do with useful work. It's after being told for so long it's impossible/broken/not wanted, wasting time, and then having to sort out the situation in adverse circumstances. It's very unfortunate not to have an active maintainer for Mule generally. > When I implemented translation table stuffs, I didn't expect that it > can be used this thoroughly. Strange! I thought that was exactly what they were for, and the only thing that was missing initially to satisfy the complaining Europeans was char-coding-system-table. The names were even `...-unification-...' originally. > I thought containing ucs-tables and etc in RC is at least > for making unify-on-encoding the default INCLUDING utf-8. I've no idea. As far as I remember, it was due to pressure from users of both Latin-1 and Latin-9 who must have actually tried it despite what they were told. I was surprised it was eventually allowed in. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 23:09 ` Dave Love @ 2002-08-30 6:11 ` Richard Stallman 2002-09-04 17:21 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-30 6:11 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel > I think RC should also allow utf-8 to encode 8859-X > correctly like in HEAD. I see no harm in it. I'm sure there's no harm in my Mule changes generally, but that's not what everyone has been told, unfortunately. We would not have installed your changes in the trunk if they were harmful. The issue about RC is not harm, it is risk of bugs. Any change has a risk of bugs, even if it is a great improvement. But the risk is not proportional to the improvement; they depend on different factors. In RC we try to keep this risk down. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-30 6:11 ` Richard Stallman @ 2002-09-04 17:21 ` Dave Love 0 siblings, 0 replies; 90+ messages in thread From: Dave Love @ 2002-09-04 17:21 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel Richard Stallman <rms@gnu.org> writes: > We would not have installed your changes in the trunk if they were > harmful. I was referring to what people have been told about them, including in PROBLEMS. [I'm not sure you'd actually know a priori whether what I installed was harmful; it wasn't properly tested. Obviously I think it's OK modulo the bugs I haven't heard about, but that doesn't mean it couldn't corrupt data.] > The issue about RC is not harm, it is risk of bugs. Any > change has a risk of bugs, even if it is a great improvement. Of course, and I'm surprised at some of what's been added. > But the risk is not proportional to the improvement; they depend on > different factors. In RC we try to keep this risk down. Of course. I happen to be in the best position to evaluate the factors in this case. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 13:25 ` Kenichi Handa 2002-08-29 17:32 ` Stefan Monnier 2002-08-29 23:09 ` Dave Love @ 2002-08-29 23:17 ` Dave Love 2002-08-30 6:11 ` Richard Stallman 2002-08-30 6:09 ` Richard Stallman 3 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-29 23:17 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > The safe-charsets property of utf-8 in RC is this: > > ascii eight-bit-control eight-bit-graphic latin-iso8859-1 > mule-unicode-0100-24ff mule-unicode-2500-33ff > mule-unicode-e000-ffff ethiopic tibetan thai-tis620 > katakana-jisx0201 ipa chinese-sisheng lao > vietnamese-viscii-lower vietnamese-viscii-upper I've just realized that you probably used coding-system-get, and there's a problem with what I installed. I didn't cut out this from my working version: *** ucs-tables.el.~1.12.4.1.~ Wed Jul 3 15:38:14 2002 --- ucs-tables.el Thu Aug 29 19:27:15 2002 *************** *** 2443,2453 **** (coding-system-put cs 'translation-table-for-input cs))))) (optimize-char-table ucs-mule-to-mule-unicode) (dolist (c safe-charsets) ! (aset table (make-char c) t)) ! (coding-system-put 'mule-utf-8 'safe-charsets ! (append (coding-system-get 'mule-utf-8 'safe-charsets) ! safe-charsets)) ! (register-char-codings 'mule-utf-8 table))) (defvar translation-table-for-input (make-translation-table)) --- 2443,2449 ---- (coding-system-put cs 'translation-table-for-input cs))))) (optimize-char-table ucs-mule-to-mule-unicode) (dolist (c safe-charsets) ! (aset table (make-char c) t)))) (defvar translation-table-for-input (make-translation-table)) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 23:17 ` Dave Love @ 2002-08-30 6:11 ` Richard Stallman 2002-08-31 17:31 ` Dave Love 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-30 6:11 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel I've just realized that you probably used coding-system-get, and there's a problem with what I installed. I didn't cut out this from my working version: Is this a change we should install in RC now? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-30 6:11 ` Richard Stallman @ 2002-08-31 17:31 ` Dave Love 2002-09-02 0:01 ` Richard Stallman 0 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-31 17:31 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel Richard Stallman <rms@gnu.org> writes: > I've just realized that you probably used coding-system-get, and > there's a problem with what I installed. I didn't cut out this from > my working version: > > Is this a change we should install in RC now? That depends on whether you include code in utf-8.el that encodes those charsets. If not, you need that change. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-31 17:31 ` Dave Love @ 2002-09-02 0:01 ` Richard Stallman 2002-09-02 1:28 ` Kenichi Handa 0 siblings, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-09-02 0:01 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel That depends on whether you include code in utf-8.el that encodes those charsets. If not, you need that change. In that case, I will install that change presently, and then we can study the question of whether to include the code in utf-8.el instead. What does that code in utf-8.el do, and how safe a change is it? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-02 0:01 ` Richard Stallman @ 2002-09-02 1:28 ` Kenichi Handa 2002-09-05 13:41 ` Dave Love 2002-09-10 16:36 ` Richard Stallman 0 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-09-02 1:28 UTC (permalink / raw) Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel In article <E17lefC-0003IF-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > That depends on whether you include code in utf-8.el that encodes > those charsets. If not, you need that change. > In that case, I will install that change presently, and then we can > study the question of whether to include the code in utf-8.el instead. > What does that code in utf-8.el do, and how safe a change is it? It defines two CCL codes to decode and encode utf-8 byte sequence, and makes the coding system mule-utf-8 by using those CCL codes. I'll attach the necessary change to enable RC's utf-8 to encode latin-X plus alpha (e.g. thai). The docstring of mule-utf-8 may need improvement. As the change is very small and that code has been in HEAD for more than one month, I think the change is quite safe. I recommend to install it in RC. I also checked the code to some extent by this testsuite. (dolist (charset (delq 'ascii (delq 'eight-bit-control (delq 'eight-bit-graphic (coding-system-get 'mule-utf-8 'safe-charsets))))) (let ((dimension (charset-dimension charset)) str) (if (= dimension 1) (setq str (string (make-char charset 33) (make-char charset 34))) (setq str (string (make-char charset 33 33) (make-char charset 33 34)))) (or (memq 'mule-utf-8 (find-coding-systems-string str)) (not (string-match "\357\277\275" ; UTF-8 form of U+FFFD (encode-coding-string str 'mule-utf-8))) (error (format "%s is not supported" charset))))) --- Ken'ichi HANDA handa@etl.go.jp *** utf-8.el.~1.9.4.2.~ Tue Jul 23 13:54:13 2002 --- utf-8.el Mon Sep 2 10:28:26 2002 *************** *** 269,275 **** (loop (if (r5 < 0) ((r1 = -1) ! (read-multibyte-character r0 r1)) (;; We have already done read-multibyte-character. (r0 = r5) (r1 = r6) --- 269,277 ---- (loop (if (r5 < 0) ((r1 = -1) ! (read-multibyte-character r0 r1) ! (translate-character ucs-mule-to-mule-unicode r0 r1)) ! (;; We have already done read-multibyte-character. (r0 = r5) (r1 = r6) *************** *** 392,397 **** --- 394,423 ---- mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff + latin-iso8859-2 (*) + latin-iso8859-3 (*) + latin-iso8859-4 (*) + cyrillic-iso8859-5 (*) + arabic-iso8859-6 (*) + greek-iso8859-7 (*) + hebrew-iso8859-8 (*) + latin-iso8859-9 (*) + latin-iso8859-14 (*) + latin-iso8859-15 (*) + chinese-sisheng (*) + ethiopic (*) + ipa (*) + lao (*) + katakana-jisx0201 (*) + thai-tis620 (*) + tibetan (*) + vietnamese-viscii-lower (*) + vietnamese-viscii-upper (*) + + Among them, the charsets labeled \"(*)\" are supported only on + encoding. That means, they are correctly encoded to UTF-8, but are + decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or + mule-unicode-2500-33ff, not to the original charsets. Unicode characters out of the ranges U+0000-U+33FF and U+E200-U+FFFF are decoded into sequences of eight-bit-control and eight-bit-graphic *************** *** 409,415 **** latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff ! mule-unicode-e000-ffff) (mime-charset . utf-8) (coding-category . coding-category-utf-8) (valid-codes (0 . 255)))) --- 435,460 ---- latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff ! mule-unicode-e000-ffff ! latin-iso8859-2 ! latin-iso8859-3 ! latin-iso8859-4 ! cyrillic-iso8859-5 ! arabic-iso8859-6 ! greek-iso8859-7 ! hebrew-iso8859-8 ! latin-iso8859-9 ! latin-iso8859-14 ! latin-iso8859-15 ! chinese-sisheng ! ethiopic ! ipa ! lao ! katakana-jisx0201 ! thai-tis620 ! tibetan ! vietnamese-viscii-lower ! vietnamese-viscii-upper) (mime-charset . utf-8) (coding-category . coding-category-utf-8) (valid-codes (0 . 255)))) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-02 1:28 ` Kenichi Handa @ 2002-09-05 13:41 ` Dave Love 2002-09-05 23:32 ` Kenichi Handa 2002-09-10 16:36 ` Richard Stallman 1 sibling, 1 reply; 90+ messages in thread From: Dave Love @ 2002-09-05 13:41 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > + Among them, the charsets labeled \"(*)\" are supported only on > + encoding. I assume they still are only encodable if unify-8859-on-encoding-mode is on. > That means, they are correctly encoded to UTF-8, but are > + decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or > + mule-unicode-2500-33ff, not to the original charsets. [That's actually customizable through a decoding table, of course.] ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-05 13:41 ` Dave Love @ 2002-09-05 23:32 ` Kenichi Handa 2002-09-06 11:38 ` Robert J. Chassell 2002-09-07 23:19 ` Dave Love 0 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-09-05 23:32 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel In article <rzqy9ag7dux.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes: > Kenichi Handa <handa@etl.go.jp> writes: >> + Among them, the charsets labeled \"(*)\" are supported only on >> + encoding. > I assume they still are only encodable if unify-8859-on-encoding-mode > is on. Yes. But, that mode is on by default in RC too. >> That means, they are correctly encoded to UTF-8, but are >> + decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or >> + mule-unicode-2500-33ff, not to the original charsets. > [That's actually customizable through a decoding table, of course.] How about adding this paragraph? See also the documentations of: `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode', `utf-8-fragment-on-decoding' to customize the behaviour of this coding system." --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-05 23:32 ` Kenichi Handa @ 2002-09-06 11:38 ` Robert J. Chassell 2002-09-07 23:19 ` Dave Love 1 sibling, 0 replies; 90+ messages in thread From: Robert J. Chassell @ 2002-09-06 11:38 UTC (permalink / raw) [This started as a question regarding `unify-8859-on-encoding-mode', but has evolved to a `themes' related question!] Yes. But, that mode is on by default in RC too. How do I determine easily whether unify-8859-on-encoding-mode is on or off by default in particular instances of Emacs. Currently, I am running two instances, one a `plain vanilla' Emacs, and another that loads a 150kb .emacs file. I would like to know whether `unify-8859-on-encoding-mode' is on or off in my `plain vanilla' Emacs. I am not actually trying to track down the code (which I have done anyhow. Evidentally, `ucs-fragment-8859' sets properties to `nil', but I don't know whether they are changed elsewhere.). Rather I am looking for a mechanism that reports the complete current status. The `mule-diag' command does this for other features, and I thought it might provide the unify status, too, but it does not. (Probably for the good reason that eventually, unify will always be on.) Instead, it turns out that I am looking for a reporter that tells me everything about the current state of a particular instance of Emacs, including variables and properties; in other words, including the values of `(mule-diag)', `(describe-bindings)', `(current-frame-configuration)', `load-path', and so on. This reporter would be useful for anyone working on themes, since it would mean you could go back to any number of previous states. (And yes, the resulting status files will be big, perhaps too big for any normal use. But right now I am concerned more about the capability than about optimization. I don't know whether the capability merits optimization but think it is a simplification worth providing to moderately knowledgeable hackers.) -- Robert J. Chassell bob@rattlesnake.com bob@gnu.org Rattlesnake Enterprises http://www.rattlesnake.com Free Software Foundation http://www.gnu.org GnuPG Key ID: 004B4AC8 ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-05 23:32 ` Kenichi Handa 2002-09-06 11:38 ` Robert J. Chassell @ 2002-09-07 23:19 ` Dave Love 2002-09-09 0:21 ` Richard Stallman 2002-09-26 4:51 ` Kenichi Handa 1 sibling, 2 replies; 90+ messages in thread From: Dave Love @ 2002-09-07 23:19 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > Yes. But, that mode is on by default in RC too. Gosh. However, it appears to be done wrongly. Custom will show it isn't on, and would turn it off if you tried to turn it on. Surely if it's preloaded and meant to be the default, the defcustom initial value should just be changed. > How about adding this paragraph? > > See also the documentations of: > `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode', > `utf-8-fragment-on-decoding' > to customize the behaviour of this coding system." Fine, but that shouldn't be specific to mule-utf-8. Those variables affect more coding systems, and other CCL ones should use the appropriate translation tables. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-07 23:19 ` Dave Love @ 2002-09-09 0:21 ` Richard Stallman 2002-09-12 22:43 ` Dave Love 2002-09-26 4:51 ` Kenichi Handa 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-09-09 0:21 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel > Yes. But, that mode is on by default in RC too. Gosh. However, it appears to be done wrongly. Custom will show it isn't on, and would turn it off if you tried to turn it on. Surely if it's preloaded and meant to be the default, the defcustom initial value should just be changed. That sounds right to me. Can you send a patch? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-09 0:21 ` Richard Stallman @ 2002-09-12 22:43 ` Dave Love 0 siblings, 0 replies; 90+ messages in thread From: Dave Love @ 2002-09-12 22:43 UTC (permalink / raw) Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel Richard Stallman <rms@gnu.org> writes: > > Yes. But, that mode is on by default in RC too. > > Gosh. However, it appears to be done wrongly. Custom will show it > isn't on, and would turn it off if you tried to turn it on. Surely if > it's preloaded and meant to be the default, the defcustom initial > value should just be changed. > > That sounds right to me. Can you send a patch? I should have said `define-minor-mode', not defcustom. Just change :init-value nil to t and take out the function call from loadup. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-07 23:19 ` Dave Love 2002-09-09 0:21 ` Richard Stallman @ 2002-09-26 4:51 ` Kenichi Handa 1 sibling, 0 replies; 90+ messages in thread From: Kenichi Handa @ 2002-09-26 4:51 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel In article <rzqelc5s7zb.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes: >> See also the documentations of: >> `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode', >> `utf-8-fragment-on-decoding' >> to customize the behaviour of this coding system." > Fine, but that shouldn't be specific to mule-utf-8. Those variables > affect more coding systems, I'm going to introduce `dependency' in coding system property. The value will be a list of symbols whose values affect the behaviour of the coding system. mule-utf-* can have this property from the start. For iso-8859-?, we can add this property in ucs-tables.el. Then, descibe-coding-system can check it and produce a proper descriptions something like below: ---------------------------------------------------------------------- 1 -- iso-latin-1 (alias: iso-8859-1 latin-1) ISO 2022 based 8-bit encoding for Latin-1 (MIME:ISO-8859-1). See also the documentation of these customizable variables which alter the behaviour of this coding system. `unify-8859-on-encoding-mode' `unify-8859-on-decoding-mode' [...] ---------------------------------------------------------------------- > and other CCL ones should use the appropriate translation > tables. Sure. I'll work on it later. --- Ken'ichi HANDA handa@m17n.org ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-09-02 1:28 ` Kenichi Handa 2002-09-05 13:41 ` Dave Love @ 2002-09-10 16:36 ` Richard Stallman 1 sibling, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-09-10 16:36 UTC (permalink / raw) Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel I'll attach the necessary change to enable RC's utf-8 to encode latin-X plus alpha (e.g. thai). The docstring of mule-utf-8 may need improvement. As the change is very small and that code has been in HEAD for more than one month, I think the change is quite safe. I recommend to install it in RC. Ok, would you please install it when your conference is over? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 13:25 ` Kenichi Handa ` (2 preceding siblings ...) 2002-08-29 23:17 ` Dave Love @ 2002-08-30 6:09 ` Richard Stallman 3 siblings, 0 replies; 90+ messages in thread From: Richard Stallman @ 2002-08-30 6:09 UTC (permalink / raw) Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel Hmmm, I think I realized the situation of RC. It can unify charsets between iso-8859-X, but utf-8 can't encode iso-8859-X (intentionally), correct? Richard, is it what you asked Dave to install for RC? I can't remember after this much time has gone by. Chances are I never knew about this specific issue and that I did not say anything to him about it one way or another, but I can't remember. If you can make this case work with a clean and safe change, please do. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-19 7:48 Kenichi Handa 2002-08-22 17:08 ` Dave Love @ 2002-08-24 12:11 ` Richard Stallman 2002-08-26 13:17 ` Kenichi Handa 1 sibling, 1 reply; 90+ messages in thread From: Richard Stallman @ 2002-08-24 12:11 UTC (permalink / raw) Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel I'm quite confused with the current status of utf-8.el, ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and RC. Do you understand the situation in HEAD? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-24 12:11 ` Richard Stallman @ 2002-08-26 13:17 ` Kenichi Handa 2002-08-26 16:15 ` Stefan Monnier 2002-08-29 23:19 ` Dave Love 0 siblings, 2 replies; 90+ messages in thread From: Kenichi Handa @ 2002-08-26 13:17 UTC (permalink / raw) Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel In article <200208241211.g7OCBW111768@wijiji.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > I'm quite confused with the current status of utf-8.el, > ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and > RC. > Do you understand the situation in HEAD? I don't understand what exactly do you mean by "situation". I don't know if they are the same as what Dave currently has. I understand how each functions and variables are supposed to work. And, I know that those codes doesn't do definitely wrong thing by reading through the codes briefly. But, I have not checked if they surely works as expected. I believe Dave has done it. And, I don't understand why those many functions/variables are designed as the current way. For instance, (1) Why does loadup.el has this code: (ucs-unify-8859 'encode-only) instead of: (unify-8859-on-encoding-mode 1) (2) Why doesn't utf-8-subst.el provide mappings of non-Chinese characters for ksc, gb, and jisx charsets? The document of utf-8-translate-cjk says as below: ---------------------------------------------------------------------- Whether the `mule-utf-8' coding system should encode many CJK characters. Enabling this loads tables which enable the coding system to encode characters in the charsets `korean-ksc5601', `chinese-gb2312' and `japanese-jisx0208', and to decode the corresponding unicodes into ... ---------------------------------------------------------------------- but, currently only Chinese characters in those charsets are handled. (3) Why is utf-8-translate-cjk a variable, not a minor-mode like unify-8859-on-(de/en)coding-mode? Or, why the latter is not a simple variable? By the way, it seems that once we customize utf-8-translate-cjk to t, customize it back to nil doesn't cancel the translation. (4) It seems that the variable name utf-8-fragment-on-decoding is not appropriate because it is used also in utf-18.el. Perhaps, ucs-fragment-on-decoding is better. (5) It seems that mule-utf-16 can handle the same range of characters as mule-utf-8, but `safe-charsets' property doesn't contain, for instance, `latin-iso8895-2'. Perhaps, this is simply a bug to be fixed easily. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-26 13:17 ` Kenichi Handa @ 2002-08-26 16:15 ` Stefan Monnier 2002-08-29 23:18 ` Dave Love 2002-08-29 23:19 ` Dave Love 1 sibling, 1 reply; 90+ messages in thread From: Stefan Monnier @ 2002-08-26 16:15 UTC (permalink / raw) Cc: rms, d.love, monnier+gnu/emacs, keichwa, emacs-devel > (1) Why does loadup.el has this code: > (ucs-unify-8859 'encode-only) > instead of: > (unify-8859-on-encoding-mode 1) It might have been my "fault". I think it's because I expect(ed) unify-8859-on-encoding-mode to disappear (because there's no benefit in turning it off, except for working around some bugs maybe). Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-26 16:15 ` Stefan Monnier @ 2002-08-29 23:18 ` Dave Love 2002-08-30 14:36 ` Stefan Monnier 0 siblings, 1 reply; 90+ messages in thread From: Dave Love @ 2002-08-29 23:18 UTC (permalink / raw) Cc: Kenichi Handa, rms, keichwa, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > It might have been my "fault". I think it's because I expect(ed) > unify-8859-on-encoding-mode to disappear (because there's no benefit > in turning it off, except for working around some bugs maybe). What bugs? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-29 23:18 ` Dave Love @ 2002-08-30 14:36 ` Stefan Monnier 0 siblings, 0 replies; 90+ messages in thread From: Stefan Monnier @ 2002-08-30 14:36 UTC (permalink / raw) Cc: Stefan Monnier, Kenichi Handa, rms, keichwa, emacs-devel > > It might have been my "fault". I think it's because I expect(ed) > > unify-8859-on-encoding-mode to disappear (because there's no benefit > > in turning it off, except for working around some bugs maybe). > > What bugs? None that I know of. I meant the sentence to mean "to be able to turn it off in case a bug showed up". Stefan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Several serious problems 2002-08-26 13:17 ` Kenichi Handa 2002-08-26 16:15 ` Stefan Monnier @ 2002-08-29 23:19 ` Dave Love 1 sibling, 0 replies; 90+ messages in thread From: Dave Love @ 2002-08-29 23:19 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > I don't know if they are the same as what Dave currently > has. I tried to install all the relevant stuff I had, but for the CVS head, it's modified versions of what I've actually been using, and is basically untested. I wanted someone who was actually using that code base to install it and test it, but no-one could or would -- I can't remember, but rms leant on me to install it. > But, I have not checked if they surely works as > expected. I believe Dave has done it. Only in more-or-less Emacs 21.2. > And, I don't understand why those many functions/variables > are designed as the current way. For instance, > > (1) Why does loadup.el has this code: > (ucs-unify-8859 'encode-only) > instead of: > (unify-8859-on-encoding-mode 1) Indeed. I didn't do that. The obvious thing to do is to change the default in the defcustom, if ucs-tables is preloaded. > (2) Why doesn't utf-8-subst.el provide mappings of > non-Chinese characters for ksc, gb, and jisx charsets? > The document of utf-8-translate-cjk says as below: > ---------------------------------------------------------------------- > Whether the `mule-utf-8' coding system should encode many CJK characters. > > Enabling this loads tables which enable the coding system to encode > characters in the charsets `korean-ksc5601', `chinese-gb2312' and > `japanese-jisx0208', and to decode the corresponding unicodes into > ... > ---------------------------------------------------------------------- > but, currently only Chinese characters in those charsets are > handled. I didn't realize that. It may be coincidence. What should be translated is the set of characters (japanese-jisx0208 ∪ chinese-gb2312 ∪ korean-ksc5601) \ mule-unicode-2500-33ff ^ ^ union set difference according to the Mule-UCS tables -- I just took the relevant codes from there above U+33FF. Perhaps that isn't how it actually is. It needs someone with an interest in the CJK range to redo that stuff anyhow; it shouldn't hardwire Japanese as the japanese-jisx0208 as the preferred set, the sets used should probably be configurable, and it should allow translating the relevant characters below U+3400. (I didn't think much about how best to do that without keeping large tables on the heap that aren't actually used to do the translation.) > (3) Why is utf-8-translate-cjk a variable, not a minor-mode > like unify-8859-on-(de/en)coding-mode? I think because it can't be turned off. > Or, why the > latter is not a simple variable? By the way, it seems > that once we customize utf-8-translate-cjk to t, > customize it back to nil doesn't cancel the translation. > > (4) It seems that the variable name > utf-8-fragment-on-decoding is not appropriate because it > is used also in utf-18.el. Perhaps, > ucs-fragment-on-decoding is better. Probably. It was defined before I wrote utf-16.el. Much of that stuff would have been written differently for installation in 21.1, but it was done during the campaign against anything Unicode-based, so that users could have it in Emacs 21.2 as conveniently as possible. > (5) It seems that mule-utf-16 can handle the same range of > characters as mule-utf-8, but `safe-charsets' property > doesn't contain, for instance, `latin-iso8895-2'. > Perhaps, this is simply a bug to be fixed easily. Yes. The coding system needs to register the relevant translation table(s) for safe-chars, that would have to be updated in sync with any changes. I don't know why that didn't get done. ^ permalink raw reply [flat|nested] 90+ messages in thread
end of thread, other threads:[~2002-09-26 4:51 UTC | newest] Thread overview: 90+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-22 17:11 Several serious problems Richard Stallman 2002-07-22 19:01 ` Andre Spiegel 2002-07-22 19:03 ` Andre Spiegel 2002-07-23 4:00 ` Richard Stallman 2002-07-22 19:03 ` Andreas Schwab 2002-07-23 18:58 ` Richard Stallman 2002-07-22 19:11 ` Andre Spiegel 2002-07-23 4:42 ` Karl Eichwalder 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:43 ` Karl Eichwalder 2002-07-25 3:12 ` Richard Stallman 2002-07-25 3:24 ` Karl Eichwalder 2002-07-26 15:35 ` Richard Stallman 2002-07-27 3:19 ` Karl Eichwalder 2002-07-29 1:12 ` Richard Stallman 2002-07-29 14:32 ` Karl Eichwalder 2002-07-30 1:00 ` Richard Stallman 2002-08-09 7:42 ` Stefan Monnier 2002-08-09 16:08 ` Karl Eichwalder 2002-08-10 17:16 ` Richard Stallman 2002-08-12 16:20 ` Stefan Monnier 2002-08-13 1:48 ` Richard Stallman 2002-08-15 2:30 ` Karl Eichwalder 2002-08-15 2:47 ` Stefan Monnier 2002-08-15 5:31 ` Karl Eichwalder 2002-08-15 15:30 ` Stefan Monnier 2002-08-15 17:33 ` Dave Love 2002-07-23 13:35 ` Kenichi Handa 2002-07-23 13:52 ` Alan Shutko 2002-07-24 3:25 ` Richard Stallman 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:37 ` Kenichi Handa 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader 2002-07-26 14:29 ` Francesco Potorti` 2002-07-27 18:52 ` Richard Stallman 2002-08-09 7:43 ` Stefan Monnier 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2002-08-12 17:06 ` Richard Stallman 2002-08-12 17:15 ` Stefan Monnier 2002-08-13 0:37 ` Kenichi Handa 2002-08-13 22:47 ` Richard Stallman 2002-08-14 0:20 ` Kenichi Handa 2002-08-14 23:13 ` Richard Stallman 2002-08-15 17:51 ` Dave Love 2002-08-19 5:04 ` Kenichi Handa 2002-08-29 22:52 ` Dave Love 2002-08-30 6:53 ` Andre Spiegel 2002-08-09 7:44 ` Several serious problems Stefan Monnier 2002-08-10 17:16 ` Richard Stallman 2002-08-12 0:26 ` Kenichi Handa 2002-08-09 4:41 ` Stefan Monnier 2002-08-15 17:23 ` Dave Love -- strict thread matches above, loose matches on Subject: below -- 2002-08-19 7:48 Kenichi Handa 2002-08-22 17:08 ` Dave Love 2002-08-29 13:25 ` Kenichi Handa 2002-08-29 17:32 ` Stefan Monnier 2002-08-29 23:15 ` Dave Love 2002-08-30 14:36 ` Stefan Monnier 2002-09-04 17:23 ` Dave Love 2002-08-30 6:09 ` Richard Stallman 2002-08-31 17:30 ` Dave Love 2002-09-02 0:01 ` Richard Stallman 2002-09-04 17:15 ` Dave Love 2002-09-08 12:54 ` Richard Stallman 2002-09-12 22:38 ` Dave Love 2002-09-13 19:34 ` Richard Stallman 2002-08-29 23:09 ` Dave Love 2002-08-30 6:11 ` Richard Stallman 2002-09-04 17:21 ` Dave Love 2002-08-29 23:17 ` Dave Love 2002-08-30 6:11 ` Richard Stallman 2002-08-31 17:31 ` Dave Love 2002-09-02 0:01 ` Richard Stallman 2002-09-02 1:28 ` Kenichi Handa 2002-09-05 13:41 ` Dave Love 2002-09-05 23:32 ` Kenichi Handa 2002-09-06 11:38 ` Robert J. Chassell 2002-09-07 23:19 ` Dave Love 2002-09-09 0:21 ` Richard Stallman 2002-09-12 22:43 ` Dave Love 2002-09-26 4:51 ` Kenichi Handa 2002-09-10 16:36 ` Richard Stallman 2002-08-30 6:09 ` Richard Stallman 2002-08-24 12:11 ` Richard Stallman 2002-08-26 13:17 ` Kenichi Handa 2002-08-26 16:15 ` Stefan Monnier 2002-08-29 23:18 ` Dave Love 2002-08-30 14:36 ` Stefan Monnier 2002-08-29 23:19 ` Dave Love
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).