* Several serious problems @ 2002-07-22 17:11 Richard Stallman 2002-07-22 19:01 ` Andre Spiegel ` (6 more replies) 0 siblings, 7 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-22 17:11 UTC (permalink / raw) Cc: emacs-devel I cannot save the file lisp/ChangeLog. It specifies coding system iso-2022-7bit, but it contains something that cannot be encoded in that coding system. I don't know any way to find the text that causes the problem; essentially I am helpless. Handa-san, would you please clean up whatever is wrong with that file so that it can save properly once again? We MUST do something to make it easier for users to cope with such a situation. We talked about this a few weeks ago but nothing was done. Perhaps we could add a command which simply scans forward for the next run of characters that can't be saved in the specified coding system. The message you get in that situation could tell you about this command. This would be a powerful solution, since you could easily find all the problems, not just the first one. Highlighting all of them would also be a useful thing to do. This problem prevented me from commiting changes to the file from Emacs. I was able to edit and save the file using find-file-literally, but when I tried to commit the changes, C-x v v tried to revisit the file non-literally. I think that is a serious bug in VC. VC should cope with visiting a file literally. Andre, would you please fix that? So I tried typing `cd lisp; cvs commit ChangeLog'. It put me into vi to ask me to edit a log message. Damn! I killed it, set EDITOR and VISUAL to `emacs', and tried again. This time it gave me Emacs to edit with. I deleted all the text, saved the log message file, and exited Emacs. cvs obnoxiously complained about the empty log message and asked me what to do. I typed `c RET' meaning "continue". At that point it never came back to me. Now the emacs/lisp directory is locked and nobody can do anything in it any more. Savannah people, would you please delete the lock? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman @ 2002-07-22 19:01 ` Andre Spiegel 2002-07-22 19:03 ` Andre Spiegel ` (5 subsequent siblings) 6 siblings, 0 replies; 53+ messages in thread From: Andre Spiegel @ 2002-07-22 19:01 UTC (permalink / raw) Cc: handa, emacs-devel > Handa-san, would you please clean up whatever is wrong with that file > so that it can save properly once again? When I visit the ChangeLog, Kai's most recent entry from 2002-07-21 displays with a german sharp 's' (ß), but all of his former entries have a \337 in place of it. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman 2002-07-22 19:01 ` Andre Spiegel @ 2002-07-22 19:03 ` Andre Spiegel 2002-07-23 4:00 ` Richard Stallman 2002-07-22 19:03 ` Andreas Schwab ` (4 subsequent siblings) 6 siblings, 1 reply; 53+ messages in thread From: Andre Spiegel @ 2002-07-22 19:03 UTC (permalink / raw) Cc: handa, emacs-devel > This problem prevented me from commiting changes to the file from > Emacs. I was able to edit and save the file using > find-file-literally, but when I tried to commit the changes, C-x v v > tried to revisit the file non-literally. I think that is a serious > bug in VC. VC should cope with visiting a file literally. > Andre, would you please fix that? It is fixed now. I've installed the patch in vc.el, but haven't made an entry in the ChangeLog yet, since it still seems corrupted. Will do so after it's been cleaned up. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 19:03 ` Andre Spiegel @ 2002-07-23 4:00 ` Richard Stallman 0 siblings, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-23 4:00 UTC (permalink / raw) Cc: handa, emacs-devel Thanks for jumping right on the problem. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman 2002-07-22 19:01 ` Andre Spiegel 2002-07-22 19:03 ` Andre Spiegel @ 2002-07-22 19:03 ` Andreas Schwab 2002-07-23 18:58 ` Richard Stallman 2002-07-22 19:11 ` Andre Spiegel ` (3 subsequent siblings) 6 siblings, 1 reply; 53+ messages in thread From: Andreas Schwab @ 2002-07-22 19:03 UTC (permalink / raw) Cc: handa, spiegel, savannah-hackers, emacs-devel, dominik Richard Stallman <rms@gnu.org> writes: |> I cannot save the file lisp/ChangeLog. It specifies coding system |> iso-2022-7bit, but it contains something that cannot be encoded in that |> coding system. I don't know any way to find the text that causes the |> problem; essentially I am helpless. It was the last commit by Carsten Dominik which broke the file. I have now fixed it by visiting as iso-latin-1, fixing the two remaining iso-2022 encoded characters and then saving it again in the right encoding. Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 19:03 ` Andreas Schwab @ 2002-07-23 18:58 ` Richard Stallman 0 siblings, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-23 18:58 UTC (permalink / raw) Cc: handa, spiegel, savannah-hackers, emacs-devel, dominik It was the last commit by Carsten Dominik which broke the file. Carsten, can you figure out what action it was that broke the file? Can you find a way to reproduce it (prefereably without checking in the broken version!)? We need to figure this out so we can make changes to remove the risk users will do this. Andreas, thanks for fixing the file. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (2 preceding siblings ...) 2002-07-22 19:03 ` Andreas Schwab @ 2002-07-22 19:11 ` Andre Spiegel 2002-07-23 4:42 ` Karl Eichwalder ` (2 subsequent siblings) 6 siblings, 0 replies; 53+ messages in thread From: Andre Spiegel @ 2002-07-22 19:11 UTC (permalink / raw) Cc: handa, emacs-devel > This problem prevented me from commiting changes to the file from > Emacs. I was able to edit and save the file using > find-file-literally, but when I tried to commit the changes, C-x v v > tried to revisit the file non-literally. I think that is a serious > bug in VC. VC should cope with visiting a file literally. > Andre, would you please fix that? It is fixed now. I've installed the patch in vc.el, but haven't made an entry in the ChangeLog yet, since it still seems corrupted. Will do so after it's been cleaned up. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (3 preceding siblings ...) 2002-07-22 19:11 ` Andre Spiegel @ 2002-07-23 4:42 ` Karl Eichwalder 2002-07-24 3:25 ` Richard Stallman 2002-07-23 13:35 ` Kenichi Handa 2002-08-09 4:41 ` Stefan Monnier 6 siblings, 1 reply; 53+ messages in thread From: Karl Eichwalder @ 2002-07-23 4:42 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > We MUST do something to make it easier for users to cope with such a > situation. We talked about this a few weeks ago but nothing was done. Yes, you are right. As said months ago I hve to fix those files quite often; users don't how to do it on their own. Often it's getting even worse: Emacs proposes a "secure" encoding and when users go for it, all looks well until you want to process such a file with TeX... Please add this issue to etc/TODO. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-23 4:42 ` Karl Eichwalder @ 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:43 ` Karl Eichwalder 0 siblings, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-07-24 3:25 UTC (permalink / raw) Cc: handa, emacs-devel Often it's getting even worse: Emacs proposes a "secure" encoding and when users go for it, all looks well until you want to process such a file with TeX... I am not really sure what that means--would you please explain? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-24 3:25 ` Richard Stallman @ 2002-07-24 4:43 ` Karl Eichwalder 2002-07-25 3:12 ` Richard Stallman 0 siblings, 1 reply; 53+ messages in thread From: Karl Eichwalder @ 2002-07-24 4:43 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Often it's getting even worse: Emacs proposes a > "secure" encoding and when users go for it, all looks well until you > want to process such a file with TeX... > > I am not really sure what that means--would you please explain? We discussed the issue several times (e.g. under the subject "lisp/ChangeLog coding system"); here is a good remark by Stephen J. Turnbull. Yes, that's a different from your problem, but it's cause by the same implementation concept (enabling unification might cure most of these problems -- thus it's very important to release an Emacs with this feature, all released Emacs 21.x versions destroy user files at random...): From: "Stephen J. Turnbull" <stephen@xemacs.org> Subject: Re: lisp/ChangeLog coding system To: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> Cc: Eli Zaretskii <eliz@is.elta.co.il>, emacs-devel@gnu.org Date: 29 Apr 2002 20:28:55 +0900 >>>>> "Stefan" == Stefan Monnier <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> One aspect is making better guesses about desired coding >> systems. Stefan> I'm not sure what kind of improvements you're thinking Stefan> about. Well, in the version (mid-January, maybe?) of GNU Emacs I have, when I tried saving a buffer with mixed ascii, latin-1, and latin-2 in it, it gave me an abominably long list of coding systems including mule internal, all the -with-esc systems, and iso-2022-jp-2. But all of the characters used in the buffer are in ISO-8859-2, it's just Mule making false distinctions. At the very least, the defaults in Emacs should be to identify identical characters (eg, those from the Latin-## subsets) and to distinguish those where unification is controversial (the Han ideographs). Stefan> non-MIME coding-systems should be in the "unlikely" list, tho. There is no unique "the unlikely list". For example, if I were Croatian, I probably would want the buffer described above saved in ISO-8859-2 without being asked, but a German would probably want to save it in UTF-8 (or maybe ISO-2022-7 if she were an Emacs developer), or be queried, defaulting to ISO-8859-2. And some of the "universal" coding systems (UTF-32, mule internal, all the -with-esc systems) should probably not even be offered to most users; they should have to ask for them by name. But people with special needs should be able to configure them for regular use. And what's a "non-MIME coding system"? AFAIK MIME has nothing to do with coding systems except that the notation "the preferred MIME name" is a useful convention. But KOI8-R and all the Windows-125x sets are MIME registered. Stefan> Looking at the README, I have the impression that most of Stefan> the functionality is already part of the Emacs CVS code Stefan> (mostly thanks to Dave's ucs-tables.el). Someone should Stefan> try and figure out the details. As for most functionality being in Emacs, yes, that's why I said I'd help refactor; relative to ucs-tables.el the contribution is all UI. My duplication[1] of ucs-tables is straightforward, not terribly efficient code; all the meat is devoted to the question of "how do we know which coding systems to offer the user". Specifically I address the issues of preferred unibyte systems and preferred universal systems described above. Footnotes: [1] XEmacs 21.5 has built-in support for Unicode. The UCS tables are loaded at startup from (a local copy of) the Unicode Consortium tables, and an API is provided to reload if desirable. The code predates the release of Emacs 21, and so is different from ucs-tables.el, unfortunately. The duplicative parts are for 21.4. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-24 4:43 ` Karl Eichwalder @ 2002-07-25 3:12 ` Richard Stallman 2002-07-25 3:24 ` Karl Eichwalder 0 siblings, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-07-25 3:12 UTC (permalink / raw) Cc: handa, emacs-devel > Often it's getting even worse: Emacs proposes a > "secure" encoding and when users go for it, all looks well until you > want to process such a file with TeX... > > I am not really sure what that means--would you please explain? We discussed the issue several times (e.g. under the subject "lisp/ChangeLog coding system"); I did not recognize the issue because you said "a 'secure' encoding" and that is not a term we normally use. Well, in the version (mid-January, maybe?) of GNU Emacs I have, when I tried saving a buffer with mixed ascii, latin-1, and latin-2 in it, it gave me an abominably long list of coding systems including mule internal, all the -with-esc systems, and iso-2022-jp-2. But all of the characters used in the buffer are in ISO-8859-2, it's just Mule making false distinctions. The current development version of Emacs enables unify-8859-on-encoding-mode; does that solve this problem? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-25 3:12 ` Richard Stallman @ 2002-07-25 3:24 ` Karl Eichwalder 2002-07-26 15:35 ` Richard Stallman 0 siblings, 1 reply; 53+ messages in thread From: Karl Eichwalder @ 2002-07-25 3:24 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > I did not recognize the issue because you said "a 'secure' encoding" > and that is not a term we normally use. I thought that is were the Emacs wording. Sorry. > The current development version of Emacs enables > unify-8859-on-encoding-mode; does that solve this problem? Yes, that helps a lot. I must go into the RC branch, please, to make it available to the public. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-25 3:24 ` Karl Eichwalder @ 2002-07-26 15:35 ` Richard Stallman 2002-07-27 3:19 ` Karl Eichwalder 0 siblings, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-07-26 15:35 UTC (permalink / raw) Cc: handa, emacs-devel Yes, that helps a lot. I must go into the RC branch, please, to make it available to the public. Have we already considered this possibility? I can't remember, but chances are we would have considered it. It might depend on too many other changes to be easy to put into RC. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-26 15:35 ` Richard Stallman @ 2002-07-27 3:19 ` Karl Eichwalder 2002-07-29 1:12 ` Richard Stallman 2002-08-09 7:42 ` Stefan Monnier 0 siblings, 2 replies; 53+ messages in thread From: Karl Eichwalder @ 2002-07-27 3:19 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Yes, that helps a lot. I must go into the RC branch, please, to make > it available to the public. > > It might depend on too many other changes to be easy to put into RC. Since such a patch would prevent file corruptions from happening it's worth all effort. IIRC, the reason not to install the unification feature was: "it isn't tested enough". Of course, this argument isn't valid since we need a solution for a known problem -- users already suffering too long. Without the unification feature I cannot recommend Emacs 21.x to european users having to deal with latin1 and latin9 encodings. At the moment, they are better served using Emacs from the CVS trunk. Thanks for considering the issue and for your answer! -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-27 3:19 ` Karl Eichwalder @ 2002-07-29 1:12 ` Richard Stallman 2002-07-29 14:32 ` Karl Eichwalder 2002-08-09 7:42 ` Stefan Monnier 1 sibling, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-07-29 1:12 UTC (permalink / raw) Cc: handa, emacs-devel Could you make a patch that installs in the RC branch and that works for you? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-29 1:12 ` Richard Stallman @ 2002-07-29 14:32 ` Karl Eichwalder 2002-07-30 1:00 ` Richard Stallman 0 siblings, 1 reply; 53+ messages in thread From: Karl Eichwalder @ 2002-07-29 14:32 UTC (permalink / raw) Cc: handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > Could you make a patch that installs in the RC branch and that works > for you? I fear that's too complicate for me. On 21.1 I installed the files Dave Love posted; when Dave's enhancements were added to the CVS HEAD I switch to the CVS HEAD version (and forgot all about the release branch). Maybe the one who installed Dave's files on the trunck can do the same on the release branch? I guess it happened here to the HEAD: 2001-12-07 Dave Love <fx@gnu.org> and later unification was enabled by default. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-29 14:32 ` Karl Eichwalder @ 2002-07-30 1:00 ` Richard Stallman 0 siblings, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-30 1:00 UTC (permalink / raw) Cc: handa, emacs-devel Maybe the one who installed Dave's files on the trunck can do the same on the release branch? I don't know who that was or whether he will do this. Anyone who would like to make this happen, I invite to work on it. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-27 3:19 ` Karl Eichwalder 2002-07-29 1:12 ` Richard Stallman @ 2002-08-09 7:42 ` Stefan Monnier 2002-08-09 16:08 ` Karl Eichwalder 2002-08-10 17:16 ` Richard Stallman 1 sibling, 2 replies; 53+ messages in thread From: Stefan Monnier @ 2002-08-09 7:42 UTC (permalink / raw) Cc: rms, handa, emacs-devel > > Yes, that helps a lot. I must go into the RC branch, please, to make > > it available to the public. > > > > It might depend on too many other changes to be easy to put into RC. > > Since such a patch would prevent file corruptions from happening it's > worth all effort. IIRC, the reason not to install the unification > feature was: "it isn't tested enough". Of course, this argument isn't > valid since we need a solution for a known problem -- users already > suffering too long. ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3. It is not turned on by default, tho. I think it's safe to turn on unify-8859-on-encoding-mode (as is done on the trunk), but I'll let others judge. After all, it's supposed to be a bug-fix release and this is not quite a bug-fix in that things work as designed (it's just that the design doesn't do what the user wants). Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-09 7:42 ` Stefan Monnier @ 2002-08-09 16:08 ` Karl Eichwalder 2002-08-10 17:16 ` Richard Stallman 1 sibling, 0 replies; 53+ messages in thread From: Karl Eichwalder @ 2002-08-09 16:08 UTC (permalink / raw) Cc: emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > ucs-tables is installed in the RC branch and will thus be part of > Emacs-21.3. Since 2002-07-11, great! And it is even mentioned in NEWS. Just today I started to switch to the RC branch; now I'll use it for my daily work. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-09 7:42 ` Stefan Monnier 2002-08-09 16:08 ` Karl Eichwalder @ 2002-08-10 17:16 ` Richard Stallman 2002-08-12 16:20 ` Stefan Monnier 1 sibling, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-08-10 17:16 UTC (permalink / raw) Cc: keichwa, handa, emacs-devel ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3. It is not turned on by default, tho. I think it's safe to turn on unify-8859-on-encoding-mode (as is done on the trunk), but I'll let others judge. I think we should try this. File corruption is a bug, and if we can fix it, we should. Can you or someone show me precisely what change is needed? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-10 17:16 ` Richard Stallman @ 2002-08-12 16:20 ` Stefan Monnier 2002-08-13 1:48 ` Richard Stallman 0 siblings, 1 reply; 53+ messages in thread From: Stefan Monnier @ 2002-08-12 16:20 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, handa, emacs-devel > ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3. > It is not turned on by default, tho. I think it's safe to turn on > unify-8859-on-encoding-mode (as is done on the trunk), but I'll let > others judge. > > I think we should try this. File corruption is a bug, and if we can > fix it, we should. > > Can you or someone show me precisely what change is needed? I think we just need to add a call like (load "ucs-tables") (unify-8859-on-encoding-mode 1) to startup.el (and add ucs-tables.el to the list of files that are dumped). Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-12 16:20 ` Stefan Monnier @ 2002-08-13 1:48 ` Richard Stallman 2002-08-15 2:30 ` Karl Eichwalder 0 siblings, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-08-13 1:48 UTC (permalink / raw) Cc: monnier+gnu/emacs, keichwa, handa, emacs-devel I think we just need to add a call like (load "ucs-tables") (unify-8859-on-encoding-mode 1) to startup.el (and add ucs-tables.el to the list of files that are dumped). Eli, or someone else, can you try this in RC and see how it works? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-13 1:48 ` Richard Stallman @ 2002-08-15 2:30 ` Karl Eichwalder 2002-08-15 2:47 ` Stefan Monnier 0 siblings, 1 reply; 53+ messages in thread From: Karl Eichwalder @ 2002-08-15 2:30 UTC (permalink / raw) Cc: monnier+gnu/emacs, handa, emacs-devel Richard Stallman <rms@gnu.org> writes: > I think we just need to add a call like > > (load "ucs-tables") > (unify-8859-on-encoding-mode 1) > > to startup.el (and add ucs-tables.el to the list of files that are > dumped). Excuse my ignorance: do you really mean startup.el? > > Eli, or someone else, can you try this in RC and see how it works? ATM, I'm running the appended patch without problems. I guess, it's a know limitation that unification of characters different from the latin-1 set, isn't supported by the RC branch? I can unify a-umlaut from latin-2; but unification does not take place for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). Index: src/puresize.h =================================================================== RCS file: /cvsroot/emacs/emacs/src/puresize.h,v retrieving revision 1.57.14.1 diff -u -r1.57.14.1 puresize.h *** src/puresize.h 22 Feb 2002 11:21:04 -0000 1.57.14.1 --- src/puresize.h 15 Aug 2002 02:18:49 -0000 *************** *** 42,48 **** #endif #ifndef BASE_PURESIZE ! #define BASE_PURESIZE (710000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA) #endif /* Increase BASE_PURESIZE by a ratio depending on the machine's word size. */ --- 42,48 ---- #endif #ifndef BASE_PURESIZE ! #define BASE_PURESIZE (715000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA) #endif /* Increase BASE_PURESIZE by a ratio depending on the machine's word size. */ Index: lisp/loadup.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/loadup.el,v retrieving revision 1.113 diff -u -r1.113 loadup.el *** lisp/loadup.el 15 Jul 2001 16:15:34 -0000 1.113 --- lisp/loadup.el 15 Aug 2002 02:18:49 -0000 *************** *** 106,111 **** --- 106,115 ---- (load "language/tibetan") (load "language/vietnamese") (load "language/misc-lang") + (load "international/ucs-tables") + (unify-8859-on-encoding-mode 1) + ;; (ucs-unify-8859 'encode-only) + (update-coding-systems-internal) (load "indent") -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-15 2:30 ` Karl Eichwalder @ 2002-08-15 2:47 ` Stefan Monnier 2002-08-15 5:31 ` Karl Eichwalder 0 siblings, 1 reply; 53+ messages in thread From: Stefan Monnier @ 2002-08-15 2:47 UTC (permalink / raw) Cc: rms, monnier+gnu/emacs, handa, emacs-devel > > I think we just need to add a call like > > > > (load "ucs-tables") > > (unify-8859-on-encoding-mode 1) > > > > to startup.el (and add ucs-tables.el to the list of files that are > > dumped). > > Excuse my ignorance: do you really mean startup.el? Sorry, I meant loadup.el, of course. > > Eli, or someone else, can you try this in RC and see how it works? > > ATM, I'm running the appended patch without problems. I guess, it's a > know limitation that unification of characters different from the > latin-1 set, isn't supported by the RC branch? I'm not sure I understand, but I'm pretty sure it's known ;-) If you mean that a latin-2 char is not the same as a unicode char (e.g. for searching purposes), then you just need to use unify-8859-on-decoding-mode as well. This can't be the default because it has a few undesirable side-effects (harmless for the typical user, but annoying for people working on some Emacs files such as ucs-*.el where we do want to be able to talk about the difference between a latin-2 and a unicode char). > I can unify a-umlaut from latin-2; but unification does not take place > for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). I don't understand what you mean "the unification does not take place". Please just explain step by step what you did and what you expected as if we were terminally stupid (this seems necessary when discussing such things as unification because it has various different meanings in the context of the current Mule code and it's too often difficult to know which one we're talking about). Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-15 2:47 ` Stefan Monnier @ 2002-08-15 5:31 ` Karl Eichwalder 2002-08-15 15:30 ` Stefan Monnier 0 siblings, 1 reply; 53+ messages in thread From: Karl Eichwalder @ 2002-08-15 5:31 UTC (permalink / raw) Cc: rms, handa, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> I can unify a-umlaut from latin-2; but unification does not take place >> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). > > I don't understand what you mean "the unification does not take > place". Here is a recipe: Starting from an Latin-1 environment enter: Grüß Gott! C-x C-s (buffer is latin-1 encoded) Switch input encoding: C-x RET C-\ latin-2-prefix RET; enter: Dobr'y den ("'y" becomes one char, y with accent) C-x C-s (buffer stays latin-1 encoded, okay) Enter: Dzie'n dobry! ("'n" becomes one char, n with accent, not available in Latin-1) C-x C-s Emacs proposes iso-8859-2, okay, but I would have preferred UTF-8. C-x RET C-\ TeX RET; enter: \euro ("\euro becomes one char, the euro symbol, missing from latin-2) Emacs (RC) isn't able to unify the buffer to UTF-8 (it proposes "x-ctext" etc.); but Emacs (trunk version) can save the buffer UTF-8 encoded. Hope this helps. -- ke@suse.de (work) / keichwa@gmx.net (home): | http://www.suse.de/~ke/ | ,__o Free Translation Project: | _-\_<, http://www.iro.umontreal.ca/contrib/po/HTML/ | (*)/'(*) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-15 5:31 ` Karl Eichwalder @ 2002-08-15 15:30 ` Stefan Monnier 2002-08-15 17:33 ` Dave Love 0 siblings, 1 reply; 53+ messages in thread From: Stefan Monnier @ 2002-08-15 15:30 UTC (permalink / raw) Cc: Stefan Monnier, rms, handa, emacs-devel, fx > >> I can unify a-umlaut from latin-2; but unification does not take place > >> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142). > > > > I don't understand what you mean "the unification does not take > > place". [...] > Emacs (RC) isn't able to unify the buffer to UTF-8 (it proposes > "x-ctext" etc.); but Emacs (trunk version) can save the buffer UTF-8 > encoded. Hope this helps. Indeed, the safe-charsets property of the utf-8 coding-system has not been updated to list the extra charsets it can now encode. In the trunk utf-8.el says: '((safe-charsets ascii eight-bit-control eight-bit-graphic latin-iso8859-1 latin-iso8859-15 latin-iso8859-14 latin-iso8859-9 hebrew-iso8859-8 greek-iso8859-7 cyrillic-iso8859-5 latin-iso8859-4 latin-iso8859-3 latin-iso8859-2 vietnamese-viscii-lower vietnamese-viscii-upper thai-tis620 ipa ethiopic indian-is13194 katakana-jisx0201 chinese-sisheng lao mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff) where in the RC branch it only says '((safe-charsets ascii eight-bit-control eight-bit-graphic latin-iso8859-1 mule-unicode-0100-24ff mule-unicode-2500-33ff mule-unicode-e000-ffff) And turning on unify-8859-on-encoding-mode doesn't update the corresponding info either. I think Dave or Handa would now better how to fix that (whether unify-8859-on-encoding-mode should change the safe-charsets or whether it should simply always include the new charsets and load ucs-tables when needed. And also which charsets should be added). Thank you for pointing it out. Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-15 15:30 ` Stefan Monnier @ 2002-08-15 17:33 ` Dave Love 0 siblings, 0 replies; 53+ messages in thread From: Dave Love @ 2002-08-15 17:33 UTC (permalink / raw) Cc: Karl Eichwalder, rms, handa, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > Indeed, the safe-charsets property of the utf-8 coding-system has not been > updated to list the extra charsets it can now encode. I hope whatever's been changed has been properly tested if it's on the release branch. Please get handa to check it if he hasn't already. > I think Dave or Handa would now better how to fix that (whether > unify-8859-on-encoding-mode should change the safe-charsets or whether > it should simply always include the new charsets and load ucs-tables > when needed. And also which charsets should be added). Whoever changed it should sort it out. [Actually the stuff on the trunk should really use the encoding translation table to set `safe-chars', which would need to be re-registered if it changed, assuming that utf-8.el is how I left it. However, the default does encode the listed charsets completely and was unaffected by `unify-8859-on-encoding-mode' -- it deals with more than 8859 anyhow.] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (4 preceding siblings ...) 2002-07-23 4:42 ` Karl Eichwalder @ 2002-07-23 13:35 ` Kenichi Handa 2002-07-23 13:52 ` Alan Shutko 2002-07-24 3:25 ` Richard Stallman 2002-08-09 4:41 ` Stefan Monnier 6 siblings, 2 replies; 53+ messages in thread From: Kenichi Handa @ 2002-07-23 13:35 UTC (permalink / raw) Cc: spiegel, savannah-hackers, emacs-devel In article <200207221711.g6MHBZo02496@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > I cannot save the file lisp/ChangeLog. It specifies coding system > iso-2022-7bit, but it contains something that cannot be encoded in that > coding system. It seem that this problem was already fixed. As I also found one unnecessary mule-unicode-0100-24ff char, I deleted it. > I don't know any way to find the text that causes the > problem; essentially I am helpless. At least, (find-charset-region 1 (point-max)) will give you some information. If the returned value contains a suspicious charset, we can search it (if it's not eight-bit-xxx) by: (re-search-forward "[%c-%c]" (make-char CHARSET 32 32) (make-char CHARSET 127 127)) To search for eight-bit-control: (re-search-forward "[\200-\237]") To search for eight-bit-graphic: (re-search-forward (string-as-multibyte "[\240-\377]")) It's not sophisticated. :-( > We MUST do something to make it easier for users to cope with such a > situation. We talked about this a few weeks ago but nothing was done. > Perhaps we could add a command which simply scans forward for the next > run of characters that can't be saved in the specified coding system. > The message you get in that situation could tell you about this > command. This would be a powerful solution, since you could easily > find all the problems, not just the first one. Highlighting all of > them would also be a useful thing to do. Do you mean a command something like this? (defun check-coding-system-region (from to coding-system &optional max-num) "Check if the text after point is encodable by the specified coding system. When called from a program, takes three arguments: CODING-SYSTEM, FROM, and TO. START and END are buffer positions. Value is a list of positions of characters that are not encodable by CODING-SYSTEM. Optional 4th argument MAX-NUM, if non-nil, limits the length of returned list. By default, there's no limit." (interactive (list (point) (point-max) (read-non-nil-coding-system "Coding-system: ") 1)) (check-coding-system coding-system) (or (and coding-system (integerp (coding-system-type coding-system))) (error "Invalid coding system to check: %s" coding-system)) (let ((safe-chars (coding-system-get coding-system 'safe-chars)) (positions) (n 0)) (save-excursion (save-restriction (narrow-to-region from to) (goto-char (point-min)) (or max-num (setq max-num (- (point-max) (point-min)))) (if (eq safe-chars t) (let ((re (string-as-multibyte "[\200-\237\240-\377]"))) (while (and (< n max-num) (re-search-forward re nil t)) (setq positions (cons (1- (point)) positions) n (1+ n)))) (while (and (< n max-num) (re-search-forward "[^\000-\177]" nil t)) (or (aref safe-chars (preceding-char)) (setq positions (cons (1- (point)) positions) n (1+ n))))))) (if (interactive-p) (if (not positions) (message "All characters are encodable by %s" coding-system) (goto-char (car positions)) (error "This character can't be encoded by %s" coding-system)) (setq positions (nreverse positions))))) --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-23 13:35 ` Kenichi Handa @ 2002-07-23 13:52 ` Alan Shutko 2002-07-24 3:25 ` Richard Stallman 2002-07-24 3:25 ` Richard Stallman 1 sibling, 1 reply; 53+ messages in thread From: Alan Shutko @ 2002-07-23 13:52 UTC (permalink / raw) Cc: rms, spiegel, savannah-hackers, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > It seem that this problem was already fixed. As I also > found one unnecessary mule-unicode-0100-24ff char, I deleted > it. I took a quick look, and I think these are the commits that didn't make it into the ChangeLog: RCS file: /cvsroot/emacs/emacs/lisp/cus-start.el,v total revisions: 55; selected revisions: 1 description: Add customization information for intrinsics. ---------------------------- revision 1.51 date: 2002/07/22 15:22:49; author: rms; state: Exp; lines: +1 -0 (double-click-fuzz): Added. ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/vc.el,v total revisions: 341; selected revisions: 1 description: ;;; vc.el --- drive a version-control system from within Emacs ---------------------------- revision 1.335 date: 2002/07/22 18:52:04; author: spiegel; state: Exp; lines: +7 -6 (vc-next-action-on-file): Preserve find-file-literally. ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/cal-hebrew.el,v total revisions: 13; selected revisions: 1 description: ---------------------------- revision 1.13 date: 2002/07/22 15:31:13; author: rms; state: Exp; lines: +94 -77 (diary-omer, diary-yahrzeit, diary-rosh-hodesh, diary-parasha, diary-parasha): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/diary-lib.el,v total revisions: 55; selected revisions: 1 description: ---------------------------- revision 1.55 date: 2002/07/22 15:32:00; author: rms; state: Exp; lines: +96 -89 (mark-sexp-diary-entries): Retrieve mark from diary-sexp-entry and pass it to mark-visible-calendar-date. (list-sexp-diary-entries): Update doc string for new docs for .... If diary-sexp-entry returns a cons, only add the text to the diary list. (diary-sexp-entry): Allow sexps to return a cons of the form (MARK . STRING) to specify what face or character mark should be used in the calendar display. (diary-date, diary-block, diary-float, diary-anniversary) (diary-cyclic): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). (check-calendar-holidays, diary-iso-date) (calendar-holiday-list, diary-french-date, diary-mayan-date) (diary-julian-date, diary-astro-day-number, diary-chinese-date) (diary-islamic-date, list-islamic-diary-entries) (mark-islamic-diary-entries, mark-islamic-calendar-date-pattern) (diary-hebrew-date, diary-omer, diary-yahrzeit, diary-parasha) (diary-rosh-hodesh, list-hebrew-diary-entries) (mark-hebrew-diary-entries, mark-hebrew-calendar-date-pattern) (diary-coptic-date, diary-persian-date, diary-phases-of-moon) (diary-sunrise-sunset, diary-sabbath-candles): Remove interactive flag from autoloads. ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/lunar.el,v total revisions: 18; selected revisions: 1 description: ;;; lunar.el --- calendar functions for phases of the moon. ---------------------------- revision 1.18 date: 2002/07/22 15:30:43; author: rms; state: Exp; lines: +7 -4 (diary-phases-of-moon): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/calendar/solar.el,v total revisions: 45; selected revisions: 1 description: ;;; solar.el --- calendar functions for solar events. ---------------------------- revision 1.44 date: 2002/07/22 15:30:24; author: rms; state: Exp; lines: +8 -4 (diary-sabbath-candles): Add optional MARK parameter, specifying what face or character to use in the calendar display. These will now return (MARK . ENTRY). ============================================================================= RCS file: /cvsroot/emacs/emacs/lisp/net/browse-url.el,v total revisions: 24; selected revisions: 1 description: ---------------------------- revision 1.23 date: 2002/07/22 15:21:41; author: rms; state: Exp; lines: +7 -3 (browse-url-lynx-input-attempts): Use defcustom. (browse-url-lynx-input-delay): Add custom type and group. ============================================================================= -- Alan Shutko <ats@acm.org> - In a variety of flavors! I failed as a proof-reader for M & M's. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-23 13:52 ` Alan Shutko @ 2002-07-24 3:25 ` Richard Stallman 0 siblings, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-24 3:25 UTC (permalink / raw) Cc: handa, spiegel, savannah-hackers, emacs-devel I took a quick look, and I think these are the commits that didn't make it into the ChangeLog: I think these are all included now, thanks. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-23 13:35 ` Kenichi Handa 2002-07-23 13:52 ` Alan Shutko @ 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:37 ` Kenichi Handa 1 sibling, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-07-24 3:25 UTC (permalink / raw) Cc: spiegel, savannah-hackers, emacs-devel Do you mean a command something like this? (defun check-coding-system-region (from to coding-system &optional max-num) "Check if the text after point is encodable by the specified coding system. When called from a program, takes three arguments: CODING-SYSTEM, FROM, and TO. START and END are buffer positions. Value is a list of positions of characters that are not encodable by CODING-SYSTEM. Optional 4th argument MAX-NUM, if non-nil, limits the length of returned list. By default, there's no limit." This could do the internals of the job. To be useful, it needs a user interface. How about if you modify it to make overlays to highlight those characters instead of returning a list saying where they are? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-24 3:25 ` Richard Stallman @ 2002-07-24 4:37 ` Kenichi Handa 2002-07-25 3:12 ` Richard Stallman 2002-08-09 7:44 ` Several serious problems Stefan Monnier 0 siblings, 2 replies; 53+ messages in thread From: Kenichi Handa @ 2002-07-24 4:37 UTC (permalink / raw) Cc: spiegel, emacs-devel In article <200207240325.g6O3PdX04898@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > Do you mean a command something like this? > (defun check-coding-system-region (from to coding-system &optional max-num) > "Check if the text after point is encodable by the specified coding system. > When called from a program, takes three arguments: > CODING-SYSTEM, FROM, and TO. START and END are buffer positions. > Value is a list of positions of characters that are not encodable by > CODING-SYSTEM. > Optional 4th argument MAX-NUM, if non-nil, limits the length of > returned list. By default, there's no limit." > This could do the internals of the job. To be useful, it needs a user > interface. Ooops, I forgot to include this sentence in the docstring. If an unencodable character is found, move point to that character. So, this function can be used both for an internal job and for an interactive job (to find the next unencodable character). > How about if you modify it to make overlays to highlight those characters > instead of returning a list saying where they are? If the specified coding system is totally inappropriate for the buffer, highlighting them will results in huge amount of overlays and also it takes long time to finish the job. If we limit the number of highlighting, it may give users incorrect information (i.e. non-highlighted characters seems to be encodable). So, I thought just moving point to the next unencodable character is better. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-24 4:37 ` Kenichi Handa @ 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader ` (2 more replies) 2002-08-09 7:44 ` Several serious problems Stefan Monnier 1 sibling, 3 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-25 3:12 UTC (permalink / raw) Cc: spiegel, emacs-devel If the specified coding system is totally inappropriate for the buffer, highlighting them will results in huge amount of overlays and also it takes long time to finish the job. That is true. If we limit the number of highlighting, it may give users incorrect information (i.e. non-highlighted characters seems to be encodable). It could highlight the first N runs of such characters, and display a message saying "Many more unencodable characters found--type WHATEVER to view them". WHATEVER could be the same command with a prefix argument. What do you think of that? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-25 3:12 ` Richard Stallman @ 2002-07-25 5:53 ` Miles Bader 2002-07-26 14:29 ` Francesco Potorti` 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2 siblings, 0 replies; 53+ messages in thread From: Miles Bader @ 2002-07-25 5:53 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel Richard Stallman <rms@gnu.org> writes: > If we limit the number of highlighting, it may give users > incorrect information (i.e. non-highlighted characters seems to be > encodable). > > It could highlight the first N runs of such characters, and display a > message saying "Many more unencodable characters found--type WHATEVER > to view them". WHATEVER could be the same command with a prefix > argument. I'd like something similar to the way isearch works (when highlighting non-current matches) -- just highlight what's currently displayed and give the user a chance to jump to the next instance. [Maybe it could even use jit-lock-functions or something to allow free movement in the buffer while still using optimizing display] -Miles -- Somebody has to do something, and it's just incredibly pathetic that it has to be us. -- Jerry Garcia ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader @ 2002-07-26 14:29 ` Francesco Potorti` 2002-07-27 18:52 ` Richard Stallman 2002-08-09 7:43 ` Stefan Monnier 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2 siblings, 2 replies; 53+ messages in thread From: Francesco Potorti` @ 2002-07-26 14:29 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel Recently, a package called buffer-charset.el was posted to gnu.emacs-sources. It uses the machinery of hi-lock to work, and it's wonderfully simple to use: you just do M-x show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is already active) and you're done. You are asked what charset you want to highlight, and if you don't know you just press TAB and choose from the list. The offending characters are highlighted. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-26 14:29 ` Francesco Potorti` @ 2002-07-27 18:52 ` Richard Stallman 2002-08-09 7:43 ` Stefan Monnier 1 sibling, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-07-27 18:52 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel Recently, a package called buffer-charset.el was posted to gnu.emacs-sources. It uses the machinery of hi-lock to work, and it's wonderfully simple to use: you just do M-x show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is already active) and you're done. You are asked what charset you want to highlight, and if you don't know you just press TAB and choose from the list. The offending characters are highlighted. This might be useful for some purposes, but it is not the right interface to be a convenient solution to this particular problem. The user knows that the file can't be encoded in a certain coding system but she does not know which character sets are the problem. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-26 14:29 ` Francesco Potorti` 2002-07-27 18:52 ` Richard Stallman @ 2002-08-09 7:43 ` Stefan Monnier 1 sibling, 0 replies; 53+ messages in thread From: Stefan Monnier @ 2002-08-09 7:43 UTC (permalink / raw) Cc: rms, handa, spiegel, emacs-devel > Recently, a package called buffer-charset.el was posted to > gnu.emacs-sources. It uses the machinery of hi-lock to work, and it's > wonderfully simple to use: you just do M-x > show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is > already active) and you're done. You are asked what charset you want to > highlight, and if you don't know you just press TAB and choose from the > list. The offending characters are highlighted. Charsets are irrelevant (they're only an obscure internal implementation detail). Users only care about coding-systems. Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* unencodable-char-position [Re: Several serious problems] 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader 2002-07-26 14:29 ` Francesco Potorti` @ 2002-08-11 1:59 ` Kenichi Handa 2002-08-12 17:06 ` Richard Stallman 2002-08-15 17:51 ` Dave Love 2 siblings, 2 replies; 53+ messages in thread From: Kenichi Handa @ 2002-08-11 1:59 UTC (permalink / raw) Cc: spiegel, emacs-devel, d.love In article <200207250312.g6P3C9J06653@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. > That is true. > If > we limit the number of highlighting, it may give users > incorrect information (i.e. non-highlighted characters seems > to be encodable). > It could highlight the first N runs of such characters, and display a > message saying "Many more unencodable characters found--type WHATEVER > to view them". WHATEVER could be the same command with a prefix > argument. I implemented that and tried on several files. But, it seems that such kind of feature is not that helpful. In the case that the buffer contains many unencodable chars, usually the specified coding system is wrong, and we must use a different coding system. So, it is not that interesting to know where are the other unencodable characters. In the case that the buffer contains a few unencodable chars, as it's seldam that more than one of them appear in one window, highlighting the other unencodable chars is not that useful. By the way, I've just noticed that Dave has already installed the function `unencodable-char-position' in mule-cmds.el and used it in select-safe-coding-system. That function resembles to check-coding-system-region on which we are currently discussing. But, as the docstring says, it's slow. So, I commited these changes. (1) Re-implementation of unencodable-char-position in C while adding two optional arguments. ---------------------------------------------------------------------- unencodable-char-position is a built-in function. (unencodable-char-position START END CODING-SYSTEM &optional COUNT STRING) Return position of first un-encodable character in a region. START and END specfiy the region and CODING-SYSTEM specifies the encoding to check. Return nil if CODING-SYSTEM does encode the region. If optional 4th argument COUNT is non-nil, it specifies at most how many un-encodable characters to search. In this case, the value is a list of positions. If optional 5th argument STRING is non-nil, it is a string to search for un-encodable characters. In that case, START and END are indexes to the string. ---------------------------------------------------------------------- (2) New function `search-unencodable-char' for interactive use. It utilizes `unencodable-char-position'. ---------------------------------------------------------------------- (search-unencodable-char CODING-SYSTEM) Search forward from point for a character that is not encodable. It asks which coding system to check. If such a character is found, set point after that character. Otherwise, don't move point. When called from a program, the value is a position of the found character, or nil if all characters are encodable. ---------------------------------------------------------------------- It may be good to bind C-x RET s to this command. Could someone make this command more user friendly (e.g. improving messages)? It is also easy to modify this funciton to highlight a few more (or windowful) unencodable characters if you think that is surely helpful. (3) Make select-safe-coding-system to show (at most 10) unencodable characters for each default coding systems tried. Now, if any unencodable chars are found, one can type C-g to cancel further saving. As C-g doesn't hide *Warning* buffer, one can clik on the displayed unencodable chars to jump to the corresponding position in a buffer. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa @ 2002-08-12 17:06 ` Richard Stallman 2002-08-12 17:15 ` Stefan Monnier 2002-08-15 17:51 ` Dave Love 1 sibling, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-08-12 17:06 UTC (permalink / raw) Cc: spiegel, emacs-devel, d.love I implemented that and tried on several files. But, it seems that such kind of feature is not that helpful. In the case that the buffer contains many unencodable chars, usually the specified coding system is wrong, and we must use a different coding system. So, it is not that interesting to know where are the other unencodable characters. In the case that the buffer contains a few unencodable chars, as it's seldam that more than one of them appear in one window, highlighting the other unencodable chars is not that useful. These seem like persuasive arguments; it sounds good. How can I make a test case to observe it functioning? I tried but I couldn't get encoding to "fail". ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-12 17:06 ` Richard Stallman @ 2002-08-12 17:15 ` Stefan Monnier 2002-08-13 0:37 ` Kenichi Handa 0 siblings, 1 reply; 53+ messages in thread From: Stefan Monnier @ 2002-08-12 17:15 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel, d.love > I implemented that and tried on several files. But, it > seems that such kind of feature is not that helpful. > > In the case that the buffer contains many unencodable chars, > usually the specified coding system is wrong, and we must > use a different coding system. So, it is not that > interesting to know where are the other unencodable > characters. > > In the case that the buffer contains a few unencodable > chars, as it's seldam that more than one of them appear in > one window, highlighting the other unencodable chars is not > that useful. > > These seem like persuasive arguments; it sounds good. > > How can I make a test case to observe it functioning? > I tried but I couldn't get encoding to "fail". Try to save the HELLO file in utf-8. Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-12 17:15 ` Stefan Monnier @ 2002-08-13 0:37 ` Kenichi Handa 2002-08-13 22:47 ` Richard Stallman 0 siblings, 1 reply; 53+ messages in thread From: Kenichi Handa @ 2002-08-13 0:37 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel, d.love In article <200208121715.g7CHFrw29709@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> How can I make a test case to observe it functioning? >> I tried but I couldn't get encoding to "fail". > Try to save the HELLO file in utf-8. Yes. For instance: C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-13 0:37 ` Kenichi Handa @ 2002-08-13 22:47 ` Richard Stallman 2002-08-14 0:20 ` Kenichi Handa 0 siblings, 1 reply; 53+ messages in thread From: Richard Stallman @ 2002-08-13 22:47 UTC (permalink / raw) Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love Yes. For instance: C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET Yes, that indeed runs the new code. What I tried was C-h h C-x RET c utf-8 RET C-x C-w ~/temp RET. But it "worked"--it saved the file without complaint. Is this a bug? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-13 22:47 ` Richard Stallman @ 2002-08-14 0:20 ` Kenichi Handa 2002-08-14 23:13 ` Richard Stallman 0 siblings, 1 reply; 53+ messages in thread From: Kenichi Handa @ 2002-08-14 0:20 UTC (permalink / raw) Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love In article <200208132247.g7DMlHT07283@wijiji.santafe.edu>, Richard Stallman <rms@gnu.org> writes: > Yes. For instance: > C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET > Yes, that indeed runs the new code. > What I tried was C-h h C-x RET c utf-8 RET C-x C-w ~/temp RET. > But it "worked"--it saved the file without complaint. But, I think it broke some part of the file. > Is this a bug? No, it is an intentional behaviour. C-x RET c _CODING_ RET means that "I'll take all responsibility, so just accept _CODING_, don't make any warnings!". --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-14 0:20 ` Kenichi Handa @ 2002-08-14 23:13 ` Richard Stallman 0 siblings, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-08-14 23:13 UTC (permalink / raw) Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love No, it is an intentional behaviour. C-x RET c _CODING_ RET means that "I'll take all responsibility, so just accept _CODING_, don't make any warnings!". Thanks. I explained this in the manual. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2002-08-12 17:06 ` Richard Stallman @ 2002-08-15 17:51 ` Dave Love 2002-08-19 5:04 ` Kenichi Handa 1 sibling, 1 reply; 53+ messages in thread From: Dave Love @ 2002-08-15 17:51 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > I implemented that and tried on several files. But, it > seems that such kind of feature is not that helpful. If I understand what's being talked about, I agree. Normally the first problematic character tells me what's up. > By the way, I've just noticed that Dave has already > installed the function `unencodable-char-position' in > mule-cmds.el and used it in select-safe-coding-system. > > That function resembles to check-coding-system-region on > which we are currently discussing. I'm sorry if that was wrong. I thought it was supposed to have been installed months ago, and I was trying to clear out the Mule changes I've had hanging around after rms was on about it. I thought that was all stuff you approved of, or `obviously right'. > But, as the docstring says, it's slow. [It seemed fast enough for that use since it's only executed occasionally, when there's actually a problem. It was probably developed on a P133.] By the way, aborting in select-safe-coding-system can have bad effects when you're using VC. As far as I remember, it actually loses your edits in some circumstance. I haven't had time to look at the problem. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-15 17:51 ` Dave Love @ 2002-08-19 5:04 ` Kenichi Handa 2002-08-29 22:52 ` Dave Love 0 siblings, 1 reply; 53+ messages in thread From: Kenichi Handa @ 2002-08-19 5:04 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel In article <rzqvg6cm2mq.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes: >> By the way, I've just noticed that Dave has already >> installed the function `unencodable-char-position' in >> mule-cmds.el and used it in select-safe-coding-system. >> >> That function resembles to check-coding-system-region on >> which we are currently discussing. > I'm sorry if that was wrong. I thought it was supposed to have been > installed months ago, and I was trying to clear out the Mule changes > I've had hanging around after rms was on about it. I thought that was > all stuff you approved of, or `obviously right'. You don't have to be sorry. Perhaps, I've overlooked that part when you asked about various changes long ago. >> But, as the docstring says, it's slow. > [It seemed fast enough for that use since it's only executed > occasionally, when there's actually a problem. It was probably > developed on a P133.] Ah, yes. Currently, it is used only interactively, thus the speed is not that problem. But, I'm thinking about using unencodable-char-position to check if default coding systems can encode the region or not in select-safe-coding-system (not yet done). I think such a change makes select-safe-coding-system runs much faster. > By the way, aborting in select-safe-coding-system can have bad effects > when you're using VC. As far as I remember, it actually loses your > edits in some circumstance. I haven't had time to look at the > problem. I noticed that too. But, I also don't have time to fix it for the moment. I've never read the code of vc. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-19 5:04 ` Kenichi Handa @ 2002-08-29 22:52 ` Dave Love 2002-08-30 6:53 ` Andre Spiegel 0 siblings, 1 reply; 53+ messages in thread From: Dave Love @ 2002-08-29 22:52 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel Kenichi Handa <handa@etl.go.jp> writes: > > By the way, aborting in select-safe-coding-system can have bad effects > > when you're using VC. As far as I remember, it actually loses your > > edits in some circumstance. I haven't had time to look at the > > problem. > > I noticed that too. But, I also don't have time to fix it > for the moment. I've never read the code of vc. Is someone going to fix this? (I have worked on VC, but ...) ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: unencodable-char-position [Re: Several serious problems] 2002-08-29 22:52 ` Dave Love @ 2002-08-30 6:53 ` Andre Spiegel 0 siblings, 0 replies; 53+ messages in thread From: Andre Spiegel @ 2002-08-30 6:53 UTC (permalink / raw) Cc: Kenichi Handa, rms, emacs-devel On Fri, 2002-08-30 at 00:52, Dave Love wrote: > Kenichi Handa <handa@etl.go.jp> writes: > > > > By the way, aborting in select-safe-coding-system can have bad effects > > > when you're using VC. As far as I remember, it actually loses your > > > edits in some circumstance. I haven't had time to look at the > > > problem. > > > > I noticed that too. But, I also don't have time to fix it > > for the moment. I've never read the code of vc. > > Is someone going to fix this? (I have worked on VC, but ...) I will look into it. Can someone give me a more detailed description of the circumstances when the problem arises? Sequence of commands? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-24 4:37 ` Kenichi Handa 2002-07-25 3:12 ` Richard Stallman @ 2002-08-09 7:44 ` Stefan Monnier 2002-08-10 17:16 ` Richard Stallman 2002-08-12 0:26 ` Kenichi Handa 1 sibling, 2 replies; 53+ messages in thread From: Stefan Monnier @ 2002-08-09 7:44 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. If That was also my concern, but I heard that Emacs-20 did just that. Stefan ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-09 7:44 ` Several serious problems Stefan Monnier @ 2002-08-10 17:16 ` Richard Stallman 2002-08-12 0:26 ` Kenichi Handa 1 sibling, 0 replies; 53+ messages in thread From: Richard Stallman @ 2002-08-10 17:16 UTC (permalink / raw) Cc: handa, spiegel, emacs-devel > If the specified coding system is totally inappropriate for > the buffer, highlighting them will results in huge amount of > overlays and also it takes long time to finish the job. If That was also my concern, but I heard that Emacs-20 did just that. If empirically it works well enough, there's no reason to object. Did anyone ever try this in Emacs 20 on a substantial file with many unsuitable characters? If not, would you like to try that now and see how bad it was? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-09 7:44 ` Several serious problems Stefan Monnier 2002-08-10 17:16 ` Richard Stallman @ 2002-08-12 0:26 ` Kenichi Handa 1 sibling, 0 replies; 53+ messages in thread From: Kenichi Handa @ 2002-08-12 0:26 UTC (permalink / raw) Cc: rms, spiegel, emacs-devel In article <200208090744.g797irF11925@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: >> If the specified coding system is totally inappropriate for >> the buffer, highlighting them will results in huge amount of >> overlays and also it takes long time to finish the job. If > That was also my concern, but I heard that Emacs-20 did just that. Emacs 20 highlighted at most 256 such characters. And, in Emacs 20, detecting unencodable characters was easier because there's no coding system that can encode a part of a charset. --- Ken'ichi HANDA handa@etl.go.jp ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-07-22 17:11 Several serious problems Richard Stallman ` (5 preceding siblings ...) 2002-07-23 13:35 ` Kenichi Handa @ 2002-08-09 4:41 ` Stefan Monnier 2002-08-15 17:23 ` Dave Love 6 siblings, 1 reply; 53+ messages in thread From: Stefan Monnier @ 2002-08-09 4:41 UTC (permalink / raw) Cc: emacs-devel, d.love > I cannot save the file lisp/ChangeLog. It specifies coding system > iso-2022-7bit, but it contains something that cannot be encoded in that > coding system. I don't know any way to find the text that causes the > problem; essentially I am helpless. > > Handa-san, would you please clean up whatever is wrong with that file > so that it can save properly once again? > > We MUST do something to make it easier for users to cope with such a > situation. We talked about this a few weeks ago but nothing was done. Dave Love has code for it (and has posted it here). I can't check it in, so could someone else take care of it ? Stefan "who pleads guilty of delaying this patch" ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Several serious problems 2002-08-09 4:41 ` Stefan Monnier @ 2002-08-15 17:23 ` Dave Love 0 siblings, 0 replies; 53+ messages in thread From: Dave Love @ 2002-08-15 17:23 UTC (permalink / raw) Cc: Richard Stallman, emacs-devel "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes: > > I cannot save the file lisp/ChangeLog. It specifies coding system > > iso-2022-7bit, but it contains something that cannot be encoded in that > > coding system. I don't know any way to find the text that causes the > > problem; essentially I am helpless. > > > > Handa-san, would you please clean up whatever is wrong with that file > > so that it can save properly once again? > > > > We MUST do something to make it easier for users to cope with such a > > situation. We talked about this a few weeks ago but nothing was done. > > Dave Love has code for it (and has posted it here). > I can't check it in, so could someone else take care of it ? > > > Stefan "who pleads guilty of delaying this patch" I don't know what that refers to. I suspect the problem concerns eight-bit-... characters. If you search for them, you have to get the multibyteness of the search string right in a way I always have to look up. [vc-annotate should show you what edit was responsible.] However, I installed code in `select-safe-coding-system' some time ago which should point to the first offending character when selection fails. (As far as I remember, that was supposed to be done long ago, but never was.) If the development source doesn't show you the offending character and advocate C-u C-x =, there's something wrong with that code. ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2002-08-30 6:53 UTC | newest] Thread overview: 53+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-22 17:11 Several serious problems Richard Stallman 2002-07-22 19:01 ` Andre Spiegel 2002-07-22 19:03 ` Andre Spiegel 2002-07-23 4:00 ` Richard Stallman 2002-07-22 19:03 ` Andreas Schwab 2002-07-23 18:58 ` Richard Stallman 2002-07-22 19:11 ` Andre Spiegel 2002-07-23 4:42 ` Karl Eichwalder 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:43 ` Karl Eichwalder 2002-07-25 3:12 ` Richard Stallman 2002-07-25 3:24 ` Karl Eichwalder 2002-07-26 15:35 ` Richard Stallman 2002-07-27 3:19 ` Karl Eichwalder 2002-07-29 1:12 ` Richard Stallman 2002-07-29 14:32 ` Karl Eichwalder 2002-07-30 1:00 ` Richard Stallman 2002-08-09 7:42 ` Stefan Monnier 2002-08-09 16:08 ` Karl Eichwalder 2002-08-10 17:16 ` Richard Stallman 2002-08-12 16:20 ` Stefan Monnier 2002-08-13 1:48 ` Richard Stallman 2002-08-15 2:30 ` Karl Eichwalder 2002-08-15 2:47 ` Stefan Monnier 2002-08-15 5:31 ` Karl Eichwalder 2002-08-15 15:30 ` Stefan Monnier 2002-08-15 17:33 ` Dave Love 2002-07-23 13:35 ` Kenichi Handa 2002-07-23 13:52 ` Alan Shutko 2002-07-24 3:25 ` Richard Stallman 2002-07-24 3:25 ` Richard Stallman 2002-07-24 4:37 ` Kenichi Handa 2002-07-25 3:12 ` Richard Stallman 2002-07-25 5:53 ` Miles Bader 2002-07-26 14:29 ` Francesco Potorti` 2002-07-27 18:52 ` Richard Stallman 2002-08-09 7:43 ` Stefan Monnier 2002-08-11 1:59 ` unencodable-char-position [Re: Several serious problems] Kenichi Handa 2002-08-12 17:06 ` Richard Stallman 2002-08-12 17:15 ` Stefan Monnier 2002-08-13 0:37 ` Kenichi Handa 2002-08-13 22:47 ` Richard Stallman 2002-08-14 0:20 ` Kenichi Handa 2002-08-14 23:13 ` Richard Stallman 2002-08-15 17:51 ` Dave Love 2002-08-19 5:04 ` Kenichi Handa 2002-08-29 22:52 ` Dave Love 2002-08-30 6:53 ` Andre Spiegel 2002-08-09 7:44 ` Several serious problems Stefan Monnier 2002-08-10 17:16 ` Richard Stallman 2002-08-12 0:26 ` Kenichi Handa 2002-08-09 4:41 ` Stefan Monnier 2002-08-15 17:23 ` Dave Love
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.