Several serious problems

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Several serious problems
@ 2002-07-22 17:11 Richard Stallman
  2002-07-22 19:01 ` Andre Spiegel
                   ` (6 more replies)
  0 siblings, 7 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-22 17:11 UTC (permalink / raw)
  Cc: emacs-devel

I cannot save the file lisp/ChangeLog.  It specifies coding system
iso-2022-7bit, but it contains something that cannot be encoded in that
coding system.  I don't know any way to find the text that causes the
problem; essentially I am helpless.

Handa-san, would you please clean up whatever is wrong with that file
so that it can save properly once again?

We MUST do something to make it easier for users to cope with such a
situation.  We talked about this a few weeks ago but nothing was done.
Perhaps we could add a command which simply scans forward for the next
run of characters that can't be saved in the specified coding system.
The message you get in that situation could tell you about this
command.  This would be a powerful solution, since you could easily
find all the problems, not just the first one.  Highlighting all of
them would also be a useful thing to do.

This problem prevented me from commiting changes to the file from
Emacs.  I was able to edit and save the file using
find-file-literally, but when I tried to commit the changes, C-x v v
tried to revisit the file non-literally.  I think that is a serious
bug in VC.  VC should cope with visiting a file literally.
Andre, would you please fix that?

So I tried typing `cd lisp; cvs commit ChangeLog'.  It put me into
vi to ask me to edit a log message.  Damn!  I killed it, set EDITOR
and VISUAL to `emacs', and tried again.  This time it gave me Emacs
to edit with.  I deleted all the text, saved the log message file,
and exited Emacs.  cvs obnoxiously complained about the empty log
message and asked me what to do.  I typed `c RET' meaning "continue".

At that point it never came back to me.  Now the emacs/lisp directory
is locked and nobody can do anything in it any more.

Savannah people, would you please delete the lock?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
@ 2002-07-22 19:01 ` Andre Spiegel
  2002-07-22 19:03 ` Andre Spiegel
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 90+ messages in thread
From: Andre Spiegel @ 2002-07-22 19:01 UTC (permalink / raw)
  Cc: handa, emacs-devel

> Handa-san, would you please clean up whatever is wrong with that file
> so that it can save properly once again?

When I visit the ChangeLog, Kai's most recent entry from 2002-07-21
displays with a german sharp 's' (ß), but all of his former entries have
a \337 in place of it.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
  2002-07-22 19:01 ` Andre Spiegel
@ 2002-07-22 19:03 ` Andre Spiegel
  2002-07-23  4:00   ` Richard Stallman
  2002-07-22 19:03 ` Andreas Schwab
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 90+ messages in thread
From: Andre Spiegel @ 2002-07-22 19:03 UTC (permalink / raw)
  Cc: handa, emacs-devel

> This problem prevented me from commiting changes to the file from
> Emacs.  I was able to edit and save the file using
> find-file-literally, but when I tried to commit the changes, C-x v v
> tried to revisit the file non-literally.  I think that is a serious
> bug in VC.  VC should cope with visiting a file literally.
> Andre, would you please fix that?

It is fixed now.  I've installed the patch in vc.el, but haven't made an
entry in the ChangeLog yet, since it still seems corrupted.  Will do so
after it's been cleaned up.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 19:03 ` Andre Spiegel
@ 2002-07-23  4:00   ` Richard Stallman
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-23  4:00 UTC (permalink / raw)
  Cc: handa, emacs-devel

Thanks for jumping right on the problem.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
  2002-07-22 19:01 ` Andre Spiegel
  2002-07-22 19:03 ` Andre Spiegel
@ 2002-07-22 19:03 ` Andreas Schwab
  2002-07-23 18:58   ` Richard Stallman
  2002-07-22 19:11 ` Andre Spiegel
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 90+ messages in thread
From: Andreas Schwab @ 2002-07-22 19:03 UTC (permalink / raw)
  Cc: handa, spiegel, savannah-hackers, emacs-devel, dominik

Richard Stallman <rms@gnu.org> writes:

|> I cannot save the file lisp/ChangeLog.  It specifies coding system
|> iso-2022-7bit, but it contains something that cannot be encoded in that
|> coding system.  I don't know any way to find the text that causes the
|> problem; essentially I am helpless.

It was the last commit by Carsten Dominik which broke the file.  I have
now fixed it by visiting as iso-latin-1, fixing the two remaining iso-2022
encoded characters and then saving it again in the right encoding.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 19:03 ` Andreas Schwab
@ 2002-07-23 18:58   ` Richard Stallman
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-23 18:58 UTC (permalink / raw)
  Cc: handa, spiegel, savannah-hackers, emacs-devel, dominik

    It was the last commit by Carsten Dominik which broke the file.

Carsten, can you figure out what action it was that broke the file?
Can you find a way to reproduce it (prefereably without checking in
the broken version!)?

We need to figure this out so we can make changes to remove the risk
users will do this.

Andreas, thanks for fixing the file.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
                   ` (2 preceding siblings ...)
  2002-07-22 19:03 ` Andreas Schwab
@ 2002-07-22 19:11 ` Andre Spiegel
  2002-07-23  4:42 ` Karl Eichwalder
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 90+ messages in thread
From: Andre Spiegel @ 2002-07-22 19:11 UTC (permalink / raw)
  Cc: handa, emacs-devel

> This problem prevented me from commiting changes to the file from
> Emacs.  I was able to edit and save the file using
> find-file-literally, but when I tried to commit the changes, C-x v v
> tried to revisit the file non-literally.  I think that is a serious
> bug in VC.  VC should cope with visiting a file literally.
> Andre, would you please fix that?

It is fixed now.  I've installed the patch in vc.el, but haven't made an
entry in the ChangeLog yet, since it still seems corrupted.  Will do so
after it's been cleaned up. 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
                   ` (3 preceding siblings ...)
  2002-07-22 19:11 ` Andre Spiegel
@ 2002-07-23  4:42 ` Karl Eichwalder
  2002-07-24  3:25   ` Richard Stallman
  2002-07-23 13:35 ` Kenichi Handa
  2002-08-09  4:41 ` Stefan Monnier
  6 siblings, 1 reply; 90+ messages in thread
From: Karl Eichwalder @ 2002-07-23  4:42 UTC (permalink / raw)
  Cc: handa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> We MUST do something to make it easier for users to cope with such a
> situation.  We talked about this a few weeks ago but nothing was done.

Yes, you are right.

As said months ago I hve to fix those files quite often; users don't how
to do it on their own.  Often it's getting even worse: Emacs proposes a
"secure" encoding and when users go for it, all looks well until you
want to process such a file with TeX...

Please add this issue to etc/TODO.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-23  4:42 ` Karl Eichwalder
@ 2002-07-24  3:25   ` Richard Stallman
  2002-07-24  4:43     ` Karl Eichwalder
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-07-24  3:25 UTC (permalink / raw)
  Cc: handa, emacs-devel

      Often it's getting even worse: Emacs proposes a
    "secure" encoding and when users go for it, all looks well until you
    want to process such a file with TeX...

I am not really sure what that means--would you please explain?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-24  3:25   ` Richard Stallman
@ 2002-07-24  4:43     ` Karl Eichwalder
  2002-07-25  3:12       ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Karl Eichwalder @ 2002-07-24  4:43 UTC (permalink / raw)
  Cc: handa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>       Often it's getting even worse: Emacs proposes a
>     "secure" encoding and when users go for it, all looks well until you
>     want to process such a file with TeX...
>
> I am not really sure what that means--would you please explain?

We discussed the issue several times (e.g. under the subject
"lisp/ChangeLog coding system"); here is a good remark by Stephen
J. Turnbull.  Yes, that's a different from your problem, but it's cause
by the same implementation concept (enabling unification might cure most
of these problems -- thus it's very important to release an Emacs with
this feature, all released Emacs 21.x versions destroy user files at
random...):

From: "Stephen J. Turnbull" <stephen@xemacs.org>
Subject: Re: lisp/ChangeLog coding system
To: "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu>
Cc: Eli Zaretskii <eliz@is.elta.co.il>, emacs-devel@gnu.org
Date: 29 Apr 2002 20:28:55 +0900

>>>>> "Stefan" == Stefan Monnier <monnier+gnu/emacs@rum.cs.yale.edu> writes:

    >> One aspect is making better guesses about desired coding
    >> systems.

    Stefan> I'm not sure what kind of improvements you're thinking
    Stefan> about.

Well, in the version (mid-January, maybe?) of GNU Emacs I have, when I
tried saving a buffer with mixed ascii, latin-1, and latin-2 in it, it
gave me an abominably long list of coding systems including mule
internal, all the -with-esc systems, and iso-2022-jp-2.  But all of
the characters used in the buffer are in ISO-8859-2, it's just Mule
making false distinctions.

At the very least, the defaults in Emacs should be to identify
identical characters (eg, those from the Latin-## subsets) and to
distinguish those where unification is controversial (the Han
ideographs).

    Stefan> non-MIME coding-systems should be in the "unlikely" list, tho.

There is no unique "the unlikely list".

For example, if I were Croatian, I probably would want the buffer
described above saved in ISO-8859-2 without being asked, but a German
would probably want to save it in UTF-8 (or maybe ISO-2022-7 if she
were an Emacs developer), or be queried, defaulting to ISO-8859-2.
And some of the "universal" coding systems (UTF-32, mule internal, all
the -with-esc systems) should probably not even be offered to most
users; they should have to ask for them by name.  But people with
special needs should be able to configure them for regular use.

And what's a "non-MIME coding system"?  AFAIK MIME has nothing to do with
coding systems except that the notation "the preferred MIME name" is a
useful convention.  But KOI8-R and all the Windows-125x sets are MIME
registered.

    Stefan> Looking at the README, I have the impression that most of
    Stefan> the functionality is already part of the Emacs CVS code
    Stefan> (mostly thanks to Dave's ucs-tables.el).  Someone should
    Stefan> try and figure out the details.

As for most functionality being in Emacs, yes, that's why I said I'd
help refactor; relative to ucs-tables.el the contribution is all UI.
My duplication[1] of ucs-tables is straightforward, not terribly
efficient code; all the meat is devoted to the question of "how do we
know which coding systems to offer the user".  Specifically I address
the issues of preferred unibyte systems and preferred universal
systems described above.

Footnotes: 
[1]  XEmacs 21.5 has built-in support for Unicode.  The UCS tables are
loaded at startup from (a local copy of) the Unicode Consortium
tables, and an API is provided to reload if desirable.  The code
predates the release of Emacs 21, and so is different from
ucs-tables.el, unfortunately.  The duplicative parts are for 21.4.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-24  4:43     ` Karl Eichwalder
@ 2002-07-25  3:12       ` Richard Stallman
  2002-07-25  3:24         ` Karl Eichwalder
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-07-25  3:12 UTC (permalink / raw)
  Cc: handa, emacs-devel

    >       Often it's getting even worse: Emacs proposes a
    >     "secure" encoding and when users go for it, all looks well until you
    >     want to process such a file with TeX...
    >
    > I am not really sure what that means--would you please explain?

    We discussed the issue several times (e.g. under the subject
    "lisp/ChangeLog coding system"); 

I did not recognize the issue because you said "a 'secure' encoding"
and that is not a term we normally use.

    Well, in the version (mid-January, maybe?) of GNU Emacs I have, when I
    tried saving a buffer with mixed ascii, latin-1, and latin-2 in it, it
    gave me an abominably long list of coding systems including mule
    internal, all the -with-esc systems, and iso-2022-jp-2.  But all of
    the characters used in the buffer are in ISO-8859-2, it's just Mule
    making false distinctions.

The current development version of Emacs enables
unify-8859-on-encoding-mode; does that solve this problem?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-25  3:12       ` Richard Stallman
@ 2002-07-25  3:24         ` Karl Eichwalder
  2002-07-26 15:35           ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Karl Eichwalder @ 2002-07-25  3:24 UTC (permalink / raw)
  Cc: handa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> I did not recognize the issue because you said "a 'secure' encoding"
> and that is not a term we normally use.

I thought that is were the Emacs wording.  Sorry.

> The current development version of Emacs enables
> unify-8859-on-encoding-mode; does that solve this problem?

Yes, that helps a lot.  I must go into the RC branch, please, to make
it available to the public.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-25  3:24         ` Karl Eichwalder
@ 2002-07-26 15:35           ` Richard Stallman
  2002-07-27  3:19             ` Karl Eichwalder
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-07-26 15:35 UTC (permalink / raw)
  Cc: handa, emacs-devel

    Yes, that helps a lot.  I must go into the RC branch, please, to make
    it available to the public.

Have we already considered this possibility?  I can't remember,
but chances are we would have considered it.  It might depend
on too many other changes to be easy to put into RC.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-26 15:35           ` Richard Stallman
@ 2002-07-27  3:19             ` Karl Eichwalder
  2002-07-29  1:12               ` Richard Stallman
  2002-08-09  7:42               ` Stefan Monnier
  0 siblings, 2 replies; 90+ messages in thread
From: Karl Eichwalder @ 2002-07-27  3:19 UTC (permalink / raw)
  Cc: handa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     Yes, that helps a lot.  I must go into the RC branch, please, to make
>     it available to the public.
>
> It might depend on too many other changes to be easy to put into RC.

Since such a patch would prevent file corruptions from happening it's
worth all effort.  IIRC, the reason not to install the unification
feature was: "it isn't tested enough".  Of course, this argument isn't
valid since we need a solution for a known problem -- users already
suffering too long.

Without the unification feature I cannot recommend Emacs 21.x to
european users having to deal with latin1 and latin9 encodings.  At the
moment, they are better served using Emacs from the CVS trunk.

Thanks for considering the issue and for your answer!

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-27  3:19             ` Karl Eichwalder
@ 2002-07-29  1:12               ` Richard Stallman
  2002-07-29 14:32                 ` Karl Eichwalder
  2002-08-09  7:42               ` Stefan Monnier
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-07-29  1:12 UTC (permalink / raw)
  Cc: handa, emacs-devel

Could you make a patch that installs in the RC branch and that works
for you?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-29  1:12               ` Richard Stallman
@ 2002-07-29 14:32                 ` Karl Eichwalder
  2002-07-30  1:00                   ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Karl Eichwalder @ 2002-07-29 14:32 UTC (permalink / raw)
  Cc: handa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Could you make a patch that installs in the RC branch and that works
> for you?

I fear that's too complicate for me.

On 21.1 I installed the files Dave Love posted; when Dave's
enhancements were added to the CVS HEAD I switch to the CVS HEAD
version (and forgot all about the release branch).

Maybe the one who installed Dave's files on the trunck can do the same
on the release branch?

I guess it happened here to the HEAD:

    2001-12-07  Dave Love  <fx@gnu.org>

and later unification was enabled by default.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-29 14:32                 ` Karl Eichwalder
@ 2002-07-30  1:00                   ` Richard Stallman
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-30  1:00 UTC (permalink / raw)
  Cc: handa, emacs-devel

    Maybe the one who installed Dave's files on the trunck can do the same
    on the release branch?

I don't know who that was or whether he will do this.
Anyone who would like to make this happen, I invite to work on it.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-27  3:19             ` Karl Eichwalder
  2002-07-29  1:12               ` Richard Stallman
@ 2002-08-09  7:42               ` Stefan Monnier
  2002-08-09 16:08                 ` Karl Eichwalder
  2002-08-10 17:16                 ` Richard Stallman
  1 sibling, 2 replies; 90+ messages in thread
From: Stefan Monnier @ 2002-08-09  7:42 UTC (permalink / raw)
  Cc: rms, handa, emacs-devel

> >     Yes, that helps a lot.  I must go into the RC branch, please, to make
> >     it available to the public.
> >
> > It might depend on too many other changes to be easy to put into RC.
> 
> Since such a patch would prevent file corruptions from happening it's
> worth all effort.  IIRC, the reason not to install the unification
> feature was: "it isn't tested enough".  Of course, this argument isn't
> valid since we need a solution for a known problem -- users already
> suffering too long.

ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3.
It is not turned on by default, tho.  I think it's safe to turn on
unify-8859-on-encoding-mode (as is done on the trunk), but I'll let
others judge.
After all, it's supposed to be a bug-fix release and this is
not quite a bug-fix in that things work as designed (it's just
that the design doesn't do what the user wants).


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-09  7:42               ` Stefan Monnier
@ 2002-08-09 16:08                 ` Karl Eichwalder
  2002-08-10 17:16                 ` Richard Stallman
  1 sibling, 0 replies; 90+ messages in thread
From: Karl Eichwalder @ 2002-08-09 16:08 UTC (permalink / raw)
  Cc: emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> ucs-tables is installed in the RC branch and will thus be part of
> Emacs-21.3.

Since 2002-07-11, great!  And it is even mentioned in NEWS.

Just today I started to switch to the RC branch; now I'll use it for my
daily work.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-09  7:42               ` Stefan Monnier
  2002-08-09 16:08                 ` Karl Eichwalder
@ 2002-08-10 17:16                 ` Richard Stallman
  2002-08-12 16:20                   ` Stefan Monnier
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-10 17:16 UTC (permalink / raw)
  Cc: keichwa, handa, emacs-devel

    ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3.
    It is not turned on by default, tho.  I think it's safe to turn on
    unify-8859-on-encoding-mode (as is done on the trunk), but I'll let
    others judge.

I think we should try this.  File corruption is a bug, and if we can
fix it, we should.

Can you or someone show me precisely what change is needed?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-10 17:16                 ` Richard Stallman
@ 2002-08-12 16:20                   ` Stefan Monnier
  2002-08-13  1:48                     ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-12 16:20 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, handa, emacs-devel

>     ucs-tables is installed in the RC branch and will thus be part of Emacs-21.3.
>     It is not turned on by default, tho.  I think it's safe to turn on
>     unify-8859-on-encoding-mode (as is done on the trunk), but I'll let
>     others judge.
> 
> I think we should try this.  File corruption is a bug, and if we can
> fix it, we should.
> 
> Can you or someone show me precisely what change is needed?

I think we just need to add a call like

	(load "ucs-tables")
	(unify-8859-on-encoding-mode 1)

to startup.el (and add ucs-tables.el to the list of files that are dumped).


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-12 16:20                   ` Stefan Monnier
@ 2002-08-13  1:48                     ` Richard Stallman
  2002-08-15  2:30                       ` Karl Eichwalder
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-13  1:48 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, handa, emacs-devel

    I think we just need to add a call like

	    (load "ucs-tables")
	    (unify-8859-on-encoding-mode 1)

    to startup.el (and add ucs-tables.el to the list of files that are dumped).

Eli, or someone else, can you try this in RC and see how it works?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-13  1:48                     ` Richard Stallman
@ 2002-08-15  2:30                       ` Karl Eichwalder
  2002-08-15  2:47                         ` Stefan Monnier
  0 siblings, 1 reply; 90+ messages in thread
From: Karl Eichwalder @ 2002-08-15  2:30 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     I think we just need to add a call like
>
> 	    (load "ucs-tables")
> 	    (unify-8859-on-encoding-mode 1)
>
>     to startup.el (and add ucs-tables.el to the list of files that are
>     dumped).

Excuse my ignorance: do you really mean startup.el?
>
> Eli, or someone else, can you try this in RC and see how it works?

ATM, I'm running the appended patch without problems.  I guess, it's a
know limitation that unification of characters different from the
latin-1 set, isn't supported by the RC branch?

I can unify a-umlaut from latin-2; but unification does not take place
for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142).

Index: src/puresize.h
===================================================================
RCS file: /cvsroot/emacs/emacs/src/puresize.h,v
retrieving revision 1.57.14.1
diff -u -r1.57.14.1 puresize.h
*** src/puresize.h	22 Feb 2002 11:21:04 -0000	1.57.14.1
--- src/puresize.h	15 Aug 2002 02:18:49 -0000
***************
*** 42,48 ****
  #endif
  
  #ifndef BASE_PURESIZE
! #define BASE_PURESIZE (710000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA)
  #endif
  
  /* Increase BASE_PURESIZE by a ratio depending on the machine's word size.  */
--- 42,48 ----
  #endif
  
  #ifndef BASE_PURESIZE
! #define BASE_PURESIZE (715000 + SYSTEM_PURESIZE_EXTRA + SITELOAD_PURESIZE_EXTRA)
  #endif
  
  /* Increase BASE_PURESIZE by a ratio depending on the machine's word size.  */
Index: lisp/loadup.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/loadup.el,v
retrieving revision 1.113
diff -u -r1.113 loadup.el
*** lisp/loadup.el	15 Jul 2001 16:15:34 -0000	1.113
--- lisp/loadup.el	15 Aug 2002 02:18:49 -0000
***************
*** 106,111 ****
--- 106,115 ----
  (load "language/tibetan")
  (load "language/vietnamese")
  (load "language/misc-lang")
+ (load "international/ucs-tables")
+ (unify-8859-on-encoding-mode 1)
+ ;; (ucs-unify-8859 'encode-only)
+ 
  (update-coding-systems-internal)
  
  (load "indent")

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-15  2:30                       ` Karl Eichwalder
@ 2002-08-15  2:47                         ` Stefan Monnier
  2002-08-15  5:31                           ` Karl Eichwalder
  0 siblings, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-15  2:47 UTC (permalink / raw)
  Cc: rms, monnier+gnu/emacs, handa, emacs-devel

> >     I think we just need to add a call like
> >
> > 	    (load "ucs-tables")
> > 	    (unify-8859-on-encoding-mode 1)
> >
> >     to startup.el (and add ucs-tables.el to the list of files that are
> >     dumped).
> 
> Excuse my ignorance: do you really mean startup.el?

Sorry, I meant loadup.el, of course.

> > Eli, or someone else, can you try this in RC and see how it works?
> 
> ATM, I'm running the appended patch without problems.  I guess, it's a
> know limitation that unification of characters different from the
> latin-1 set, isn't supported by the RC branch?

I'm not sure I understand, but I'm pretty sure it's known ;-)
If you mean that a latin-2 char is not the same as a unicode char
(e.g. for searching purposes), then you just need to use
unify-8859-on-decoding-mode as well.  This can't be the default because
it has a few undesirable side-effects
(harmless for the typical user, but annoying for people working on
some Emacs files such as ucs-*.el where we do want to be able
to talk about the difference between a latin-2 and a unicode char).

> I can unify a-umlaut from latin-2; but unification does not take place
> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142).

I don't understand what you mean "the unification does not take place".
Please just explain step by step what you did and what you expected
as if we were terminally stupid (this seems necessary when discussing
such things as unification because it has various different meanings in
the context of the current Mule code and it's too often difficult to
know which one we're talking about).

	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-15  2:47                         ` Stefan Monnier
@ 2002-08-15  5:31                           ` Karl Eichwalder
  2002-08-15 15:30                             ` Stefan Monnier
  0 siblings, 1 reply; 90+ messages in thread
From: Karl Eichwalder @ 2002-08-15  5:31 UTC (permalink / raw)
  Cc: rms, handa, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

>> I can unify a-umlaut from latin-2; but unification does not take place
>> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142).
>
> I don't understand what you mean "the unification does not take
> place".

Here is a recipe:

Starting from an Latin-1 environment enter:

    Grüß Gott!

C-x C-s
(buffer is latin-1 encoded)

Switch input encoding: C-x RET C-\ latin-2-prefix RET; enter:

    Dobr'y den

("'y" becomes one char, y with accent)

C-x C-s
(buffer stays latin-1 encoded, okay)

Enter:

    Dzie'n dobry!

("'n" becomes one char, n with accent, not available in Latin-1)

C-x C-s
Emacs proposes iso-8859-2, okay, but I would have preferred UTF-8.

C-x RET C-\ TeX RET; enter:

    \euro

("\euro becomes one char, the euro symbol, missing from latin-2)

Emacs (RC) isn't able to unify the buffer to UTF-8 (it proposes
"x-ctext" etc.); but Emacs (trunk version) can save the buffer UTF-8
encoded.  Hope this helps.

-- 
ke@suse.de (work) / keichwa@gmx.net (home):              |
http://www.suse.de/~ke/                                  |      ,__o
Free Translation Project:                                |    _-\_<,
http://www.iro.umontreal.ca/contrib/po/HTML/             |   (*)/'(*)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-15  5:31                           ` Karl Eichwalder
@ 2002-08-15 15:30                             ` Stefan Monnier
  2002-08-15 17:33                               ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-15 15:30 UTC (permalink / raw)
  Cc: Stefan Monnier, rms, handa, emacs-devel, fx

> >> I can unify a-umlaut from latin-2; but unification does not take place
> >> for characterslike "LATIN SMALL LETTER L WITH STROKE" (x0142).
> >
> > I don't understand what you mean "the unification does not take
> > place".
[...]
> Emacs (RC) isn't able to unify the buffer to UTF-8 (it proposes
> "x-ctext" etc.); but Emacs (trunk version) can save the buffer UTF-8
> encoded.  Hope this helps.

Indeed, the safe-charsets property of the utf-8 coding-system has not been
updated to list the extra charsets it can now encode.
In the trunk utf-8.el says:

 '((safe-charsets
    ascii
    eight-bit-control
    eight-bit-graphic
    latin-iso8859-1
    latin-iso8859-15
    latin-iso8859-14
    latin-iso8859-9
    hebrew-iso8859-8
    greek-iso8859-7
    cyrillic-iso8859-5
    latin-iso8859-4
    latin-iso8859-3
    latin-iso8859-2
    vietnamese-viscii-lower
    vietnamese-viscii-upper
    thai-tis620
    ipa
    ethiopic
    indian-is13194
    katakana-jisx0201
    chinese-sisheng
    lao
    mule-unicode-0100-24ff
    mule-unicode-2500-33ff
    mule-unicode-e000-ffff)

where in the RC branch it only says

 '((safe-charsets
    ascii
    eight-bit-control
    eight-bit-graphic
    latin-iso8859-1
    mule-unicode-0100-24ff
    mule-unicode-2500-33ff
    mule-unicode-e000-ffff)

And turning on unify-8859-on-encoding-mode doesn't update the corresponding
info either.
I think Dave or Handa would now better how to fix that (whether
unify-8859-on-encoding-mode should change the safe-charsets or whether
it should simply always include the new charsets and load ucs-tables
when needed.  And also which charsets should be added).
Thank you for pointing it out.


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-15 15:30                             ` Stefan Monnier
@ 2002-08-15 17:33                               ` Dave Love
  0 siblings, 0 replies; 90+ messages in thread
From: Dave Love @ 2002-08-15 17:33 UTC (permalink / raw)
  Cc: Karl Eichwalder, rms, handa, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> Indeed, the safe-charsets property of the utf-8 coding-system has not been
> updated to list the extra charsets it can now encode.

I hope whatever's been changed has been properly tested if it's on the
release branch.  Please get handa to check it if he hasn't already.

> I think Dave or Handa would now better how to fix that (whether
> unify-8859-on-encoding-mode should change the safe-charsets or whether
> it should simply always include the new charsets and load ucs-tables
> when needed.  And also which charsets should be added).

Whoever changed it should sort it out.

[Actually the stuff on the trunk should really use the encoding
translation table to set `safe-chars', which would need to be
re-registered if it changed, assuming that utf-8.el is how I left it.
However, the default does encode the listed charsets completely and
was unaffected by `unify-8859-on-encoding-mode' -- it deals with more
than 8859 anyhow.]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
                   ` (4 preceding siblings ...)
  2002-07-23  4:42 ` Karl Eichwalder
@ 2002-07-23 13:35 ` Kenichi Handa
  2002-07-23 13:52   ` Alan Shutko
  2002-07-24  3:25   ` Richard Stallman
  2002-08-09  4:41 ` Stefan Monnier
  6 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-07-23 13:35 UTC (permalink / raw)
  Cc: spiegel, savannah-hackers, emacs-devel

In article <200207221711.g6MHBZo02496@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes:

> I cannot save the file lisp/ChangeLog.  It specifies coding system
> iso-2022-7bit, but it contains something that cannot be encoded in that
> coding system.  

It seem that this problem was already fixed.  As I also
found one unnecessary mule-unicode-0100-24ff char, I deleted
it.

> I don't know any way to find the text that causes the
> problem; essentially I am helpless.

At least, (find-charset-region 1 (point-max)) will give you
some information.  If the returned value contains a
suspicious charset, we can search it (if it's not
eight-bit-xxx) by:
	(re-search-forward "[%c-%c]"
			   (make-char CHARSET 32 32) 
			   (make-char CHARSET 127 127))
To search for eight-bit-control:
	(re-search-forward "[\200-\237]")
To search for eight-bit-graphic:
	(re-search-forward (string-as-multibyte "[\240-\377]"))
It's not sophisticated.  :-(

> We MUST do something to make it easier for users to cope with such a
> situation.  We talked about this a few weeks ago but nothing was done.
> Perhaps we could add a command which simply scans forward for the next
> run of characters that can't be saved in the specified coding system.
> The message you get in that situation could tell you about this
> command.  This would be a powerful solution, since you could easily
> find all the problems, not just the first one.  Highlighting all of
> them would also be a useful thing to do.

Do you mean a command something like this?

(defun check-coding-system-region (from to coding-system &optional max-num)
  "Check if the text after point is encodable by the specified coding system.
When called from a program, takes three arguments:
CODING-SYSTEM, FROM, and TO.  START and END are buffer positions.
Value is a list of positions of characters that are not encodable by
CODING-SYSTEM.
Optional 4th argument MAX-NUM, if non-nil, limits the length of
returned list.  By default, there's no limit."
  (interactive (list (point)
		     (point-max)
		     (read-non-nil-coding-system "Coding-system: ")
		     1))
  (check-coding-system coding-system)
  (or (and coding-system
	   (integerp (coding-system-type coding-system)))
      (error "Invalid coding system to check: %s" coding-system))
  (let ((safe-chars (coding-system-get coding-system 'safe-chars))
	(positions)
	(n 0))
    (save-excursion
      (save-restriction
	(narrow-to-region from to)
	(goto-char (point-min))
	(or max-num
	    (setq max-num (- (point-max) (point-min))))
	(if (eq safe-chars t)
	    (let ((re (string-as-multibyte "[\200-\237\240-\377]")))
	      (while (and (< n max-num) (re-search-forward re nil t))
		(setq positions (cons (1- (point)) positions)
		      n (1+ n))))
	  (while (and (< n max-num) (re-search-forward "[^\000-\177]" nil t))
	    (or (aref safe-chars (preceding-char))
		(setq positions (cons (1- (point)) positions)
		      n (1+ n)))))))
    (if (interactive-p)
	(if (not positions)
	    (message "All characters are encodable by %s" coding-system)
	  (goto-char (car positions))
	  (error "This character can't be encoded by %s" coding-system))
      (setq positions (nreverse positions)))))

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-23 13:35 ` Kenichi Handa
@ 2002-07-23 13:52   ` Alan Shutko
  2002-07-24  3:25     ` Richard Stallman
  2002-07-24  3:25   ` Richard Stallman
  1 sibling, 1 reply; 90+ messages in thread
From: Alan Shutko @ 2002-07-23 13:52 UTC (permalink / raw)
  Cc: rms, spiegel, savannah-hackers, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> It seem that this problem was already fixed.  As I also
> found one unnecessary mule-unicode-0100-24ff char, I deleted
> it.

I took a quick look, and I think these are the commits that didn't
make it into the ChangeLog:

RCS file: /cvsroot/emacs/emacs/lisp/cus-start.el,v
total revisions: 55;	selected revisions: 1
description:
Add customization information for intrinsics.
----------------------------
revision 1.51
date: 2002/07/22 15:22:49;  author: rms;  state: Exp;  lines: +1 -0
(double-click-fuzz): Added.

=============================================================================

RCS file: /cvsroot/emacs/emacs/lisp/vc.el,v
total revisions: 341;	selected revisions: 1
description:
;;; vc.el --- drive a version-control system from within Emacs
----------------------------
revision 1.335
date: 2002/07/22 18:52:04;  author: spiegel;  state: Exp;  lines: +7 -6
(vc-next-action-on-file): Preserve find-file-literally.

=============================================================================

RCS file: /cvsroot/emacs/emacs/lisp/calendar/cal-hebrew.el,v
total revisions: 13;	selected revisions: 1
description:
----------------------------
revision 1.13
date: 2002/07/22 15:31:13;  author: rms;  state: Exp;  lines: +94 -77
(diary-omer, diary-yahrzeit, diary-rosh-hodesh, diary-parasha, diary-parasha):
Add optional MARK parameter, specifying what face or character to use
in the calendar display.  These will now return (MARK . ENTRY).

=============================================================================

RCS file: /cvsroot/emacs/emacs/lisp/calendar/diary-lib.el,v
total revisions: 55;	selected revisions: 1
description:
----------------------------
revision 1.55
date: 2002/07/22 15:32:00;  author: rms;  state: Exp;  lines: +96 -89
(mark-sexp-diary-entries): Retrieve mark
from diary-sexp-entry and pass it to mark-visible-calendar-date.
(list-sexp-diary-entries): Update doc string for new docs for ....
If diary-sexp-entry returns a cons, only add the text to the diary list.
(diary-sexp-entry): Allow sexps to return a cons of the form (MARK
. STRING) to specify what face or character mark should be used in
the calendar display.
(diary-date, diary-block, diary-float, diary-anniversary)
(diary-cyclic): Add optional MARK parameter, specifying what face
or character to use in the calendar display.  These will now
return (MARK . ENTRY).

(check-calendar-holidays, diary-iso-date)
(calendar-holiday-list, diary-french-date, diary-mayan-date)
(diary-julian-date, diary-astro-day-number, diary-chinese-date)
(diary-islamic-date, list-islamic-diary-entries)
(mark-islamic-diary-entries, mark-islamic-calendar-date-pattern)
(diary-hebrew-date, diary-omer, diary-yahrzeit, diary-parasha)
(diary-rosh-hodesh, list-hebrew-diary-entries)
(mark-hebrew-diary-entries, mark-hebrew-calendar-date-pattern)
(diary-coptic-date, diary-persian-date, diary-phases-of-moon)
(diary-sunrise-sunset, diary-sabbath-candles):
Remove interactive flag from autoloads.

=============================================================================

RCS file: /cvsroot/emacs/emacs/lisp/calendar/lunar.el,v
total revisions: 18;	selected revisions: 1
description:
;;; lunar.el --- calendar functions for phases of the moon.
----------------------------
revision 1.18
date: 2002/07/22 15:30:43;  author: rms;  state: Exp;  lines: +7 -4
(diary-phases-of-moon): Add optional MARK
parameter, specifying what face or character to use in the
calendar display.  These will now return (MARK . ENTRY).

=============================================================================

RCS file: /cvsroot/emacs/emacs/lisp/calendar/solar.el,v
total revisions: 45;	selected revisions: 1
description:
;;; solar.el --- calendar functions for solar events.
----------------------------
revision 1.44
date: 2002/07/22 15:30:24;  author: rms;  state: Exp;  lines: +8 -4
(diary-sabbath-candles): Add optional MARK
parameter, specifying what face or character to use in the
calendar display.  These will now return (MARK . ENTRY).

=============================================================================

RCS file: /cvsroot/emacs/emacs/lisp/net/browse-url.el,v
total revisions: 24;	selected revisions: 1
description:
----------------------------
revision 1.23
date: 2002/07/22 15:21:41;  author: rms;  state: Exp;  lines: +7 -3
(browse-url-lynx-input-attempts): Use defcustom.
(browse-url-lynx-input-delay): Add custom type and group.

=============================================================================




-- 
Alan Shutko <ats@acm.org> - In a variety of flavors!
I failed as a proof-reader for M & M's.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-23 13:52   ` Alan Shutko
@ 2002-07-24  3:25     ` Richard Stallman
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-24  3:25 UTC (permalink / raw)
  Cc: handa, spiegel, savannah-hackers, emacs-devel

    I took a quick look, and I think these are the commits that didn't
    make it into the ChangeLog:

I think these are all included now, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-23 13:35 ` Kenichi Handa
  2002-07-23 13:52   ` Alan Shutko
@ 2002-07-24  3:25   ` Richard Stallman
  2002-07-24  4:37     ` Kenichi Handa
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-07-24  3:25 UTC (permalink / raw)
  Cc: spiegel, savannah-hackers, emacs-devel

    Do you mean a command something like this?

    (defun check-coding-system-region (from to coding-system &optional max-num)
      "Check if the text after point is encodable by the specified coding system.
    When called from a program, takes three arguments:
    CODING-SYSTEM, FROM, and TO.  START and END are buffer positions.
    Value is a list of positions of characters that are not encodable by
    CODING-SYSTEM.
    Optional 4th argument MAX-NUM, if non-nil, limits the length of
    returned list.  By default, there's no limit."

This could do the internals of the job.  To be useful, it needs a user
interface.

How about if you modify it to make overlays to highlight those characters
instead of returning a list saying where they are?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-24  3:25   ` Richard Stallman
@ 2002-07-24  4:37     ` Kenichi Handa
  2002-07-25  3:12       ` Richard Stallman
  2002-08-09  7:44       ` Several serious problems Stefan Monnier
  0 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-07-24  4:37 UTC (permalink / raw)
  Cc: spiegel, emacs-devel

In article <200207240325.g6O3PdX04898@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes:
>     Do you mean a command something like this?
>     (defun check-coding-system-region (from to coding-system &optional max-num)
>       "Check if the text after point is encodable by the specified coding system.
>     When called from a program, takes three arguments:
>     CODING-SYSTEM, FROM, and TO.  START and END are buffer positions.
>     Value is a list of positions of characters that are not encodable by
>     CODING-SYSTEM.
>     Optional 4th argument MAX-NUM, if non-nil, limits the length of
>     returned list.  By default, there's no limit."

> This could do the internals of the job.  To be useful, it needs a user
> interface.

Ooops, I forgot to include this sentence in the docstring.

If an unencodable character is found, move point to that character.

So, this function can be used both for an internal job and
for an interactive job (to find the next unencodable
character).

> How about if you modify it to make overlays to highlight those characters
> instead of returning a list saying where they are?

If the specified coding system is totally inappropriate for
the buffer, highlighting them will results in huge amount of
overlays and also it takes long time to finish the job.  If
we limit the number of highlighting, it may give users
incorrect information (i.e. non-highlighted characters seems
to be encodable).  So, I thought just moving point to the
next unencodable character is better.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-24  4:37     ` Kenichi Handa
@ 2002-07-25  3:12       ` Richard Stallman
  2002-07-25  5:53         ` Miles Bader
                           ` (2 more replies)
  2002-08-09  7:44       ` Several serious problems Stefan Monnier
  1 sibling, 3 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-25  3:12 UTC (permalink / raw)
  Cc: spiegel, emacs-devel

    If the specified coding system is totally inappropriate for
    the buffer, highlighting them will results in huge amount of
    overlays and also it takes long time to finish the job.

That is true.

      If
    we limit the number of highlighting, it may give users
    incorrect information (i.e. non-highlighted characters seems
    to be encodable).

It could highlight the first N runs of such characters, and display a
message saying "Many more unencodable characters found--type WHATEVER
to view them".  WHATEVER could be the same command with a prefix
argument.

What do you think of that?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-25  3:12       ` Richard Stallman
@ 2002-07-25  5:53         ` Miles Bader
  2002-07-26 14:29         ` Francesco Potorti`
  2002-08-11  1:59         ` unencodable-char-position [Re: Several serious problems] Kenichi Handa
  2 siblings, 0 replies; 90+ messages in thread
From: Miles Bader @ 2002-07-25  5:53 UTC (permalink / raw)
  Cc: handa, spiegel, emacs-devel

Richard Stallman <rms@gnu.org> writes:
>     If we limit the number of highlighting, it may give users
>     incorrect information (i.e. non-highlighted characters seems to be
>     encodable).
> 
> It could highlight the first N runs of such characters, and display a
> message saying "Many more unencodable characters found--type WHATEVER
> to view them".  WHATEVER could be the same command with a prefix
> argument.

I'd like something similar to the way isearch works (when highlighting
non-current matches) -- just highlight what's currently displayed and
give the user a chance to jump to the next instance.  [Maybe it could
even use jit-lock-functions or something to allow free movement in the
buffer while still using optimizing display]

-Miles
-- 
Somebody has to do something, and it's just incredibly pathetic that it
has to be us.  -- Jerry Garcia

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-25  3:12       ` Richard Stallman
  2002-07-25  5:53         ` Miles Bader
@ 2002-07-26 14:29         ` Francesco Potorti`
  2002-07-27 18:52           ` Richard Stallman
  2002-08-09  7:43           ` Stefan Monnier
  2002-08-11  1:59         ` unencodable-char-position [Re: Several serious problems] Kenichi Handa
  2 siblings, 2 replies; 90+ messages in thread
From: Francesco Potorti` @ 2002-07-26 14:29 UTC (permalink / raw)
  Cc: handa, spiegel, emacs-devel

Recently, a package called buffer-charset.el was posted to
gnu.emacs-sources.  It uses the machinery of hi-lock to work, and it's
wonderfully simple to use: you just do M-x
show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is
already active) and you're done.  You are asked what charset you want to
highlight, and if you don't know you just press TAB and choose from the
list.  The offending characters are highlighted.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-26 14:29         ` Francesco Potorti`
@ 2002-07-27 18:52           ` Richard Stallman
  2002-08-09  7:43           ` Stefan Monnier
  1 sibling, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-07-27 18:52 UTC (permalink / raw)
  Cc: handa, spiegel, emacs-devel

    Recently, a package called buffer-charset.el was posted to
    gnu.emacs-sources.  It uses the machinery of hi-lock to work, and it's
    wonderfully simple to use: you just do M-x
    show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is
    already active) and you're done.  You are asked what charset you want to
    highlight, and if you don't know you just press TAB and choose from the
    list.  The offending characters are highlighted.

This might be useful for some purposes, but it is not the right
interface to be a convenient solution to this particular problem.  The
user knows that the file can't be encoded in a certain coding system
but she does not know which character sets are the problem.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-26 14:29         ` Francesco Potorti`
  2002-07-27 18:52           ` Richard Stallman
@ 2002-08-09  7:43           ` Stefan Monnier
  1 sibling, 0 replies; 90+ messages in thread
From: Stefan Monnier @ 2002-08-09  7:43 UTC (permalink / raw)
  Cc: rms, handa, spiegel, emacs-devel

> Recently, a package called buffer-charset.el was posted to
> gnu.emacs-sources.  It uses the machinery of hi-lock to work, and it's
> wonderfully simple to use: you just do M-x
> show-buffer-charset-characters (or use `C-x w c' is hi-lock-mode is
> already active) and you're done.  You are asked what charset you want to
> highlight, and if you don't know you just press TAB and choose from the
> list.  The offending characters are highlighted.

Charsets are irrelevant (they're only an obscure internal implementation
detail).  Users only care about coding-systems.


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* unencodable-char-position [Re: Several serious problems]
  2002-07-25  3:12       ` Richard Stallman
  2002-07-25  5:53         ` Miles Bader
  2002-07-26 14:29         ` Francesco Potorti`
@ 2002-08-11  1:59         ` Kenichi Handa
  2002-08-12 17:06           ` Richard Stallman
  2002-08-15 17:51           ` Dave Love
  2 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-08-11  1:59 UTC (permalink / raw)
  Cc: spiegel, emacs-devel, d.love

In article <200207250312.g6P3C9J06653@aztec.santafe.edu>, Richard Stallman <rms@gnu.org> writes:

>     If the specified coding system is totally inappropriate for
>     the buffer, highlighting them will results in huge amount of
>     overlays and also it takes long time to finish the job.

> That is true.

>       If
>     we limit the number of highlighting, it may give users
>     incorrect information (i.e. non-highlighted characters seems
>     to be encodable).

> It could highlight the first N runs of such characters, and display a
> message saying "Many more unencodable characters found--type WHATEVER
> to view them".  WHATEVER could be the same command with a prefix
> argument.

I implemented that and tried on several files.  But, it
seems that such kind of feature is not that helpful.

In the case that the buffer contains many unencodable chars,
usually the specified coding system is wrong, and we must
use a different coding system.  So, it is not that
interesting to know where are the other unencodable
characters.

In the case that the buffer contains a few unencodable
chars, as it's seldam that more than one of them appear in
one window, highlighting the other unencodable chars is not
that useful.

By the way, I've just noticed that Dave has already
installed the function `unencodable-char-position' in
mule-cmds.el and used it in select-safe-coding-system.

That function resembles to check-coding-system-region on
which we are currently discussing.

But, as the docstring says, it's slow.

So, I commited these changes.

(1) Re-implementation of unencodable-char-position in C
    while adding two optional arguments.
----------------------------------------------------------------------
unencodable-char-position is a built-in function.
(unencodable-char-position START END CODING-SYSTEM &optional COUNT STRING)

Return position of first un-encodable character in a region.
START and END specfiy the region and CODING-SYSTEM specifies the
encoding to check.  Return nil if CODING-SYSTEM does encode the region.

If optional 4th argument COUNT is non-nil, it specifies at most how
many un-encodable characters to search.  In this case, the value is a
list of positions.

If optional 5th argument STRING is non-nil, it is a string to search
for un-encodable characters.  In that case, START and END are indexes
to the string.
----------------------------------------------------------------------

(2) New function `search-unencodable-char' for interactive
    use.  It utilizes `unencodable-char-position'.

----------------------------------------------------------------------
(search-unencodable-char CODING-SYSTEM)

Search forward from point for a character that is not encodable.
It asks which coding system to check.
If such a character is found, set point after that character.
Otherwise, don't move point.

When called from a program, the value is a position of the found character,
or nil if all characters are encodable.
----------------------------------------------------------------------

It may be good to bind C-x RET s to this command.

Could someone make this command more user friendly
(e.g. improving messages)?  It is also easy to modify this
funciton to highlight a few more (or windowful) unencodable
characters if you think that is surely helpful.

(3) Make select-safe-coding-system to show (at most 10)
    unencodable characters for each default coding systems
    tried.

Now, if any unencodable chars are found, one can type C-g to
cancel further saving.  As C-g doesn't hide *Warning*
buffer, one can clik on the displayed unencodable chars to
jump to the corresponding position in a buffer.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-11  1:59         ` unencodable-char-position [Re: Several serious problems] Kenichi Handa
@ 2002-08-12 17:06           ` Richard Stallman
  2002-08-12 17:15             ` Stefan Monnier
  2002-08-15 17:51           ` Dave Love
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-12 17:06 UTC (permalink / raw)
  Cc: spiegel, emacs-devel, d.love

    I implemented that and tried on several files.  But, it
    seems that such kind of feature is not that helpful.

    In the case that the buffer contains many unencodable chars,
    usually the specified coding system is wrong, and we must
    use a different coding system.  So, it is not that
    interesting to know where are the other unencodable
    characters.

    In the case that the buffer contains a few unencodable
    chars, as it's seldam that more than one of them appear in
    one window, highlighting the other unencodable chars is not
    that useful.

These seem like persuasive arguments; it sounds good.

How can I make a test case to observe it functioning?
I tried but I couldn't get encoding to "fail".

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-12 17:06           ` Richard Stallman
@ 2002-08-12 17:15             ` Stefan Monnier
  2002-08-13  0:37               ` Kenichi Handa
  0 siblings, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-12 17:15 UTC (permalink / raw)
  Cc: handa, spiegel, emacs-devel, d.love

>     I implemented that and tried on several files.  But, it
>     seems that such kind of feature is not that helpful.
> 
>     In the case that the buffer contains many unencodable chars,
>     usually the specified coding system is wrong, and we must
>     use a different coding system.  So, it is not that
>     interesting to know where are the other unencodable
>     characters.
> 
>     In the case that the buffer contains a few unencodable
>     chars, as it's seldam that more than one of them appear in
>     one window, highlighting the other unencodable chars is not
>     that useful.
> 
> These seem like persuasive arguments; it sounds good.
> 
> How can I make a test case to observe it functioning?
> I tried but I couldn't get encoding to "fail".

Try to save the HELLO file in utf-8.


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-12 17:15             ` Stefan Monnier
@ 2002-08-13  0:37               ` Kenichi Handa
  2002-08-13 22:47                 ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Kenichi Handa @ 2002-08-13  0:37 UTC (permalink / raw)
  Cc: rms, spiegel, emacs-devel, d.love

In article <200208121715.g7CHFrw29709@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  How can I make a test case to observe it functioning?
>>  I tried but I couldn't get encoding to "fail".

> Try to save the HELLO file in utf-8.

Yes.  For instance:
	C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-13  0:37               ` Kenichi Handa
@ 2002-08-13 22:47                 ` Richard Stallman
  2002-08-14  0:20                   ` Kenichi Handa
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-13 22:47 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love

    Yes.  For instance:
	    C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET

Yes, that indeed runs the new code.

What I tried was C-h h C-x RET c utf-8 RET C-x C-w ~/temp RET.
But it "worked"--it saved the file without complaint.

Is this a bug?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-13 22:47                 ` Richard Stallman
@ 2002-08-14  0:20                   ` Kenichi Handa
  2002-08-14 23:13                     ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Kenichi Handa @ 2002-08-14  0:20 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love

In article <200208132247.g7DMlHT07283@wijiji.santafe.edu>, Richard Stallman <rms@gnu.org> writes:
>     Yes.  For instance:
> 	    C-h h C-x RET f utf-8 RET C-x C-w ~/temp RET

> Yes, that indeed runs the new code.

> What I tried was C-h h C-x RET c utf-8 RET C-x C-w ~/temp RET.
> But it "worked"--it saved the file without complaint.

But, I think it broke some part of the file.

> Is this a bug?

No, it is an intentional behaviour.  C-x RET c _CODING_ RET
means that "I'll take all responsibility, so just accept
_CODING_, don't make any warnings!".

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-14  0:20                   ` Kenichi Handa
@ 2002-08-14 23:13                     ` Richard Stallman
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-08-14 23:13 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, spiegel, emacs-devel, d.love

    No, it is an intentional behaviour.  C-x RET c _CODING_ RET
    means that "I'll take all responsibility, so just accept
    _CODING_, don't make any warnings!".

Thanks.  I explained this in the manual.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-11  1:59         ` unencodable-char-position [Re: Several serious problems] Kenichi Handa
  2002-08-12 17:06           ` Richard Stallman
@ 2002-08-15 17:51           ` Dave Love
  2002-08-19  5:04             ` Kenichi Handa
  1 sibling, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-15 17:51 UTC (permalink / raw)
  Cc: rms, spiegel, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> I implemented that and tried on several files.  But, it
> seems that such kind of feature is not that helpful.

If I understand what's being talked about, I agree.  Normally the
first problematic character tells me what's up.

> By the way, I've just noticed that Dave has already
> installed the function `unencodable-char-position' in
> mule-cmds.el and used it in select-safe-coding-system.
> 
> That function resembles to check-coding-system-region on
> which we are currently discussing.

I'm sorry if that was wrong.  I thought it was supposed to have been
installed months ago, and I was trying to clear out the Mule changes
I've had hanging around after rms was on about it.  I thought that was
all stuff you approved of, or `obviously right'.

> But, as the docstring says, it's slow.

[It seemed fast enough for that use since it's only executed
occasionally, when there's actually a problem.  It was probably
developed on a P133.]

By the way, aborting in select-safe-coding-system can have bad effects
when you're using VC.  As far as I remember, it actually loses your
edits in some circumstance.  I haven't had time to look at the
problem.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-15 17:51           ` Dave Love
@ 2002-08-19  5:04             ` Kenichi Handa
  2002-08-29 22:52               ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Kenichi Handa @ 2002-08-19  5:04 UTC (permalink / raw)
  Cc: rms, spiegel, emacs-devel

In article <rzqvg6cm2mq.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
>>  By the way, I've just noticed that Dave has already
>>  installed the function `unencodable-char-position' in
>>  mule-cmds.el and used it in select-safe-coding-system.
>>  
>>  That function resembles to check-coding-system-region on
>>  which we are currently discussing.

> I'm sorry if that was wrong.  I thought it was supposed to have been
> installed months ago, and I was trying to clear out the Mule changes
> I've had hanging around after rms was on about it.  I thought that was
> all stuff you approved of, or `obviously right'.

You don't have to be sorry.  Perhaps, I've overlooked that
part when you asked about various changes long ago.

>>  But, as the docstring says, it's slow.

> [It seemed fast enough for that use since it's only executed
> occasionally, when there's actually a problem.  It was probably
> developed on a P133.]

Ah, yes.  Currently, it is used only interactively, thus the
speed is not that problem.  But, I'm thinking about using
unencodable-char-position to check if default coding systems
can encode the region or not in select-safe-coding-system
(not yet done).  I think such a change makes
select-safe-coding-system runs much faster.

> By the way, aborting in select-safe-coding-system can have bad effects
> when you're using VC.  As far as I remember, it actually loses your
> edits in some circumstance.  I haven't had time to look at the
> problem.

I noticed that too.  But, I also don't have time to fix it
for the moment.  I've never read the code of vc.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-19  5:04             ` Kenichi Handa
@ 2002-08-29 22:52               ` Dave Love
  2002-08-30  6:53                 ` Andre Spiegel
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-29 22:52 UTC (permalink / raw)
  Cc: rms, spiegel, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> > By the way, aborting in select-safe-coding-system can have bad effects
> > when you're using VC.  As far as I remember, it actually loses your
> > edits in some circumstance.  I haven't had time to look at the
> > problem.
> 
> I noticed that too.  But, I also don't have time to fix it
> for the moment.  I've never read the code of vc.

Is someone going to fix this?  (I have worked on VC, but ...)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: unencodable-char-position [Re: Several serious problems]
  2002-08-29 22:52               ` Dave Love
@ 2002-08-30  6:53                 ` Andre Spiegel
  0 siblings, 0 replies; 90+ messages in thread
From: Andre Spiegel @ 2002-08-30  6:53 UTC (permalink / raw)
  Cc: Kenichi Handa, rms, emacs-devel

On Fri, 2002-08-30 at 00:52, Dave Love wrote:
> Kenichi Handa <handa@etl.go.jp> writes:
> 
> > > By the way, aborting in select-safe-coding-system can have bad effects
> > > when you're using VC.  As far as I remember, it actually loses your
> > > edits in some circumstance.  I haven't had time to look at the
> > > problem.
> > 
> > I noticed that too.  But, I also don't have time to fix it
> > for the moment.  I've never read the code of vc.
> 
> Is someone going to fix this?  (I have worked on VC, but ...)

I will look into it.  Can someone give me a more detailed description of
the circumstances when the problem arises?  Sequence of commands?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-24  4:37     ` Kenichi Handa
  2002-07-25  3:12       ` Richard Stallman
@ 2002-08-09  7:44       ` Stefan Monnier
  2002-08-10 17:16         ` Richard Stallman
  2002-08-12  0:26         ` Kenichi Handa
  1 sibling, 2 replies; 90+ messages in thread
From: Stefan Monnier @ 2002-08-09  7:44 UTC (permalink / raw)
  Cc: rms, spiegel, emacs-devel

> If the specified coding system is totally inappropriate for
> the buffer, highlighting them will results in huge amount of
> overlays and also it takes long time to finish the job.  If

That was also my concern, but I heard that Emacs-20 did just that.


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-09  7:44       ` Several serious problems Stefan Monnier
@ 2002-08-10 17:16         ` Richard Stallman
  2002-08-12  0:26         ` Kenichi Handa
  1 sibling, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-08-10 17:16 UTC (permalink / raw)
  Cc: handa, spiegel, emacs-devel

    > If the specified coding system is totally inappropriate for
    > the buffer, highlighting them will results in huge amount of
    > overlays and also it takes long time to finish the job.  If

    That was also my concern, but I heard that Emacs-20 did just that.

If empirically it works well enough, there's no reason to object.
Did anyone ever try this in Emacs 20 on a substantial file
with many unsuitable characters?  If not, would you like to try that
now and see how bad it was?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-09  7:44       ` Several serious problems Stefan Monnier
  2002-08-10 17:16         ` Richard Stallman
@ 2002-08-12  0:26         ` Kenichi Handa
  1 sibling, 0 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-08-12  0:26 UTC (permalink / raw)
  Cc: rms, spiegel, emacs-devel

In article <200208090744.g797irF11925@rum.cs.yale.edu>, "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  If the specified coding system is totally inappropriate for
>>  the buffer, highlighting them will results in huge amount of
>>  overlays and also it takes long time to finish the job.  If

> That was also my concern, but I heard that Emacs-20 did just that.

Emacs 20 highlighted at most 256 such characters.  And, in
Emacs 20, detecting unencodable characters was easier
because there's no coding system that can encode a part of a
charset.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-07-22 17:11 Several serious problems Richard Stallman
                   ` (5 preceding siblings ...)
  2002-07-23 13:35 ` Kenichi Handa
@ 2002-08-09  4:41 ` Stefan Monnier
  2002-08-15 17:23   ` Dave Love
  6 siblings, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-09  4:41 UTC (permalink / raw)
  Cc: emacs-devel, d.love

> I cannot save the file lisp/ChangeLog.  It specifies coding system
> iso-2022-7bit, but it contains something that cannot be encoded in that
> coding system.  I don't know any way to find the text that causes the
> problem; essentially I am helpless.
> 
> Handa-san, would you please clean up whatever is wrong with that file
> so that it can save properly once again?
> 
> We MUST do something to make it easier for users to cope with such a
> situation.  We talked about this a few weeks ago but nothing was done.

Dave Love has code for it (and has posted it here).
I can't check it in, so could someone else take care of it ?


	Stefan "who pleads guilty of delaying this patch"

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-09  4:41 ` Stefan Monnier
@ 2002-08-15 17:23   ` Dave Love
  0 siblings, 0 replies; 90+ messages in thread
From: Dave Love @ 2002-08-15 17:23 UTC (permalink / raw)
  Cc: Richard Stallman, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> > I cannot save the file lisp/ChangeLog.  It specifies coding system
> > iso-2022-7bit, but it contains something that cannot be encoded in that
> > coding system.  I don't know any way to find the text that causes the
> > problem; essentially I am helpless.
> > 
> > Handa-san, would you please clean up whatever is wrong with that file
> > so that it can save properly once again?
> > 
> > We MUST do something to make it easier for users to cope with such a
> > situation.  We talked about this a few weeks ago but nothing was done.
> 
> Dave Love has code for it (and has posted it here).
> I can't check it in, so could someone else take care of it ?
> 
> 
> 	Stefan "who pleads guilty of delaying this patch"

I don't know what that refers to.

I suspect the problem concerns eight-bit-... characters.  If you
search for them, you have to get the multibyteness of the search
string right in a way I always have to look up.  [vc-annotate should
show you what edit was responsible.]

However, I installed code in `select-safe-coding-system' some time ago
which should point to the first offending character when selection
fails.  (As far as I remember, that was supposed to be done long ago,
but never was.)  If the development source doesn't show you the
offending character and advocate C-u C-x =, there's something wrong
with that code.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
@ 2002-08-19  7:48 Kenichi Handa
  2002-08-22 17:08 ` Dave Love
  2002-08-24 12:11 ` Richard Stallman
  0 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-08-19  7:48 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel

Dave Love <d.love@dl.ac.uk> writes:
> "Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:
>>  Indeed, the safe-charsets property of the utf-8 coding-system has not been
>>  updated to list the extra charsets it can now encode.

> I hope whatever's been changed has been properly tested if it's on the
> release branch.  Please get handa to check it if he hasn't already.

>>  I think Dave or Handa would now better how to fix that (whether
>>  unify-8859-on-encoding-mode should change the safe-charsets or whether
>>  it should simply always include the new charsets and load ucs-tables
>>  when needed.  And also which charsets should be added).

> Whoever changed it should sort it out.

I'm quite confused with the current status of utf-8.el,
ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and
RC.

They differ in many parts (utf-8-subst.el and the necessary
change for that in mule.el and ccl.c don't exist in RC).

It's IMPOSSIBLE for me to figure out what are the correct
behaviour of them.  I've thought that the current codes were
the same one as what Dave had, but the above statement of
Dave's tells that it's not.

Could someone tell me why are they different in HEAD and RC,
and why are they different from what Dave have written?

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-19  7:48 Kenichi Handa
@ 2002-08-22 17:08 ` Dave Love
  2002-08-29 13:25   ` Kenichi Handa
  2002-08-24 12:11 ` Richard Stallman
  1 sibling, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-22 17:08 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> I'm quite confused with the current status of utf-8.el,
> ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and
> RC.

I've been confused too, struggling to maintain several different
versions.

> It's IMPOSSIBLE for me to figure out what are the correct
> behaviour of them.

As far as I know, what's installed in the trunk behaves correctly, but
I'm not using that code and I don't know if I'd hear about real
problems with it (as opposed to imagined problems).  It should all be
things you have said are OK or I'm sure you will think are OK, but I
may have overlooked something.  However, it could use work for CJK, in
particular; there's a fixme in utf-8, and there could be additional
interconversion tables for CJK charsets as well as a way of
customizing the character preferences in utf-8-subst.el, and probably
other things.

> I've thought that the current codes were
> the same one as what Dave had, but the above statement of
> Dave's tells that it's not.

Well, now I check, utf-8.el in the RC branch seems to be as I left it,
which is what rms (I think) told me to do.  As far as I can tell, its
safe-charsets property is correct, and I don't understand what the
complaint is about.  When I couldn't check, I assumed someone had
modified it incorrectly, but there's no sign of that in CVS.

> Could someone tell me why are they different in HEAD and RC,
> and why are they different from what Dave have written?

Most changes aren't in RC since I was only allowed to add (a version
of) ucs-tables, not changing the default behaviour, so people could
turn on (partial) character translation themselves.  It doesn't affect
utf-8 or any other ccl coding systems because they don't use the
translation table (although the useful extra coding systems in
code-pages.el aren't included either, so I think only koi,
alternativnyj and mac-roman are affected).

I think I unilaterally added some other things (a utf-8 language
environment and utf-16.el?) since they addressed somewhat misleading
entries in PROBLEMS and the arguments against the Unicode support are
either demonstrably wrong or spurious IMNSHO.

I'm afraid I've had enough of all this, and I doubt it's worth more
effort anyhow.  Especially after all the FUD about them, the Mule
additions probably won't get used much unless they're the default,
even by i18n people, unfortunately.  It's a pity your good work on
Mule 5 is rather wasted.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-22 17:08 ` Dave Love
@ 2002-08-29 13:25   ` Kenichi Handa
  2002-08-29 17:32     ` Stefan Monnier
                       ` (3 more replies)
  0 siblings, 4 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-08-29 13:25 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel

In article <rzqlm6ybz38.fsf@albion.dl.ac.uk>,
  Dave Love <d.love@dl.ac.uk> writes:
> As far as I know, what's installed in the trunk behaves correctly, but
> I'm not using that code

Why aren't you using that code?  Does it mean that you
changed some of them locally?

> and I don't know if I'd hear about real
> problems with it (as opposed to imagined problems).  It should all be
> things you have said are OK or I'm sure you will think are OK, but I
> may have overlooked something.  However, it could use work for CJK, in
> particular; there's a fixme in utf-8, and there could be additional
> interconversion tables for CJK charsets as well as a way of
> customizing the character preferences in utf-8-subst.el, and probably
> other things.

I noticed those `fixme's.   Yes, it is better to solve all
of them, but, for the moment, I want to concentrate on
fixing the problem of RC.

>>  I've thought that the current codes were
>>  the same one as what Dave had, but the above statement of
>>  Dave's tells that it's not.

> Well, now I check, utf-8.el in the RC branch seems to be as I left it,
> which is what rms (I think) told me to do.  As far as I can tell, its
> safe-charsets property is correct,

The safe-charsets property of utf-8 in RC is this:

ascii eight-bit-control eight-bit-graphic latin-iso8859-1
mule-unicode-0100-24ff mule-unicode-2500-33ff
mule-unicode-e000-ffff ethiopic tibetan thai-tis620
katakana-jisx0201 ipa chinese-sisheng lao
vietnamese-viscii-lower vietnamese-viscii-upper

It doesn't contain latin-iso8859-[23...].

> and I don't understand what the complaint is about.  When
> I couldn't check, I assumed someone had modified it
> incorrectly, but there's no sign of that in CVS.

The complaint is that the coding-system utf-8 can't encode
latin-2 characters in RC even if loadup.el has these lines.

(load "international/ucs-tables")
(ucs-unify-8859 'encode-only)

The reason is, as far as I see, the ccl program
`ccl-encode-mule-utf-8' doesn't have this line at the near
to head.

	   (translate-character ucs-mule-to-mule-unicode r0 r1))

So, even if we setup the translation table
`ucs-mule-to-mule-unicode' at loadup time, it is not used in
utf-8.

>>  Could someone tell me why are they different in HEAD and RC,
>>  and why are they different from what Dave have written?

> Most changes aren't in RC since I was only allowed to add (a version
> of) ucs-tables, not changing the default behaviour, so people could
> turn on (partial) character translation themselves.  It doesn't affect
> utf-8 or any other ccl coding systems because they don't use the
> translation table (although the useful extra coding systems in
> code-pages.el aren't included either, so I think only koi,
> alternativnyj and mac-roman are affected).

Hmmm, I think I realized the situation of RC.  It can unify
charsets between iso-8859-X, but utf-8 can't encode
iso-8859-X (intentionally), correct?

Richard, is it what you asked Dave to install for RC?

I think RC should also allow utf-8 to encode 8859-X
correctly like in HEAD.  I see no harm in it.

> I think I unilaterally added some other things (a utf-8 language
> environment and utf-16.el?) since they addressed somewhat misleading
> entries in PROBLEMS and the arguments against the Unicode support are
> either demonstrably wrong or spurious IMNSHO.

I don't oppose to that.  I found one problem with utf-16.
It seems that utf-16-le/be can handle 8859-X correctly
because of this line in ccl-encode-mule-utf-16-le/be,
      (translate-character ucs-mule-to-mule-unicode r0 r1)
but the safe-charsets property lists only these:
      ascii
      eight-bit-control
      latin-iso8859-1
      mule-unicode-0100-24ff
      mule-unicode-2500-33ff
      mule-unicode-e000-ffff
thus, they can't be regarded as a safe coding system for
them.

> I'm afraid I've had enough of all this,

Yah, you have done the excellent hack!  When I implemented
translation table stuffs, I didn't expect that it can be
used this thoroughly.

> and I doubt it's worth more effort anyhow.  Especially
> after all the FUD about them, the Mule additions probably
> won't get used much unless they're the default, even by
> i18n people, unfortunately.

I thought containing ucs-tables and etc in RC is at least
for making unify-on-encoding the default INCLUDING utf-8.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 13:25   ` Kenichi Handa
@ 2002-08-29 17:32     ` Stefan Monnier
  2002-08-29 23:15       ` Dave Love
  2002-08-30  6:09       ` Richard Stallman
  2002-08-29 23:09     ` Dave Love
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 90+ messages in thread
From: Stefan Monnier @ 2002-08-29 17:32 UTC (permalink / raw)
  Cc: d.love, monnier+gnu/emacs, keichwa, rms, emacs-devel

> I noticed those `fixme's.   Yes, it is better to solve all
> of them, but, for the moment, I want to concentrate on
> fixing the problem of RC.

I think the only "problem" in RC is that latin-N chars cannot
be saved to utf-8.

> >>  I've thought that the current codes were
> >>  the same one as what Dave had, but the above statement of
> >>  Dave's tells that it's not.
> 
> > Well, now I check, utf-8.el in the RC branch seems to be as I left it,
> > which is what rms (I think) told me to do.  As far as I can tell, its
> > safe-charsets property is correct,
> 
> The safe-charsets property of utf-8 in RC is this:
> 
> ascii eight-bit-control eight-bit-graphic latin-iso8859-1
> mule-unicode-0100-24ff mule-unicode-2500-33ff
> mule-unicode-e000-ffff ethiopic tibetan thai-tis620
> katakana-jisx0201 ipa chinese-sisheng lao
> vietnamese-viscii-lower vietnamese-viscii-upper
> 
> It doesn't contain latin-iso8859-[23...].

And it's correct as long as ucs-tables is not loaded.
And since RC is "only bug-fixes" it's important that we don't make
any change outside of ucs-tables.el except for bug-fixes, so
we can't just change the safe-charsets property.  I.e.
we have to either accept the current situation or else
change the safe-charsets property of utf-8 from ucs-tables.el.
Unless RMS accepts to make changes to utf-8.el which are not
bug-fixes but improvements to the utf-8 support.

On the trunk it's easier since we just changed the safe-charsets
property directly in utf-8.el and made sure that ucs-tables.el
is loaded when necessary.

	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 17:32     ` Stefan Monnier
@ 2002-08-29 23:15       ` Dave Love
  2002-08-30 14:36         ` Stefan Monnier
  2002-08-30  6:09       ` Richard Stallman
  1 sibling, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-29 23:15 UTC (permalink / raw)
  Cc: Kenichi Handa, keichwa, rms, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> I think the only "problem" in RC is that latin-N chars cannot
> be saved to utf-8.

In that case, I wasted considerable time...  I know, for instance,
that people whinge that keyboard input doesn't conform to the buffer
file coding system, and that other coding systems &c are needed --
windows-1252 probably most importantly.

> > It doesn't contain latin-iso8859-[23...].
> 
> And it's correct as long as ucs-tables is not loaded.

What handa showed isn't correct.  The utf-8 coding system on the RC
branch doesn't encode lao, for instance.

> And since RC is "only bug-fixes"

For some value of `bug fix'...

> it's important that we don't make
> any change outside of ucs-tables.el except for bug-fixes, so
> we can't just change the safe-charsets property.

I don't understand.  Of course you can't just change safe-charsets --
it has to reflect what the coding system actually encodes.

> On the trunk it's easier since we just changed the safe-charsets
> property directly in utf-8.el and made sure that ucs-tables.el
> is loaded when necessary.

Last I looked, it was preloaded.  I don't see why it shouldn't be, and
it would have been designed to be if I hadn't had to write it just as
an add-on initially.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 23:15       ` Dave Love
@ 2002-08-30 14:36         ` Stefan Monnier
  2002-09-04 17:23           ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-30 14:36 UTC (permalink / raw)
  Cc: Stefan Monnier, Kenichi Handa, keichwa, rms, emacs-devel

> > I think the only "problem" in RC is that latin-N chars cannot
> > be saved to utf-8.
> 
> In that case, I wasted considerable time...  I know, for instance,
> that people whinge that keyboard input doesn't conform to the buffer
> file coding system, and that other coding systems &c are needed --
> windows-1252 probably most importantly.

By "in RC" I meant "in RC as it currently stands", not "in RC before you
installed ucs-tables.el".  As you know, I'm a big fan of ucs-tables.el.
Please don't try and find offense where there isn't, it makes me rather sad.

> > > It doesn't contain latin-iso8859-[23...].
> > 
> > And it's correct as long as ucs-tables is not loaded.
> 
> What handa showed isn't correct.  The utf-8 coding system on the RC
> branch doesn't encode lao, for instance.

I was referring to what's in the utf-8.el file.

> > And since RC is "only bug-fixes"
> For some value of `bug fix'...

Obviously.

> > it's important that we don't make
> > any change outside of ucs-tables.el except for bug-fixes, so
> > we can't just change the safe-charsets property.
> 
> I don't understand.  Of course you can't just change safe-charsets --
> it has to reflect what the coding system actually encodes.

IIRC, on the trunk you changed utf-8.el directly and simply enforced
that ucs-tables.el be loaded when necessary.


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-30 14:36         ` Stefan Monnier
@ 2002-09-04 17:23           ` Dave Love
  0 siblings, 0 replies; 90+ messages in thread
From: Dave Love @ 2002-09-04 17:23 UTC (permalink / raw)
  Cc: Kenichi Handa, keichwa, rms, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> > In that case, I wasted considerable time...  I know, for instance,
> > that people whinge that keyboard input doesn't conform to the buffer
> > file coding system, and that other coding systems &c are needed --
> > windows-1252 probably most importantly.
> 
> By "in RC" I meant "in RC as it currently stands", not "in RC before you
> installed ucs-tables.el".

So did I, or at least as it stood a few days ago.  I don't understand
this (or the rest of the message).  It's a non sequitur as far as I
can tell.

> Please don't try and find offense where there isn't, it makes me
> rather sad.

I don't know what you mean.  I'm just sticking up for a large set of
users.  However I guess they are likely to find offence if maintainers
dismiss -- or appear to -- m17n features they need.

As far as I know, my opinions are roughly the same as handa's --
apologies if not -- and he was the one proposing more changes in this
case.  I'm glad he eventually gets listened to, anyhow.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 17:32     ` Stefan Monnier
  2002-08-29 23:15       ` Dave Love
@ 2002-08-30  6:09       ` Richard Stallman
  2002-08-31 17:30         ` Dave Love
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-30  6:09 UTC (permalink / raw)
  Cc: handa, d.love, monnier+gnu/emacs, keichwa, emacs-devel

    And since RC is "only bug-fixes" it's important that we don't make
    any change outside of ucs-tables.el except for bug-fixes, so
    we can't just change the safe-charsets property.

I don't follow the logic here.  Why can't we just change the
safe-charsets property?  Is there some obstacle to doing that?  Do you
think other things would fail to work if we did?  Are other changes
are needed as well to make it work?

    Unless RMS accepts to make changes to utf-8.el which are not
    bug-fixes but improvements to the utf-8 support.

If we can't save latin-N characters as utf-8, that is a bug.
If the fix is safe and clear, we may as well install it in RC.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-30  6:09       ` Richard Stallman
@ 2002-08-31 17:30         ` Dave Love
  2002-09-02  0:01           ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-31 17:30 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> I don't follow the logic here.  Why can't we just change the
> safe-charsets property?

If you change safe-charsets without changing what the CCL actually
encodes, you're just courting data corruption.
E.g. find-coding-systems-... will report utf-8 for lao text, but if
you encode it, you'll just get U+FFFDs.

> If we can't save latin-N characters as utf-8, that is a bug.

[You argued against that before.]

Why just Latin-N, and why just as utf-8?  There shouldn't be anything
special about Latin.  That version of utf-8.el can't encode
cyrillic-iso8859-5, for instance, and the Cyrillic coding systems
can't encode the relevant characters from mule-unicode-0100-24ff.

Is it also a bug that utf-8 can't encode the CJK space or that the CJK
sets can't encode equivalent characters from other sets (which I
haven't tried to address and people probably don't care about)?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-31 17:30         ` Dave Love
@ 2002-09-02  0:01           ` Richard Stallman
  2002-09-04 17:15             ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-09-02  0:01 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel

    Why just Latin-N, and why just as utf-8?

I am talking about that issue because that is the issue someone
raised.  I don't know what other issue there is.  Could you tell us?

					      There shouldn't be anything
    special about Latin.

Latin-N character sets are very important in practice.  It is also
possible that they are easier to handle than some other character sets
(but I don't know whether that is the case here).  Those two factors
are directly relevant to whether it is worth fixing this case in RC.
The factors might be different for another character set.

    Is it also a bug that utf-8 can't encode the CJK space or that the CJK
    sets can't encode equivalent characters from other sets (which I
    haven't tried to address and people probably don't care about)?

That is certainly a bug.  The question is whether this bug may not be
worth fixing in RC.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-02  0:01           ` Richard Stallman
@ 2002-09-04 17:15             ` Dave Love
  2002-09-08 12:54               ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-09-04 17:15 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     Why just Latin-N, and why just as utf-8?
> 
> I am talking about that issue because that is the issue someone
> raised.  I don't know what other issue there is.  Could you tell us?

The issue is just the same for the other charsets that have
translation tables in the head code, and for other CCL coding systems.
For instance, the RC version of mule-utf-8 doesn't translate
cyrillic-iso8859-5, and the Cyrillic coding systems don't translate
mule-unicode-0100-24ff.

> Latin-N character sets are very important in practice.

I think the only thing which distinguishes Latin-N is that Latin-1 is
(was?) the Internet default and its code points are a Unicode subset.
I see no reason to treat, say, Latin-2 as more important than
Cyrillic; I guess it has fewer users for a start.  I also guess
windows-1252 is more widely used than Latin-1, like it or not.

> It is also possible that they are easier to handle than some other
> character sets (but I don't know whether that is the case here).

They're treated identically to the others that ucs-tables handles.
You have to work to remove them.  (The sets that are handled are just
the ones I could conveniently make tables for.)

>     Is it also a bug that utf-8 can't encode the CJK space or that the CJK
>     sets can't encode equivalent characters from other sets (which I
>     haven't tried to address and people probably don't care about)?
> 
> That is certainly a bug.

I actually agree with your previous opinion that lack of translations
isn't a bug as such, despite what PROBLEMS implied -- the features
behave as designed and documented.

I definitely don't agree that general lack of unification of Japanese
characters is a bug.  I got detailed information on the problems with
jisx mappings to Unicode, and we were asked not to confuse matters by
providing jisx0213 tables in Emacs 22, which is designed not to force
that.  (The jisx0208 that utf-8-subst.el uses is a case in point, but
I assume the Mule-UCS table I used is what Japanese linguists agree
on.)  It's also not clear that one should unify double-width
characters with iso8859, for instance.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-04 17:15             ` Dave Love
@ 2002-09-08 12:54               ` Richard Stallman
  2002-09-12 22:38                 ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-09-08 12:54 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel

    For instance, the RC version of mule-utf-8 doesn't translate
    cyrillic-iso8859-5, and the Cyrillic coding systems don't translate
    mule-unicode-0100-24ff.

We could consider adding that support in RC.  Is it a safe change?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-08 12:54               ` Richard Stallman
@ 2002-09-12 22:38                 ` Dave Love
  2002-09-13 19:34                   ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-09-12 22:38 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     For instance, the RC version of mule-utf-8 doesn't translate
>     cyrillic-iso8859-5, and the Cyrillic coding systems don't translate
>     mule-unicode-0100-24ff.
> 
> We could consider adding that support in RC.  Is it a safe change?

It won't break anything if done correctly, but I don't remember how
much of a change it is relative to the 21.2 code and I don't know who
might have been testing it, if anyone.  My Cyrillic changes also
filled in the koi8-r and alternativnj translation tables properly, and
that may be mixed up with it.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-12 22:38                 ` Dave Love
@ 2002-09-13 19:34                   ` Richard Stallman
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-09-13 19:34 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, handa, keichwa, emacs-devel

    It won't break anything if done correctly, but I don't remember how
    much of a change it is relative to the 21.2 code and I don't know who
    might have been testing it, if anyone.  My Cyrillic changes also
    filled in the koi8-r and alternativnj translation tables properly, and
    that may be mixed up with it.

If you want to extract the precise changes that would make sense
to install in Emacs 21.3, we could possibly do that.  Otherwise
I guess we have nothing to install.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 13:25   ` Kenichi Handa
  2002-08-29 17:32     ` Stefan Monnier
@ 2002-08-29 23:09     ` Dave Love
  2002-08-30  6:11       ` Richard Stallman
  2002-08-29 23:17     ` Dave Love
  2002-08-30  6:09     ` Richard Stallman
  3 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-29 23:09 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> In article <rzqlm6ybz38.fsf@albion.dl.ac.uk>,
>   Dave Love <d.love@dl.ac.uk> writes:
> > As far as I know, what's installed in the trunk behaves correctly, but
> > I'm not using that code
> 
> Why aren't you using that code?

I don't want to use an unstable Emacs with all sorts of things I don't
understand.

> I noticed those `fixme's.   Yes, it is better to solve all
> of them, but, for the moment, I want to concentrate on
> fixing the problem of RC.

I was trying to sort out RC, but I don't understand this problem.

> The safe-charsets property of utf-8 in RC is this:
> 
> ascii eight-bit-control eight-bit-graphic latin-iso8859-1
> mule-unicode-0100-24ff mule-unicode-2500-33ff
> mule-unicode-e000-ffff ethiopic tibetan thai-tis620
> katakana-jisx0201 ipa chinese-sisheng lao
> vietnamese-viscii-lower vietnamese-viscii-upper

I see:

 '((safe-charsets
    ascii
    eight-bit-control
    eight-bit-graphic
    latin-iso8859-1
    mule-unicode-0100-24ff
    mule-unicode-2500-33ff
    mule-unicode-e000-ffff)

in what appears to be revision 1.9.4.2 with sticky tag `EMACS_21_1_RC'.

> It doesn't contain latin-iso8859-[23...].

Indeed.

> The complaint is that the coding-system utf-8 can't encode
> latin-2 characters in RC even if loadup.el has these lines.

Indeed, but the complaint seemed to be that it could encode latin-2
and safe-charsets didn't say so.  That's why I thought someone had
changed it.

> The reason is, as far as I see, the ccl program
> `ccl-encode-mule-utf-8' doesn't have this line at the near
> to head.
> 
> 	   (translate-character ucs-mule-to-mule-unicode r0 r1))

Yes.  

> So, even if we setup the translation table
> `ucs-mule-to-mule-unicode' at loadup time, it is not used in
> utf-8.

Nor in other CCL coding systems.

> Hmmm, I think I realized the situation of RC.  It can unify
> charsets between iso-8859-X, but utf-8 can't encode
> iso-8859-X (intentionally), correct?

Yes.

> Richard, is it what you asked Dave to install for RC?

I'm pretty sure ucs-tables was only allowed to be installed because
just adding the file couldn't break anything.

> I think RC should also allow utf-8 to encode 8859-X
> correctly like in HEAD.  I see no harm in it.

I'm sure there's no harm in my Mule changes generally, but that's not
what everyone has been told, unfortunately.  

> > I think I unilaterally added some other things (a utf-8 language
> > environment and utf-16.el?) since they addressed somewhat misleading
> > entries in PROBLEMS and the arguments against the Unicode support are
> > either demonstrably wrong or spurious IMNSHO.
> 
> I don't oppose to that.

I didn't think you would.

> I found one problem with utf-16.
> It seems that utf-16-le/be can handle 8859-X correctly
> because of this line in ccl-encode-mule-utf-16-le/be,
>       (translate-character ucs-mule-to-mule-unicode r0 r1)

I guess that's an error, and I should have taken that out for
consistency with utf-8.

> > I'm afraid I've had enough of all this,
> 
> Yah, you have done the excellent hack!

I don't mean anything to do with useful work.  It's after being told
for so long it's impossible/broken/not wanted, wasting time, and then
having to sort out the situation in adverse circumstances.  It's very
unfortunate not to have an active maintainer for Mule generally.

> When I implemented translation table stuffs, I didn't expect that it
> can be used this thoroughly.

Strange!  I thought that was exactly what they were for, and the only
thing that was missing initially to satisfy the complaining Europeans
was char-coding-system-table.  The names were even
`...-unification-...' originally.

> I thought containing ucs-tables and etc in RC is at least
> for making unify-on-encoding the default INCLUDING utf-8.

I've no idea.  As far as I remember, it was due to pressure from users
of both Latin-1 and Latin-9 who must have actually tried it despite
what they were told.  I was surprised it was eventually allowed in.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 23:09     ` Dave Love
@ 2002-08-30  6:11       ` Richard Stallman
  2002-09-04 17:21         ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-30  6:11 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

    > I think RC should also allow utf-8 to encode 8859-X
    > correctly like in HEAD.  I see no harm in it.

    I'm sure there's no harm in my Mule changes generally, but that's not
    what everyone has been told, unfortunately.  

We would not have installed your changes in the trunk if they were
harmful.  The issue about RC is not harm, it is risk of bugs.  Any
change has a risk of bugs, even if it is a great improvement.  But the
risk is not proportional to the improvement; they depend on different
factors.  In RC we try to keep this risk down.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-30  6:11       ` Richard Stallman
@ 2002-09-04 17:21         ` Dave Love
  0 siblings, 0 replies; 90+ messages in thread
From: Dave Love @ 2002-09-04 17:21 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> We would not have installed your changes in the trunk if they were
> harmful.

I was referring to what people have been told about them, including in
PROBLEMS.

[I'm not sure you'd actually know a priori whether what I installed
was harmful; it wasn't properly tested.  Obviously I think it's OK
modulo the bugs I haven't heard about, but that doesn't mean it
couldn't corrupt data.]

> The issue about RC is not harm, it is risk of bugs.  Any
> change has a risk of bugs, even if it is a great improvement.

Of course, and I'm surprised at some of what's been added.

> But the risk is not proportional to the improvement; they depend on
> different factors.  In RC we try to keep this risk down.

Of course.  I happen to be in the best position to evaluate the
factors in this case.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 13:25   ` Kenichi Handa
  2002-08-29 17:32     ` Stefan Monnier
  2002-08-29 23:09     ` Dave Love
@ 2002-08-29 23:17     ` Dave Love
  2002-08-30  6:11       ` Richard Stallman
  2002-08-30  6:09     ` Richard Stallman
  3 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-29 23:17 UTC (permalink / raw)
  Cc: monnier+gnu/emacs, keichwa, rms, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> The safe-charsets property of utf-8 in RC is this:
> 
> ascii eight-bit-control eight-bit-graphic latin-iso8859-1
> mule-unicode-0100-24ff mule-unicode-2500-33ff
> mule-unicode-e000-ffff ethiopic tibetan thai-tis620
> katakana-jisx0201 ipa chinese-sisheng lao
> vietnamese-viscii-lower vietnamese-viscii-upper

I've just realized that you probably used coding-system-get, and
there's a problem with what I installed.  I didn't cut out this from
my working version:

*** ucs-tables.el.~1.12.4.1.~	Wed Jul  3 15:38:14 2002
--- ucs-tables.el	Thu Aug 29 19:27:15 2002
***************
*** 2443,2453 ****
  	       (coding-system-put cs 'translation-table-for-input cs)))))
      (optimize-char-table ucs-mule-to-mule-unicode)
      (dolist (c safe-charsets)
!       (aset table (make-char c) t))
!     (coding-system-put 'mule-utf-8 'safe-charsets
! 		       (append (coding-system-get 'mule-utf-8 'safe-charsets)
! 			       safe-charsets))
!     (register-char-codings 'mule-utf-8 table)))
  
  (defvar translation-table-for-input (make-translation-table))
  
--- 2443,2449 ----
  	       (coding-system-put cs 'translation-table-for-input cs)))))
      (optimize-char-table ucs-mule-to-mule-unicode)
      (dolist (c safe-charsets)
!       (aset table (make-char c) t))))
  
  (defvar translation-table-for-input (make-translation-table))

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 23:17     ` Dave Love
@ 2002-08-30  6:11       ` Richard Stallman
  2002-08-31 17:31         ` Dave Love
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-30  6:11 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

    I've just realized that you probably used coding-system-get, and
    there's a problem with what I installed.  I didn't cut out this from
    my working version:

Is this a change we should install in RC now?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-30  6:11       ` Richard Stallman
@ 2002-08-31 17:31         ` Dave Love
  2002-09-02  0:01           ` Richard Stallman
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-31 17:31 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     I've just realized that you probably used coding-system-get, and
>     there's a problem with what I installed.  I didn't cut out this from
>     my working version:
> 
> Is this a change we should install in RC now?

That depends on whether you include code in utf-8.el that encodes
those charsets.  If not, you need that change.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-31 17:31         ` Dave Love
@ 2002-09-02  0:01           ` Richard Stallman
  2002-09-02  1:28             ` Kenichi Handa
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-09-02  0:01 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

    That depends on whether you include code in utf-8.el that encodes
    those charsets.  If not, you need that change.

In that case, I will install that change presently, and then we can
study the question of whether to include the code in utf-8.el instead.

What does that code in utf-8.el do, and how safe a change is it?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-02  0:01           ` Richard Stallman
@ 2002-09-02  1:28             ` Kenichi Handa
  2002-09-05 13:41               ` Dave Love
  2002-09-10 16:36               ` Richard Stallman
  0 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-09-02  1:28 UTC (permalink / raw)
  Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel

In article <E17lefC-0003IF-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     That depends on whether you include code in utf-8.el that encodes
>     those charsets.  If not, you need that change.

> In that case, I will install that change presently, and then we can
> study the question of whether to include the code in utf-8.el instead.

> What does that code in utf-8.el do, and how safe a change is it?

It defines two CCL codes to decode and encode utf-8 byte
sequence, and makes the coding system mule-utf-8 by using
those CCL codes.

I'll attach the necessary change to enable RC's utf-8 to
encode latin-X plus alpha (e.g. thai).  The docstring of
mule-utf-8 may need improvement.

As the change is very small and that code has been in HEAD
for more than one month, I think the change is quite safe.
I recommend to install it in RC.

I also checked the code to some extent by this testsuite.

(dolist (charset (delq 'ascii
		       (delq 'eight-bit-control
			     (delq 'eight-bit-graphic
				   (coding-system-get 'mule-utf-8
						      'safe-charsets)))))
  (let ((dimension (charset-dimension charset))
	str)
    (if (= dimension 1)
	(setq str (string (make-char charset 33) (make-char charset 34)))
      (setq str (string (make-char charset 33 33) (make-char charset 33 34))))
    (or (memq 'mule-utf-8 (find-coding-systems-string str))
        (not (string-match "\357\277\275" ; UTF-8 form of U+FFFD
			   (encode-coding-string str 'mule-utf-8)))

	(error (format "%s is not supported" charset)))))

---
Ken'ichi HANDA
handa@etl.go.jp

*** utf-8.el.~1.9.4.2.~	Tue Jul 23 13:54:13 2002
--- utf-8.el	Mon Sep  2 10:28:26 2002
***************
*** 269,275 ****
       (loop
        (if (r5 < 0)
  	  ((r1 = -1)
! 	   (read-multibyte-character r0 r1))
  	(;; We have already done read-multibyte-character.
  	 (r0 = r5)
  	 (r1 = r6)
--- 269,277 ----
       (loop
        (if (r5 < 0)
  	  ((r1 = -1)
! 	   (read-multibyte-character r0 r1)
! 	   (translate-character ucs-mule-to-mule-unicode r0 r1))
! 
  	(;; We have already done read-multibyte-character.
  	 (r0 = r5)
  	 (r1 = r6)
***************
*** 392,397 ****
--- 394,423 ----
     mule-unicode-0100-24ff
     mule-unicode-2500-33ff
     mule-unicode-e000-ffff
+    latin-iso8859-2 (*)
+    latin-iso8859-3 (*)
+    latin-iso8859-4 (*)
+    cyrillic-iso8859-5 (*)
+    arabic-iso8859-6 (*)
+    greek-iso8859-7 (*)
+    hebrew-iso8859-8 (*)
+    latin-iso8859-9 (*)
+    latin-iso8859-14 (*)
+    latin-iso8859-15 (*)
+    chinese-sisheng (*)
+    ethiopic (*)
+    ipa (*)
+    lao (*)
+    katakana-jisx0201 (*)
+    thai-tis620 (*)
+    tibetan (*)
+    vietnamese-viscii-lower (*)
+    vietnamese-viscii-upper (*)
+ 
+ Among them, the charsets labeled \"(*)\" are supported only on
+ encoding.  That means, they are correctly encoded to UTF-8, but are
+ decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or
+ mule-unicode-2500-33ff, not to the original charsets.
  
  Unicode characters out of the ranges U+0000-U+33FF and U+E200-U+FFFF
  are decoded into sequences of eight-bit-control and eight-bit-graphic
***************
*** 409,415 ****
      latin-iso8859-1
      mule-unicode-0100-24ff
      mule-unicode-2500-33ff
!     mule-unicode-e000-ffff)
     (mime-charset . utf-8)
     (coding-category . coding-category-utf-8)
     (valid-codes (0 . 255))))
--- 435,460 ----
      latin-iso8859-1
      mule-unicode-0100-24ff
      mule-unicode-2500-33ff
!     mule-unicode-e000-ffff
!     latin-iso8859-2 
!     latin-iso8859-3 
!     latin-iso8859-4 
!     cyrillic-iso8859-5 
!     arabic-iso8859-6 
!     greek-iso8859-7 
!     hebrew-iso8859-8 
!     latin-iso8859-9 
!     latin-iso8859-14 
!     latin-iso8859-15 
!     chinese-sisheng 
!     ethiopic 
!     ipa 
!     lao 
!     katakana-jisx0201 
!     thai-tis620 
!     tibetan 
!     vietnamese-viscii-lower 
!     vietnamese-viscii-upper)
     (mime-charset . utf-8)
     (coding-category . coding-category-utf-8)
     (valid-codes (0 . 255))))

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-02  1:28             ` Kenichi Handa
@ 2002-09-05 13:41               ` Dave Love
  2002-09-05 23:32                 ` Kenichi Handa
  2002-09-10 16:36               ` Richard Stallman
  1 sibling, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-09-05 13:41 UTC (permalink / raw)
  Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> + Among them, the charsets labeled \"(*)\" are supported only on
> + encoding.

I assume they still are only encodable if unify-8859-on-encoding-mode
is on.

> That means, they are correctly encoded to UTF-8, but are
> + decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or
> + mule-unicode-2500-33ff, not to the original charsets.

[That's actually customizable through a decoding table, of course.]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-05 13:41               ` Dave Love
@ 2002-09-05 23:32                 ` Kenichi Handa
  2002-09-06 11:38                   ` Robert J. Chassell
  2002-09-07 23:19                   ` Dave Love
  0 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-09-05 23:32 UTC (permalink / raw)
  Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel

In article <rzqy9ag7dux.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@etl.go.jp> writes:
>>  + Among them, the charsets labeled \"(*)\" are supported only on
>>  + encoding.

> I assume they still are only encodable if unify-8859-on-encoding-mode
> is on.

Yes.  But, that mode is on by default in RC too.

>>  That means, they are correctly encoded to UTF-8, but are
>>  + decoded back to charsets latin-iso8859-1, mule-unicode-0100-24ff, or
>>  + mule-unicode-2500-33ff, not to the original charsets.

> [That's actually customizable through a decoding table, of course.]

How about adding this paragraph?

See also the documentations of:
  `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode',
  `utf-8-fragment-on-decoding'
to customize the behaviour of this coding system."

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-05 23:32                 ` Kenichi Handa
@ 2002-09-06 11:38                   ` Robert J. Chassell
  2002-09-07 23:19                   ` Dave Love
  1 sibling, 0 replies; 90+ messages in thread
From: Robert J. Chassell @ 2002-09-06 11:38 UTC (permalink / raw)


[This started as a question regarding `unify-8859-on-encoding-mode', but
has evolved to a `themes' related question!]

   Yes.  But, that mode is on by default in RC too.

How do I determine easily whether unify-8859-on-encoding-mode is on or
off by default in particular instances of Emacs.  Currently, I am
running two instances, one a `plain vanilla' Emacs, and another that
loads a 150kb .emacs file.  I would like to know whether
`unify-8859-on-encoding-mode' is on or off in my `plain vanilla'
Emacs.

I am not actually trying to track down the code (which I have done
anyhow.  Evidentally, `ucs-fragment-8859' sets properties to `nil',
but I don't know whether they are changed elsewhere.).

Rather I am looking for a mechanism that reports the complete current
status.

The `mule-diag' command does this for other features, and I thought
it might provide the unify status, too, but it does not.  (Probably
for the good reason that eventually, unify will always be on.)

Instead, it turns out that I am looking for a reporter that tells me
everything about the current state of a particular instance of Emacs,
including variables and properties; in other words, including the
values of `(mule-diag)', `(describe-bindings)',
`(current-frame-configuration)', `load-path', and so on.

This reporter would be useful for anyone working on themes, since it
would mean you could go back to any number of previous states.

(And yes, the resulting status files will be big, perhaps too big for
any normal use.  But right now I am concerned more about the
capability than about optimization.  I don't know whether the
capability merits optimization but think it is a simplification worth
providing to moderately knowledgeable hackers.)

-- 
    Robert J. Chassell            bob@rattlesnake.com  bob@gnu.org
    Rattlesnake Enterprises       http://www.rattlesnake.com
    Free Software Foundation      http://www.gnu.org   GnuPG Key ID: 004B4AC8

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-05 23:32                 ` Kenichi Handa
  2002-09-06 11:38                   ` Robert J. Chassell
@ 2002-09-07 23:19                   ` Dave Love
  2002-09-09  0:21                     ` Richard Stallman
  2002-09-26  4:51                     ` Kenichi Handa
  1 sibling, 2 replies; 90+ messages in thread
From: Dave Love @ 2002-09-07 23:19 UTC (permalink / raw)
  Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> Yes.  But, that mode is on by default in RC too.

Gosh.  However, it appears to be done wrongly.  Custom will show it
isn't on, and would turn it off if you tried to turn it on.  Surely if
it's preloaded and meant to be the default, the defcustom initial
value should just be changed.

> How about adding this paragraph?
> 
> See also the documentations of:
>   `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode',
>   `utf-8-fragment-on-decoding'
> to customize the behaviour of this coding system."

Fine, but that shouldn't be specific to mule-utf-8.  Those variables
affect more coding systems, and other CCL ones should use the
appropriate translation tables.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-07 23:19                   ` Dave Love
@ 2002-09-09  0:21                     ` Richard Stallman
  2002-09-12 22:43                       ` Dave Love
  2002-09-26  4:51                     ` Kenichi Handa
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-09-09  0:21 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

    > Yes.  But, that mode is on by default in RC too.

    Gosh.  However, it appears to be done wrongly.  Custom will show it
    isn't on, and would turn it off if you tried to turn it on.  Surely if
    it's preloaded and meant to be the default, the defcustom initial
    value should just be changed.

That sounds right to me.  Can you send a patch?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-09  0:21                     ` Richard Stallman
@ 2002-09-12 22:43                       ` Dave Love
  0 siblings, 0 replies; 90+ messages in thread
From: Dave Love @ 2002-09-12 22:43 UTC (permalink / raw)
  Cc: handa, monnier+gnu/emacs, keichwa, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>     > Yes.  But, that mode is on by default in RC too.
> 
>     Gosh.  However, it appears to be done wrongly.  Custom will show it
>     isn't on, and would turn it off if you tried to turn it on.  Surely if
>     it's preloaded and meant to be the default, the defcustom initial
>     value should just be changed.
> 
> That sounds right to me.  Can you send a patch?

I should have said `define-minor-mode', not defcustom.  Just change
:init-value nil to t and take out the function call from loadup.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-07 23:19                   ` Dave Love
  2002-09-09  0:21                     ` Richard Stallman
@ 2002-09-26  4:51                     ` Kenichi Handa
  1 sibling, 0 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-09-26  4:51 UTC (permalink / raw)
  Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel

In article <rzqelc5s7zb.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
>>  See also the documentations of:
>>    `unify-8859-on-decoding-mode', `unify-8859-on-encoding-mode',
>>    `utf-8-fragment-on-decoding'
>>  to customize the behaviour of this coding system."

> Fine, but that shouldn't be specific to mule-utf-8.  Those variables
> affect more coding systems,

I'm going to introduce `dependency' in coding system
property.  The value will be a list of symbols whose values
affect the behaviour of the coding system.  mule-utf-* can
have this property from the start.  For iso-8859-?, we can
add this property in ucs-tables.el.

Then, descibe-coding-system can check it and produce a
proper descriptions something like below:
----------------------------------------------------------------------
1 -- iso-latin-1 (alias: iso-8859-1 latin-1)

ISO 2022 based 8-bit encoding for Latin-1 (MIME:ISO-8859-1).

See also the documentation of these customizable variables
which alter the behaviour of this coding system.
	`unify-8859-on-encoding-mode'
	`unify-8859-on-decoding-mode'
[...]
----------------------------------------------------------------------

> and other CCL ones should use the appropriate translation
> tables.

Sure.  I'll work on it later.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-09-02  1:28             ` Kenichi Handa
  2002-09-05 13:41               ` Dave Love
@ 2002-09-10 16:36               ` Richard Stallman
  1 sibling, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-09-10 16:36 UTC (permalink / raw)
  Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel

    I'll attach the necessary change to enable RC's utf-8 to
    encode latin-X plus alpha (e.g. thai).  The docstring of
    mule-utf-8 may need improvement.

    As the change is very small and that code has been in HEAD
    for more than one month, I think the change is quite safe.
    I recommend to install it in RC.

Ok, would you please install it when your conference is over?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 13:25   ` Kenichi Handa
                       ` (2 preceding siblings ...)
  2002-08-29 23:17     ` Dave Love
@ 2002-08-30  6:09     ` Richard Stallman
  3 siblings, 0 replies; 90+ messages in thread
From: Richard Stallman @ 2002-08-30  6:09 UTC (permalink / raw)
  Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel

    Hmmm, I think I realized the situation of RC.  It can unify
    charsets between iso-8859-X, but utf-8 can't encode
    iso-8859-X (intentionally), correct?

    Richard, is it what you asked Dave to install for RC?

I can't remember after this much time has gone by.
Chances are I never knew about this specific issue
and that I did not say anything to him about it one way or another,
but I can't remember.

If you can make this case work with a clean and safe change,
please do.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-19  7:48 Kenichi Handa
  2002-08-22 17:08 ` Dave Love
@ 2002-08-24 12:11 ` Richard Stallman
  2002-08-26 13:17   ` Kenichi Handa
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Stallman @ 2002-08-24 12:11 UTC (permalink / raw)
  Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel

    I'm quite confused with the current status of utf-8.el,
    ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and
    RC.

Do you understand the situation in HEAD?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-24 12:11 ` Richard Stallman
@ 2002-08-26 13:17   ` Kenichi Handa
  2002-08-26 16:15     ` Stefan Monnier
  2002-08-29 23:19     ` Dave Love
  0 siblings, 2 replies; 90+ messages in thread
From: Kenichi Handa @ 2002-08-26 13:17 UTC (permalink / raw)
  Cc: d.love, monnier+gnu/emacs, keichwa, emacs-devel

In article <200208241211.g7OCBW111768@wijiji.santafe.edu>, Richard Stallman <rms@gnu.org> writes:
>     I'm quite confused with the current status of utf-8.el,
>     ucs-tables.el, utf-16.el, utf-8-subst.el, etc in HEAD and
>     RC.

> Do you understand the situation in HEAD?

I don't understand what exactly do you mean by "situation".

I don't know if they are the same as what Dave currently
has.

I understand how each functions and variables are supposed
to work.  And, I know that those codes doesn't do definitely
wrong thing by reading through the codes briefly.

But, I have not checked if they surely works as
expected.  I believe Dave has done it.

And, I don't understand why those many functions/variables
are designed as the current way.  For instance,

(1) Why does loadup.el has this code:
	(ucs-unify-8859 'encode-only)
instead of:
	(unify-8859-on-encoding-mode 1)

(2) Why doesn't utf-8-subst.el provide mappings of
    non-Chinese characters for ksc, gb, and jisx charsets?
    The document of utf-8-translate-cjk says as below:
----------------------------------------------------------------------
Whether the `mule-utf-8' coding system should encode many CJK characters.

Enabling this loads tables which enable the coding system to encode
characters in the charsets `korean-ksc5601', `chinese-gb2312' and
`japanese-jisx0208', and to decode the corresponding unicodes into
...
----------------------------------------------------------------------
but, currently only Chinese characters in those charsets are
handled.

(3) Why is utf-8-translate-cjk a variable, not a minor-mode
    like unify-8859-on-(de/en)coding-mode?  Or, why the
    latter is not a simple variable?   By the way, it seems
    that once we customize utf-8-translate-cjk to t,
    customize it back to nil doesn't cancel the translation.

(4) It seems that the variable name
    utf-8-fragment-on-decoding is not appropriate because it
    is used also in utf-18.el.  Perhaps,
    ucs-fragment-on-decoding is better.

(5) It seems that mule-utf-16 can handle the same range of
    characters as mule-utf-8, but `safe-charsets' property
    doesn't contain, for instance, `latin-iso8895-2'.
    Perhaps, this is simply a bug to be fixed easily.

---
Ken'ichi HANDA
handa@etl.go.jp

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-26 13:17   ` Kenichi Handa
@ 2002-08-26 16:15     ` Stefan Monnier
  2002-08-29 23:18       ` Dave Love
  2002-08-29 23:19     ` Dave Love
  1 sibling, 1 reply; 90+ messages in thread
From: Stefan Monnier @ 2002-08-26 16:15 UTC (permalink / raw)
  Cc: rms, d.love, monnier+gnu/emacs, keichwa, emacs-devel

> (1) Why does loadup.el has this code:
> 	(ucs-unify-8859 'encode-only)
> instead of:
> 	(unify-8859-on-encoding-mode 1)

It might have been my "fault".  I think it's because I expect(ed)
unify-8859-on-encoding-mode to disappear (because there's no benefit
in turning it off, except for working around some bugs maybe).


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-26 16:15     ` Stefan Monnier
@ 2002-08-29 23:18       ` Dave Love
  2002-08-30 14:36         ` Stefan Monnier
  0 siblings, 1 reply; 90+ messages in thread
From: Dave Love @ 2002-08-29 23:18 UTC (permalink / raw)
  Cc: Kenichi Handa, rms, keichwa, emacs-devel

"Stefan Monnier" <monnier+gnu/emacs@rum.cs.yale.edu> writes:

> It might have been my "fault".  I think it's because I expect(ed)
> unify-8859-on-encoding-mode to disappear (because there's no benefit
> in turning it off, except for working around some bugs maybe).

What bugs?

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-29 23:18       ` Dave Love
@ 2002-08-30 14:36         ` Stefan Monnier
  0 siblings, 0 replies; 90+ messages in thread
From: Stefan Monnier @ 2002-08-30 14:36 UTC (permalink / raw)
  Cc: Stefan Monnier, Kenichi Handa, rms, keichwa, emacs-devel

> > It might have been my "fault".  I think it's because I expect(ed)
> > unify-8859-on-encoding-mode to disappear (because there's no benefit
> > in turning it off, except for working around some bugs maybe).
> 
> What bugs?

None that I know of.  I meant the sentence to mean "to be able to turn
it off in case a bug showed up".


	Stefan

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: Several serious problems
  2002-08-26 13:17   ` Kenichi Handa
  2002-08-26 16:15     ` Stefan Monnier
@ 2002-08-29 23:19     ` Dave Love
  1 sibling, 0 replies; 90+ messages in thread
From: Dave Love @ 2002-08-29 23:19 UTC (permalink / raw)
  Cc: rms, monnier+gnu/emacs, keichwa, emacs-devel

Kenichi Handa <handa@etl.go.jp> writes:

> I don't know if they are the same as what Dave currently
> has.

I tried to install all the relevant stuff I had, but for the CVS head,
it's modified versions of what I've actually been using, and is
basically untested.  I wanted someone who was actually using that code
base to install it and test it, but no-one could or would -- I can't
remember, but rms leant on me to install it.

> But, I have not checked if they surely works as
> expected.  I believe Dave has done it.

Only in more-or-less Emacs 21.2.

> And, I don't understand why those many functions/variables
> are designed as the current way.  For instance,
> 
> (1) Why does loadup.el has this code:
> 	(ucs-unify-8859 'encode-only)
> instead of:
> 	(unify-8859-on-encoding-mode 1)

Indeed.  I didn't do that.  The obvious thing to do is to change the
default in the defcustom, if ucs-tables is preloaded.

> (2) Why doesn't utf-8-subst.el provide mappings of
>     non-Chinese characters for ksc, gb, and jisx charsets?
>     The document of utf-8-translate-cjk says as below:
> ----------------------------------------------------------------------
> Whether the `mule-utf-8' coding system should encode many CJK characters.
> 
> Enabling this loads tables which enable the coding system to encode
> characters in the charsets `korean-ksc5601', `chinese-gb2312' and
> `japanese-jisx0208', and to decode the corresponding unicodes into
> ...
> ----------------------------------------------------------------------
> but, currently only Chinese characters in those charsets are
> handled.

I didn't realize that.  It may be coincidence.  What should be
translated is the set of characters

(japanese-jisx0208 ∪ chinese-gb2312 ∪ korean-ksc5601) \ mule-unicode-2500-33ff
                   ^                                  ^
                   union                              set difference

according to the Mule-UCS tables -- I just took the relevant codes
from there above U+33FF.  Perhaps that isn't how it actually is.

It needs someone with an interest in the CJK range to redo that stuff
anyhow; it shouldn't hardwire Japanese as the japanese-jisx0208 as the
preferred set, the sets used should probably be configurable, and it
should allow translating the relevant characters below U+3400.  (I
didn't think much about how best to do that without keeping large
tables on the heap that aren't actually used to do the translation.)

> (3) Why is utf-8-translate-cjk a variable, not a minor-mode
>     like unify-8859-on-(de/en)coding-mode?

I think because it can't be turned off.

>     Or, why the
>     latter is not a simple variable?   By the way, it seems
>     that once we customize utf-8-translate-cjk to t,
>     customize it back to nil doesn't cancel the translation.
> 
> (4) It seems that the variable name
>     utf-8-fragment-on-decoding is not appropriate because it
>     is used also in utf-18.el.  Perhaps,
>     ucs-fragment-on-decoding is better.

Probably.  It was defined before I wrote utf-16.el.  Much of that
stuff would have been written differently for installation in 21.1,
but it was done during the campaign against anything Unicode-based, so
that users could have it in Emacs 21.2 as conveniently as possible.

> (5) It seems that mule-utf-16 can handle the same range of
>     characters as mule-utf-8, but `safe-charsets' property
>     doesn't contain, for instance, `latin-iso8895-2'.
>     Perhaps, this is simply a bug to be fixed easily.

Yes.  The coding system needs to register the relevant translation
table(s) for safe-chars, that would have to be updated in sync with
any changes.  I don't know why that didn't get done.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2002-09-26  4:51 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-22 17:11 Several serious problems Richard Stallman
2002-07-22 19:01 ` Andre Spiegel
2002-07-22 19:03 ` Andre Spiegel
2002-07-23  4:00   ` Richard Stallman
2002-07-22 19:03 ` Andreas Schwab
2002-07-23 18:58   ` Richard Stallman
2002-07-22 19:11 ` Andre Spiegel
2002-07-23  4:42 ` Karl Eichwalder
2002-07-24  3:25   ` Richard Stallman
2002-07-24  4:43     ` Karl Eichwalder
2002-07-25  3:12       ` Richard Stallman
2002-07-25  3:24         ` Karl Eichwalder
2002-07-26 15:35           ` Richard Stallman
2002-07-27  3:19             ` Karl Eichwalder
2002-07-29  1:12               ` Richard Stallman
2002-07-29 14:32                 ` Karl Eichwalder
2002-07-30  1:00                   ` Richard Stallman
2002-08-09  7:42               ` Stefan Monnier
2002-08-09 16:08                 ` Karl Eichwalder
2002-08-10 17:16                 ` Richard Stallman
2002-08-12 16:20                   ` Stefan Monnier
2002-08-13  1:48                     ` Richard Stallman
2002-08-15  2:30                       ` Karl Eichwalder
2002-08-15  2:47                         ` Stefan Monnier
2002-08-15  5:31                           ` Karl Eichwalder
2002-08-15 15:30                             ` Stefan Monnier
2002-08-15 17:33                               ` Dave Love
2002-07-23 13:35 ` Kenichi Handa
2002-07-23 13:52   ` Alan Shutko
2002-07-24  3:25     ` Richard Stallman
2002-07-24  3:25   ` Richard Stallman
2002-07-24  4:37     ` Kenichi Handa
2002-07-25  3:12       ` Richard Stallman
2002-07-25  5:53         ` Miles Bader
2002-07-26 14:29         ` Francesco Potorti`
2002-07-27 18:52           ` Richard Stallman
2002-08-09  7:43           ` Stefan Monnier
2002-08-11  1:59         ` unencodable-char-position [Re: Several serious problems] Kenichi Handa
2002-08-12 17:06           ` Richard Stallman
2002-08-12 17:15             ` Stefan Monnier
2002-08-13  0:37               ` Kenichi Handa
2002-08-13 22:47                 ` Richard Stallman
2002-08-14  0:20                   ` Kenichi Handa
2002-08-14 23:13                     ` Richard Stallman
2002-08-15 17:51           ` Dave Love
2002-08-19  5:04             ` Kenichi Handa
2002-08-29 22:52               ` Dave Love
2002-08-30  6:53                 ` Andre Spiegel
2002-08-09  7:44       ` Several serious problems Stefan Monnier
2002-08-10 17:16         ` Richard Stallman
2002-08-12  0:26         ` Kenichi Handa
2002-08-09  4:41 ` Stefan Monnier
2002-08-15 17:23   ` Dave Love
  -- strict thread matches above, loose matches on Subject: below --
2002-08-19  7:48 Kenichi Handa
2002-08-22 17:08 ` Dave Love
2002-08-29 13:25   ` Kenichi Handa
2002-08-29 17:32     ` Stefan Monnier
2002-08-29 23:15       ` Dave Love
2002-08-30 14:36         ` Stefan Monnier
2002-09-04 17:23           ` Dave Love
2002-08-30  6:09       ` Richard Stallman
2002-08-31 17:30         ` Dave Love
2002-09-02  0:01           ` Richard Stallman
2002-09-04 17:15             ` Dave Love
2002-09-08 12:54               ` Richard Stallman
2002-09-12 22:38                 ` Dave Love
2002-09-13 19:34                   ` Richard Stallman
2002-08-29 23:09     ` Dave Love
2002-08-30  6:11       ` Richard Stallman
2002-09-04 17:21         ` Dave Love
2002-08-29 23:17     ` Dave Love
2002-08-30  6:11       ` Richard Stallman
2002-08-31 17:31         ` Dave Love
2002-09-02  0:01           ` Richard Stallman
2002-09-02  1:28             ` Kenichi Handa
2002-09-05 13:41               ` Dave Love
2002-09-05 23:32                 ` Kenichi Handa
2002-09-06 11:38                   ` Robert J. Chassell
2002-09-07 23:19                   ` Dave Love
2002-09-09  0:21                     ` Richard Stallman
2002-09-12 22:43                       ` Dave Love
2002-09-26  4:51                     ` Kenichi Handa
2002-09-10 16:36               ` Richard Stallman
2002-08-30  6:09     ` Richard Stallman
2002-08-24 12:11 ` Richard Stallman
2002-08-26 13:17   ` Kenichi Handa
2002-08-26 16:15     ` Stefan Monnier
2002-08-29 23:18       ` Dave Love
2002-08-30 14:36         ` Stefan Monnier
2002-08-29 23:19     ` Dave Love

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).