8859 unification and Emacs' ChangeLog files

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* 8859 unification and Emacs' ChangeLog files
@ 2003-03-29 17:19 Kai Großjohann
  2003-03-31  0:45 ` Kenichi Handa
  0 siblings, 1 reply; 7+ messages in thread
From: Kai Großjohann @ 2003-03-29 17:19 UTC (permalink / raw)
  Cc: Simon Krahnke

I've learned the hard way that unify-8859-on-decoding-mode mangles
Emacs' ChangeLog files.  Now Simon (see Cc) has shown me that it is
possible to turn unification off for certain encodings.

Do you think it might be good to turn it off for iso-2022-7bit?
Maybe that would make it possible to turn unify-8859-on-decoding-mode
on by default?

Simon's Lisp was: (coding-system-put 'iso-2022-7bit
'translation-table-for-decode (make-translation-table))
-- 
A preposition is not a good thing to end a sentence with.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 8859 unification and Emacs' ChangeLog files
  2003-03-29 17:19 8859 unification and Emacs' ChangeLog files Kai Großjohann
@ 2003-03-31  0:45 ` Kenichi Handa
  2003-04-03 16:50   ` Kai Großjohann
  0 siblings, 1 reply; 7+ messages in thread
From: Kenichi Handa @ 2003-03-31  0:45 UTC (permalink / raw)
  Cc: emacs-devel

In article <848yuy3xab.fsf@lucy.is.informatik.uni-duisburg.de>, kai.grossjohann@gmx.net (Kai Großjohann) writes:
> I've learned the hard way that unify-8859-on-decoding-mode mangles
> Emacs' ChangeLog files.  Now Simon (see Cc) has shown me that it is
> possible to turn unification off for certain encodings.

> Do you think it might be good to turn it off for iso-2022-7bit?
[...]
> Simon's Lisp was: (coding-system-put 'iso-2022-7bit
> 'translation-table-for-decode (make-translation-table))

Yes, it works because it overrides the translation table
created by unify-8859-on-decoding-mode.

But, it results in that a Latin-2 char read by iso-2022-7bit
is different from what read by iso-latin-2.  I don't think
such a change is a good idea.

> Maybe that would make it possible to turn unify-8859-on-decoding-mode
> on by default?

It will stop unibyte<->multibyte automatic conversion in any
single byte lang. env. (e.g. Latin-X, Greek) except for
Latin-1.

I think unify-8859-on-decoding-mode is still only for those
people who knows the consequence of the command well.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 8859 unification and Emacs' ChangeLog files
  2003-03-31  0:45 ` Kenichi Handa
@ 2003-04-03 16:50   ` Kai Großjohann
  2003-04-03 18:38     ` Stefan Monnier
  2003-04-03 20:38     ` Jason Rumney
  0 siblings, 2 replies; 7+ messages in thread
From: Kai Großjohann @ 2003-04-03 16:50 UTC (permalink / raw)

Kenichi Handa <handa@m17n.org> writes:

> In article <848yuy3xab.fsf@lucy.is.informatik.uni-duisburg.de>, kai.grossjohann@gmx.net (Kai Großjohann) writes:
>> I've learned the hard way that unify-8859-on-decoding-mode mangles
>> Emacs' ChangeLog files.  Now Simon (see Cc) has shown me that it is
>> possible to turn unification off for certain encodings.
>
>> Do you think it might be good to turn it off for iso-2022-7bit?
> [...]
>> Simon's Lisp was: (coding-system-put 'iso-2022-7bit
>> 'translation-table-for-decode (make-translation-table))
>
> Yes, it works because it overrides the translation table
> created by unify-8859-on-decoding-mode.
>
> But, it results in that a Latin-2 char read by iso-2022-7bit
> is different from what read by iso-latin-2.  I don't think
> such a change is a good idea.

Well, without unification, a Latin-2 character read from a Latin-1
file is different from the same character read from a Latin-2 file.

>> Maybe that would make it possible to turn unify-8859-on-decoding-mode
>> on by default?
>
> It will stop unibyte<->multibyte automatic conversion in any
> single byte lang. env. (e.g. Latin-X, Greek) except for
> Latin-1.

What do you mean by "it"?  Do you mean unify-8859-on-decoding-mode
generally, or do you mean unify-8859-on-decoding-mode with Simon's
add-on?

> I think unify-8859-on-decoding-mode is still only for those
> people who knows the consequence of the command well.

I know that unify-8859-on-decoding-mode is harmful for Emacs'
ChangeLog files, because they distinguish between the various
iso-8859 charsets.

But if all Emacs developers enable unify-8859-on-decoding-mode, then
this is not a problem anymore.  Or the ChangeLog files could be
stored as UTF-8.

Of course, there might be other files where iso-8859 characters might
be present in different encodings.  Is it right to say that these
files must be encoded in iso-2022 or in emacs-mule, because no other
encodings distinguish between Latin-1 ä and Latin-2 ä, say?

If I understand correctly, Emacs 22 will automatically unify all
iso-8859 charsets, so the same ChangeLog problem will occur there,
right?  Or is Emacs 22 going to enable distinguishing between Latin-1
ä and Latin-2 ä for iso-2022 files somehow?

It seems that Europeans are really interested in more 8859
unification.  Now that unify-8859-on-encoding-mode is on by default,
they are somewhat happier.  But even with this enabled, people in a
Latin-9 locale can't search UTF-8 files well: when they hit the ä key,
they are searching for a Latin-9 ä whereas there is a Latin-1 ä in the
buffer.  Or take me, personally: I use the german-prefix input method
which produces Latin-1 characters, but I'm running in a Latin-9 locale
so that the buffers contain the `wrong' characters.

At the moment, I tell them that I don't turn on
unify-8859-on-decoding-mode because then I can't edit the Emacs
ChangeLog files anymore.

If any of the previous message sounds weird, that might be because I
fail to fully grok the problem.  I apologize.  I'm looking at it from
a European point of view, so I might miss important points.
-- 
A preposition is not a good thing to end a sentence with.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 8859 unification and Emacs' ChangeLog files
  2003-04-03 16:50   ` Kai Großjohann
@ 2003-04-03 18:38     ` Stefan Monnier
  2003-04-03 20:38     ` Jason Rumney
  1 sibling, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 2003-04-03 18:38 UTC (permalink / raw)
  Cc: emacs-devel

> I know that unify-8859-on-decoding-mode is harmful for Emacs'
> ChangeLog files, because they distinguish between the various
> iso-8859 charsets.

I know it changes the byte-representation of locations that you didn't
intend to change, but in what way is it harmful, really ?

> But if all Emacs developers enable unify-8859-on-decoding-mode, then
> this is not a problem anymore.  Or the ChangeLog files could be
> stored as UTF-8.

If the UTF-8 support is good enough, we should do that.


	Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 8859 unification and Emacs' ChangeLog files
  2003-04-03 16:50   ` Kai Großjohann
  2003-04-03 18:38     ` Stefan Monnier
@ 2003-04-03 20:38     ` Jason Rumney
  2003-04-03 20:52       ` Jason Rumney
  2003-04-04 12:58       ` Kai Großjohann
  1 sibling, 2 replies; 7+ messages in thread
From: Jason Rumney @ 2003-04-03 20:38 UTC (permalink / raw)

kai.grossjohann@gmx.net (Kai Großjohann) writes:

> But if all Emacs developers enable unify-8859-on-decoding-mode, then
> this is not a problem anymore.

It is still a problem, because some users will end up saving all
Latin characters unified to Latin-1 while others will save all
unified to Latin-2 etc. So there will be a lot of unnecessary changed
lines in each checkin.

>  Or the ChangeLog files could be stored as UTF-8.

This could certainly work, if we think UTF-8 support in Emacs now is
good enough to handle it reliably.

> Of course, there might be other files where iso-8859 characters might
> be present in different encodings.  Is it right to say that these
> files must be encoded in iso-2022 or in emacs-mule, because no other
> encodings distinguish between Latin-1 ä and Latin-2 ä, say?

Yes, many of the files in leim, lisp/international and lisp/language
need to make the distinction between different Latin charsets.

> If I understand correctly, Emacs 22 will automatically unify all
> iso-8859 charsets, so the same ChangeLog problem will occur there,
> right?  Or is Emacs 22 going to enable distinguishing between Latin-1
> ä and Latin-2 ä for iso-2022 files somehow?

I think the future plan is to support keeping the distinction
(somehow) even though the internal representation is unicode.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 8859 unification and Emacs' ChangeLog files
  2003-04-03 20:38     ` Jason Rumney
@ 2003-04-03 20:52       ` Jason Rumney
  2003-04-04 12:58       ` Kai Großjohann
  1 sibling, 0 replies; 7+ messages in thread
From: Jason Rumney @ 2003-04-03 20:52 UTC (permalink / raw)

Jason Rumney <jasonr@gnu.org> writes:

> > Of course, there might be other files where iso-8859 characters might
> > be present in different encodings.  Is it right to say that these
> > files must be encoded in iso-2022 or in emacs-mule, because no other
> > encodings distinguish between Latin-1 ä and Latin-2 ä, say?
> 
> Yes, many of the files in leim, lisp/international and lisp/language
> need to make the distinction between different Latin charsets.

I think this points to a need for an easy way to turn off unification
for a file, perhaps with an extension to the "coding:" cookie, or some
similar mechanism (at worst, use the general local-variable
mechanism, but this seems like it might justify something special
that does not give the user the option to ignore it).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 8859 unification and Emacs' ChangeLog files
  2003-04-03 20:38     ` Jason Rumney
  2003-04-03 20:52       ` Jason Rumney
@ 2003-04-04 12:58       ` Kai Großjohann
  1 sibling, 0 replies; 7+ messages in thread
From: Kai Großjohann @ 2003-04-04 12:58 UTC (permalink / raw)


Jason Rumney <jasonr@gnu.org> writes:

> kai.grossjohann@gmx.net (Kai Großjohann) writes:
>
>> But if all Emacs developers enable unify-8859-on-decoding-mode, then
>> this is not a problem anymore.
>
> It is still a problem, because some users will end up saving all
> Latin characters unified to Latin-1 while others will save all
> unified to Latin-2 etc. So there will be a lot of unnecessary changed
> lines in each checkin.

Oh, I didn't know that the preferred encoding is influenced by the
language environment.  Hm.

>>  Or the ChangeLog files could be stored as UTF-8.
>
> This could certainly work, if we think UTF-8 support in Emacs now is
> good enough to handle it reliably.

OK.

>> Of course, there might be other files where iso-8859 characters might
>> be present in different encodings.  Is it right to say that these
>> files must be encoded in iso-2022 or in emacs-mule, because no other
>> encodings distinguish between Latin-1 ä and Latin-2 ä, say?
>
> Yes, many of the files in leim, lisp/international and lisp/language
> need to make the distinction between different Latin charsets.

Really?  Why?  If *all* of Emacs treats all `ä's as the same, why do
these files then need to make the distinction?

But I guess that even with unify-8859-on-decoding-mode, maybe Emacs
does not treat all `ä's as the same.  Hm.
-- 
A preposition is not a good thing to end a sentence with.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-04-04 12:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-29 17:19 8859 unification and Emacs' ChangeLog files Kai Großjohann
2003-03-31  0:45 ` Kenichi Handa
2003-04-03 16:50   ` Kai Großjohann
2003-04-03 18:38     ` Stefan Monnier
2003-04-03 20:38     ` Jason Rumney
2003-04-03 20:52       ` Jason Rumney
2003-04-04 12:58       ` Kai Großjohann

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).