unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
@ 2011-02-02 10:35 Emmanuel Bigler
  2011-02-02 19:47 ` Eli Zaretskii
  2011-02-02 19:53 ` Stefan Monnier
  0 siblings, 2 replies; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-02 10:35 UTC (permalink / raw)
  To: 7962

This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug.  If you can, give
a recipe starting from `emacs -Q':


----------- bug ?? or feature ?? ----------------------

Hello from France !

I have been using emacs for decades and in a previous version emacs 22
the function capitalize word used to work properly with the European
character set ISO-8859-1

For example
capitalize word on éèçàù used to supply Éèçàù
upcase word on éèçàù used to supply ÉÈÇÀÙ

Now under emacs 23.2.1 the same action does not seem to work, only
the ASCII-7-bit set is properly capitalized or up-cased.


-- 
Emmanuel BIGLER
<bigler@ens2m.fr>



If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
     `bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file
/usr/share/emacs/23.2/etc/DEBUG.


In GNU Emacs 23.2.1 (i386-redhat-linux-gnu, GTK+ Version 2.21.4)
  of 2010-07-08 on x86-05.phx2.fedoraproject.org
Windowing system distributor `Fedora Project', version 11.0.10903000
configured using `configure  '--build=i386-redhat-linux-gnu' 
'--host=i386-redhat-linux-gnu' '--program-prefix=' 
'--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' 
'--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' 
'--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib' 
'--libexecdir=/usr/libexec' '--localstatedir=/var' 
'--sharedstatedir=/var/lib' '--mandir=/usr/share/man' 
'--infodir=/usr/share/info' '--with-dbus' '--with-gif' '--with-jpeg' 
'--with-png' '--with-rsvg' '--with-tiff' '--with-xft' '--with-xpm' 
'--with-x-toolkit=gtk' 'build_alias=i386-redhat-linux-gnu' 
'host_alias=i386-redhat-linux-gnu' 'CFLAGS=-DMAIL_USE_LOCKF -O2 -g -pipe 
-Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
--param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom 
-fasynchronous-unwind-tables -fno-optimize-sibling-calls''

Important settings:
   value of $LC_ALL: fr_FR.ISO8859-1
   value of $LC_COLLATE: fr_FR.ISO8859-1
   value of $LC_CTYPE: fr_FR.ISO8859-1
   value of $LC_MESSAGES: nil
   value of $LC_MONETARY: nil
   value of $LC_NUMERIC: nil
   value of $LC_TIME: nil
   value of $LANG: fr_FR.ISO8859-1
   value of $XMODIFIERS: @im=none
   locale-coding-system: iso-latin-1-unix
   default enable-multibyte-characters: nil

Major mode: Text

Minor modes in effect:
   tooltip-mode: t
   mouse-wheel-mode: t
   tool-bar-mode: t
   menu-bar-mode: t
   file-name-shadow-mode: t
   global-font-lock-mode: t
   font-lock-mode: t
   blink-cursor-mode: t
   auto-encryption-mode: t
   auto-compression-mode: t
   column-number-mode: t
   line-number-mode: t
   transient-mark-mode: t
   view-mode: t

Recent input:
C-x b v 8 <return> é è ç à ù <home> <escape> c <down-mouse-1>
<mouse-movement> <mouse-1> C-x d t m p <return> s <C-home>
<down> <down> <down> <down> v <right> <right> <right>
<right> <right> <right> <right> <right> <right> <right>
<right> <right> <right> <right> <right> C-SPC <end>
<escape> w <help-echo> <help-echo> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo>
<help-echo> <menu-bar> <help-menu> <send-emacs-bug
-report>

Recent messages:
Loading /usr/share/emacs/site-lisp/site-start.d/php-mode-init.el 
(source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/preview-latex.el 
(source)...done
Loading /usr/share/emacs/site-lisp/site-start.d/rpm-spec-mode-init.el 
(source)...done
Loading lpr...done
Loading /home/bigler/divers.eb/p/webpdebut.el (source)...done
Loading /home/bigler/.emacs_last_dir...done
For information about GNU Emacs and the GNU system, type C-h C-a.
Mark set
View mode: type <help> for help, h for commands, q to quit.
Mark set

Load-path shadows:
~/lisp/font-latex hides /usr/share/emacs/site-lisp/auctex/font-latex
~/emacs/lisp/bookmode hides /usr/share/emacs/site-lisp/maxima/bookmode
~/emacs/lisp/sshell hides /usr/share/emacs/site-lisp/maxima/sshell
/usr/share/emacs/site-lisp/site-start.d/maxima-modes hides 
/usr/share/emacs/site-lisp/maxima/site_start.d/maxima-modes

Features:
(shadow sort mail-extr message idna ecomplete rfc822 mml mml-sec
password-cache mm-decode mm-bodies mm-encode mailcap mail-parse rfc2231
rfc2047 rfc2045 qp ietf-drums mailabbrev nnheader gnus-util netrc
time-date mm-util mail-prsvr gmm-utils mailheader canlock sha1 hex-util
hashcash mail-utils warnings emacsbug mule-util latin1-disp latex
easy-mmode edmacro kmacro tex-style tex advice help-fns advice-preload
dired ispell sendmail ange-ftp regexp-opt comint ring ps-print ps-def
lpr info view server cus-edit easymenu wid-edit cus-start cus-load
preview-latex tex-site auto-loads disp-table tooltip ediff-hook vc-hooks
lisp-float-type mwheel x-win x-dnd font-setting tool-bar dnd fontset
image fringe lisp-mode register page menu-bar rfn-eshadow timer select
scroll-bar mldrag mouse jit-lock font-lock syntax facemenu font-core
frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help
simple abbrev loaddefs button minibuffer faces cus-face files
text-properties overlay md5 base64 format env code-pages mule custom
widget hashtable-print-readable backquote make-network-process dbusbind
system-font-setting font-render-setting gtk x-toolkit x multi-tty emacs)





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-02 10:35 bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
@ 2011-02-02 19:47 ` Eli Zaretskii
  2011-02-02 19:53 ` Stefan Monnier
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-02 19:47 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

> Date: Wed, 02 Feb 2011 11:35:52 +0100
> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
> Cc: 
> 
> I have been using emacs for decades and in a previous version emacs 22
> the function capitalize word used to work properly with the European
> character set ISO-8859-1
> 
> For example
> capitalize word on éèçàù used to supply Éèçàù
> upcase word on éèçàù used to supply ÉÈÇÀÙ
> 
> Now under emacs 23.2.1 the same action does not seem to work, only
> the ASCII-7-bit set is properly capitalized or up-cased.

Works fine for me in the latest pretest of Emacs 23.3.  Can you try
that?  You can find it on ftp://alpha.gnu.org/gnu/emacs/pretest/.






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-02 10:35 bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
  2011-02-02 19:47 ` Eli Zaretskii
@ 2011-02-02 19:53 ` Stefan Monnier
  2011-02-03 13:34   ` Emmanuel Bigler
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2011-02-02 19:53 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

> Hello from France !

Hello from New France,

> I have been using emacs for decades and in a previous version emacs 22
> the function capitalize word used to work properly with the European
> character set ISO-8859-1

You have:

  default enable-multibyte-characters: nil

which is deprecated and means that Emacs should treat your files as
containing bytes rather than chars, so it can't do a good job putting
those bytes in upper or lower case since it doesn't know what they're
meant to represent.
  
I.e. please fix your .emacs so as not to set the default
enable-multibyte-characters to nil.


        Stefan





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-02 19:53 ` Stefan Monnier
@ 2011-02-03 13:34   ` Emmanuel Bigler
  2011-02-03 17:56     ` Eli Zaretskii
                       ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-03 13:34 UTC (permalink / raw)
  To: Stefan Monnier, 7962


Le 02/02/2011 20:53, Stefan Monnier a écrit :

> Hello from New France,
>
>> I have been using emacs for decades and in a previous version emacs 22
>> the function capitalize word used to work properly with the European
>> character set ISO-8859-1

> You have:
>
>    default enable-multibyte-characters: nil

> which is deprecated and means that Emacs should treat your files as
> containing bytes rather than chars, so it can't do a good job putting
> those bytes in upper or lower case since it doesn't know what they're
> meant to represent.

> I.e. please fix your .emacs so as not to set the default
> enable-multibyte-characters to nil.
>
>
>          Stefan
>


Thanks !

Using (toggle-enable-multibyte-characters nil) solves the problem but
brings in another issue, namely the compatibilityu of all my previous
documents written in unibyte ISO-8859-1. ... but this is another issue.

--
Emmanuel Bigler





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 13:34   ` Emmanuel Bigler
@ 2011-02-03 17:56     ` Eli Zaretskii
  2011-02-03 18:01     ` Emmanuel Bigler
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-03 17:56 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

> Date: Thu, 03 Feb 2011 14:34:31 +0100
> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
> Cc: 
> 
> Using (toggle-enable-multibyte-characters nil) solves the problem but
> brings in another issue, namely the compatibilityu of all my previous
> documents written in unibyte ISO-8859-1.

What compatibility problems are those?  Files on disk encoded in
ISO-8859-1 are always unibyte.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 13:34   ` Emmanuel Bigler
  2011-02-03 17:56     ` Eli Zaretskii
@ 2011-02-03 18:01     ` Emmanuel Bigler
  2011-02-03 18:21     ` Lennart Borgman
  2011-02-03 18:49     ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Stefan Monnier
  3 siblings, 0 replies; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-03 18:01 UTC (permalink / raw)
  To: 7962, eliz



>> Now under emacs 23.2.1 the same action does not seem to work, only
> the ASCII-7-bit set is properly capitalized or up-cased.

From: Eli Zaretskii <eliz <at> gnu.org>
>Works fine for me in the latest pretest of Emacs 23.3.  Can you try
>that?  You can find it on ftp://alpha.gnu.org/gnu/emacs/pretest/.

Thanks, I tried 23.2.93.1 freshly compiled and it behaves like 23.2.1
i.e. I had a buffer with unibyte only. capitalizing an unibyte word 
inside an obsolete unibyte buffer with 1-byte ISO-8859-1 characters no 
longer works.

When I (toggle-enable-multibyte-characters nil), capitalize-word works
as it should, but works only, as far as I've tried, on a freshly typed
sequence of UTF-8 2-byte characters.

So I'll live with the new features ! No problem !

Thanks for your help !
--
Emmanuel





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 13:34   ` Emmanuel Bigler
  2011-02-03 17:56     ` Eli Zaretskii
  2011-02-03 18:01     ` Emmanuel Bigler
@ 2011-02-03 18:21     ` Lennart Borgman
  2011-02-03 18:49       ` Emmanuel Bigler
  2011-02-03 18:49     ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Stefan Monnier
  3 siblings, 1 reply; 40+ messages in thread
From: Lennart Borgman @ 2011-02-03 18:21 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

On Thu, Feb 3, 2011 at 2:34 PM, Emmanuel Bigler
<Emmanuel.Bigler@ens2m.fr> wrote:
>
> Using (toggle-enable-multibyte-characters nil) solves the problem but
> brings in another issue, namely the compatibilityu of all my previous
> documents written in unibyte ISO-8859-1. ... but this is another issue.

That sounds like a very important issue. Are you filing a bug report for that?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 18:21     ` Lennart Borgman
@ 2011-02-03 18:49       ` Emmanuel Bigler
  2011-02-03 21:19         ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-03 18:49 UTC (permalink / raw)
  To: Lennart Borgman, Stefan Monnier, 7962

Le 03/02/2011 19:21, Lennart Borgman a écrit :
> On Thu, Feb 3, 2011 at 2:34 PM, Emmanuel Bigler
> <Emmanuel.Bigler@ens2m.fr>  wrote:
>>
>> Using (toggle-enable-multibyte-characters nil) solves the problem but
>> brings in another issue, namely the compatibilityu of all my previous
>> documents written in unibyte ISO-8859-1. ... but this is another issue.
>
> That sounds like a very important issue. Are you filing a bug report for that?
>


Well, I do not know.

Let's describe the situation differently.

I have a legacy of documents typeset in unibyte and coded as
ISO-8859-1 or iso-8859-15 (which is a minor variant, e.g. the euro
sign...)

Taking into account that I only typeset documents requiring the
iso-8859-1 (or -15) unibyte coding system, I have absolutely no use of
UTF-8 and 2-byte coding system. But I accept that UTF-8 and 2-byte is
the future and that 1-byte iso-8859 is obsolete.

So far using emacs up to emacs 22 there was no issue at all. With
emacs 23 I am not sure that I understand what is going on, this is the
reason why I posted this "bug-or-feature" report. As far as I have
tested, a previously typeset document in unibyte iso-8859-1 is
perfectly handled, modified, saved by the latest emacs 23 and the
document seems to remain "classical-unibyte". So at a first glance
there is no compatibily issue so far in emacs 23.

Except for the only loss, to my opinion, is that the functions dealing
with upcasing or downcasing no longer work on unibyte words. This is
not a big issue. I can enter unibyte capital letters with my keyboard.
This did not change so I'm happy. As a old emacs-man, I simply miss
the comfort of capitalize-word on letters with diacritic signs in
unibyte ;-) Who cares ? ;-);-)
Most French people even do not know and do not care for the fact that
the classical typesetting rules in French state that the first letter
of a word at the beginning of a sentence should be capitalized EVEN if
is bears a diacritic sign, e.g. :

« À partir d'emacs 23, la fonction capitalize-word ne marche plus sur
les mots codés unibyte iso-8859-1 ... » ;-)

However I am reluctant to switch to UTF-8 for all my texts in English,
German, French, even Icelandic etc... i.e. Western European languages,
the only ones I'll ever typeset, which are perfectly handled in
unibyte iso-8859-1. This is another issue and as far as I have
understood, emacs 23 continues to handle my iso-8859 unibyte files as
usual so I do not complain.

Thanks again for your help !

--
Emmanuel





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 13:34   ` Emmanuel Bigler
                       ` (2 preceding siblings ...)
  2011-02-03 18:21     ` Lennart Borgman
@ 2011-02-03 18:49     ` Stefan Monnier
  3 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2011-02-03 18:49 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

> Using (toggle-enable-multibyte-characters nil) solves the problem but

It doesn't quite fix it, really: it only changes
enable-multibyte-characters in the current buffer, whereas the right fix
is to change the default (aka global) value to t.

Note that this global value is t by default, so something in your
configuration must change it to nil, and it's that thing that you need
to eliminate.  Maybe it's an EMACS_UNIBYTE environment variable, or
something in your .emacs.  Probably a left over from when you switched
to Emacs-20 where the multibyte code (i.e. "that Mule stuff") was still
suffering from various problems.


        Stefan





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 18:49       ` Emmanuel Bigler
@ 2011-02-03 21:19         ` Eli Zaretskii
  2011-02-03 21:39           ` Lennart Borgman
  2011-02-04  8:00           ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-03 21:19 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

> Date: Thu, 03 Feb 2011 19:49:22 +0100
> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
> Cc: 
> 
> However I am reluctant to switch to UTF-8 for all my texts in English,
> German, French, even Icelandic etc... i.e. Western European languages,
> the only ones I'll ever typeset, which are perfectly handled in
> unibyte iso-8859-1.

You seem to assume that going multibyte requires that your files be
encoded in UTF-8.  That simply isn't true.  The files can continue be
encoded in ISO-8859-1 or ISO-8859-15; Emacs will decode them into its
internal representation when you visit the file and encode it back
when you save it.  You lose nothing.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 21:19         ` Eli Zaretskii
@ 2011-02-03 21:39           ` Lennart Borgman
  2011-02-03 22:17             ` Sven Joachim
  2011-02-04  8:00           ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
  1 sibling, 1 reply; 40+ messages in thread
From: Lennart Borgman @ 2011-02-03 21:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emmanuel Bigler, 7962

On Thu, Feb 3, 2011 at 10:19 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Thu, 03 Feb 2011 19:49:22 +0100
>> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
>> Cc:
>>
>> However I am reluctant to switch to UTF-8 for all my texts in English,
>> German, French, even Icelandic etc... i.e. Western European languages,
>> the only ones I'll ever typeset, which are perfectly handled in
>> unibyte iso-8859-1.
>
> You seem to assume that going multibyte requires that your files be
> encoded in UTF-8.  That simply isn't true.  The files can continue be
> encoded in ISO-8859-1 or ISO-8859-15; Emacs will decode them into its
> internal representation when you visit the file and encode it back
> when you save it.  You lose nothing.


But does that require that you mark those files as coded in ISO-8859-1 etc?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 21:39           ` Lennart Borgman
@ 2011-02-03 22:17             ` Sven Joachim
  2011-02-03 22:22               ` Lennart Borgman
  0 siblings, 1 reply; 40+ messages in thread
From: Sven Joachim @ 2011-02-03 22:17 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962

On 2011-02-03 22:39 +0100, Lennart Borgman wrote:

> On Thu, Feb 3, 2011 at 10:19 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>>> Date: Thu, 03 Feb 2011 19:49:22 +0100
>>> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
>>> Cc:
>>>
>>> However I am reluctant to switch to UTF-8 for all my texts in English,
>>> German, French, even Icelandic etc... i.e. Western European languages,
>>> the only ones I'll ever typeset, which are perfectly handled in
>>> unibyte iso-8859-1.
>>
>> You seem to assume that going multibyte requires that your files be
>> encoded in UTF-8.  That simply isn't true.  The files can continue be
>> encoded in ISO-8859-1 or ISO-8859-15; Emacs will decode them into its
>> internal representation when you visit the file and encode it back
>> when you save it.  You lose nothing.
>
> But does that require that you mark those files as coded in ISO-8859-1 etc?

For reliability, yes.  Otherwise Emacs cannot really know the encoding
and has to guess if a byte with value 164 means ¤ (ISO-8859-1) or €
(ISO-8859-15), for instance.  Converting the files to UTF-8 solves this
ambiguity, of course.

Sven





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 22:17             ` Sven Joachim
@ 2011-02-03 22:22               ` Lennart Borgman
  2011-02-03 22:35                 ` Sven Joachim
  2011-02-04 10:44                 ` Andreas Schwab
  0 siblings, 2 replies; 40+ messages in thread
From: Lennart Borgman @ 2011-02-03 22:22 UTC (permalink / raw)
  To: Sven Joachim; +Cc: Emmanuel Bigler, 7962

On Thu, Feb 3, 2011 at 11:17 PM, Sven Joachim <svenjoac@gmx.de> wrote:
>>>>
>>>> However I am reluctant to switch to UTF-8 for all my texts in English,
>>>> German, French, even Icelandic etc... i.e. Western European languages,
>>>> the only ones I'll ever typeset, which are perfectly handled in
>>>> unibyte iso-8859-1.
>>>
>>> You seem to assume that going multibyte requires that your files be
>>> encoded in UTF-8.  That simply isn't true.  The files can continue be
>>> encoded in ISO-8859-1 or ISO-8859-15; Emacs will decode them into its
>>> internal representation when you visit the file and encode it back
>>> when you save it.  You lose nothing.
>>
>> But does that require that you mark those files as coded in ISO-8859-1 etc?
>
> For reliability, yes.  Otherwise Emacs cannot really know the encoding
> and has to guess if a byte with value 164 means ¤ (ISO-8859-1) or €
> (ISO-8859-15), for instance.  Converting the files to UTF-8 solves this
> ambiguity, of course.


Is it sure that Emacs in the future will always do the same guess
about encoding for files that have no explicit coding in them? If not
then perhaps it would be good to add a function to emacs that helps
users add comments about explicit coding to a bunch of files.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 22:22               ` Lennart Borgman
@ 2011-02-03 22:35                 ` Sven Joachim
  2011-02-04 10:44                 ` Andreas Schwab
  1 sibling, 0 replies; 40+ messages in thread
From: Sven Joachim @ 2011-02-03 22:35 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962

On 2011-02-03 23:22 +0100, Lennart Borgman wrote:

> On Thu, Feb 3, 2011 at 11:17 PM, Sven Joachim <svenjoac@gmx.de> wrote:
>>
>> For reliability, yes.  Otherwise Emacs cannot really know the encoding
>> and has to guess if a byte with value 164 means ¤ (ISO-8859-1) or €
>> (ISO-8859-15), for instance.  Converting the files to UTF-8 solves this
>> ambiguity, of course.
>
>
> Is it sure that Emacs in the future will always do the same guess
> about encoding for files that have no explicit coding in them?

This is not even the case today, depending on your locale environment
Emacs will guess differently for ¤/€.

> If not then perhaps it would be good to add a function to emacs that
> helps users add comments about explicit coding to a bunch of files.

IMO offering to convert them to UTF-8 would be more helpful.  The legacy
encodings ought to die some day.

Cheers,
       Sven





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 21:19         ` Eli Zaretskii
  2011-02-03 21:39           ` Lennart Borgman
@ 2011-02-04  8:00           ` Emmanuel Bigler
  2011-02-04  8:09             ` Emmanuel Bigler
  1 sibling, 1 reply; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-04  8:00 UTC (permalink / raw)
  To: Eli Zaretskii, lennart.borgman, monnier, 7962

Le 03/02/2011 22:19, Eli Zaretskii a écrit :
>> Date: Thu, 03 Feb 2011 19:49:22 +0100
>> From: Emmanuel Bigler<Emmanuel.Bigler@ens2m.fr>
>> Cc:
>>
>> However I am reluctant to switch to UTF-8 for all my texts in English,
>> German, French, even Icelandic etc... i.e. Western European languages,
>> the only ones I'll ever typeset, which are perfectly handled in
>> unibyte iso-8859-1.
>
> You seem to assume that going multibyte requires that your files be
> encoded in UTF-8.  That simply isn't true.  The files can continue be
> encoded in ISO-8859-1 or ISO-8859-15; Emacs will decode them into its
> internal representation when you visit the file and encode it back
> when you save it.  You lose nothing.
>


OK Eli. I'm trying to understand and this is a bit tricky.
Now I'm entering into **very** strange things ; I hope somebody can 
explain what's going on. This is emacs 23.2.93.1

1/ in an emacs buffer I load a test-file created years ago with a 
program, a list of unibyte characters that I display as iso-8859-1. All 
characters are displayed properly.

2/ I (toggle-enable-multibyte-characters nil) : unibyte characters 
already present in the files do not change and are displayed correctly.
I type a fresh sequence of letters with diacritic signs at the end of 
the buffer : OK. I toggle back to unibyte : freshly entered letters 
appear as 2-byte sequences. OK

3/ now I cut-paste a line of the  old unibyte file and enter a fresh 
letter with diacritic sign in thye middel of this line ; I'm still in 
2-byte codeing/display : the letter entered in the middle of unibytes 
seems to stay unibyte !!
Is this a bug, a feature, or misconceptions / misunderstandings on my 
side ??

------------

 From Sven :
 > IMO offering to convert them to UTF-8 would be more helpful.  The
 > legacy encodings ought to die some day.

Exactly like legacy software should die some day ; (ahem) : who said : 
"exacty like emacs, an old and obsolete software, that I've been using 
for 25 years..." ;-);-)

--

Emmanuel







^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04  8:00           ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
@ 2011-02-04  8:09             ` Emmanuel Bigler
  2011-02-04  8:22               ` bug#7962: 23.2; capitalize / ISO 8859 / UNIBYTE / utf-8 backward compatibility Emmanuel Bigler
  0 siblings, 1 reply; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-04  8:09 UTC (permalink / raw)
  To: Eli Zaretskii, lennart.borgman, monnier, 7962


> 3/ now I cut-paste a line of the old unibyte file and enter a fresh
> letter with diacritic sign in thye middel of this line ; I'm still in
> 2-byte codeing/display : the letter entered in the middle of unibytes
> seems to stay unibyte !!
> Is this a bug, a feature, or misconceptions / misunderstandings on my
> side ??
>

Ooops ! this was an error on my side, forget about this.

  I simply entered a fresh letter in the middle of an old unibyte line 
of an old file, fresh letter in unibyte format/display. So it behaves as 
it should.  When toggled to 2-byte display, the old line does not change 
but characters entered in multibyte in the middel eof it are displayed 
as 2-byte ... when toggles *back* to unibyte display.

So ("same player shoot again", sorry) everything works as it should, 
i.e. fine, except that captialize does not work on unibyte letters with 
diacritic signs, when displayed as unibyte.

Now the last test is to enter fresh letters in 2-byte, capitalize them, 
do not switch bak to unibyet display, save the file, exits emacs,  and 
see what happens when re-loaded/displayed as unibyte.

--
E.B.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize / ISO 8859 / UNIBYTE / utf-8 backward compatibility
  2011-02-04  8:09             ` Emmanuel Bigler
@ 2011-02-04  8:22               ` Emmanuel Bigler
  2011-02-04  8:48                 ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-04  8:22 UTC (permalink / raw)
  To: Eli Zaretskii, lennart.borgman, monnier, 7962

Le 04/02/2011 09:09, Emmanuel Bigler a écrit :
> Now the last test is to enter fresh letters in 2-byte, capitalize them,
> do not switch bak to unibyet display, save the file, exits emacs,  and
> see what happens when re-loaded/displayed as unibyte.


I just did this test.

After loading the old unibyte file, toggling 2-byte display on, I 
entered freshly typed letters with a diacritic sign. Did not toggle back 
to unibyte, and saved buffer - killed emacs.
Re-loaded emacs <myfile> ; note that in my .emacs I have nothing to 
force emacs to be unibyte.
The results is that old unibyte letters are displayed correctly, new 
2-bytes letters appear 2-byte. Hence emacs is smart enough to stay 
unibyte when he starts reading unibyte code. Nothing is lost : *great*.
However,  new letters added at the end of the file in the previous 
2-byte session show as 2-byte in unibyte displayed, and were stored as 
2-byte when stores under the  "2-byte display" setting. Nothing but 
normal, after all.

However mixing 2-byte an 1-byte code is definitely something annoying to me.

Hence I'll probably stay 100% unibyte until emacs forces me to be 
"modern" ;-)

--
Emmanuel








^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize / ISO 8859 / UNIBYTE / utf-8 backward compatibility
  2011-02-04  8:22               ` bug#7962: 23.2; capitalize / ISO 8859 / UNIBYTE / utf-8 backward compatibility Emmanuel Bigler
@ 2011-02-04  8:48                 ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-04  8:48 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962

> Date: Fri, 04 Feb 2011 09:22:14 +0100
> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
> 
> Le 04/02/2011 09:09, Emmanuel Bigler a écrit :
> > Now the last test is to enter fresh letters in 2-byte, capitalize them,
> > do not switch bak to unibyet display, save the file, exits emacs,  and
> > see what happens when re-loaded/displayed as unibyte.
> 
> 
> I just did this test.
> 
> After loading the old unibyte file, toggling 2-byte display on, I 
> entered freshly typed letters with a diacritic sign. Did not toggle back 
> to unibyte, and saved buffer - killed emacs.
> Re-loaded emacs <myfile> ; note that in my .emacs I have nothing to 
> force emacs to be unibyte.

If there's nothing in your .emacs to force Emacs to be unibyte, then
what do you mean by "loading the old unibyte file"?  What exactly do
you type to do that?

Btw, perhaps it is better that you use "emacs -Q" for these
experiments, until we figure out what's wrong with editing these files
in multibyte mode.

> However,  new letters added at the end of the file in the previous 
> 2-byte session show as 2-byte in unibyte displayed

What do you mean by "unibyte display"?  How did you manage to force
Emacs to display the text as unibyte?  It does not do that by default.

> and were stored as 2-byte when stores under the "2-byte display"
> setting. Nothing but normal, after all.

No, this isn't "normal".

Your procedure for using multibyte Emacs is incorrect.  I will suggest
the correct procedure once I understand how you force the buffer that
visits a file be unibyte.  There should be no problems with editing
buffers with ISO-8859-1 text in multibyte mode.






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-03 22:22               ` Lennart Borgman
  2011-02-03 22:35                 ` Sven Joachim
@ 2011-02-04 10:44                 ` Andreas Schwab
  2011-02-04 12:07                   ` Lennart Borgman
  1 sibling, 1 reply; 40+ messages in thread
From: Andreas Schwab @ 2011-02-04 10:44 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962, Sven Joachim

Lennart Borgman <lennart.borgman@gmail.com> writes:

> Is it sure that Emacs in the future will always do the same guess
> about encoding for files that have no explicit coding in them? If not
> then perhaps it would be good to add a function to emacs that helps
> users add comments about explicit coding to a bunch of files.

Just set the coding system priority accordingly.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 10:44                 ` Andreas Schwab
@ 2011-02-04 12:07                   ` Lennart Borgman
  2011-02-04 12:41                     ` Andreas Schwab
  0 siblings, 1 reply; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 12:07 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emmanuel Bigler, 7962, Sven Joachim

On Fri, Feb 4, 2011 at 11:44 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> Is it sure that Emacs in the future will always do the same guess
>> about encoding for files that have no explicit coding in them? If not
>> then perhaps it would be good to add a function to emacs that helps
>> users add comments about explicit coding to a bunch of files.
>
> Just set the coding system priority accordingly.

I see, but it is not the coding system priority that we want to change
so that utf-8 is first?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 12:07                   ` Lennart Borgman
@ 2011-02-04 12:41                     ` Andreas Schwab
  2011-02-04 13:02                       ` Lennart Borgman
  0 siblings, 1 reply; 40+ messages in thread
From: Andreas Schwab @ 2011-02-04 12:41 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962, Sven Joachim

Lennart Borgman <lennart.borgman@gmail.com> writes:

> I see, but it is not the coding system priority that we want to change
> so that utf-8 is first?

I don't understand your question.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 12:41                     ` Andreas Schwab
@ 2011-02-04 13:02                       ` Lennart Borgman
  2011-02-04 13:06                         ` Andreas Schwab
  0 siblings, 1 reply; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 13:02 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emmanuel Bigler, 7962, Sven Joachim

On Fri, Feb 4, 2011 at 1:41 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> I see, but it is not the coding system priority that we want to change
>> so that utf-8 is first?
>
> I don't understand your question.

I think it is rather me that do not understand this ;-)

I just suggested that we are moving towards using utf-8 and that the
default coding system lists might change. (So what we are discussing
here before is a problem.)





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:02                       ` Lennart Borgman
@ 2011-02-04 13:06                         ` Andreas Schwab
  2011-02-04 13:11                           ` Lennart Borgman
  0 siblings, 1 reply; 40+ messages in thread
From: Andreas Schwab @ 2011-02-04 13:06 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962, Sven Joachim

Lennart Borgman <lennart.borgman@gmail.com> writes:

> I just suggested that we are moving towards using utf-8

Moving?  We are there already.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:06                         ` Andreas Schwab
@ 2011-02-04 13:11                           ` Lennart Borgman
  2011-02-04 13:21                             ` Lennart Borgman
                                               ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 13:11 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emmanuel Bigler, 7962, Sven Joachim

On Fri, Feb 4, 2011 at 2:06 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> I just suggested that we are moving towards using utf-8
>
> Moving?  We are there already.

Are we? I am sure I do not understand this ;-)

I see this:
  buffer-file-coding-system is a variable defined in `C source code'.
  Its value is iso-latin-1-dos





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:11                           ` Lennart Borgman
@ 2011-02-04 13:21                             ` Lennart Borgman
  2011-02-04 13:31                             ` Andreas Schwab
  2011-02-04 14:54                             ` Eli Zaretskii
  2 siblings, 0 replies; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 13:21 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emmanuel Bigler, 7962, Sven Joachim

On Fri, Feb 4, 2011 at 2:11 PM, Lennart Borgman
<lennart.borgman@gmail.com> wrote:
> On Fri, Feb 4, 2011 at 2:06 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Lennart Borgman <lennart.borgman@gmail.com> writes:
>>
>>> I just suggested that we are moving towards using utf-8
>>
>> Moving?  We are there already.
>
> Are we? I am sure I do not understand this ;-)
>
> I see this:
>  buffer-file-coding-system is a variable defined in `C source code'.
>  Its value is iso-latin-1-dos

For "emacs -Q" of course.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:11                           ` Lennart Borgman
  2011-02-04 13:21                             ` Lennart Borgman
@ 2011-02-04 13:31                             ` Andreas Schwab
  2011-02-04 13:44                               ` Lennart Borgman
  2011-02-04 14:54                             ` Eli Zaretskii
  2 siblings, 1 reply; 40+ messages in thread
From: Andreas Schwab @ 2011-02-04 13:31 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962, Sven Joachim

Lennart Borgman <lennart.borgman@gmail.com> writes:

> I see this:
>   buffer-file-coding-system is a variable defined in `C source code'.
>   Its value is iso-latin-1-dos

What is you issue with that?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:31                             ` Andreas Schwab
@ 2011-02-04 13:44                               ` Lennart Borgman
  2011-02-04 13:58                                 ` Andreas Schwab
  0 siblings, 1 reply; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 13:44 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emmanuel Bigler, 7962, Sven Joachim

On Fri, Feb 4, 2011 at 2:31 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> I see this:
>>   buffer-file-coding-system is a variable defined in `C source code'.
>>   Its value is iso-latin-1-dos
>
> What is you issue with that?

It does not look like utf-8 to me.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:44                               ` Lennart Borgman
@ 2011-02-04 13:58                                 ` Andreas Schwab
  2011-02-04 14:00                                   ` Lennart Borgman
  0 siblings, 1 reply; 40+ messages in thread
From: Andreas Schwab @ 2011-02-04 13:58 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962, Sven Joachim

Lennart Borgman <lennart.borgman@gmail.com> writes:

> On Fri, Feb 4, 2011 at 2:31 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> Lennart Borgman <lennart.borgman@gmail.com> writes:
>>
>>> I see this:
>>>   buffer-file-coding-system is a variable defined in `C source code'.
>>>   Its value is iso-latin-1-dos
>>
>> What is you issue with that?
>
> It does not look like utf-8 to me.

Why should it be?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:58                                 ` Andreas Schwab
@ 2011-02-04 14:00                                   ` Lennart Borgman
  2011-02-04 14:22                                     ` Andreas Schwab
  0 siblings, 1 reply; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 14:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Emmanuel Bigler, 7962, Sven Joachim

On Fri, Feb 4, 2011 at 2:58 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> On Fri, Feb 4, 2011 at 2:31 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>>> Lennart Borgman <lennart.borgman@gmail.com> writes:
>>>
>>>> I see this:
>>>>   buffer-file-coding-system is a variable defined in `C source code'.
>>>>   Its value is iso-latin-1-dos
>>>
>>> What is you issue with that?
>>
>> It does not look like utf-8 to me.
>
> Why should it be?

I thought you said utf-8 was the default, or?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 14:00                                   ` Lennart Borgman
@ 2011-02-04 14:22                                     ` Andreas Schwab
  0 siblings, 0 replies; 40+ messages in thread
From: Andreas Schwab @ 2011-02-04 14:22 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel Bigler, 7962, Sven Joachim

Lennart Borgman <lennart.borgman@gmail.com> writes:

> I thought you said utf-8 was the default, or?

If your file isn't utf-8, why should it be decoded as utf-8?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 13:11                           ` Lennart Borgman
  2011-02-04 13:21                             ` Lennart Borgman
  2011-02-04 13:31                             ` Andreas Schwab
@ 2011-02-04 14:54                             ` Eli Zaretskii
  2011-02-04 15:00                               ` Lennart Borgman
  2011-02-04 17:08                               ` Emmanuel Bigler
  2 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-04 14:54 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel.Bigler, 7962, schwab, svenjoac

> From: Lennart Borgman <lennart.borgman@gmail.com>
> Date: Fri, 4 Feb 2011 14:11:36 +0100
> Cc: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>, 7962@debbugs.gnu.org,
> 	Sven Joachim <svenjoac@gmx.de>
> 
> On Fri, Feb 4, 2011 at 2:06 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> > Lennart Borgman <lennart.borgman@gmail.com> writes:
> >
> >> I just suggested that we are moving towards using utf-8
> >
> > Moving?  We are there already.
> 
> Are we? I am sure I do not understand this ;-)
> 
> I see this:
>   buffer-file-coding-system is a variable defined in `C source code'.
>   Its value is iso-latin-1-dos

See "M-: (coding-system-priority-list) RET".

The highest-priority encoding is set from your locale, but look what
is the next one.






^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 14:54                             ` Eli Zaretskii
@ 2011-02-04 15:00                               ` Lennart Borgman
  2011-02-04 15:20                                 ` Jason Rumney
  2011-02-04 15:36                                 ` Eli Zaretskii
  2011-02-04 17:08                               ` Emmanuel Bigler
  1 sibling, 2 replies; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 15:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emmanuel.Bigler, 7962, schwab, svenjoac

On Fri, Feb 4, 2011 at 3:54 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Lennart Borgman <lennart.borgman@gmail.com>
>> Date: Fri, 4 Feb 2011 14:11:36 +0100
>> Cc: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>, 7962@debbugs.gnu.org,
>>       Sven Joachim <svenjoac@gmx.de>
>>
>> On Fri, Feb 4, 2011 at 2:06 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>> > Lennart Borgman <lennart.borgman@gmail.com> writes:
>> >
>> >> I just suggested that we are moving towards using utf-8
>> >
>> > Moving?  We are there already.
>>
>> Are we? I am sure I do not understand this ;-)
>>
>> I see this:
>>   buffer-file-coding-system is a variable defined in `C source code'.
>>   Its value is iso-latin-1-dos
>
> See "M-: (coding-system-priority-list) RET".
>
> The highest-priority encoding is set from your locale, but look what
> is the next one.

Thanks. Yes, this is utf-8.

So what is the recommended way to change to utf-8 as default? Where is
it documented?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 15:00                               ` Lennart Borgman
@ 2011-02-04 15:20                                 ` Jason Rumney
  2011-02-04 15:36                                 ` Eli Zaretskii
  1 sibling, 0 replies; 40+ messages in thread
From: Jason Rumney @ 2011-02-04 15:20 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel.Bigler, schwab, 7962, svenjoac

Lennart Borgman <lennart.borgman@gmail.com> writes:

> Thanks. Yes, this is utf-8.
>
> So what is the recommended way to change to utf-8 as default? Where is
> it documented?

The recommended way is to change your system locale to a UTF-8 based
one.  Last time I checked, this caused some major problems for cmd.exe
if you attempt it on Windows.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 15:00                               ` Lennart Borgman
  2011-02-04 15:20                                 ` Jason Rumney
@ 2011-02-04 15:36                                 ` Eli Zaretskii
  2011-02-04 16:36                                   ` Lennart Borgman
  1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-04 15:36 UTC (permalink / raw)
  To: Lennart Borgman; +Cc: Emmanuel.Bigler, 7962, schwab, svenjoac

> From: Lennart Borgman <lennart.borgman@gmail.com>
> Date: Fri, 4 Feb 2011 16:00:49 +0100
> Cc: schwab@linux-m68k.org, Emmanuel.Bigler@ens2m.fr, 7962@debbugs.gnu.org, 
> 	svenjoac@gmx.de
> 
> So what is the recommended way to change to utf-8 as default?

Why would you want to?  What's wrong with the current default?





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 15:36                                 ` Eli Zaretskii
@ 2011-02-04 16:36                                   ` Lennart Borgman
  0 siblings, 0 replies; 40+ messages in thread
From: Lennart Borgman @ 2011-02-04 16:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Emmanuel.Bigler, 7962, schwab, svenjoac

On Fri, Feb 4, 2011 at 4:36 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Lennart Borgman <lennart.borgman@gmail.com>
>> Date: Fri, 4 Feb 2011 16:00:49 +0100
>> Cc: schwab@linux-m68k.org, Emmanuel.Bigler@ens2m.fr, 7962@debbugs.gnu.org,
>>       svenjoac@gmx.de
>>
>> So what is the recommended way to change to utf-8 as default?
>
> Why would you want to?  What's wrong with the current default?

That would avoid having to add utf-8 to every org-mode file I use ;-)





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 14:54                             ` Eli Zaretskii
  2011-02-04 15:00                               ` Lennart Borgman
@ 2011-02-04 17:08                               ` Emmanuel Bigler
  2011-02-04 18:37                                 ` Eli Zaretskii
  2011-02-04 21:34                                 ` Stefan Monnier
  1 sibling, 2 replies; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-04 17:08 UTC (permalink / raw)
  To: Eli Zaretskii, Lennart Borgman, schwab, 7962; +Cc: svenjoac

[-- Attachment #1: Type: text/plain, Size: 3443 bytes --]


>>
>> I see this:
>>    buffer-file-coding-system is a variable defined in `C source code'.
>>    Its value is iso-latin-1-dos
>
> See "M-: (coding-system-priority-list) RET".
>
> The highest-priority encoding is set from your locale, but look what
> is the next one.
>

hello again.
I think I'm starting to understand what is going on.

I had created a long time ago an unibyte file containing the 1-byte 
characters I want to test within emacs. The file was created with a 
program on which I have total control byte-by-byte, so I know what is 
inside the file exactly. I have attached the file to this mail, not sure 
that this is allowed on the gnu-debug mailing list, but this is simple 
and very short .txt file, that reads as follows : (this mail itself  is 
typeset and displayed here as iso-8859-1)

------- mytestchars-224-255-iso-8859.txt ---------------------

   224 \340  à   225 \341  á   226 \342  â   227 \343  ã
   228 \344  ä   229 \345  å   230 \346  æ   231 \347  ç
   232 \350  è   233 \351  é   234 \352  ê   235 \353  ë
   236 \354  ì   237 \355  í   238 \356  î   239 \357  ï
   240 \360  ð   241 \361  ñ   242 \362  ò   243 \363  ó
   244 \364  ô   245 \365  õ   246 \366  ö   247 \367  ÷
   248 \370  ø   249 \371  ù   250 \372  ú   251 \373  û
   252 \374  ü   253 \375  ý   254 \376  þ   255 \377  ÿ

éèçàù  < test strings to see how they behave
Éèçàù

----------------------------------------------------------


I started /usr/local/bin/emacs -Q mytestchars-224-255-iso-8859.txt
under emacs  23.2.93.1 (i686-pc-linux-gnu)

The file displays perfectly correctly. (describe-char (point)) gives me 
exactly what I want, i.e. an extended asci decimal code between 224 and 255.
Almost all operations (except capitalize, see below) work exactly as I 
wish and exactly like in older emacs versions, no mystery since the 
priority list
M-: (coding-system-priority-list) RET reads as :
(iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 
emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit 
utf-8-auto utf-8-with-signature ...)

Again I'm perfectly happy since I see that iso-latin-1 comes first, but 
is this what I want ? certainly yes,
my locale environment variables look like :
LC_ALL=fr_FR.ISO8859-1
LC_COLLATE=fr_FR.ISO8859-1
LANG=fr_FR.ISO8859-1
GDM_LANG=fr_FR.iso88591
LC_CTYPE=fr_FR.ISO8859-1
XTERM_LOCALE=fr_FR.ISO8859-1

However, in this emacs -Q session, with a correct unibyte display of
an unibyte file, *capitalize does not work*.
At the beginning of this discussion, Sven explained that capitalize 
would only work on 2-byte characters. Which I tested of course, and of 
course it works, but I simply wish I could continue to capitalize M-c 
unibyte words like  in the good old iso-8859 days !!

Additional info : when applying the M-c command to a letter above
decimal ascii 224, nothing happens on the display as reported, *although 
the buffer is marked as being changed.*

Incidentally in a good ol' xterm window (fitted with gnu readline and
obeying my LOCALE preferences as liste above), M-c works perfectly as
it should, and if I cut-paste from the xterm to the emacs buffer,
everything looks fine & unibyte ... except that I can no longer change
the case of the pasted string with 'capitalize' or a similar 'case'
command.

Bug, or UTF-8 emacs 23.2 feature ?

--
Emmanuel

[-- Attachment #2: mytestchars-224-255-iso-8859.txt --]
[-- Type: text/plain, Size: 480 bytes --]

  224 \340  à   225 \341  á   226 \342  â   227 \343  ã 
  228 \344  ä   229 \345  å   230 \346  æ   231 \347  ç 
  232 \350  è   233 \351  é   234 \352  ê   235 \353  ë 
  236 \354  ì   237 \355  í   238 \356  î   239 \357  ï 
  240 \360  ð   241 \361  ñ   242 \362  ò   243 \363  ó 
  244 \364  ô   245 \365  õ   246 \366  ö   247 \367  ÷ 
  248 \370  ø   249 \371  ù   250 \372  ú   251 \373  û 
  252 \374  ü   253 \375  ý   254 \376  þ   255 \377  ÿ 

éèçàù
Éèçàù

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 17:08                               ` Emmanuel Bigler
@ 2011-02-04 18:37                                 ` Eli Zaretskii
  2011-02-04 21:34                                 ` Stefan Monnier
  1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2011-02-04 18:37 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962, schwab, svenjoac

> Date: Fri, 04 Feb 2011 18:08:51 +0100
> From: Emmanuel Bigler <Emmanuel.Bigler@ens2m.fr>
> Cc: svenjoac@gmx.de
> 
> However, in this emacs -Q session, with a correct unibyte display of
> an unibyte file, *capitalize does not work*.

I cannot reproduce this.  If I start an Emacs session exactly like you
say, with the file you attached, I get correct display and
capitalization does work for me.

> Additional info : when applying the M-c command to a letter above
> decimal ascii 224, nothing happens on the display as reported, *although 
> the buffer is marked as being changed.*

What does "C-h c M-c" say?

Also, what does Emacs show when you put the cursor on the caharcter
you just applied M-c to, and type "C-u C-x ="?

> Bug, or UTF-8 emacs 23.2 feature ?

Neither.  It's some strange issue specific to your system, I think.





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
  2011-02-04 17:08                               ` Emmanuel Bigler
  2011-02-04 18:37                                 ` Eli Zaretskii
@ 2011-02-04 21:34                                 ` Stefan Monnier
  2011-02-05  9:27                                   ` bug#7962: 23.2; capitalize letters ISO-8859-1 [SOLVED, MANY THANKS TO ALL] Emmanuel Bigler
  1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2011-02-04 21:34 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: schwab, 7962, svenjoac

> I think I'm starting to understand what is going on.

I think you're still confused, tho.

For some reason, you haven't replied to any of my email, even tho it's
blatantly obvious that your "default enable-multibyte-characters" is the
main culprit (and the "default" part is important here, it means that
it comes from something you've explicitly changed in your Emacs config).

> I had created a long time ago an unibyte file containing the 1-byte
> characters I want to test within emacs.

You mean an iso-8859-1 file, then.  A unibyte file only contains bytes,
no chars.

> I started /usr/local/bin/emacs -Q mytestchars-224-255-iso-8859.txt
> under emacs  23.2.93.1 (i686-pc-linux-gnu)

> The file displays perfectly correctly. (describe-char (point)) gives me
> exactly what I want, i.e. an extended asci decimal code between 224 and 255.

The code is not very helpful here, since depending on whether the
current buffer is unibyte or multibyte, the 224 or 255 doesn't mean the
same thing.  So the second line "preferred charset:" is more important,
since it should either say "eight-bit" (i.e. a raw byte with no
associated meaning of it representing some kind of character) or
"iso-8859-1".

> Almost all operations (except capitalize, see below) work exactly as I wish

So that leads me to think the buffer is in unibyte mode.
If you started with "emacs -Q", the only explanation is that you have
EMACS_UNIBYTE set in your environment variables.  If that's the case,
then please get rid of it.

> At the beginning of this discussion, Sven explained that capitalize would
> only work on 2-byte characters.

unibyte-vs-multibyte is not the same as "1-byte char"-vs-"2-byte chars".
It's an issue that's internal to Emacs and that's largely irrelevant to
how Emacs stores chars (e.g. as an array of 32bit integers, or as
a sequence of bytes, with escape sequences to represent more than 256
different values).

> Which I tested of course, and of course it works, but I simply wish
> I could continue to capitalize M-c  unibyte words like  in the good
> old iso-8859 days !!

Most likely you won't tell the difference: the multibyte mode works just
as well for iso-8859 files.  "multibyte-mode" means "we're manipulating
chars", whereas unibyte mode means "we're manipulating bytes", where
bytes are simply numbers between 0 and 255.  Now you tell me: what does
it mean to capitalize the number 224?


        Stefan





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 [SOLVED, MANY THANKS TO ALL]
  2011-02-04 21:34                                 ` Stefan Monnier
@ 2011-02-05  9:27                                   ` Emmanuel Bigler
  2011-02-05 16:04                                     ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Emmanuel Bigler @ 2011-02-05  9:27 UTC (permalink / raw)
  To: Stefan Monnier, Eli Zaretskii, Lennart Borgman, schwab, 7962,
	svenjoac


Le 04/02/2011 22:34, Stefan Monnier a écrit :
> ............
> So that leads me to think the buffer is in unibyte mode.
> If you started with "emacs -Q", the only explanation is that you have
> EMACS_UNIBYTE set in your environment variables.  If that's the case,
> then please get rid of it.

*THANKS, Stefan !*

I had this (!!$£%!µÞЬ¬!!) variable set in my shell environment
variables from older emacs releases.

Getting rid of it solved the problem.
So it was not a bug, but a feature ;-)

I think that we can now say *IN CAPITAL LETTERS* :

*[FEATURE CLOSED]* (since this was not a bug ;-)

And this also solved another problem I had noticed : ispell was
pretending to use 2-byte characters instead of the 1-byte ISO-8859 I
was using.

Thanks to all for your time !

Now the only pending isssue is whether I'll convert my files to UTF-8
some day ; ahem... now that my faithful emacs does exactly what I
want, I do not see any reason to do it soon, I like to be obsolete to
some extent ;-)

All the best !

--
Emmanuel





^ permalink raw reply	[flat|nested] 40+ messages in thread

* bug#7962: 23.2; capitalize letters ISO-8859-1 [SOLVED, MANY THANKS TO ALL]
  2011-02-05  9:27                                   ` bug#7962: 23.2; capitalize letters ISO-8859-1 [SOLVED, MANY THANKS TO ALL] Emmanuel Bigler
@ 2011-02-05 16:04                                     ` Stefan Monnier
  0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2011-02-05 16:04 UTC (permalink / raw)
  To: Emmanuel Bigler; +Cc: 7962-done

>> EMACS_UNIBYTE set in your environment variables.  If that's the case,
> *THANKS, Stefan !*

> I had this (!!$£%!µÞЬ¬!!) variable set in my shell environment
> variables from older emacs releases.

FWIW, I did point you there a dozen email ago already.
Anyway, thanks for confirming,


        Stefan





^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2011-02-05 16:04 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-02 10:35 bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
2011-02-02 19:47 ` Eli Zaretskii
2011-02-02 19:53 ` Stefan Monnier
2011-02-03 13:34   ` Emmanuel Bigler
2011-02-03 17:56     ` Eli Zaretskii
2011-02-03 18:01     ` Emmanuel Bigler
2011-02-03 18:21     ` Lennart Borgman
2011-02-03 18:49       ` Emmanuel Bigler
2011-02-03 21:19         ` Eli Zaretskii
2011-02-03 21:39           ` Lennart Borgman
2011-02-03 22:17             ` Sven Joachim
2011-02-03 22:22               ` Lennart Borgman
2011-02-03 22:35                 ` Sven Joachim
2011-02-04 10:44                 ` Andreas Schwab
2011-02-04 12:07                   ` Lennart Borgman
2011-02-04 12:41                     ` Andreas Schwab
2011-02-04 13:02                       ` Lennart Borgman
2011-02-04 13:06                         ` Andreas Schwab
2011-02-04 13:11                           ` Lennart Borgman
2011-02-04 13:21                             ` Lennart Borgman
2011-02-04 13:31                             ` Andreas Schwab
2011-02-04 13:44                               ` Lennart Borgman
2011-02-04 13:58                                 ` Andreas Schwab
2011-02-04 14:00                                   ` Lennart Borgman
2011-02-04 14:22                                     ` Andreas Schwab
2011-02-04 14:54                             ` Eli Zaretskii
2011-02-04 15:00                               ` Lennart Borgman
2011-02-04 15:20                                 ` Jason Rumney
2011-02-04 15:36                                 ` Eli Zaretskii
2011-02-04 16:36                                   ` Lennart Borgman
2011-02-04 17:08                               ` Emmanuel Bigler
2011-02-04 18:37                                 ` Eli Zaretskii
2011-02-04 21:34                                 ` Stefan Monnier
2011-02-05  9:27                                   ` bug#7962: 23.2; capitalize letters ISO-8859-1 [SOLVED, MANY THANKS TO ALL] Emmanuel Bigler
2011-02-05 16:04                                     ` Stefan Monnier
2011-02-04  8:00           ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Emmanuel Bigler
2011-02-04  8:09             ` Emmanuel Bigler
2011-02-04  8:22               ` bug#7962: 23.2; capitalize / ISO 8859 / UNIBYTE / utf-8 backward compatibility Emmanuel Bigler
2011-02-04  8:48                 ` Eli Zaretskii
2011-02-03 18:49     ` bug#7962: 23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1 Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).