From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Peter Dyballa <Peter_Dyballa@Web.DE>
Newsgroups: gmane.emacs.help
Subject: Re: Ediff problem with accents
Date: Fri, 22 Sep 2006 12:42:39 +0200
Message-ID: <0DC5E148-FB14-4A62-B3BC-CD18F2EBF020@Web.DE>
References: <87lkoosu5e.fsf@mundaneum.mygooglest.com>
	<61CD86F9-90F0-45D9-888B-D344320541A8@Web.DE>
	<873bak2r61.fsf_-_@mundaneum.mygooglest.com>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0 (Apple Message framework v752.2)
Content-Type: text/plain; charset=WINDOWS-1252; delsp=yes; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Trace: sea.gmane.org 1158921802 11436 80.91.229.2 (22 Sep 2006 10:43:22 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Fri, 22 Sep 2006 10:43:22 +0000 (UTC)
Cc: GNU Emacs List <help-gnu-emacs@gnu.org>
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Sep 22 12:43:14 2006
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1GQiUn-0003GC-MP
	for geh-help-gnu-emacs@m.gmane.org; Fri, 22 Sep 2006 12:43:02 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1GQiUn-0003yC-4O
	for geh-help-gnu-emacs@m.gmane.org; Fri, 22 Sep 2006 06:43:01 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1GQiUa-0003wN-LE
	for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:42:48 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1GQiUV-0003uJ-W3
	for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:42:46 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1GQiUV-0003uC-Rt
	for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:42:43 -0400
Original-Received: from [217.72.192.234] (helo=fmmailgate03.web.de)
	by monty-python.gnu.org with esmtp (Exim 4.52) id 1GQiYE-0007RZ-01
	for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:46:34 -0400
Original-Received: from smtp07.web.de (fmsmtp07.dlan.cinetic.de [172.20.5.215])
	by fmmailgate03.web.de (Postfix) with ESMTP id BC8FC21B66A8;
	Fri, 22 Sep 2006 12:42:41 +0200 (CEST)
Original-Received: from [87.193.29.218] (helo=[192.168.1.2])
	by smtp07.web.de with asmtp (TLSv1:RC4-SHA:128) (WEB.DE 4.107 #114)
	id 1GQiUS-0001qW-00; Fri, 22 Sep 2006 12:42:41 +0200
In-Reply-To: <873bak2r61.fsf_-_@mundaneum.mygooglest.com>
X-Image-Url: http://homepage.mac.com/sparifankal/.cv/thumbs/me.thumbnail
Original-To: =?ISO-8859-1?Q?S=E9bastien_Vauban?= <ewgeocaufsfb@spammotel.com>
X-Mailer: Apple Mail (2.752.2)
X-Sender: Peter_Dyballa@web.de
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:37564
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/37564>


Am 22.09.2006 um 11:20 schrieb S=E9bastien Vauban:

> Hello Peter,
>
> Sorry for the long delay... but it was impossible for me to make
> the wished tests until now.
>
>     Note -- You can copy this mail to gnus.emacs.help. I don't
>             have news access from where I am now...
>
> FYI, I've sanitized my .emacs section about the coding systems
> (you'll see an extract beneath), and I've made a lot of
> comparisons.
>
> I still have the problem, but here follows a deeper insight on
> what I'm experiencing:
>
>     o   if (prefer-coding-system 'iso-latin-9),
>         then I see the following when ediff'ing:
>
>         ----------------------------------------
>         | ^M               |                   |
>         | pr\351sente ^M   | pr=E9sente          |
>         | ^M               |                   |
>         |                  |                   |
>         |------------------|-------------------|
>         |-0:%%             |-0\--              |  (modeline)
>         ----------------------------------------
>           iso-latin-9?       iso-latin-9-dos

That's correct, for the modeline: ISO 8859-15 or ISO Latin-9 encoding =20=

is used. The left buffer is read-only, the right one is not changed? =20
The ``\=B4=B4 puzzles me, but it's such a long time that I have used GNU =
=20
Emacs on some MS Losedows, that I cannot remember. The ^M in the left =20=

buffer should not appear, probably the right value for the encoding =20
is the right one, which also presents pr=E9sente the right way.

>
>
>     o   if (prefer-coding-system 'utf-8),
>         then I see the following when ediff'ing:
>
>         ----------------------------------------
>         | ^M               |                   |
>         | pr\351sente ^M   | pr=E9sente          |
>         | ^M               |                   |
>         |                  |                   |
>         |------------------|-------------------|
>         |-u:%%             |-1\--              |
>         ----------------------------------------
>           utf-8?             iso-latin-1-dos

Again, the  mode-lines are right and the left buffer need to be =20
specified as utf-8-dos to make the ^M disappear and make pr\351sente =20
appear correctly. The prefer-coding-system function allows the use of =20=

"extensions" like -dos, -mac, -unix to specify exactly the preferred =20
encoding.

>
>
>     o   if I don't set any preferred coding system (commented line),
>         then I see the following when ediff'ing:
>
>         ----------------------------------------
>         | ^M               |                   |
>         | pr\351sente ^M   | pr=E9sente          |
>         | ^M               |                   |
>         |                  |                   |
>         |------------------|-------------------|
>         |-1\%%             |-1\--              |
>         ----------------------------------------
>           iso-latin-1-dos?   iso-latin-1-dos

Here certainly the left buffer is not -dos =96 otherwise the ^M would =20=

not appear there.

>
> To indicate the coding system under the window, I used
>
>     M-x describe-coding-system RET RET
>
> but, for the base version, it states "not set locally, use the
> default"; that's why I wrote the default coding system for new
> files and put a interrogation mark after (because I'm not sure
> this is the correct way to do).

There are no local settings in the file (see below), so some default =20
is assumed that the *Help* buffer should describe:

	Coding system for saving this buffer:
	  0 -- iso-latin-9-unix
=09
	Default coding system (for new files):
	  u -- mule-utf-8-unix
=09
	Coding system for keyboard input:
	  nil
	Coding system for terminal output:
	  u -- mule-utf-8 (alias: utf-8)
=09
	Defaults for subprocess I/O:
	  decoding: u -- mule-utf-8 (alias: utf-8)
=09
	  encoding: u -- mule-utf-8 (alias: utf-8)
=09
=09
	Priority order for recognizing coding systems when reading =
files:
	  .
	  .
	  .


The prefer-coding-system setting also effects your old files: they =20
can now be interpreted differently then when they were created and =20
saved. You could continue to stick at iso-latin-9-dos to have the =80 =20=

and keep your old files unchanged. Every new (and old) file will have =20=

some extra ^M bytes, but at least new and old ones will be treated =20
equally. (Conversion could be done, on the command line (recode, =20
iconv) or more time consuming with GNU Emacs: Options menu -> Mule -> =20=

Set Coding Systems.)

>
> So, you can see that, whatever I do, I can't compare my buffers
> in a normal way... I'm completely lost...

Try: (prefer-coding-system 'iso-latin-9-dos).

You also can use some of these calls each with a different encoding. =20
These will make GNU Emacs first to choose from this list and then try =20=

to find another encoding.

>
> PS- As promised, an extract of my .emacs config file:
>
> ,----[ my Emacs Init File ]
> |
> | (message "26 International Character Set Support...")
> |
> | ;; default input method for multilingual text
> | (setq default-input-method "latin-9-prefix")

I do not use any input method: my keyboard creates/composes =E9 by =20
pressing the dead key =B4 first and then the e. Works also for some =20
other accented characters. Actually I think I never used any Emacs =20
input method. 20 years ago I had DEC or Sun keyboards with a Compose =20
key, now the X server allows to have other characters with alt or =20
shift-alt pressed ...

> |
> | ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS
> | (GNUEmacs
> |     (try-require 'un-define))

This was necessary with GNU Emacs 20. The recent versions 21.x have =20
MULE somehow built-in. Could be that this line causes a lot of your =20
trouble. (A good way to test the built-in capabilities is to launch =20
GNU Emacs with -Q: no site or user specific initialisation files are =20
used. And it might perform better ...)

> |
> | (add-to-list 'file-coding-system-alist
> |              '("\\.owl\\'" utf-8 . utf-8))

This obviously only effects .owl files.

> | ;; In GNU Emacs, when you specify the coding explicitly in the =20
> file, that
> | ;; overrides `file-coding-system-alist'. Not in XEmacs?
> |
> | ;; ;; default coding system (for new files)
> | (GNUEmacs
> |     (prefer-coding-system 'utf-8))

You might consider to add -dos, but it's more important that you =20
understand that this change will make a lot of your old files =20
unusable. In UTF-8 only the 7 bit ASCII range is encoded by one =20
octet. All 8 bit characters from the ISO Latin encodings are encoding =20=

by two octets (or even three, for example the =80). Your =E9 is encoded =20=

as C3 A9 (=80 as the well known E2 82 AC). If GNU Emacs only sees E9 =20
(or A4 for =80), it will make mistakes! If you switch to UTF-8 you =20
would need to convert all text files first, or save their old =20
encodings by adding a header line like this as the first line:

	 -*- mode: Text; coding: iso-8859-9; -*-

The mode part is not necessary (could also be tex or latex), but =20
coding *is*. The other option is 'local variables' in the file's footer:

	%%% Local Variables:
	%%% coding: iso-8859-9
	%%% mode: tex
	%%% End:

and might need to teach GNU Emacs that these local variables are 'safe'.

> |
> | (GNUEmacs
> |     ;; to copy and paste outside Emacs
> |     (set-clipboard-coding-system 'iso-latin-9))  ;; aka iso-8859-15

This depends on the windowing system you use. Now, I think, most will =20=

use UTF-8 ...

> |
> | ;; unify the Latin-N charsets, so that Emacs knows that the =E9 in =20=

> Latin-9
> | ;; (with the euro) is the same as the =E9 in Latin-1 (without the =20=

> euro)
> | ;; [avoid the small accentuated characters]
> | (when (try-require 'ucs-tables)
> |     (unify-8859-on-encoding-mode 1)  ;; harmless
> |     (unify-8859-on-decoding-mode 1)) ;; may unexpectedly change =20
> files if they
> |                                      ;; contain different Latin-N =20=

> charsets
> |                                      ;; which should not be unified

I use these two in GNU Emacs 21.3.50 without the MULE/ucs clause ...

> |
> | (when window-system
> |   ;; functions for dealing with char tables
> |   (require 'disp-table))

This might have been useful in GNU Emacs 20 and before. I never used =20
it, except for european-display or such, maybe. And I also avoid set-=20
language-environment: this is close to obsolete, politely writing.

> |
> | (XEmacs
> |     (require 'iso-syntax))

I'm not really an XEmacs user, but I think this is also something =20
from the past, 20th century or before.


Could be GNU Emacs 22.0.50 serves you better. Both GNU Emacsen, =20
22.0.50 and 21.3, work better and set up internally better when they =20
read environment variables like LC_CTYPE, LANG, or LC_ALL that =20
explain in which environment they are running. Then you only need to =20
specify exceptions from this general rule. I have in my environment =20
LC_CTYPE=3Dde_DE.UTF-8 ...

--
Greetings

   Pete

A morning without coffee is like something without something else.