From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Peter Dyballa Newsgroups: gmane.emacs.help Subject: Re: Ediff problem with accents Date: Fri, 22 Sep 2006 12:42:39 +0200 Message-ID: <0DC5E148-FB14-4A62-B3BC-CD18F2EBF020@Web.DE> References: <87lkoosu5e.fsf@mundaneum.mygooglest.com> <61CD86F9-90F0-45D9-888B-D344320541A8@Web.DE> <873bak2r61.fsf_-_@mundaneum.mygooglest.com> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=WINDOWS-1252; delsp=yes; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: sea.gmane.org 1158921802 11436 80.91.229.2 (22 Sep 2006 10:43:22 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 22 Sep 2006 10:43:22 +0000 (UTC) Cc: GNU Emacs List Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Sep 22 12:43:14 2006 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GQiUn-0003GC-MP for geh-help-gnu-emacs@m.gmane.org; Fri, 22 Sep 2006 12:43:02 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GQiUn-0003yC-4O for geh-help-gnu-emacs@m.gmane.org; Fri, 22 Sep 2006 06:43:01 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GQiUa-0003wN-LE for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:42:48 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GQiUV-0003uJ-W3 for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:42:46 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GQiUV-0003uC-Rt for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:42:43 -0400 Original-Received: from [217.72.192.234] (helo=fmmailgate03.web.de) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GQiYE-0007RZ-01 for help-gnu-emacs@gnu.org; Fri, 22 Sep 2006 06:46:34 -0400 Original-Received: from smtp07.web.de (fmsmtp07.dlan.cinetic.de [172.20.5.215]) by fmmailgate03.web.de (Postfix) with ESMTP id BC8FC21B66A8; Fri, 22 Sep 2006 12:42:41 +0200 (CEST) Original-Received: from [87.193.29.218] (helo=[192.168.1.2]) by smtp07.web.de with asmtp (TLSv1:RC4-SHA:128) (WEB.DE 4.107 #114) id 1GQiUS-0001qW-00; Fri, 22 Sep 2006 12:42:41 +0200 In-Reply-To: <873bak2r61.fsf_-_@mundaneum.mygooglest.com> X-Image-Url: http://homepage.mac.com/sparifankal/.cv/thumbs/me.thumbnail Original-To: =?ISO-8859-1?Q?S=E9bastien_Vauban?= X-Mailer: Apple Mail (2.752.2) X-Sender: Peter_Dyballa@web.de X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:37564 Archived-At: Am 22.09.2006 um 11:20 schrieb S=E9bastien Vauban: > Hello Peter, > > Sorry for the long delay... but it was impossible for me to make > the wished tests until now. > > Note -- You can copy this mail to gnus.emacs.help. I don't > have news access from where I am now... > > FYI, I've sanitized my .emacs section about the coding systems > (you'll see an extract beneath), and I've made a lot of > comparisons. > > I still have the problem, but here follows a deeper insight on > what I'm experiencing: > > o if (prefer-coding-system 'iso-latin-9), > then I see the following when ediff'ing: > > ---------------------------------------- > | ^M | | > | pr\351sente ^M | pr=E9sente | > | ^M | | > | | | > |------------------|-------------------| > |-0:%% |-0\-- | (modeline) > ---------------------------------------- > iso-latin-9? iso-latin-9-dos That's correct, for the modeline: ISO 8859-15 or ISO Latin-9 encoding =20= is used. The left buffer is read-only, the right one is not changed? =20 The ``\=B4=B4 puzzles me, but it's such a long time that I have used GNU = =20 Emacs on some MS Losedows, that I cannot remember. The ^M in the left =20= buffer should not appear, probably the right value for the encoding =20 is the right one, which also presents pr=E9sente the right way. > > > o if (prefer-coding-system 'utf-8), > then I see the following when ediff'ing: > > ---------------------------------------- > | ^M | | > | pr\351sente ^M | pr=E9sente | > | ^M | | > | | | > |------------------|-------------------| > |-u:%% |-1\-- | > ---------------------------------------- > utf-8? iso-latin-1-dos Again, the mode-lines are right and the left buffer need to be =20 specified as utf-8-dos to make the ^M disappear and make pr\351sente =20 appear correctly. The prefer-coding-system function allows the use of =20= "extensions" like -dos, -mac, -unix to specify exactly the preferred =20 encoding. > > > o if I don't set any preferred coding system (commented line), > then I see the following when ediff'ing: > > ---------------------------------------- > | ^M | | > | pr\351sente ^M | pr=E9sente | > | ^M | | > | | | > |------------------|-------------------| > |-1\%% |-1\-- | > ---------------------------------------- > iso-latin-1-dos? iso-latin-1-dos Here certainly the left buffer is not -dos =96 otherwise the ^M would =20= not appear there. > > To indicate the coding system under the window, I used > > M-x describe-coding-system RET RET > > but, for the base version, it states "not set locally, use the > default"; that's why I wrote the default coding system for new > files and put a interrogation mark after (because I'm not sure > this is the correct way to do). There are no local settings in the file (see below), so some default =20 is assumed that the *Help* buffer should describe: Coding system for saving this buffer: 0 -- iso-latin-9-unix =09 Default coding system (for new files): u -- mule-utf-8-unix =09 Coding system for keyboard input: nil Coding system for terminal output: u -- mule-utf-8 (alias: utf-8) =09 Defaults for subprocess I/O: decoding: u -- mule-utf-8 (alias: utf-8) =09 encoding: u -- mule-utf-8 (alias: utf-8) =09 =09 Priority order for recognizing coding systems when reading = files: . . . The prefer-coding-system setting also effects your old files: they =20 can now be interpreted differently then when they were created and =20 saved. You could continue to stick at iso-latin-9-dos to have the =80 =20= and keep your old files unchanged. Every new (and old) file will have =20= some extra ^M bytes, but at least new and old ones will be treated =20 equally. (Conversion could be done, on the command line (recode, =20 iconv) or more time consuming with GNU Emacs: Options menu -> Mule -> =20= Set Coding Systems.) > > So, you can see that, whatever I do, I can't compare my buffers > in a normal way... I'm completely lost... Try: (prefer-coding-system 'iso-latin-9-dos). You also can use some of these calls each with a different encoding. =20 These will make GNU Emacs first to choose from this list and then try =20= to find another encoding. > > PS- As promised, an extract of my .emacs config file: > > ,----[ my Emacs Init File ] > | > | (message "26 International Character Set Support...") > | > | ;; default input method for multilingual text > | (setq default-input-method "latin-9-prefix") I do not use any input method: my keyboard creates/composes =E9 by =20 pressing the dead key =B4 first and then the e. Works also for some =20 other accented characters. Actually I think I never used any Emacs =20 input method. 20 years ago I had DEC or Sun keyboards with a Compose =20 key, now the X server allows to have other characters with alt or =20 shift-alt pressed ... > | > | ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS > | (GNUEmacs > | (try-require 'un-define)) This was necessary with GNU Emacs 20. The recent versions 21.x have =20 MULE somehow built-in. Could be that this line causes a lot of your =20 trouble. (A good way to test the built-in capabilities is to launch =20 GNU Emacs with -Q: no site or user specific initialisation files are =20 used. And it might perform better ...) > | > | (add-to-list 'file-coding-system-alist > | '("\\.owl\\'" utf-8 . utf-8)) This obviously only effects .owl files. > | ;; In GNU Emacs, when you specify the coding explicitly in the =20 > file, that > | ;; overrides `file-coding-system-alist'. Not in XEmacs? > | > | ;; ;; default coding system (for new files) > | (GNUEmacs > | (prefer-coding-system 'utf-8)) You might consider to add -dos, but it's more important that you =20 understand that this change will make a lot of your old files =20 unusable. In UTF-8 only the 7 bit ASCII range is encoded by one =20 octet. All 8 bit characters from the ISO Latin encodings are encoding =20= by two octets (or even three, for example the =80). Your =E9 is encoded =20= as C3 A9 (=80 as the well known E2 82 AC). If GNU Emacs only sees E9 =20 (or A4 for =80), it will make mistakes! If you switch to UTF-8 you =20 would need to convert all text files first, or save their old =20 encodings by adding a header line like this as the first line: -*- mode: Text; coding: iso-8859-9; -*- The mode part is not necessary (could also be tex or latex), but =20 coding *is*. The other option is 'local variables' in the file's footer: %%% Local Variables: %%% coding: iso-8859-9 %%% mode: tex %%% End: and might need to teach GNU Emacs that these local variables are 'safe'. > | > | (GNUEmacs > | ;; to copy and paste outside Emacs > | (set-clipboard-coding-system 'iso-latin-9)) ;; aka iso-8859-15 This depends on the windowing system you use. Now, I think, most will =20= use UTF-8 ... > | > | ;; unify the Latin-N charsets, so that Emacs knows that the =E9 in =20= > Latin-9 > | ;; (with the euro) is the same as the =E9 in Latin-1 (without the =20= > euro) > | ;; [avoid the small accentuated characters] > | (when (try-require 'ucs-tables) > | (unify-8859-on-encoding-mode 1) ;; harmless > | (unify-8859-on-decoding-mode 1)) ;; may unexpectedly change =20 > files if they > | ;; contain different Latin-N =20= > charsets > | ;; which should not be unified I use these two in GNU Emacs 21.3.50 without the MULE/ucs clause ... > | > | (when window-system > | ;; functions for dealing with char tables > | (require 'disp-table)) This might have been useful in GNU Emacs 20 and before. I never used =20 it, except for european-display or such, maybe. And I also avoid set-=20 language-environment: this is close to obsolete, politely writing. > | > | (XEmacs > | (require 'iso-syntax)) I'm not really an XEmacs user, but I think this is also something =20 from the past, 20th century or before. Could be GNU Emacs 22.0.50 serves you better. Both GNU Emacsen, =20 22.0.50 and 21.3, work better and set up internally better when they =20 read environment variables like LC_CTYPE, LANG, or LC_ALL that =20 explain in which environment they are running. Then you only need to =20 specify exceptions from this general rule. I have in my environment =20 LC_CTYPE=3Dde_DE.UTF-8 ... -- Greetings Pete A morning without coffee is like something without something else.