* Re: Ediff problem with accents
[not found] ` <873bak2r61.fsf_-_@mundaneum.mygooglest.com>
@ 2006-09-22 10:42 ` Peter Dyballa
0 siblings, 0 replies; 4+ messages in thread
From: Peter Dyballa @ 2006-09-22 10:42 UTC (permalink / raw)
Cc: GNU Emacs List
Am 22.09.2006 um 11:20 schrieb Sébastien Vauban:
> Hello Peter,
>
> Sorry for the long delay... but it was impossible for me to make
> the wished tests until now.
>
> Note -- You can copy this mail to gnus.emacs.help. I don't
> have news access from where I am now...
>
> FYI, I've sanitized my .emacs section about the coding systems
> (you'll see an extract beneath), and I've made a lot of
> comparisons.
>
> I still have the problem, but here follows a deeper insight on
> what I'm experiencing:
>
> o if (prefer-coding-system 'iso-latin-9),
> then I see the following when ediff'ing:
>
> ----------------------------------------
> | ^M | |
> | pr\351sente ^M | présente |
> | ^M | |
> | | |
> |------------------|-------------------|
> |-0:%% |-0\-- | (modeline)
> ----------------------------------------
> iso-latin-9? iso-latin-9-dos
That's correct, for the modeline: ISO 8859-15 or ISO Latin-9 encoding
is used. The left buffer is read-only, the right one is not changed?
The ``\´´ puzzles me, but it's such a long time that I have used GNU
Emacs on some MS Losedows, that I cannot remember. The ^M in the left
buffer should not appear, probably the right value for the encoding
is the right one, which also presents présente the right way.
>
>
> o if (prefer-coding-system 'utf-8),
> then I see the following when ediff'ing:
>
> ----------------------------------------
> | ^M | |
> | pr\351sente ^M | présente |
> | ^M | |
> | | |
> |------------------|-------------------|
> |-u:%% |-1\-- |
> ----------------------------------------
> utf-8? iso-latin-1-dos
Again, the mode-lines are right and the left buffer need to be
specified as utf-8-dos to make the ^M disappear and make pr\351sente
appear correctly. The prefer-coding-system function allows the use of
"extensions" like -dos, -mac, -unix to specify exactly the preferred
encoding.
>
>
> o if I don't set any preferred coding system (commented line),
> then I see the following when ediff'ing:
>
> ----------------------------------------
> | ^M | |
> | pr\351sente ^M | présente |
> | ^M | |
> | | |
> |------------------|-------------------|
> |-1\%% |-1\-- |
> ----------------------------------------
> iso-latin-1-dos? iso-latin-1-dos
Here certainly the left buffer is not -dos – otherwise the ^M would
not appear there.
>
> To indicate the coding system under the window, I used
>
> M-x describe-coding-system RET RET
>
> but, for the base version, it states "not set locally, use the
> default"; that's why I wrote the default coding system for new
> files and put a interrogation mark after (because I'm not sure
> this is the correct way to do).
There are no local settings in the file (see below), so some default
is assumed that the *Help* buffer should describe:
Coding system for saving this buffer:
0 -- iso-latin-9-unix
Default coding system (for new files):
u -- mule-utf-8-unix
Coding system for keyboard input:
nil
Coding system for terminal output:
u -- mule-utf-8 (alias: utf-8)
Defaults for subprocess I/O:
decoding: u -- mule-utf-8 (alias: utf-8)
encoding: u -- mule-utf-8 (alias: utf-8)
Priority order for recognizing coding systems when reading files:
.
.
.
The prefer-coding-system setting also effects your old files: they
can now be interpreted differently then when they were created and
saved. You could continue to stick at iso-latin-9-dos to have the €
and keep your old files unchanged. Every new (and old) file will have
some extra ^M bytes, but at least new and old ones will be treated
equally. (Conversion could be done, on the command line (recode,
iconv) or more time consuming with GNU Emacs: Options menu -> Mule ->
Set Coding Systems.)
>
> So, you can see that, whatever I do, I can't compare my buffers
> in a normal way... I'm completely lost...
Try: (prefer-coding-system 'iso-latin-9-dos).
You also can use some of these calls each with a different encoding.
These will make GNU Emacs first to choose from this list and then try
to find another encoding.
>
> PS- As promised, an extract of my .emacs config file:
>
> ,----[ my Emacs Init File ]
> |
> | (message "26 International Character Set Support...")
> |
> | ;; default input method for multilingual text
> | (setq default-input-method "latin-9-prefix")
I do not use any input method: my keyboard creates/composes é by
pressing the dead key ´ first and then the e. Works also for some
other accented characters. Actually I think I never used any Emacs
input method. 20 years ago I had DEC or Sun keyboards with a Compose
key, now the X server allows to have other characters with alt or
shift-alt pressed ...
> |
> | ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS
> | (GNUEmacs
> | (try-require 'un-define))
This was necessary with GNU Emacs 20. The recent versions 21.x have
MULE somehow built-in. Could be that this line causes a lot of your
trouble. (A good way to test the built-in capabilities is to launch
GNU Emacs with -Q: no site or user specific initialisation files are
used. And it might perform better ...)
> |
> | (add-to-list 'file-coding-system-alist
> | '("\\.owl\\'" utf-8 . utf-8))
This obviously only effects .owl files.
> | ;; In GNU Emacs, when you specify the coding explicitly in the
> file, that
> | ;; overrides `file-coding-system-alist'. Not in XEmacs?
> |
> | ;; ;; default coding system (for new files)
> | (GNUEmacs
> | (prefer-coding-system 'utf-8))
You might consider to add -dos, but it's more important that you
understand that this change will make a lot of your old files
unusable. In UTF-8 only the 7 bit ASCII range is encoded by one
octet. All 8 bit characters from the ISO Latin encodings are encoding
by two octets (or even three, for example the €). Your é is encoded
as C3 A9 (€ as the well known E2 82 AC). If GNU Emacs only sees E9
(or A4 for €), it will make mistakes! If you switch to UTF-8 you
would need to convert all text files first, or save their old
encodings by adding a header line like this as the first line:
-*- mode: Text; coding: iso-8859-9; -*-
The mode part is not necessary (could also be tex or latex), but
coding *is*. The other option is 'local variables' in the file's footer:
%%% Local Variables:
%%% coding: iso-8859-9
%%% mode: tex
%%% End:
and might need to teach GNU Emacs that these local variables are 'safe'.
> |
> | (GNUEmacs
> | ;; to copy and paste outside Emacs
> | (set-clipboard-coding-system 'iso-latin-9)) ;; aka iso-8859-15
This depends on the windowing system you use. Now, I think, most will
use UTF-8 ...
> |
> | ;; unify the Latin-N charsets, so that Emacs knows that the é in
> Latin-9
> | ;; (with the euro) is the same as the é in Latin-1 (without the
> euro)
> | ;; [avoid the small accentuated characters]
> | (when (try-require 'ucs-tables)
> | (unify-8859-on-encoding-mode 1) ;; harmless
> | (unify-8859-on-decoding-mode 1)) ;; may unexpectedly change
> files if they
> | ;; contain different Latin-N
> charsets
> | ;; which should not be unified
I use these two in GNU Emacs 21.3.50 without the MULE/ucs clause ...
> |
> | (when window-system
> | ;; functions for dealing with char tables
> | (require 'disp-table))
This might have been useful in GNU Emacs 20 and before. I never used
it, except for european-display or such, maybe. And I also avoid set-
language-environment: this is close to obsolete, politely writing.
> |
> | (XEmacs
> | (require 'iso-syntax))
I'm not really an XEmacs user, but I think this is also something
from the past, 20th century or before.
Could be GNU Emacs 22.0.50 serves you better. Both GNU Emacsen,
22.0.50 and 21.3, work better and set up internally better when they
read environment variables like LC_CTYPE, LANG, or LC_ALL that
explain in which environment they are running. Then you only need to
specify exceptions from this general rule. I have in my environment
LC_CTYPE=de_DE.UTF-8 ...
--
Greetings
Pete
A morning without coffee is like something without something else.
^ permalink raw reply [flat|nested] 4+ messages in thread