* Ediff problem with accents
@ 2006-09-12 20:32 Sébastien Vauban
2006-09-12 21:36 ` Peter Dyballa
[not found] ` <mailman.6839.1158097004.9609.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 4+ messages in thread
From: Sébastien Vauban @ 2006-09-12 20:32 UTC (permalink / raw)
Hi,
Since recently (new PC installation, in fact), I've got a weird
trouble whose I can't understand the roots of. It's over the
behavior of ediff.
I've always used ediff without any problem. That great tool
always helped me re-reading the changes I've made before
committing and logging a sensible comment.
Now, I can't really used it anymore, as it sees every accent as
being a difference between the source and the modified file...
Here's an example:
,----[ Source buffer ]
| Ce document présente le détail des modifications apportées...
| Avant.
|
|----[ Modified buffer ]
| Ce document présente le détail des modifications apportées...
| Maintenant.
`----
Needless to say that it becomes heavily difficult to distinguish
the real modifications (`Avant' -> `Maintenant') from the "false
positives" (`présente' -> `présente').
Note that this problem only occurs with ediff, not at all with
diff (that only spots the real modifications).
Any help?
Thank you very much,
Seb
--
Sébastien Vauban
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Ediff problem with accents
2006-09-12 20:32 Ediff problem with accents Sébastien Vauban
@ 2006-09-12 21:36 ` Peter Dyballa
[not found] ` <873bak2r61.fsf_-_@mundaneum.mygooglest.com>
[not found] ` <mailman.6839.1158097004.9609.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 4+ messages in thread
From: Peter Dyballa @ 2006-09-12 21:36 UTC (permalink / raw)
Cc: help-gnu-emacs
Am 12.09.2006 um 20:32 schrieb Sébastien Vauban:
> ,----[ Source buffer ]
> | Ce document présente le détail des modifications apportées...
> | Avant.
> |
> |----[ Modified buffer ]
> | Ce document présente le détail des modifications apportées...
> | Maintenant.
> `----
The upper buffer's contents is obviously presented in an ISO Latin
encoding, presumingly ISO 8859-1 or ISO 8859-15, the lower one shows
an UTF-8 contents in obviously the same ISO Latin encoding (é is in
UTF-8 C3 A9, or: Ã ©). But it seems more likely that (almost) the
same UTF-8 contents is displayed once in UTF-8 (correct) and once in
ISO Latin (incorrect).
To make both buffers appear (in) the same (encoding) you should put
into your .emacs file:
(prefer-coding-system 'utf-8-unix)
or set environment variables like LC_CTYPE or LANG with UTF-8 in it,
like mine: de_DE.UTF-8. This setting will make GNU Emacs to
automatically use UTF-8.
At least if works for my buffers with UTF-8 contents in ediff ...
--
Greetings
Pete
"Let's face it; we don't want a free market economy either."
James Farley, president, Coca-Cola Export Corp., 1959
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Ediff problem with accents
[not found] ` <mailman.6839.1158097004.9609.help-gnu-emacs@gnu.org>
@ 2006-09-22 10:14 ` Sébastien Vauban
0 siblings, 0 replies; 4+ messages in thread
From: Sébastien Vauban @ 2006-09-22 10:14 UTC (permalink / raw)
Hello Peter,
Sorry for the long delay... but it was impossible for me to make
the wished tests until now.
FYI, I've sanitized my .emacs section about the coding systems
(you'll see an extract beneath), and I've made a lot of
comparisons.
I still have the problem, but here follows a deeper insight on
what I'm experiencing:
o if (prefer-coding-system 'iso-latin-9),
then I see the following when ediff'ing:
----------------------------------------
| ^M | |
| pr\351sente ^M | présente |
| ^M | |
| | |
|------------------|-------------------|
|-0:%% |-0\-- | (modeline)
----------------------------------------
iso-latin-9? iso-latin-9-dos
o if (prefer-coding-system 'utf-8),
then I see the following when ediff'ing:
----------------------------------------
| ^M | |
| pr\351sente ^M | présente |
| ^M | |
| | |
|------------------|-------------------|
|-u:%% |-1\-- |
----------------------------------------
utf-8? iso-latin-1-dos
o if I don't set any preferred coding system (commented line),
then I see the following when ediff'ing:
----------------------------------------
| ^M | |
| pr\351sente ^M | présente |
| ^M | |
| | |
|------------------|-------------------|
|-1\%% |-1\-- |
----------------------------------------
iso-latin-1-dos? iso-latin-1-dos
To indicate the coding system under the window, I used
M-x describe-coding-system RET RET
but, for the base version, it states "not set locally, use the
default"; that's why I wrote the default coding system for new
files and put a interrogation mark after (because I'm not sure
this is the correct way to do).
So, you can see that, whatever I do, I can't compare my buffers
in a normal way... I'm completely lost...
THANK YOU very much for any help you could bring me,
Seb
PS- As promised, an extract of my .emacs config file:
,----[ my Emacs Init File ]
|
| (message "26 International Character Set Support...")
|
| ;; default input method for multilingual text
| (setq default-input-method "latin-9-prefix")
|
| ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS
| (GNUEmacs
| (try-require 'un-define))
|
| (add-to-list 'file-coding-system-alist
| '("\\.owl\\'" utf-8 . utf-8))
| ;; In GNU Emacs, when you specify the coding explicitly in the file, that
| ;; overrides `file-coding-system-alist'. Not in XEmacs?
|
| ;; ;; default coding system (for new files)
| (GNUEmacs
| (prefer-coding-system 'utf-8))
|
| (GNUEmacs
| ;; to copy and paste outside Emacs
| (set-clipboard-coding-system 'iso-latin-9)) ;; aka iso-8859-15
|
| ;; unify the Latin-N charsets, so that Emacs knows that the é in Latin-9
| ;; (with the euro) is the same as the é in Latin-1 (without the euro)
| ;; [avoid the small accentuated characters]
| (when (try-require 'ucs-tables)
| (unify-8859-on-encoding-mode 1) ;; harmless
| (unify-8859-on-decoding-mode 1)) ;; may unexpectedly change files if they
| ;; contain different Latin-N charsets
| ;; which should not be unified
|
| (when window-system
| ;; functions for dealing with char tables
| (require 'disp-table))
|
| (XEmacs
| (require 'iso-syntax))
|
| (message "26 International Character Set Support... Done")
`----
--
Sébastien Vauban
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Ediff problem with accents
[not found] ` <873bak2r61.fsf_-_@mundaneum.mygooglest.com>
@ 2006-09-22 10:42 ` Peter Dyballa
0 siblings, 0 replies; 4+ messages in thread
From: Peter Dyballa @ 2006-09-22 10:42 UTC (permalink / raw)
Cc: GNU Emacs List
Am 22.09.2006 um 11:20 schrieb Sébastien Vauban:
> Hello Peter,
>
> Sorry for the long delay... but it was impossible for me to make
> the wished tests until now.
>
> Note -- You can copy this mail to gnus.emacs.help. I don't
> have news access from where I am now...
>
> FYI, I've sanitized my .emacs section about the coding systems
> (you'll see an extract beneath), and I've made a lot of
> comparisons.
>
> I still have the problem, but here follows a deeper insight on
> what I'm experiencing:
>
> o if (prefer-coding-system 'iso-latin-9),
> then I see the following when ediff'ing:
>
> ----------------------------------------
> | ^M | |
> | pr\351sente ^M | présente |
> | ^M | |
> | | |
> |------------------|-------------------|
> |-0:%% |-0\-- | (modeline)
> ----------------------------------------
> iso-latin-9? iso-latin-9-dos
That's correct, for the modeline: ISO 8859-15 or ISO Latin-9 encoding
is used. The left buffer is read-only, the right one is not changed?
The ``\´´ puzzles me, but it's such a long time that I have used GNU
Emacs on some MS Losedows, that I cannot remember. The ^M in the left
buffer should not appear, probably the right value for the encoding
is the right one, which also presents présente the right way.
>
>
> o if (prefer-coding-system 'utf-8),
> then I see the following when ediff'ing:
>
> ----------------------------------------
> | ^M | |
> | pr\351sente ^M | présente |
> | ^M | |
> | | |
> |------------------|-------------------|
> |-u:%% |-1\-- |
> ----------------------------------------
> utf-8? iso-latin-1-dos
Again, the mode-lines are right and the left buffer need to be
specified as utf-8-dos to make the ^M disappear and make pr\351sente
appear correctly. The prefer-coding-system function allows the use of
"extensions" like -dos, -mac, -unix to specify exactly the preferred
encoding.
>
>
> o if I don't set any preferred coding system (commented line),
> then I see the following when ediff'ing:
>
> ----------------------------------------
> | ^M | |
> | pr\351sente ^M | présente |
> | ^M | |
> | | |
> |------------------|-------------------|
> |-1\%% |-1\-- |
> ----------------------------------------
> iso-latin-1-dos? iso-latin-1-dos
Here certainly the left buffer is not -dos – otherwise the ^M would
not appear there.
>
> To indicate the coding system under the window, I used
>
> M-x describe-coding-system RET RET
>
> but, for the base version, it states "not set locally, use the
> default"; that's why I wrote the default coding system for new
> files and put a interrogation mark after (because I'm not sure
> this is the correct way to do).
There are no local settings in the file (see below), so some default
is assumed that the *Help* buffer should describe:
Coding system for saving this buffer:
0 -- iso-latin-9-unix
Default coding system (for new files):
u -- mule-utf-8-unix
Coding system for keyboard input:
nil
Coding system for terminal output:
u -- mule-utf-8 (alias: utf-8)
Defaults for subprocess I/O:
decoding: u -- mule-utf-8 (alias: utf-8)
encoding: u -- mule-utf-8 (alias: utf-8)
Priority order for recognizing coding systems when reading files:
.
.
.
The prefer-coding-system setting also effects your old files: they
can now be interpreted differently then when they were created and
saved. You could continue to stick at iso-latin-9-dos to have the €
and keep your old files unchanged. Every new (and old) file will have
some extra ^M bytes, but at least new and old ones will be treated
equally. (Conversion could be done, on the command line (recode,
iconv) or more time consuming with GNU Emacs: Options menu -> Mule ->
Set Coding Systems.)
>
> So, you can see that, whatever I do, I can't compare my buffers
> in a normal way... I'm completely lost...
Try: (prefer-coding-system 'iso-latin-9-dos).
You also can use some of these calls each with a different encoding.
These will make GNU Emacs first to choose from this list and then try
to find another encoding.
>
> PS- As promised, an extract of my .emacs config file:
>
> ,----[ my Emacs Init File ]
> |
> | (message "26 International Character Set Support...")
> |
> | ;; default input method for multilingual text
> | (setq default-input-method "latin-9-prefix")
I do not use any input method: my keyboard creates/composes é by
pressing the dead key ´ first and then the e. Works also for some
other accented characters. Actually I think I never used any Emacs
input method. 20 years ago I had DEC or Sun keyboards with a Compose
key, now the X server allows to have other characters with alt or
shift-alt pressed ...
> |
> | ;; if you want to use UTF-8 on Emacs 21.3, install Mule-UCS
> | (GNUEmacs
> | (try-require 'un-define))
This was necessary with GNU Emacs 20. The recent versions 21.x have
MULE somehow built-in. Could be that this line causes a lot of your
trouble. (A good way to test the built-in capabilities is to launch
GNU Emacs with -Q: no site or user specific initialisation files are
used. And it might perform better ...)
> |
> | (add-to-list 'file-coding-system-alist
> | '("\\.owl\\'" utf-8 . utf-8))
This obviously only effects .owl files.
> | ;; In GNU Emacs, when you specify the coding explicitly in the
> file, that
> | ;; overrides `file-coding-system-alist'. Not in XEmacs?
> |
> | ;; ;; default coding system (for new files)
> | (GNUEmacs
> | (prefer-coding-system 'utf-8))
You might consider to add -dos, but it's more important that you
understand that this change will make a lot of your old files
unusable. In UTF-8 only the 7 bit ASCII range is encoded by one
octet. All 8 bit characters from the ISO Latin encodings are encoding
by two octets (or even three, for example the €). Your é is encoded
as C3 A9 (€ as the well known E2 82 AC). If GNU Emacs only sees E9
(or A4 for €), it will make mistakes! If you switch to UTF-8 you
would need to convert all text files first, or save their old
encodings by adding a header line like this as the first line:
-*- mode: Text; coding: iso-8859-9; -*-
The mode part is not necessary (could also be tex or latex), but
coding *is*. The other option is 'local variables' in the file's footer:
%%% Local Variables:
%%% coding: iso-8859-9
%%% mode: tex
%%% End:
and might need to teach GNU Emacs that these local variables are 'safe'.
> |
> | (GNUEmacs
> | ;; to copy and paste outside Emacs
> | (set-clipboard-coding-system 'iso-latin-9)) ;; aka iso-8859-15
This depends on the windowing system you use. Now, I think, most will
use UTF-8 ...
> |
> | ;; unify the Latin-N charsets, so that Emacs knows that the é in
> Latin-9
> | ;; (with the euro) is the same as the é in Latin-1 (without the
> euro)
> | ;; [avoid the small accentuated characters]
> | (when (try-require 'ucs-tables)
> | (unify-8859-on-encoding-mode 1) ;; harmless
> | (unify-8859-on-decoding-mode 1)) ;; may unexpectedly change
> files if they
> | ;; contain different Latin-N
> charsets
> | ;; which should not be unified
I use these two in GNU Emacs 21.3.50 without the MULE/ucs clause ...
> |
> | (when window-system
> | ;; functions for dealing with char tables
> | (require 'disp-table))
This might have been useful in GNU Emacs 20 and before. I never used
it, except for european-display or such, maybe. And I also avoid set-
language-environment: this is close to obsolete, politely writing.
> |
> | (XEmacs
> | (require 'iso-syntax))
I'm not really an XEmacs user, but I think this is also something
from the past, 20th century or before.
Could be GNU Emacs 22.0.50 serves you better. Both GNU Emacsen,
22.0.50 and 21.3, work better and set up internally better when they
read environment variables like LC_CTYPE, LANG, or LC_ALL that
explain in which environment they are running. Then you only need to
specify exceptions from this general rule. I have in my environment
LC_CTYPE=de_DE.UTF-8 ...
--
Greetings
Pete
A morning without coffee is like something without something else.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-09-22 10:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-12 20:32 Ediff problem with accents Sébastien Vauban
2006-09-12 21:36 ` Peter Dyballa
[not found] ` <873bak2r61.fsf_-_@mundaneum.mygooglest.com>
2006-09-22 10:42 ` Peter Dyballa
[not found] ` <mailman.6839.1158097004.9609.help-gnu-emacs@gnu.org>
2006-09-22 10:14 ` Sébastien Vauban
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).