unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* garbage chars when pasting French chars into emacs
@ 2012-02-01 20:41 ken
  2012-02-01 21:23 ` Eli Zaretskii
  2012-02-01 21:29 ` garbage chars when pasting French chars into emacs Philipp Haselwarter
  0 siblings, 2 replies; 9+ messages in thread
From: ken @ 2012-02-01 20:41 UTC (permalink / raw)
  To: GNU Emacs List

Just to be comprehensive I'll state at the outset that I'm using Linux 
(CentOS 5.7), so this is the environment emacs is working in.  From a 
shell I get this:

$ set|grep -i lang
LANG=en_US.UTF-8

Now I pull up a webpage with some French on it: 
<http://www.wikilivres.info/wiki/Maurice_Merleau-Ponty>.  Examining the 
source code of this page, I see at the top:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

So this page is presented in UTF-8.

Firefox is also set to present pages in UTF-8: View -> Character 
Encoding -> UTF-8

But when I copy and paste the text from "Francais" to "invisible, 1964)" 
inclusive, many of the characters aren't rendered correctly; I get 
"garbage" characters in their stead, e.g., the second-to-last line 
appears something like this:

     * L^[$(B!G^[$(C)+^[(Bil et l^[$(B!G^[(Besprit, Gallimard, 1960

Other lines are improperly rendered also.

I'd like to fix this.  And if possible understand why this doesn't work, 
so I might be able to diagnose these problems for myself.

BTW, I'm using GNU Emacs 21.4.1 (i686-redhat-linux-gnu, X toolkit, Xaw3d 
scroll bars) of 2011-04-28 on builder10.centos.org

Yes, it's an older version, but it's the latest from the CentOS 5.7 
distribution.  (Blame Red Hat.)


Thanks for your help.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: garbage chars when pasting French chars into emacs
  2012-02-01 20:41 garbage chars when pasting French chars into emacs ken
@ 2012-02-01 21:23 ` Eli Zaretskii
  2012-02-02  2:39   ` ken
  2012-02-01 21:29 ` garbage chars when pasting French chars into emacs Philipp Haselwarter
  1 sibling, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2012-02-01 21:23 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Wed, 01 Feb 2012 15:41:42 -0500
> From: ken <gebser@mousecar.com>
> 
> Just to be comprehensive I'll state at the outset that I'm using Linux 
> (CentOS 5.7), so this is the environment emacs is working in.  From a 
> shell I get this:
> 
> $ set|grep -i lang
> LANG=en_US.UTF-8
> 
> Now I pull up a webpage with some French on it: 
> <http://www.wikilivres.info/wiki/Maurice_Merleau-Ponty>.  Examining the 
> source code of this page, I see at the top:
> 
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
> 
> So this page is presented in UTF-8.
> 
> Firefox is also set to present pages in UTF-8: View -> Character 
> Encoding -> UTF-8
> 
> But when I copy and paste the text from "Francais" to "invisible, 1964)" 
> inclusive, many of the characters aren't rendered correctly; I get 
> "garbage" characters in their stead, e.g., the second-to-last line 
> appears something like this:
> 
>      * L^[$(B!G^[$(C)+^[(Bil et l^[$(B!G^[(Besprit, Gallimard, 1960
> 
> Other lines are improperly rendered also.
> 
> I'd like to fix this.  And if possible understand why this doesn't work, 
> so I might be able to diagnose these problems for myself.

What is your value of selection-coding-system?  Try setting it to
something like ctext-with-extensions.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: garbage chars when pasting French chars into emacs
  2012-02-01 20:41 garbage chars when pasting French chars into emacs ken
  2012-02-01 21:23 ` Eli Zaretskii
@ 2012-02-01 21:29 ` Philipp Haselwarter
  1 sibling, 0 replies; 9+ messages in thread
From: Philipp Haselwarter @ 2012-02-01 21:29 UTC (permalink / raw)
  To: help-gnu-emacs

Works perfectly for me (GNU Emacs 24).

Do you use emacs in a terminal or in graphic mode?
Is building a newer emacs an option? If it's a single user workstation
or if you're the only emacs user it shouldn't be hard.

-- 
Philipp Haselwarter




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: garbage chars when pasting French chars into emacs
  2012-02-01 21:23 ` Eli Zaretskii
@ 2012-02-02  2:39   ` ken
  2012-02-02  3:55     ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: ken @ 2012-02-02  2:39 UTC (permalink / raw)
  To: GNU Emacs List


On 02/01/2012 04:23 PM Eli Zaretskii wrote:
>> Date: Wed, 01 Feb 2012 15:41:42 -0500
>> From: ken <gebser@mousecar.com>
>>
>> Just to be comprehensive I'll state at the outset that I'm using Linux 
>> (CentOS 5.7), so this is the environment emacs is working in.  From a 
>> shell I get this:
>>
>> $ set|grep -i lang
>> LANG=en_US.UTF-8
>>
>> Now I pull up a webpage with some French on it: 
>> <http://www.wikilivres.info/wiki/Maurice_Merleau-Ponty>.  Examining the 
>> source code of this page, I see at the top:
>>
>> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
>>
>> So this page is presented in UTF-8.
>>
>> Firefox is also set to present pages in UTF-8: View -> Character 
>> Encoding -> UTF-8
>>
>> But when I copy and paste the text from "Francais" to "invisible, 1964)" 
>> inclusive, many of the characters aren't rendered correctly; I get 
>> "garbage" characters in their stead, e.g., the second-to-last line 
>> appears something like this:
>>
>>      * L^[$(B!G^[$(C)+^[(Bil et l^[$(B!G^[(Besprit, Gallimard, 1960
>>
>> Other lines are improperly rendered also.
>>
>> I'd like to fix this.  And if possible understand why this doesn't work, 
>> so I might be able to diagnose these problems for myself.
> 
> What is your value of selection-coding-system?  Try setting it to
> something like ctext-with-extensions.

Thanks, Eli,

Immediately prior to doing the copy-and-paste I ran all of these:

(set-language-environment               'UTF-8)
(set-default-coding-systems             'utf-8)
(setq file-name-coding-system           'utf-8)
(setq default-buffer-file-coding-system 'utf-8)
(setq coding-system-for-write           'utf-8)
(set-keyboard-coding-system             'utf-8)
(set-terminal-coding-system             'utf-8)
(set-clipboard-coding-system            'utf-8)
(set-selection-coding-system            'utf-8)
(prefer-coding-system                   'utf-8)
(modify-coding-system-alist 'process "\\*shell\\*\\'" 'utf-8-unix)

Following your advice, I ran

(set-selection-coding-system 'ctext-with-extensions)

and then did the same copy-and-paste again.  This got more of the 
characters correct, but not all of them.  So we're a lot closer....  Got 
another suggestion?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: garbage chars when pasting French chars into emacs
  2012-02-02  2:39   ` ken
@ 2012-02-02  3:55     ` Eli Zaretskii
  2012-02-02 20:00       ` ken
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2012-02-02  3:55 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Wed, 01 Feb 2012 21:39:22 -0500
> From: ken <gebser@mousecar.com>
> 
> > What is your value of selection-coding-system?  Try setting it to
> > something like ctext-with-extensions.
> 
> Thanks, Eli,
> 
> Immediately prior to doing the copy-and-paste I ran all of these:
> 
> (set-language-environment               'UTF-8)
> (set-default-coding-systems             'utf-8)
> (setq file-name-coding-system           'utf-8)
> (setq default-buffer-file-coding-system 'utf-8)
> (setq coding-system-for-write           'utf-8)
> (set-keyboard-coding-system             'utf-8)
> (set-terminal-coding-system             'utf-8)
> (set-clipboard-coding-system            'utf-8)
> (set-selection-coding-system            'utf-8)
> (prefer-coding-system                   'utf-8)
> (modify-coding-system-alist 'process "\\*shell\\*\\'" 'utf-8-unix)

Not a good idea, I'm afraid: the UTF-8 support in Emacs 21 left a lot
to be desired.

> Following your advice, I ran
> 
> (set-selection-coding-system 'ctext-with-extensions)
> 
> and then did the same copy-and-paste again.  This got more of the 
> characters correct, but not all of them.

Can you show the original text, and then what you have after pasting?
I need to see which characters aren't pasted correctly.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: garbage chars when pasting French chars into emacs
  2012-02-02  3:55     ` Eli Zaretskii
@ 2012-02-02 20:00       ` ken
  2012-02-03  7:31         ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: ken @ 2012-02-02 20:00 UTC (permalink / raw)
  To: GNU Emacs List

On 02/01/2012 10:55 PM Eli Zaretskii wrote:
>> Date: Wed, 01 Feb 2012 21:39:22 -0500
>> From: ken <gebser@mousecar.com>
>>
>>> What is your value of selection-coding-system?  Try setting it to
>>> something like ctext-with-extensions.
>> Thanks, Eli,
>>
>> Immediately prior to doing the copy-and-paste I ran all of these:
>>
>> (set-language-environment               'UTF-8)
>> (set-default-coding-systems             'utf-8)
>> (setq file-name-coding-system           'utf-8)
>> (setq default-buffer-file-coding-system 'utf-8)
>> (setq coding-system-for-write           'utf-8)
>> (set-keyboard-coding-system             'utf-8)
>> (set-terminal-coding-system             'utf-8)
>> (set-clipboard-coding-system            'utf-8)
>> (set-selection-coding-system            'utf-8)
>> (prefer-coding-system                   'utf-8)
>> (modify-coding-system-alist 'process "\\*shell\\*\\'" 'utf-8-unix)
> 
> Not a good idea, I'm afraid: the UTF-8 support in Emacs 21 left a lot
> to be desired.
> 
>> Following your advice, I ran
>>
>> (set-selection-coding-system 'ctext-with-extensions)
>>
>> and then did the same copy-and-paste again.  This got more of the 
>> characters correct, but not all of them.

Looking again, I see my eyes must have been malfunctioning for the text 
pasted into emacs is rendered correctly.  However, when I try to save 
the text, I'm presented with the minibuffer message "Select coding 
system (default iso-2022-jp-2): ".  At the same time a second buffer 
opens under the first giving the options for coding system:

===================================================================
These default coding systems were tried:
   mule-utf-8-unix iso-latin-1
However, none of them safely encodes the target text.

Select one of the following safe coding systems:
   iso-2022-jp-2 x-ctext iso-2022-7bit raw-text emacs-mule no-conversion
   ctext-no-compositions iso-2022-8bit-ss2 iso-2022-7bit-lock
   iso-2022-7bit-ss2 tibetan-iso-8bit-with-esc thai-tis620-with-esc
   lao-with-esc korean-iso-8bit-with-esc hebrew-iso-8bit-with-esc
   greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc
   iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc
   iso-latin-2-with-esc iso-latin-1-with-esc
   in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc
   chinese-iso-8bit-with-esc japanese-iso-8bit-with-esc
===================================================================

Entering "utf-8" into the minibuffer, of course, doesn't work.  Frankly, 
I'd like to save into utf-8, this to avoid having problems with this 
same file at a later time when I add more text to it.

> Can you show the original text, and then what you have after pasting?
> I need to see which characters aren't pasted correctly.

The original text must have been edited out somewhere in the thread. 
It's at <http://www.wikilivres.info/wiki/Maurice_Merleau-Ponty>, from 
"Francais" through the end of the unordered list, ie, the line ending 
with "et l’invisible, 1964".

Here it is:

==========================================================
Français

     * La Structure du comportement, 1942
     * La Phénoménologie de la perception, 1945
     * Humanisme et terreur, 1947
     * Sens et non-sens, 1948
     * Les Sciences de l’homme et la phénoménologie
     * Les Relations avec autrui chez l’enfant
     * Éloge de la philosophie, leçon inaugurale faite au collège de 
France, le jeudi 15 janvier 1953
     * Signes, 1960
     * L’œil et l’esprit, Gallimard, 1960
     * Le visible et l’invisible, 1964
==========================================================

Thanks again for your help.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: garbage chars when pasting French chars into emacs
  2012-02-02 20:00       ` ken
@ 2012-02-03  7:31         ` Eli Zaretskii
  2012-02-06 18:01           ` different distro [was: Re: garbage chars when pasting French chars into emacs] ken
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2012-02-03  7:31 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Thu, 02 Feb 2012 15:00:37 -0500
> From: ken <gebser@mousecar.com>
> 
> >> Following your advice, I ran
> >>
> >> (set-selection-coding-system 'ctext-with-extensions)
> >>
> >> and then did the same copy-and-paste again.  This got more of the 
> >> characters correct, but not all of them.
> 
> Looking again, I see my eyes must have been malfunctioning for the text 
> pasted into emacs is rendered correctly.  However, when I try to save 
> the text, I'm presented with the minibuffer message "Select coding 
> system (default iso-2022-jp-2): ".  At the same time a second buffer 
> opens under the first giving the options for coding system:
> 
> ===================================================================
> These default coding systems were tried:
>    mule-utf-8-unix iso-latin-1
> However, none of them safely encodes the target text.
> 
> Select one of the following safe coding systems:
>    iso-2022-jp-2 x-ctext iso-2022-7bit raw-text emacs-mule no-conversion
>    ctext-no-compositions iso-2022-8bit-ss2 iso-2022-7bit-lock
>    iso-2022-7bit-ss2 tibetan-iso-8bit-with-esc thai-tis620-with-esc
>    lao-with-esc korean-iso-8bit-with-esc hebrew-iso-8bit-with-esc
>    greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc
>    iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc
>    iso-latin-2-with-esc iso-latin-1-with-esc
>    in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc
>    chinese-iso-8bit-with-esc japanese-iso-8bit-with-esc
> ===================================================================
> 
> Entering "utf-8" into the minibuffer, of course, doesn't work.  Frankly, 
> I'd like to save into utf-8

You can't, not with Emacs 21.  In that version, the same character in
different character sets was treated as 2 different characters.  Also,
the mule-utf-8 character set didn't include the Latin-1 characters.

The only suggestion I have is to try iso-latin-1-with-esc (you will
see above that this is one of the possibilities suggested by Emacs),
it should at least produce a Latin-1 encoded file, which will be
easier on you later.

You really need to upgrade your Emacs, if you want to use UTF-8.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* different distro [was: Re: garbage chars when pasting French chars into emacs]
  2012-02-03  7:31         ` Eli Zaretskii
@ 2012-02-06 18:01           ` ken
  2012-02-06 20:15             ` Peter Dyballa
  0 siblings, 1 reply; 9+ messages in thread
From: ken @ 2012-02-06 18:01 UTC (permalink / raw)
  To: GNU Emacs List

On 02/03/2012 02:31 AM Eli Zaretskii wrote:
>> Date: Thu, 02 Feb 2012 15:00:37 -0500
>> From: ken <gebser@mousecar.com>
>>
>>>> ....
>>
>> Select one of the following safe coding systems:
>>    iso-2022-jp-2 x-ctext iso-2022-7bit raw-text emacs-mule no-conversion
>>    ctext-no-compositions iso-2022-8bit-ss2 iso-2022-7bit-lock
>>    iso-2022-7bit-ss2 tibetan-iso-8bit-with-esc thai-tis620-with-esc
>>    lao-with-esc korean-iso-8bit-with-esc hebrew-iso-8bit-with-esc
>>    greek-iso-8bit-with-esc iso-latin-9-with-esc iso-latin-8-with-esc
>>    iso-latin-5-with-esc iso-latin-4-with-esc iso-latin-3-with-esc
>>    iso-latin-2-with-esc iso-latin-1-with-esc
>>    in-is13194-devanagari-with-esc cyrillic-iso-8bit-with-esc
>>    chinese-iso-8bit-with-esc japanese-iso-8bit-with-esc
>> ===================================================================
>>
>> Entering "utf-8" into the minibuffer, of course, doesn't work.  Frankly, 
>> I'd like to save into utf-8
> 
> You can't, not with Emacs 21.  In that version, the same character in
> different character sets was treated as 2 different characters.  Also,
> the mule-utf-8 character set didn't include the Latin-1 characters.
> 
> The only suggestion I have is to try iso-latin-1-with-esc (you will
> see above that this is one of the possibilities suggested by Emacs),
> it should at least produce a Latin-1 encoded file, which will be
> easier on you later.
> 
> You really need to upgrade your Emacs, if you want to use UTF-8.

Thanks, Eli, for all the good advice.

I was using v.22 previously, but was hoping to stay with the standard 
(CentOS/Red Hat) distribution.  This situation and your insight have 
convinced me I've got to use a more recent emacs.  I'm thinking now of 
doing an edgier distro, some rpm/yum based thing like Fedora. 
Suggestions appreciated.

Thanks again to all for the great tips.





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: different distro [was: Re: garbage chars when pasting French chars into emacs]
  2012-02-06 18:01           ` different distro [was: Re: garbage chars when pasting French chars into emacs] ken
@ 2012-02-06 20:15             ` Peter Dyballa
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Dyballa @ 2012-02-06 20:15 UTC (permalink / raw)
  To: gebser; +Cc: GNU Emacs List


Am 6.2.2012 um 19:01 schrieb ken:

> I'm thinking now of doing an edgier distro, some rpm/yum based thing like Fedora. Suggestions appreciated.

Fedora is a good choice – although I was bugged first when copy & paste like your's also inserted "control chars".

--
Greetings

  Pete

Encryption, n.:
	A powerful algorithmic encoding technique employed in the creation of computer manuals.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-02-06 20:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-01 20:41 garbage chars when pasting French chars into emacs ken
2012-02-01 21:23 ` Eli Zaretskii
2012-02-02  2:39   ` ken
2012-02-02  3:55     ` Eli Zaretskii
2012-02-02 20:00       ` ken
2012-02-03  7:31         ` Eli Zaretskii
2012-02-06 18:01           ` different distro [was: Re: garbage chars when pasting French chars into emacs] ken
2012-02-06 20:15             ` Peter Dyballa
2012-02-01 21:29 ` garbage chars when pasting French chars into emacs Philipp Haselwarter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).