all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* emacs and UTF-8
@ 2005-04-28 17:44 knubee
  2005-04-28 20:05 ` Peter Dyballa
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: knubee @ 2005-04-28 17:44 UTC (permalink / raw)


I'm trying to find some summary explanation of the different ways Emacs
handles UTF-8. There seem to be many different possible settings
(set-language-environment and set-default-encoding-system and ...) --
and there seem to be distinctions between reading, displaying, writing,
etc. I have not been able to find anything that explains these
distinctions. So far, I haven't found a way to set one variable
somewhere so that makes everything within emacs read, write, and
display UTF-8. Is there such a variable?

At the moment I am able to write and display Swedish characters within
Emacs. However:

- if I reply to an email, the cited version will be full of junk
characters instead of the Swedish characters (which display fine in the
original email I recieved). My mail application is VM, if that makes
any difference.

- if I run a Scheme program on a data file that includes Swedish
characters, the returned values will include junk characters for the
Swedish ones (and again, in this case, I am able to open the file and
see the Swedish characters perfectly).

Aside from solving these *particular* problems I would like to find
some good *model* for how Emacs handles UTF-8.

I'm running Emacs 21.3 under Debian, and below are some settings from
.emacs file.

(set-language-environment 'UTF-8)
(set-default-coding-systems             'utf-8)
(setq file-name-coding-system           'utf-8)
(setq default-buffer-file-coding-system 'utf-8)
(setq coding-system-for-write           'utf-8)
(set-keyboard-coding-system             'utf-8)
(set-terminal-coding-system             'utf-8)
(set-clipboard-coding-system            'utf-8)
(set-selection-coding-system            'utf-8)

thanks for any help.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: emacs and UTF-8
  2005-04-28 17:44 emacs and UTF-8 knubee
@ 2005-04-28 20:05 ` Peter Dyballa
  2005-04-28 20:09 ` Peter Dyballa
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Peter Dyballa @ 2005-04-28 20:05 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 28.04.2005 um 19:44 schrieb knubee:

> I'm running Emacs 21.3 under Debian, and below are some settings from
> .emacs file.
>
> (set-language-environment 'UTF-8)
> (set-default-coding-systems             'utf-8)
> (setq file-name-coding-system           'utf-8)
> (setq default-buffer-file-coding-system 'utf-8)
> (setq coding-system-for-write           'utf-8)
> (set-keyboard-coding-system             'utf-8)
> (set-terminal-coding-system             'utf-8)
> (set-clipboard-coding-system            'utf-8)
> (set-selection-coding-system            'utf-8)

In case debian uses UTF-8 too for file names: keep that entry! All 
others can go to /dev/null, *if* you set environment variables similiar 
to these:

     LANG=de_DE.UTF-8
LC_CTYPE=de_DE.UTF-8

GNU Emacs learns from these settings a lot. If something fails, then it 
could be you need (set-language-environment 'UTF-8) too.

--
Greetings

   Pete

A child of five could understand this!  Fetch me a child of five.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: emacs and UTF-8
  2005-04-28 17:44 emacs and UTF-8 knubee
  2005-04-28 20:05 ` Peter Dyballa
@ 2005-04-28 20:09 ` Peter Dyballa
       [not found] ` <mailman.3358.1114718800.2895.help-gnu-emacs@gnu.org>
  2005-04-28 22:54 ` Pascal Bourguignon
  3 siblings, 0 replies; 7+ messages in thread
From: Peter Dyballa @ 2005-04-28 20:09 UTC (permalink / raw)
  Cc: help-gnu-emacs

I forgot: you need fontsets too to see those more than 64K glyphs:

     (create-fontset-from-fontset-spec "-monotype-courier 
new-medium-r-*-*-10-*-*-*-*-*-fontset-10pt_monotype_courier" t 
'noerror)
	(set-fontset-font "fontset-10pt_monotype_courier"	     
'latin-iso8859-1  '("courier new" . "iso8859-1"))
	(set-fontset-font "fontset-10pt_monotype_courier"	     
'latin-iso8859-2  '("courier new" . "iso8859-2"))
	(set-fontset-font "fontset-10pt_monotype_courier"	     
'latin-iso8859-3  '("courier new" . "iso8859-3"))
	(set-fontset-font "fontset-10pt_monotype_courier"	     
'latin-iso8859-4  '("courier new" . "iso8859-4"))
	(set-fontset-font "fontset-10pt_monotype_courier"    
'cyrillic-iso8859-5  '("courier new" . "iso8859-5"))
	(set-fontset-font "fontset-10pt_monotype_courier"      
'arabic-iso8859-6  '("courier new" . "iso8859-6"))
	(set-fontset-font "fontset-10pt_monotype_courier"	     
'greek-iso8859-7  '("courier new" . "iso8859-7"))
	(set-fontset-font "fontset-10pt_monotype_courier"      
'hebrew-iso8859-8  '("courier new" . "iso8859-8"))
	(set-fontset-font "fontset-10pt_monotype_courier"	     
'latin-iso8859-9  '("courier new" . "iso8859-9"))
	(set-fontset-font "fontset-10pt_monotype_courier"       
'latin-iso8859-15 '("courier new" . "iso8859-15"))
	(set-fontset-font "fontset-10pt_monotype_courier" 
'mule-unicode-0100-24ff '("courier new" . "iso10646-1"))
	(set-fontset-font "fontset-10pt_monotype_courier" 
'mule-unicode-2500-33ff '("courier new" . "iso10646-1"))
	(set-fontset-font "fontset-10pt_monotype_courier" 
'mule-unicode-e000-ffff '("courier new" . "iso10646-1"))

Just one example.

--
Greetings

   Pete

Ce qui été compris n'existe plus.    (Paul Eluard)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: emacs and UTF-8
       [not found] ` <mailman.3358.1114718800.2895.help-gnu-emacs@gnu.org>
@ 2005-04-28 21:33   ` knubee
  2005-04-28 22:22     ` Peter Dyballa
  0 siblings, 1 reply; 7+ messages in thread
From: knubee @ 2005-04-28 21:33 UTC (permalink / raw)


Hi Peter,

Thanks for the response. I have already the fonts and the localisations
for the operating system.

> In case debian uses UTF-8 too for file names: keep that entry!

Which one? (In fact, I think I appropriated many of those settings from
one of your earlier postings on this topic :-))

As I say, there are only a very few situations where this problem crops
up for me.

Your comment about emacs inheriting from the os settings gave me an
idea, so I ran my Scheme program outside of emacs (guile from the
command line). Same result for the file with Swedish characters. I can
see all the Swedish text just fine (within emacs and in a terminal).
But didn't actually create the file, so then I had the idea to go in
and overwrite a couple of the Swedish characters myself and then resave
the file. And now the results of the Scheme program display correctly!
It just seems odd to me that this file will otherwise display correct
without being modified by me, but then work for this specialized case
only after being changed and saved.

Similarly, I was not so precise before about the email situation. In
fact, when I hit reply, it is only the "reply to" address and subject
that is transformed into something like this:

To: =?iso-8859-1?Q?Bj=  ...
Subject: Re: =?iso-8859-1?Q? ...

The rest of the cited email displays just fine. I wonder why it should
choke only on the contents of those fields?

cheers, k.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: emacs and UTF-8
  2005-04-28 21:33   ` knubee
@ 2005-04-28 22:22     ` Peter Dyballa
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Dyballa @ 2005-04-28 22:22 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 28.04.2005 um 23:33 schrieb knubee:

>> In case debian uses UTF-8 too for file names: keep that entry!
>
> Which one? (In fact, I think I appropriated many of those settings from
> one of your earlier postings on this topic :-))
>

(setq file-name-coding-system           'utf-8)


On your Scheme experience: could be the Swedish text was the whole time 
correctly saved as UTF-8 since you made Emacs act like that. Only Guile 
was in the incorrect environment not able to detect and interpret the 
UTF-8 text. In the corrected environment Guile now could do it right.

These eMail subjects "=?iso-8859-1?Q?Bj=" come from some RFC that 
describes how such a non-7 bit text is saved when transferred over the 
internet. Since mail is still a 7 bit application, because nobody can 
guarantee that there is absolutely no node that only understands 7 bit, 
such encoding has to be done. Our eMail, when it leaves the MUA, is 
encoded as base64 (Emacs can do that too, C-h a base64 RET). And 
decoded when it enters.

Although I have saved eMails that contain lines like these

From: =?iso-8859-15?Q?S=E9bastien?= Kirche
Content-Type: text/plain; charset=iso-8859-15

To: Diskussionsliste 
=?iso-8859-1?Q?f=FCr_Mitglieder_von_DANTE_e=2EV=2E?=
Mail-Followup-To: Diskussionsliste 
=?iso-8859-1?Q?f=FCr_Mitglieder_von_DAN?=
	=?iso-8859-1?Q?TE_e=2EV=2E?= <dante-ev@dante.de>
Content-Type: text/plain; charset=iso-8859-1
Reply-To: 
=?iso-8859-1?q?Diskussionsliste_f=FCr_Mitglieder_von_DANTE_e=2EV=2E?=
List-Id: 
=?iso-8859-1?q?Diskussionsliste_f=FCr_Mitglieder_von_DANTE_e=2EV=2E?=

my Mac OS X eMail programme, just Mail(.app), hides that "header" and 
decodes the =XY codes according to the charset given to the correct 
glyphs (XY is just the hex value of the "ASCII" code of that glyph in 
that encoding). I think you only have to adjust some settings of your 
Mail User Agent. Or switch to another one!

--
Greetings

   Pete

Got Mole problems?
Call Avogadro 6.02 x 10^23

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: emacs and UTF-8
  2005-04-28 17:44 emacs and UTF-8 knubee
                   ` (2 preceding siblings ...)
       [not found] ` <mailman.3358.1114718800.2895.help-gnu-emacs@gnu.org>
@ 2005-04-28 22:54 ` Pascal Bourguignon
  2005-04-30 17:05   ` knubee
  3 siblings, 1 reply; 7+ messages in thread
From: Pascal Bourguignon @ 2005-04-28 22:54 UTC (permalink / raw)


"knubee" <knubee@gmail.com> writes:

> I'm trying to find some summary explanation of the different ways Emacs
> handles UTF-8. There seem to be many different possible settings
> (set-language-environment and set-default-encoding-system and ...) --
> and there seem to be distinctions between reading, displaying, writing,
> etc. I have not been able to find anything that explains these
> distinctions. So far, I haven't found a way to set one variable
> somewhere so that makes everything within emacs read, write, and
> display UTF-8. Is there such a variable?
>
> At the moment I am able to write and display Swedish characters within
> Emacs. However:
>
> - if I reply to an email, the cited version will be full of junk
> characters instead of the Swedish characters (which display fine in the
> original email I recieved). My mail application is VM, if that makes
> any difference.
>
> - if I run a Scheme program on a data file that includes Swedish
> characters, the returned values will include junk characters for the
> Swedish ones (and again, in this case, I am able to open the file and
> see the Swedish characters perfectly).
>
> Aside from solving these *particular* problems I would like to find
> some good *model* for how Emacs handles UTF-8.
>
> I'm running Emacs 21.3 under Debian, and below are some settings from
> .emacs file.
>
> (set-language-environment 'UTF-8)
> (set-default-coding-systems             'utf-8)
> (setq file-name-coding-system           'utf-8)
> (setq default-buffer-file-coding-system 'utf-8)
> (setq coding-system-for-write           'utf-8)
> (set-keyboard-coding-system             'utf-8)
> (set-terminal-coding-system             'utf-8)
> (set-clipboard-coding-system            'utf-8)
> (set-selection-coding-system            'utf-8)
>
> thanks for any help.

VM communicates with sendmail using the process-coding-system

(setq default-process-coding-system '(utf-8 . utf-8))

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

This is a signature virus.  Add me to your signature and help me to live

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: emacs and UTF-8
  2005-04-28 22:54 ` Pascal Bourguignon
@ 2005-04-30 17:05   ` knubee
  0 siblings, 0 replies; 7+ messages in thread
From: knubee @ 2005-04-30 17:05 UTC (permalink / raw)


Peter and Pascal,

Thanks to you both for the tips about the email question. Life is a bit
too frantic for me to pursue this in depth at the moment, but soon ...

Peter: about Scheme. I tried several Scheme implementations (from
within emacs and via a terminal) to solve the problem -- and I didn't
change any environment variables between the tests and resaving the
file (and then success).  So, I don't think it was that -- nor the
particular Scheme implementation. Just another mystery ... :-)

k

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-04-30 17:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-28 17:44 emacs and UTF-8 knubee
2005-04-28 20:05 ` Peter Dyballa
2005-04-28 20:09 ` Peter Dyballa
     [not found] ` <mailman.3358.1114718800.2895.help-gnu-emacs@gnu.org>
2005-04-28 21:33   ` knubee
2005-04-28 22:22     ` Peter Dyballa
2005-04-28 22:54 ` Pascal Bourguignon
2005-04-30 17:05   ` knubee

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.