unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* utf8 char display in buffer
@ 2009-06-08 18:33 ken
  0 siblings, 0 replies; 32+ messages in thread
From: ken @ 2009-06-08 18:33 UTC (permalink / raw)
  To: GNU Emacs List


Hey, group,

I already use a few utf8 characters in emacs (and in web pages), but
recently needed to use a couple more.  One is an 'a' with a horizontal
line above it, the other an 'i' with a vertical line above it.  How do I
input these into a buffer?


tia,
ken

-- 
"To make an apple pie from scratch,
first create the universe."
        -- Carl Sagan





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
@ 2009-06-08 19:10 ` Teemu Likonen
  2009-06-08 19:52 ` Xah Lee
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Teemu Likonen @ 2009-06-08 19:10 UTC (permalink / raw)
  To: gebser; +Cc: GNU Emacs List

On 2009-06-08 14:33 (-0400), ken wrote:

> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more. One is an 'a' with a horizontal
> line above it, the other an 'i' with a vertical line above it. How do
> I input these into a buffer?

Some keyboards (Finnish, for example) can produce those characters
(semi-)directly but through Emacs's input methods it's possible with
just basic Ascii keys. For example, turn on "TeX" input method (C-x RET
C-\ TeX RET) and type \=a for "ā" and \=i for "ī". You can also use
"ucs" input method and type Unicode code points directly: type u0101 for
"ā" and u012b for "ī".

There are probably some language-specific input methods too which may
have even easier ways for inputting these characters.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
  2009-06-08 19:10 ` utf8 char display in buffer Teemu Likonen
@ 2009-06-08 19:52 ` Xah Lee
  2009-06-09 10:52   ` ken
  2009-06-08 20:43 ` B. T. Raven
  2009-06-11 12:03 ` Teemu Likonen
  3 siblings, 1 reply; 32+ messages in thread
From: Xah Lee @ 2009-06-08 19:52 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 8, 11:33 am, ken <geb...@mousecar.com> wrote:
> Hey, group,
>
> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more.  One is an 'a' with a horizontal
> line above it, the other an 'i' with a vertical line above it.  How do I
> input these into a buffer?

i define keys to insert unicode chars that i frequently use. e.g.

(global-set-key (kbd "<kp-6>") "→")
(global-set-key (kbd "M-i a") "α")
(global-set-key (kbd "M-i b") "β")
(global-set-key (kbd "M-i t") "θ")

you can also insert unicode by its hex value. Alt+x ucs-insert.
There are few other ways...

some more tips here

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
  2009-06-08 19:10 ` utf8 char display in buffer Teemu Likonen
  2009-06-08 19:52 ` Xah Lee
@ 2009-06-08 20:43 ` B. T. Raven
  2009-06-08 20:49   ` B. T. Raven
                     ` (2 more replies)
  2009-06-11 12:03 ` Teemu Likonen
  3 siblings, 3 replies; 32+ messages in thread
From: B. T. Raven @ 2009-06-08 20:43 UTC (permalink / raw)
  To: help-gnu-emacs

ken wrote:
> Hey, group,
> 
> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more.  One is an 'a' with a horizontal
> line above it, the other an 'i' with a vertical line above it.  How do I
> input these into a buffer?
> 
> 
> tia,
> ken
> 

C-x ret C-\ latin-4-postfix

then a,e,i,o,u followed by hyphen generate macroned vowels

If you don't want all these then you could just put something like this 
in .emacs

(global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
(global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))

assuming you have these C-c combos free.

Ed


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 20:43 ` B. T. Raven
@ 2009-06-08 20:49   ` B. T. Raven
  2009-06-08 22:49     ` ken
  2009-06-09 10:24   ` ken
       [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 32+ messages in thread
From: B. T. Raven @ 2009-06-08 20:49 UTC (permalink / raw)
  To: help-gnu-emacs

B. T. Raven wrote:
> ken wrote:
>> Hey, group,
>>
>> I already use a few utf8 characters in emacs (and in web pages), but
>> recently needed to use a couple more.  One is an 'a' with a horizontal
>> line above it, the other an 'i' with a vertical line above it.  How do I
>> input these into a buffer?
>>
>>
>> tia,
>> ken
>>

Oops, I see you said i with VERTICAL line. What is that character?
Any of these?  í ï î ì If so substitute for i with macron below.

> 
> C-x ret C-\ latin-4-postfix
> 
> then a,e,i,o,u followed by hyphen generate macroned vowels
> 
> If you don't want all these then you could just put something like this 
> in .emacs
> 
> (global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
> (global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))
> 
> assuming you have these C-c combos free.
> 
> Ed


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 20:49   ` B. T. Raven
@ 2009-06-08 22:49     ` ken
  0 siblings, 0 replies; 32+ messages in thread
From: ken @ 2009-06-08 22:49 UTC (permalink / raw)
  To: GNU Emacs List


On 06/08/2009 04:49 PM B. T. Raven wrote:
> B. T. Raven wrote:
>> ken wrote:
>>> Hey, group,
>>>
>>> I already use a few utf8 characters in emacs (and in web pages), but
>>> recently needed to use a couple more.  One is an 'a' with a horizontal
>>> line above it, the other an 'i' with a vertical line above it.  How do I
>>> input these into a buffer?
>>>
>>>
>>> tia,
>>> ken
>>>
> 
> Oops, I see you said i with VERTICAL line. What is that character?
> Any of these?  í ï î ì If so substitute for i with macron below.
> 
>>....

The Oops is mine.  I meant to say "horizontal" for both.  So your
previous email did it all for me.

Thanks.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 20:43 ` B. T. Raven
  2009-06-08 20:49   ` B. T. Raven
@ 2009-06-09 10:24   ` ken
       [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 32+ messages in thread
From: ken @ 2009-06-09 10:24 UTC (permalink / raw)
  To: GNU Emacs List


On 06/08/2009 04:43 PM B. T. Raven wrote:
> ken wrote:
>> Hey, group,
>>
>> I already use a few utf8 characters in emacs (and in web pages), but
>> recently needed to use a couple more.  One is an 'a' with a horizontal
>> line above it, the other an 'i' with a horizontal line above it.  How do I
>> input these into a buffer?
>>
>>
>> tia,
>> ken
>>
> 
> C-x ret C-\ latin-4-postfix
> 
> then a,e,i,o,u followed by hyphen generate macroned vowels
> 
> If you don't want all these then you could just put something like this
> in .emacs
> 
> (global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
> (global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))
> 
> assuming you have these C-c combos free.
> 
> Ed

Fantastic!  But... when I save and close the buffer and then open it up
again, in place of the beautiful and correct characters, there are
little boxes.

I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.

The fact that these non-English characters display properly in the
buffer initially tells me that I have the requisite fonts installed.  So
what little connection is emacs not making (and how do I tell it to make
that connection)?

Thanks, all.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 19:52 ` Xah Lee
@ 2009-06-09 10:52   ` ken
  0 siblings, 0 replies; 32+ messages in thread
From: ken @ 2009-06-09 10:52 UTC (permalink / raw)
  Cc: help-gnu-emacs


On 06/08/2009 03:52 PM Xah Lee wrote:
>> ....
> 
> i define keys to insert unicode chars that i frequently use. e.g.
> 
> (global-set-key (kbd "<kp-6>") "→")
> (global-set-key (kbd "M-i a") "α")
> (global-set-key (kbd "M-i b") "β")
> (global-set-key (kbd "M-i t") "θ")

It's probably just me, but with the so many foreign characters I use,
remembering all the many key mappings becomes more than my little brain
can manage.  So I prefer to create a menu of character entities.
html-helper-mode (i.e., not html-mode) already has such a menu which
I've added to using "(mapchar 'html-helper-add-tag ...".  This menu
allows me to look up a 'character' which I can't remember *and* gives me
a reminder of what its key combo is.  My (too old) version of emacs,
however, doesn't have a "character entities" menu for regular (non-html)
buffers.  I've already got too much on my plate for the moment, so this
isn't a project for me right now.  But later....


> ....
> 
> some more tips here
> 
> • Emacs and Unicode Tips
>   http://xahlee.org/emacs/emacs_n_unicode.html
> 
> ....

Nice web page.  (Bookmarked.)  Thanks.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
@ 2009-06-09 13:03     ` B. T. Raven
  2009-06-09 14:51       ` ken
       [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 32+ messages in thread
From: B. T. Raven @ 2009-06-09 13:03 UTC (permalink / raw)
  To: help-gnu-emacs

ken wrote:
> On 06/08/2009 04:43 PM B. T. Raven wrote:
>> ken wrote:
>>> Hey, group,
>>>
>>> I already use a few utf8 characters in emacs (and in web pages), but
>>> recently needed to use a couple more.  One is an 'a' with a horizontal
>>> line above it, the other an 'i' with a horizontal line above it.  How do I
>>> input these into a buffer?
>>>
>>>
>>> tia,
>>> ken
>>>
>> C-x ret C-\ latin-4-postfix
>>
>> then a,e,i,o,u followed by hyphen generate macroned vowels
>>
>> If you don't want all these then you could just put something like this
>> in .emacs
>>
>> (global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
>> (global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))
>>
>> assuming you have these C-c combos free.
>>
>> Ed
> 
> Fantastic!  But... when I save and close the buffer and then open it up
> again, in place of the beautiful and correct characters, there are
> little boxes.

After you see then correctly in the buffer do:

C-x ret c utf-8

then

C-x C-s

Now next time you load that file it should appear correctly.
ā  and ī are not in iso-8859-1 and so you must use a more comprehensive 
coding system.

> 
> I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
> no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.
> 
> The fact that these non-English characters display properly in the
> buffer initially tells me that I have the requisite fonts installed.  So
> what little connection is emacs not making (and how do I tell it to make
> that connection)?

If you use utf-8 a lot you can put ;; -*- coding: utf-8[;] -*- into the 
first line of the file. I don't know whether that sem in brackets is 
needed or not.

> 
> Thanks, all.
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-09 13:03     ` B. T. Raven
@ 2009-06-09 14:51       ` ken
       [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 32+ messages in thread
From: ken @ 2009-06-09 14:51 UTC (permalink / raw)
  To: GNU Emacs List

On 06/09/2009 09:03 AM B. T. Raven wrote:
> ken wrote:
>> On 06/08/2009 04:43 PM B. T. Raven wrote:
>>> ken wrote:
>>>> ....
>>>>
>>> C-x ret C-\ latin-4-postfix
>>>
>>> then a,e,i,o,u followed by hyphen generate macroned vowels
>>>
>>> ....
>>
>> Fantastic!  But... when I save and close the buffer and then open it up
>> again, in place of the beautiful and correct characters, there are
>> little boxes.
> 
> After you see then correctly in the buffer do:
> 
> C-x ret c utf-8
> 
> then
> 
> C-x C-s
> 
> Now next time you load that file it should appear correctly.
> ā  and ī are not in iso-8859-1 and so you must use a more comprehensive
> coding system.

Hmmm... it doesn't.  Doing everything just as you say above, I still get
the little boxes in place of the non-English characters.

When after reloading the buffer, I run "describe-coding-system" on this
buffer, I get:

=============================================
Coding system for saving this buffer:
  u -- mule-utf-8-unix
Default coding system (for new files):
  u -- mule-utf-8 (alias: utf-8)
Coding system for keyboard input:
  nil
Coding system for terminal output:
  0 -- iso-latin-9 (alias: iso-8859-15 latin-9 latin-0)
Defaults for subprocess I/O:
  decoding: u -- mule-utf-8 (alias: utf-8)
  encoding: u -- mule-utf-8 (alias: utf-8)

Priority order for recognizing coding systems when reading files:
  1. mule-utf-8 (alias: utf-8)
  2. iso-latin-1 (alias: iso-8859-1 latin-1)
  3. iso-2022-jp (alias: junet)
  4. iso-2022-7bit
  5. iso-2022-7bit-lock (alias: iso-2022-int-1)
  6. iso-2022-8bit-ss2
  7. emacs-mule
  8. raw-text
  9. japanese-shift-jis (alias: shift_jis sjis)
  10. chinese-big5 (alias: big5 cn-big5)
  11. no-conversion (alias: binary)

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

  The followings are decoded correctly but recognized as iso-2022-7bit-lock:
    iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
    iso-2022-jp-2 iso-2022-kr

....
==================================================================

I don't know... does utf-8 or mule-utf-8 contain latin-4, greek, and/or
German characters?  (This file has some of each.)


>>
>> I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
>> no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.
>>
>> The fact that these non-English characters display properly in the
>> buffer initially tells me that I have the requisite fonts installed.  So
>> what little connection is emacs not making (and how do I tell it to make
>> that connection)?
> 
> If you use utf-8 a lot you can put ;; -*- coding: utf-8[;] -*- into the
> first line of the file. I don't know whether that sem in brackets is
> needed or not.

Sorry, I should have mentioned that I have this (with the semi-colon) at
the top of the file.

Let me also say that, though the little boxes appear in the emacs
buffer, the proper non-English characters appear when the file is loaded
into firefox.  (Yeah, this emacs file is an HTML page.)



> 
>>
>> Thanks, all.
>>
>>





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
@ 2009-06-10  1:34         ` B. T. Raven
  2009-06-10 14:03           ` Lewis Perin
  0 siblings, 1 reply; 32+ messages in thread
From: B. T. Raven @ 2009-06-10  1:34 UTC (permalink / raw)
  To: help-gnu-emacs

ken wrote:
> On 06/09/2009 09:03 AM B. T. Raven wrote:
>> ken wrote:
>>> On 06/08/2009 04:43 PM B. T. Raven wrote:
>>>> ken wrote:
>>>>> ....
>>>>>
>>>> C-x ret C-\ latin-4-postfix
>>>>
>>>> then a,e,i,o,u followed by hyphen generate macroned vowels
>>>>
>>>> ....
>>> Fantastic!  But... when I save and close the buffer and then open it up
>>> again, in place of the beautiful and correct characters, there are
>>> little boxes.
>> After you see then correctly in the buffer do:
>>
>> C-x ret c utf-8
>>
>> then
>>
>> C-x C-s
>>
>> Now next time you load that file it should appear correctly.
>> ā  and ī are not in iso-8859-1 and so you must use a more comprehensive
>> coding system.
> 
> Hmmm... it doesn't.  Doing everything just as you say above, I still get
> the little boxes in place of the non-English characters.
> 
> When after reloading the buffer, I run "describe-coding-system" on this
> buffer, I get:
> 
> =============================================
> Coding system for saving this buffer:
>   u -- mule-utf-8-unix
> Default coding system (for new files):
>   u -- mule-utf-8 (alias: utf-8)
> Coding system for keyboard input:
>   nil
> Coding system for terminal output:
>   0 -- iso-latin-9 (alias: iso-8859-15 latin-9 latin-0)
> Defaults for subprocess I/O:
>   decoding: u -- mule-utf-8 (alias: utf-8)
>   encoding: u -- mule-utf-8 (alias: utf-8)
> 
> Priority order for recognizing coding systems when reading files:
>   1. mule-utf-8 (alias: utf-8)
>   2. iso-latin-1 (alias: iso-8859-1 latin-1)
>   3. iso-2022-jp (alias: junet)
>   4. iso-2022-7bit
>   5. iso-2022-7bit-lock (alias: iso-2022-int-1)
>   6. iso-2022-8bit-ss2
>   7. emacs-mule
>   8. raw-text
>   9. japanese-shift-jis (alias: shift_jis sjis)
>   10. chinese-big5 (alias: big5 cn-big5)
>   11. no-conversion (alias: binary)
> 
>   Other coding systems cannot be distinguished automatically
>   from these, and therefore cannot be recognized automatically
>   with the present coding system priorities.
> 
>   The followings are decoded correctly but recognized as iso-2022-7bit-lock:
>     iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
>     iso-2022-jp-2 iso-2022-kr
> 
> ....
> ==================================================================
> 
> I don't know... does utf-8 or mule-utf-8 contain latin-4, greek, and/or
> German characters?  (This file has some of each.)
> 
> 
>>> I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
>>> no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.
>>>
>>> The fact that these non-English characters display properly in the
>>> buffer initially tells me that I have the requisite fonts installed.  So
>>> what little connection is emacs not making (and how do I tell it to make
>>> that connection)?
>> If you use utf-8 a lot you can put ;; -*- coding: utf-8[;] -*- into the
>> first line of the file. I don't know whether that sem in brackets is
>> needed or not.
> 
> Sorry, I should have mentioned that I have this (with the semi-colon) at
> the top of the file.
> 
> Let me also say that, though the little boxes appear in the emacs
> buffer, the proper non-English characters appear when the file is loaded
> into firefox.  (Yeah, this emacs file is an HTML page.)
> 
> 
> 
>>> Thanks, all.

Don't know. Your problem has just escalated above my pay grade. I don't 
know what it means that the files display okay in FF. I just loaded my 
.emacs into the browser and it looks fine (has many exotic non Latin-1 
characters in it). You are using GUI Emacs and not terminal, right. You 
could try these settings from my ver 22 .emacs, just for fun:

   (set-language-environment               'UTF-8)
         (set-default-coding-systems             'utf-8)
         (setq file-name-coding-system           'utf-8)
         (setq default-buffer-file-coding-system 'utf-8)
         (setq coding-system-for-write           'utf-8)
         (set-keyboard-coding-system             'utf-8)
         (set-terminal-coding-system          'utf-8)
         (set-clipboard-coding-system            'utf-8)
         (set-selection-coding-system            'utf-8)
         (prefer-coding-system                   'utf-8)
         (modify-coding-system-alist 'process 
"[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)


and try C-x ret c utf-8
C-x C-f

to open the file.



or install version 23.x w32 binary into a different directory from here

http://alpha.gnu.org/gnu/emacs/pretest/windows/


I don't think you need a .emacs with ver 23 in dealing with utf-8 since 
its internal representation is unicode.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-10  1:34         ` B. T. Raven
@ 2009-06-10 14:03           ` Lewis Perin
  2009-06-11  3:21             ` B. T. Raven
  0 siblings, 1 reply; 32+ messages in thread
From: Lewis Perin @ 2009-06-10 14:03 UTC (permalink / raw)
  To: help-gnu-emacs

I've been following this thread closely because I have the original
poster's problem, only the characters that give me trouble are some -
not many, actually - Chinese characters, e.g. ni3, the normal second
person pronoun.  And, as with the original poster, the troublesome
characters, when copied and pasted to other applications from Emacs,
display perfectly.

"B. T. Raven" <nihil@nihilo.net> writes:

> [...]
>    (set-language-environment               'UTF-8)
>          (set-default-coding-systems             'utf-8)
>          (setq file-name-coding-system           'utf-8)
>          (setq default-buffer-file-coding-system 'utf-8)
>          (setq coding-system-for-write           'utf-8)
>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)
>          (set-clipboard-coding-system            'utf-8)
>          (set-selection-coding-system            'utf-8)
>          (prefer-coding-system                   'utf-8)
>          (modify-coding-system-alist 'process
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
> 
> 
> and try C-x ret c utf-8
> C-x C-f
> 
> to open the file.

I tried this, but it didn't help.  Emacs 22.3 / Win32.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-10 14:03           ` Lewis Perin
@ 2009-06-11  3:21             ` B. T. Raven
  2009-06-12 14:54               ` ken
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 32+ messages in thread
From: B. T. Raven @ 2009-06-11  3:21 UTC (permalink / raw)
  To: help-gnu-emacs

Lewis Perin wrote:
> I've been following this thread closely because I have the original
> poster's problem, only the characters that give me trouble are some -
> not many, actually - Chinese characters, e.g. ni3, the normal second
> person pronoun.  And, as with the original poster, the troublesome
> characters, when copied and pasted to other applications from Emacs,
> display perfectly.
> 
> "B. T. Raven" <nihil@nihilo.net> writes:
> 
>> [...]
>>    (set-language-environment               'UTF-8)
>>          (set-default-coding-systems             'utf-8)
>>          (setq file-name-coding-system           'utf-8)
>>          (setq default-buffer-file-coding-system 'utf-8)
>>          (setq coding-system-for-write           'utf-8)
>>          (set-keyboard-coding-system             'utf-8)
>>          (set-terminal-coding-system          'utf-8)
>>          (set-clipboard-coding-system            'utf-8)
>>          (set-selection-coding-system            'utf-8)
>>          (prefer-coding-system                   'utf-8)
>>          (modify-coding-system-alist 'process
>> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
>>
>>
>> and try C-x ret c utf-8
>> C-x C-f
>>
>> to open the file.
> 
> I tried this, but it didn't help.  Emacs 22.3 / Win32.

Even on Emacs 23 although I see the characters in the buffer, I can't 
save the following as utf-8:

nǐ hǎo 你 好
u+4f60 and u+597d

Or at least not so as to be readable with 22.3. Both versions are using 
Arial Unicode MS.

Why is that?


> 
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
                   ` (2 preceding siblings ...)
  2009-06-08 20:43 ` B. T. Raven
@ 2009-06-11 12:03 ` Teemu Likonen
  3 siblings, 0 replies; 32+ messages in thread
From: Teemu Likonen @ 2009-06-11 12:03 UTC (permalink / raw)
  To: help-gnu-emacs

On 2009-06-08 14:33 (-0400), ken wrote:

> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more. One is an 'a' with a horizontal
> line above it, the other an 'i' with a [horizontal] line above it. How
> do I input these into a buffer?

Let’s add one more nice way to insert Unicode chars: “rfc1345” input
method. It’s an input method for Unicode characters using mnemonics.
Examples:

    &a- = ā
    &i- = ī
    &W* = Ω
    &"6 = “
    &"9 = ”

For more info: C-h I rfc1345 RET


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11  3:21             ` B. T. Raven
@ 2009-06-12 14:54               ` ken
  2009-06-13  3:30                 ` Eli Zaretskii
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 32+ messages in thread
From: ken @ 2009-06-12 14:54 UTC (permalink / raw)
  To: GNU Emacs List

Ed,

Thanks for distributing.


Everyone responding to this thread,

Please either CC me when posting about this issue or else edit the "To"
field so that your response comes to the whole list.  I'd like to get
everyone's input.  Thanks.


Lewis,

Thanks for posting.  It's lonely out there when you're the only one with
a particular problem.  To make sure we're suffering the same
cyber-indignity, here's the scenario as I see it (from an older version
of emacs running on Linux):

0) Some others and myself want to include some non-English characters in
a file being edited in emacs. Problems arise, however:

1) In a buffer which is already utf-8 encoded, I set the appropriate
input method, type in the desired characters. They display just peachy
and there is happiness in EmacsLand.

2) I save the buffer to a file, then close the buffer.

3) I visit the same file (i.e., load it again into emacs). Because it
has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
utf-8 encoded. This is confirmed by the presence of a 'u' as the second
character in the status bar.

4) The text in the buffer displays fine, except that in place of each of
those non-English characters is a little empty box. With the cursor on
one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
=", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
(While, in emacs the character after "Char:" is a little box, if I load
this same file into Firefox, that same character appears as it should,
as an 'a' with a horizontal bar above it. How it appears in your email
client will depend upon your email client.)

A) The fact that, as described in (4), the characters display correctly
in Firefox, but not in emacs indicates that emacs is not drawing on the
needed character set. Yet, the fact that in (1) the characters initially
display correctly (when first input) indicates that the needed character
set is present on the system and emacs can find it and has permission
access it. Further, we would think that emacs would throw out an error
message if either of these conditions were not met... and it doesn't. We
can only assume that, when visiting and then decoding a file and pulling
into a buffer for display, emacs is not even asking for the proper
character set when encountering a non-English character. This is where I
would start to look for the error.

B) It would be helpful if the code which does the decoding of a file and
renders it into the buffer display, if that part of it would throw an
error message when it encounters a character it doesn't know how to
display, i.e., when a little box character is displayed. After all,
isn't it an error when a little box is displayed in lieu of the correct
character? Possible error messages would be something like: "decoding
process can't find /path/to/charset.file" or "decoding process doesn't
have requisite permission to read /path/to/charset.file" or "invalid
character: [hex/decimal value]" or other.


On 06/10/2009 11:21 PM B. T. Raven wrote:
> Lewis Perin wrote:
>> I've been following this thread closely because I have the original
>> poster's problem, only the characters that give me trouble are some -
>> not many, actually - Chinese characters, e.g. ni3, the normal second
>> person pronoun.  And, as with the original poster, the troublesome
>> characters, when copied and pasted to other applications from Emacs,
>> display perfectly.
>>
>> "B. T. Raven" <nihil@nihilo.net> writes:
>>
>>> [...]
>>>    (set-language-environment               'UTF-8)
>>>          (set-default-coding-systems             'utf-8)
>>>          (setq file-name-coding-system           'utf-8)
>>>          (setq default-buffer-file-coding-system 'utf-8)
>>>          (setq coding-system-for-write           'utf-8)
>>>          (set-keyboard-coding-system             'utf-8)
>>>          (set-terminal-coding-system          'utf-8)
>>>          (set-clipboard-coding-system            'utf-8)
>>>          (set-selection-coding-system            'utf-8)
>>>          (prefer-coding-system                   'utf-8)
>>>          (modify-coding-system-alist 'process
>>> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
>>>
>>>
>>> and try C-x ret c utf-8
>>> C-x C-f
>>>
>>> to open the file.
>>
>> I tried this, but it didn't help.  Emacs 22.3 / Win32.
> 
> Even on Emacs 23 although I see the characters in the buffer, I can't
> save the following as utf-8:
> 
> nǐ hǎo 你 好
> u+4f60 and u+597d
> 
> Or at least not so as to be readable with 22.3. Both versions are using
> Arial Unicode MS.
> 
> Why is that?
> 
> 
>>
>> /Lew
>> ---
>> Lew Perin / perin@acm.org
>> http://www.panix.com/~perin/babelcarp.html




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
@ 2009-06-12 15:39                 ` Lewis Perin
  2009-06-12 16:48                   ` B. T. Raven
  2009-06-12 17:27                 ` Xah Lee
  1 sibling, 1 reply; 32+ messages in thread
From: Lewis Perin @ 2009-06-12 15:39 UTC (permalink / raw)
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> [...]
> Lewis,
> 
> Thanks for posting.  It's lonely out there when you're the only one with
> a particular problem.

The few, the proud...

> To make sure we're suffering the same cyber-indignity, here's the
> scenario as I see it (from an older version of emacs running on
> Linux):
> 
> 0) Some others and myself want to include some non-English characters in
> a file being edited in emacs. Problems arise, however:
> 
> 1) In a buffer which is already utf-8 encoded, I set the appropriate
> input method, type in the desired characters. They display just peachy
> and there is happiness in EmacsLand.
> 
> 2) I save the buffer to a file, then close the buffer.
> 
> 3) I visit the same file (i.e., load it again into emacs). Because it
> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
> character in the status bar.

I haven't been inserting that special first line.

> 4) The text in the buffer displays fine, except that in place of each of
> those non-English characters is a little empty box. With the cursor on
> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
> (While, in emacs the character after "Char:" is a little box, if I load
> this same file into Firefox, that same character appears as it should,
> as an 'a' with a horizontal bar above it. How it appears in your email
> client will depend upon your email client.)

My situation differs in that most of the non-ASCII characters (Chinese
in my case) come through just fine.  But the ones that don't have
those irritating boxes in place of the correct glyphs.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 15:39                 ` Lewis Perin
@ 2009-06-12 16:48                   ` B. T. Raven
  2009-06-12 17:45                     ` Lewis Perin
  2009-06-12 17:53                     ` Xah Lee
  0 siblings, 2 replies; 32+ messages in thread
From: B. T. Raven @ 2009-06-12 16:48 UTC (permalink / raw)
  To: help-gnu-emacs

Lewis Perin wrote:
> ken <gebser@mousecar.com> writes:
> 
>> [...]
>> Lewis,
>>
>> Thanks for posting.  It's lonely out there when you're the only one with
>> a particular problem.
> 
> The few, the proud...
> 
>> To make sure we're suffering the same cyber-indignity, here's the
>> scenario as I see it (from an older version of emacs running on
>> Linux):
>>
>> 0) Some others and myself want to include some non-English characters in
>> a file being edited in emacs. Problems arise, however:
>>
>> 1) In a buffer which is already utf-8 encoded, I set the appropriate
>> input method, type in the desired characters. They display just peachy
>> and there is happiness in EmacsLand.
>>
>> 2) I save the buffer to a file, then close the buffer.
>>
>> 3) I visit the same file (i.e., load it again into emacs). Because it
>> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
>> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
>> character in the status bar.
> 
> I haven't been inserting that special first line.
> 
>> 4) The text in the buffer displays fine, except that in place of each of
>> those non-English characters is a little empty box. With the cursor on
>> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
>> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
>> (While, in emacs the character after "Char:" is a little box, if I load
>> this same file into Firefox, that same character appears as it should,
>> as an 'a' with a horizontal bar above it. How it appears in your email
>> client will depend upon your email client.)
> 
> My situation differs in that most of the non-ASCII characters (Chinese
> in my case) come through just fine.  But the ones that don't have
> those irritating boxes in place of the correct glyphs.
> 
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://www.panix.com/~perin/babelcarp.html

I wouldn't be surprised if the gaps and overlaps in the CJK ranges of 
glyphs weren't so complicated that many characters from the following 
encodings may not be included in utf-8, especially if they are not 
precomposed. Try some of these encodings to see if some of the empty 
boxes are resolved into characters:

            chinese-big5
            chinese-hz
            chinese-iso-7bit
            chinese-iso-8bit
            chinese-iso-8bit-with-esc
            cn-big5
            cn-gb
            cn-gb-2312
            iso-2022-cjk
            iso-2022-cn
            iso-2022-cn-ext



Also it might help to install a fontset rather than depending on a 
single font to represent all these characters. Unfortunately I can't 
help with that. I am on w32 and I don't even know whether fontsets can 
be used in Emacs on that build.

Ed





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
  2009-06-12 15:39                 ` Lewis Perin
@ 2009-06-12 17:27                 ` Xah Lee
  2009-06-12 19:30                   ` Lewis Perin
                                     ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Xah Lee @ 2009-06-12 17:27 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
> B) It would be helpful if the code which does the decoding of a file and
> renders it into the buffer display, if that part of it would throw an
> error message when it encounters a character it doesn't know how to
> display, i.e., when a little box character is displayed. After all,
> isn't it an error when a little box is displayed in lieu of the correct
> character? Possible error messages would be something like: "decoding
> process can't find /path/to/charset.file" or "decoding process doesn't
> have requisite permission to read /path/to/charset.file" or "invalid
> character: [hex/decimal value]" or other.

some thought process in the above is not correct.

In general, a program just read a text file as a byte stream, and
using a encoding scheme to interprete it, the program has little way
to determine if the encoding is correct. Theoretically, it could check
with command phrases but that is generally not done by the software we
use daily. (some program does scan text guess a encoding, but not
always correct)

here's some general technical issues and experiences about using
foreign chars:

• the software needs to know what encoding & char set is used in order
to interprete the binary stream. If you don't specifically set it,
typically it assumes ascii or some iso latin char set. (of software in
USA anyway)

• today's software generally don't contain any extra heuistics to
check if the encoding used is actually correct. There is no technical
way to check that in general. It can be only heuristics, i.e. guesses.
e.g. browsers will often guess when reading a page that doesn't have
encoding info.

• even when the encoding is correct, the software needs all the proper
fonts to display it. Or, rely on some font-replacement technology,
e.g. when it finds a char which the current font doesn't have, it uses
another font for that char. (in the case of Chinese, this often
results in ugly text of mixed char style, some appear thin, some
thick, some squarly (like sans-serif), some caligraphic, some
bitmapped) Windows OS and OS X both has font-replacement technology,
as well as all the major browsers for both os x and windows. This font
replacement technology, however, is not perfect. So, sometimes you'll
see squares or question marks here or there, especially on some chars
that's not widely used (e.g. math symbols in unicode, double right
arrow, tech symbols such as Apple's command key and option key, triple
asterisk, etc.).

• when writing a file, the software needs to use a encoding to write
it. Just like reading, if you havn't explicitly set it, typically it
uses ascii or some iso latin char set, in most western lang countries.

• when you use a software to open a text but with wrong encoding info,
the result is gibberish.

the above applies not just to emacs, but applies to all apps. Some
commentary are based on my experiences with browsers, web pages, word
processors, online forums, mailing list, email apps, instant messaging
chat apps, etc, on both mac and windows.

technically, the issues involved is char set, encoding, font. ( the
concept of char set and encoding are independent but is often mixed
together in a spec, esp earlier ones).

i use mixed chinese & english in single file often and in both mac os
x and windows. They work well. On the mac, my emacs is version 22.x.
On win, it is emacs23. My encoding in emacs is set to utf-8.

I've wrote a lot about these issues, the following docs might be
helpful.

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

• Unicode Characters Example
  http://xahlee.org/Periodic_dosage_dir/t1/20040505_unicode.html

• the Journey of a Foreign Character thru Internet
  http://xahlee.org/Periodic_dosage_dir/t2/non-ascii_journey.html

• Converting a File's Encoding with Python
  http://xahlee.org/perl-python/charset_encoding.html

• Character Sets and Encoding in HTML
  http://xahlee.org/js/html_chars.html

• The Complexity And Tedium of Software Engineering (parts about
unicode problem with unison and emacs)
  http://xahlee.org/UnixResource_dir/writ/programer_frustration.html

• Mac and Windows File Conversion (parts about unicode filename
issues)
  http://xahlee.org/mswin/mac_windows_file_conv.html

• Windows Font and Unicode
  http://xahlee.org/mswin/windows_font_unicode.html

the above article contain tens of links to Wikipedia in appropriate
places. Wikipedia has massive info in digestable form about these
issues, one can spend a month on the above foreign char issues ...

for some examples of mixed chinese & english text i work with, see:

• Chinese Core Simplified Chars
  http://xahlee.org/lojban/simplified_chars.html

• Ethology, Ethnology, and Lyrics
  http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 16:48                   ` B. T. Raven
@ 2009-06-12 17:45                     ` Lewis Perin
  2009-06-12 17:53                     ` Xah Lee
  1 sibling, 0 replies; 32+ messages in thread
From: Lewis Perin @ 2009-06-12 17:45 UTC (permalink / raw)
  To: help-gnu-emacs

"B. T. Raven" <nihil@nihilo.net> writes:

> Lewis Perin wrote:
> > ken <gebser@mousecar.com> writes:
> >
> >> [...]
> >> Lewis,
> >>
> >> Thanks for posting.  It's lonely out there when you're the only one with
> >> a particular problem.
> > The few, the proud...
> >
> >> To make sure we're suffering the same cyber-indignity, here's the
> >> scenario as I see it (from an older version of emacs running on
> >> Linux):
> >>
> >> 0) Some others and myself want to include some non-English characters in
> >> a file being edited in emacs. Problems arise, however:
> >>
> >> 1) In a buffer which is already utf-8 encoded, I set the appropriate
> >> input method, type in the desired characters. They display just peachy
> >> and there is happiness in EmacsLand.
> >>
> >> 2) I save the buffer to a file, then close the buffer.
> >>
> >> 3) I visit the same file (i.e., load it again into emacs). Because it
> >> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
> >> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
> >> character in the status bar.
> > I haven't been inserting that special first line.
> >
> >> 4) The text in the buffer displays fine, except that in place of each of
> >> those non-English characters is a little empty box. With the cursor on
> >> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
> >> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
> >> (While, in emacs the character after "Char:" is a little box, if I load
> >> this same file into Firefox, that same character appears as it should,
> >> as an 'a' with a horizontal bar above it. How it appears in your email
> >> client will depend upon your email client.)
> > My situation differs in that most of the non-ASCII characters
> > (Chinese
> > in my case) come through just fine.  But the ones that don't have
> > those irritating boxes in place of the correct glyphs.
> 
> I wouldn't be surprised if the gaps and overlaps in the CJK ranges of
> glyphs weren't so complicated that many characters from the following
> encodings may not be included in utf-8,

Sorry, I'm not sure what you mean by "may not be included in utf-8":
do you mean utf-8 the standard, or do you mean Emacs's implementation
of it?  The characters I'm talking about are definitely in Unicode.

> especially if they are not precomposed.

This I don't really understand, either, I'm afraid.  Might this
explain why I can see the glyph for ni3 when I'm composing Chinese in
Emacs using the chinese-tonepy-punct input method but can't see it
when the saved file is read by Emacs?

> Try some of these encodings to see if some of the empty boxes are
> resolved into characters:
> [...]
>             cn-gb-2312

I created a little file with my bête noire character using that
encoding and saved it.  Reverting the file with that encoding, I did
see all the characters.
 
> Also it might help to install a fontset rather than depending on a
> single font to represent all these characters. Unfortunately I can't
> help with that. I am on w32 and I don't even know whether fontsets can
> be used in Emacs on that build.

Windows R Us, too.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 16:48                   ` B. T. Raven
  2009-06-12 17:45                     ` Lewis Perin
@ 2009-06-12 17:53                     ` Xah Lee
  2009-06-12 20:59                       ` Lennart Borgman
                                         ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Xah Lee @ 2009-06-12 17:53 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
> B) It would be helpful if the code which does the decoding of a file and
> renders it into the buffer display, if that part of it would throw an
> error message when it encounters a character it doesn't know how to
> display, i.e., when a little box character is displayed. After all,
> isn't it an error when a little box is displayed in lieu of the correct
> character? Possible error messages would be something like: "decoding
> process can't find /path/to/charset.file" or "decoding process doesn't
> have requisite permission to read /path/to/charset.file" or "invalid
> character: [hex/decimal value]" or other.

some thought process in the above is not correct.

In general, a program just read a text file as a byte stream, and
using a encoding scheme to interpret it, the program has little way to
determine if the encoding is correct. Theoretically, it could check
with common phrases but that is generally not done by the software we
use daily. (some program does scan text guess a encoding, but not
always correct)

here's some general technical issues and experiences about using
foreign chars:

• the software needs to know what encoding & char set is used in order
to interpret the binary stream. If you don't specifically set it,
typically it assumes ascii or some iso latin char set. (of software in
USA anyway)

• today's software generally don't contain any extra heuristics to
check if the encoding used is actually correct. There is no technical
way to check that in general. It can be only heuristics, i.e. guesses.
e.g. browsers will often guess when reading a page that doesn't have
encoding info.

• even when the encoding is correct, the software needs all the proper
fonts to display it. Or, rely on some font-replacement technology,
e.g. when it finds a char which the current font doesn't have, it uses
another font for that char. (in the case of Chinese, this often
results in ugly text of mixed char style, some appear thin, some
thick, some squarely (like sans-serif), some calligraphic, some bit-
mapped) Windows OS and OS X both has font-replacement technology, as
well as all the major browsers for both os x and windows. This font
replacement technology, however, is not perfect. So, sometimes you'll
see squares or question marks here or there, especially on some chars
that's not widely used (e.g. math symbols in unicode, double right
arrow, tech symbols such as Apple's command key and option key, triple
asterisk, etc.).

• when writing a file, the software needs to use a encoding to write
it. Just like reading, if you haven't explicitly set it, typically it
uses ascii or some iso latin char set, in most western lang countries.

• when you use a software to open a text but with wrong encoding info,
the result is gibberish.

the above applies not just to emacs, but applies to all apps. Some
commentary are based on my experiences with browsers, web pages, word
processors, online forums, mailing list, email apps, instant messaging
chat apps, etc, on both mac and windows.

technically, the issues involved is char set, encoding, font. ( the
concept of char set and encoding are independent but is often mixed
together in a spec, esp earlier ones).

i use mixed chinese & english in single file often and in both mac os
x and windows. They work well. On the mac, my emacs is version 22.x.
On win, it is emacs23. My encoding in emacs is set to utf-8.

I've wrote a lot about these issues, the following docs might be
helpful.

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

• Unicode Characters Example
  http://xahlee.org/Periodic_dosage_dir/t1/20040505_unicode.html

• the Journey of a Foreign Character thru Internet
  http://xahlee.org/Periodic_dosage_dir/t2/non-ascii_journey.html

• Converting a File's Encoding with Python
  http://xahlee.org/perl-python/charset_encoding.html

• Character Sets and Encoding in HTML
  http://xahlee.org/js/html_chars.html

• The Complexity And Tedium of Software Engineering (parts about
unicode problem with unison and emacs)
  http://xahlee.org/UnixResource_dir/writ/programer_frustration.html

• Mac and Windows File Conversion (parts about unicode filename
issues)
  http://xahlee.org/mswin/mac_windows_file_conv.html

• Windows Font and Unicode
  http://xahlee.org/mswin/windows_font_unicode.html

the above article contain tens of links to Wikipedia in appropriate
places. Wikipedia has massive info in digestible form about these
issues, one can spend a month on the above foreign char issues ...

for some examples of mixed chinese & english text i work with, see:

• Chinese Core Simplified Chars
  http://xahlee.org/lojban/simplified_chars.html

• Ethology, Ethnology, and Lyrics
  http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html

  Xah
∑ http://xahlee.org/

☄


On Jun 12, 9:48 am, "B. T. Raven" <ni...@nihilo.net> wrote:

> I wouldn't be surprised if the gaps and overlaps in the CJK ranges of
> glyphs weren't so complicated that many characters from the following
> encodings may not be included in utf-8, especially if they are not
> precomposed. Try some of these encodings to see if some of the empty
> boxes are resolved into characters:
>
>             chinese-big5
>             chinese-hz
>             chinese-iso-7bit
>             chinese-iso-8bit
>             chinese-iso-8bit-with-esc
>             cn-big5
>             cn-gb
>             cn-gb-2312
>             iso-2022-cjk
>             iso-2022-cn
>             iso-2022-cn-ext

most chinese encodings are subset or identical to unicode's charset.

In particular, the current, mostly widely used chinese charset the GB
18030, actually is just unicode.

see http://en.wikipedia.org/wiki/GB_18030

Note also, that means china's GB 18030 contain the entirely of
traditional chars in unicode too. (though, i don't know about how big5
relates to unicode )

the list you gave above is from emacs? emacs's list always seems
strange to me... haven't really looked into it. maybe emacs's list is
really encompassing of all encoding that've existed, but it also could
be just screwed up like many open source things. For example, it
invents its own names by mixing up char set encoding with concepts of
EOL convention.

btw, who actually coded the low down levels of char encoding in emacs?
e.g. especially unicode, since it came after richard stallman still
doing the bulk of emacs. That person should be admirable. lol.

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:27                 ` Xah Lee
@ 2009-06-12 19:30                   ` Lewis Perin
  2009-06-12 19:43                     ` Xah Lee
  2009-06-12 20:56                   ` B. T. Raven
  2009-06-13 20:35                   ` Lewis Perin
  2 siblings, 1 reply; 32+ messages in thread
From: Lewis Perin @ 2009-06-12 19:30 UTC (permalink / raw)
  To: help-gnu-emacs

Xah Lee <xahlee@gmail.com> writes:

> [...]
> i use mixed chinese & english in single file often and in both mac os
> x and windows. They work well. On the mac, my emacs is version 22.x.
> On win, it is emacs23. My encoding in emacs is set to utf-8.
> 
> I've wrote a lot about these issues, the following docs might be
> helpful.
> [...]

I'll assume you have no trouble with ni3, the normal second person
pronoun, and have a look at your collected works.  Thanks!

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 19:30                   ` Lewis Perin
@ 2009-06-12 19:43                     ` Xah Lee
  0 siblings, 0 replies; 32+ messages in thread
From: Xah Lee @ 2009-06-12 19:43 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 12, 12:30 pm, Lewis Perin <pe...@panix.com> wrote:
> I'll assume you have no trouble with ni3, the normal second person
> pronoun, and have a look at your collected works.  Thanks!

yeah, no prob with ni3 hao3 你好. This is written in emacs then pasted
to google groups.

 Xah


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:27                 ` Xah Lee
  2009-06-12 19:30                   ` Lewis Perin
@ 2009-06-12 20:56                   ` B. T. Raven
  2009-06-13 16:16                     ` Xah Lee
  2009-06-13 20:35                   ` Lewis Perin
  2 siblings, 1 reply; 32+ messages in thread
From: B. T. Raven @ 2009-06-12 20:56 UTC (permalink / raw)
  To: help-gnu-emacs

Xah Lee wrote:
> On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
>> B) It would be helpful if the code which does the decoding of a file and
>> renders it into the buffer display, if that part of it would throw an
>> error message when it encounters a character it doesn't know how to
>> display, i.e., when a little box character is displayed. After all,
>> isn't it an error when a little box is displayed in lieu of the correct
>> character? Possible error messages would be something like: "decoding
>> process can't find /path/to/charset.file" or "decoding process doesn't
>> have requisite permission to read /path/to/charset.file" or "invalid
>> character: [hex/decimal value]" or other.
> 
> some thought process in the above is not correct.
> 
> In general, a program just read a text file as a byte stream, and
> using a encoding scheme to interprete it, the program has little way
> to determine if the encoding is correct. Theoretically, it could check
> with command phrases but that is generally not done by the software we
> use daily. (some program does scan text guess a encoding, but not
> always correct)
> 
> here's some general technical issues and experiences about using
> foreign chars:
> 
> • the software needs to know what encoding & char set is used in order
> to interprete the binary stream. If you don't specifically set it,
> typically it assumes ascii or some iso latin char set. (of software in
> USA anyway)
> 
> • today's software generally don't contain any extra heuistics to
> check if the encoding used is actually correct. There is no technical
> way to check that in general. It can be only heuristics, i.e. guesses.
> e.g. browsers will often guess when reading a page that doesn't have
> encoding info.
> 
> • even when the encoding is correct, the software needs all the proper
> fonts to display it. Or, rely on some font-replacement technology,
> e.g. when it finds a char which the current font doesn't have, it uses
> another font for that char. (in the case of Chinese, this often
> results in ugly text of mixed char style, some appear thin, some
> thick, some squarly (like sans-serif), some caligraphic, some
> bitmapped) Windows OS and OS X both has font-replacement technology,
> as well as all the major browsers for both os x and windows. This font
> replacement technology, however, is not perfect. So, sometimes you'll
> see squares or question marks here or there, especially on some chars
> that's not widely used (e.g. math symbols in unicode, double right
> arrow, tech symbols such as Apple's command key and option key, triple
> asterisk, etc.).
> 
> • when writing a file, the software needs to use a encoding to write
> it. Just like reading, if you havn't explicitly set it, typically it
> uses ascii or some iso latin char set, in most western lang countries.
> 
> • when you use a software to open a text but with wrong encoding info,
> the result is gibberish.
> 
> the above applies not just to emacs, but applies to all apps. Some
> commentary are based on my experiences with browsers, web pages, word
> processors, online forums, mailing list, email apps, instant messaging
> chat apps, etc, on both mac and windows.
> 
> technically, the issues involved is char set, encoding, font. ( the
> concept of char set and encoding are independent but is often mixed
> together in a spec, esp earlier ones).
> 
> i use mixed chinese & english in single file often and in both mac os
> x and windows. They work well. On the mac, my emacs is version 22.x.
> On win, it is emacs23. My encoding in emacs is set to utf-8.
> 
> I've wrote a lot about these issues, the following docs might be
> helpful.
> 
> • Emacs and Unicode Tips
>   http://xahlee.org/emacs/emacs_n_unicode.html
> 
> • Unicode Characters Example
>   http://xahlee.org/Periodic_dosage_dir/t1/20040505_unicode.html
> 
> • the Journey of a Foreign Character thru Internet
>   http://xahlee.org/Periodic_dosage_dir/t2/non-ascii_journey.html
> 
> • Converting a File's Encoding with Python
>   http://xahlee.org/perl-python/charset_encoding.html
> 
> • Character Sets and Encoding in HTML
>   http://xahlee.org/js/html_chars.html
> 
> • The Complexity And Tedium of Software Engineering (parts about
> unicode problem with unison and emacs)
>   http://xahlee.org/UnixResource_dir/writ/programer_frustration.html
> 
> • Mac and Windows File Conversion (parts about unicode filename
> issues)
>   http://xahlee.org/mswin/mac_windows_file_conv.html
> 
> • Windows Font and Unicode
>   http://xahlee.org/mswin/windows_font_unicode.html
> 
> the above article contain tens of links to Wikipedia in appropriate
> places. Wikipedia has massive info in digestable form about these
> issues, one can spend a month on the above foreign char issues ...
> 
> for some examples of mixed chinese & english text i work with, see:
> 
> • Chinese Core Simplified Chars
>   http://xahlee.org/lojban/simplified_chars.html
> 
> • Ethology, Ethnology, and Lyrics
>   http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html
> 
>   Xah
> ∑ http://xahlee.org/
> 
> ☄

Totally OT but prima facie the mosting interesting title is the last. 
Unfortunately I couldn't grok what ethology (the "anthropology" of 
animals)had to do with it unless the critters that emit "The Masochistic 
Cries of Lovelorn Females" are to be considered as less than human. I 
notice that Salt-n-Pepa's sweet little ditty (Don't want no S.D.M.) is 
missing from the list, but maybe that's more sadistic than masochistic; 
maybe it belongs in the Quagmire. ;-) Sexology is a bona fide area of 
inquiry pioneered by Kinsey et al. but sexualogy is not an English word 
nor (I keep my fingers crossed) will it ever become one.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:53                     ` Xah Lee
@ 2009-06-12 20:59                       ` Lennart Borgman
  2009-06-12 22:23                       ` ken
       [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 32+ messages in thread
From: Lennart Borgman @ 2009-06-12 20:59 UTC (permalink / raw)
  To: Xah Lee; +Cc: help-gnu-emacs

On Fri, Jun 12, 2009 at 7:53 PM, Xah Lee<xahlee@gmail.com> wrote:
> the list you gave above is from emacs? emacs's list always seems
> strange to me... haven't really looked into it. maybe emacs's list is
> really encompassing of all encoding that've existed, but it also could
> be just screwed up like many open source things.

I do not know these things, but from the discussions on Emacs devel it
looks like those coding it in Emacs knows it very well.


> For example, it
> invents its own names by mixing up char set encoding with concepts of
> EOL convention.

It is a technical consideration. Hopefully it does not confuse anyone.


> btw, who actually coded the low down levels of char encoding in emacs?
> e.g. especially unicode, since it came after richard stallman still
> doing the bulk of emacs. That person should be admirable. lol.

Please look in the change log files.  (I think you need to check out
the sources to see those. Or look in the web interface for dito of
course.)




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:53                     ` Xah Lee
  2009-06-12 20:59                       ` Lennart Borgman
@ 2009-06-12 22:23                       ` ken
       [not found]                         ` <e01d8a50906121527k5e77f5abj8c2c44f62f85e537@mail.gmail.com>
       [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 32+ messages in thread
From: ken @ 2009-06-12 22:23 UTC (permalink / raw)
  To: GNU Emacs List

On 06/12/2009 01:53 PM Xah Lee wrote:
> On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
>> B) It would be helpful if the code which does the decoding of a file and
>> renders it into the buffer display, if that part of it would throw an
>> error message when it encounters a character it doesn't know how to
>> display, i.e., when a little box character is displayed. After all,
>> isn't it an error when a little box is displayed in lieu of the correct
>> character? Possible error messages would be something like: "decoding
>> process can't find /path/to/charset.file" or "decoding process doesn't
>> have requisite permission to read /path/to/charset.file" or "invalid
>> character: [hex/decimal value]" or other.
> 
> some thought process in the above is not correct.

Yet emacs puts a little box in the place of a character it cannot find
(or, per your explanation) possibly confused about.  The fact remains
that the little box is not a correct rendering of the code.  It is an
error... at least it is for me, because that's not what I typed in.  So
it is an error.  As an error, there should be a corresponding error
message, hopefully one (or more) which would help diagnose the problem.
 It seems obvious that, given the long thread on this issue with no
resolution, we could use some help-- like an error message-- which would
help in diagnosis.

Thanks for the information and the links though.


> 
> ....





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
@ 2009-06-13  0:35                         ` Xah Lee
  0 siblings, 0 replies; 32+ messages in thread
From: Xah Lee @ 2009-06-13  0:35 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 12, 3:23 pm, ken <geb...@mousecar.com> wrote:
> On 06/12/2009 01:53 PM Xah Lee wrote:
>
> > On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
> >> B) It would be helpful if the code which does the decoding of a file and
> >> renders it into the buffer display, if that part of it would throw an
> >> error message when it encounters a character it doesn't know how to
> >> display, i.e., when a little box character is displayed. After all,
> >> isn't it an error when a little box is displayed in lieu of the correct
> >> character? Possible error messages would be something like: "decoding
> >> process can't find /path/to/charset.file" or "decoding process doesn't
> >> have requisite permission to read /path/to/charset.file" or "invalid
> >> character: [hex/decimal value]" or other.
>
> > some thought process in the above is not correct.
>
> Yet emacs puts a little box in the place of a character it cannot find
> (or, per your explanation) possibly confused about.  The fact remains
> that the little box is not a correct rendering of the code.  It is an
> error... at least it is for me, because that's not what I typed in.  So
> it is an error.  As an error, there should be a corresponding error
> message, hopefully one (or more) which would help diagnose the problem.
>  It seems obvious that, given the long thread on this issue with no
> resolution, we could use some help-- like an error message-- which would
> help in diagnosis.
>
> Thanks for the information and the links though.

i think displaying a error for each char that emacs cannot find a font
for is just not feasible. The app can't know whether it used the right
encoding. And even if the encoding used is correct, it can't deal with
possible missing fonts in some of the characters in the char set.

i don't have experience in this, but imagine, when a app gets a byte
stream, and with a given charset/encoding. With that, it can decode
byte length to map to the code points in the char set. (e.g. utf-8,
utf-16, both don't have fixed byte-length for chars) After that done,
you get a sequence of a code points (i.e. a sequence of integers). At
this point, given a integer, you need to map this integere to a
character in a font. There are many issues here... a font i guess is a
set of glyphs... ultimately a set of integers. I'm not sure what sort
of spec or standard specifies what each integer means (i.e. support
your app now has a integer that represents B. Now suppose your app is
set to use font Aria. Now, Aria is a set of integers, but by what
standard that says what integer is B?)... Part of this step is what
happens when Aria don't have that character. (i'm guessing a font also
has data about what character set it contains...)
But in anycase, finally we'll have a B from font Arial. Then it goes
thru the whole display process...

 overall i think the technology we have today that actually display
fonts and unicode text etc are extremely complex, not to mention
vector based fonts and anti-aliasing and font-substitution etc techs.

some interesting read here:

http://en.wikipedia.org/wiki/Computer_font
http://en.wikipedia.org/wiki/Anti-aliasing
http://en.wikipedia.org/wiki/Font_rasterization
http://en.wikipedia.org/wiki/Subpixel_rendering
http://en.wikipedia.org/wiki/Font-substitution

for most modern apps, like browsers, i think they all call OS's APIs
to handle it. Some glimps over emacs dev list seems to suggest that
emacs implements its own display system... on one hand it's bad
because emacs misses out using all modern techs developed in 2 decades
by Apple or Adobe or Microsoft, or some Open Source's work, on the
other hand it is admirable in that it does it on its own...

sorry am rambling a bit. You are right that the bottom line is that
some things just rendered as squares and is a problem. Though, i
wanted to say that my point was that it is unfeasible to issue a error
for missing fonts or miss-interpretation of the encodings. Part of
this is because theoretically there's no way to know that encoding
chosen is correct. Part is because in practice missing font or bad
chosen encoding is very common. If we all stick with ascii, everything
is pretty good. If we stick to western langs, things are still not too
bad. But once you have chinese, japanese, korean alphabets, or the
ocational use of the many math symbols and greek letters, or adding
cyrillic/russian alphabets or arabian alphabets ... the chances of
missing font or missing encoding info is very high.

i think a large part of the problem is that char set and encoding info
is not part of the file. Things are getting better in the past decade
with mime type and unicode standard. But give a byte stream, after
being lucky of able to know it is text, there's still little way to
know how to interpret it. The char set and encoding meta data often
gets lost, implementation are often not robust, font for multi-lang
usually are not there, and font-substitution tech just started.
(according to Wikipedia, IE before 7 does not even have font
substitution (which means, you really need such beast as “unicode
font”, namely a font that contains some tens or hundreds thousands of
glyphs))

i think all these issue only started to get addressed in the past
decade since the globalization partly due to internet. Before, English
speakers just stick with ascii and that's pretty sufficient. Each
western lang region stick with their particular encoding for a few
special chars in their alphabet. Only when things started to mix they
get more complex, and now with Chinese & japanese etc. With unicode,
the use of math symbols also becomes more common. Before that, it's
just ascii markup...

speaking of this. Emacs and FSF docs still stick with 1980s's `quote
hack', and arrows like this ->  => ... very extremely stupid. Of
course i filed polite bug reports, and have argued here too heated,
but basically fallen to no ears. Somethings just is impossible to
progress in the FSF world.

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 14:54               ` ken
@ 2009-06-13  3:30                 ` Eli Zaretskii
  0 siblings, 0 replies; 32+ messages in thread
From: Eli Zaretskii @ 2009-06-13  3:30 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Fri, 12 Jun 2009 10:54:23 -0400
> From: ken <gebser@mousecar.com>
> 1) In a buffer which is already utf-8 encoded, I set the appropriate
> input method, type in the desired characters. They display just peachy
> and there is happiness in EmacsLand.
> 
> 2) I save the buffer to a file, then close the buffer.
> 
> 3) I visit the same file (i.e., load it again into emacs). Because it
> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
> character in the status bar.
> 
> 4) The text in the buffer displays fine, except that in place of each of
> those non-English characters is a little empty box. With the cursor on
> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".

Please post here the full output of "C-u C-x =" (from a buffer popped
up by Emacs) for these characters, both when you type them using the
appropriate input method and they are displayed correctly (as in 1)
above), and when you see them as empty boxes after revisiting the
file.  The differences between these two cases should give you a hint
what is wrong; if not, someone else here might have ideas.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
       [not found]                             ` <E1MFKab-0000GU-Dg@fencepost.gnu.org>
@ 2009-06-13 12:30                               ` ken
  0 siblings, 0 replies; 32+ messages in thread
From: ken @ 2009-06-13 12:30 UTC (permalink / raw)
  To: Eli Zaretskii, GNU Emacs List, emacs-devel

On 06/13/2009 12:11 AM Eli Zaretskii wrote:
>> ....
> 
> Please provide the output of "C-u C-x =" on these characters, both
> when they are displayed correctly and when they are displayed as empty
> boxes.


In a similar post on the same thread Eli Zaretskii wrote:
> Please post here the full output of "C-u C-x =" (from a buffer popped
> up by Emacs) for these characters, both when you type them using the
> appropriate input method and they are displayed correctly (as in 1)
> above), and when you see them as empty boxes after revisiting the
> file.  The differences between these two cases should give you a hint
> what is wrong; if not, someone else here might have ideas.

Eli, thanks for your response.  Here it is:

^[$-1 ¡ is 'a' with a horizontal bar over it.  On first inputting it
(after doing "set-input-method latin-4-postfix" and before changing the
input method to anything else), it appears correctly and "C-u C-x =" yields:

=============================================

  character: ^[$-1 ¡ (05140, 2656, 0xa60)
    charset: latin-iso8859-4
	     (Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110)
 code point: 96
     syntax: word
   category: l:Latin
buffer code: 0x84 0xE0
  file code: 0xC4 0x81 (encoded by coding system mule-utf-8-unix)
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-4

=============================================

When I reload the file (revisit the file), the same character is
replaced with a little box.  Doing "C-u C-x =" here yields:

=============================================

  character: ^[$-1 ¡ (01210041, 331809, 0x51021)
    charset: mule-unicode-0100-24ff
	     (Unicode characters of the range U+0100..U+24FF.)
 code point: 32 33
     syntax: word
   category: l:Latin
buffer code: 0x9C 0xF4 0xA0 0xA1
  file code: 0xC4 0x81 (encoded by coding system mule-utf-8-unix)
       font: -- none --

=============================================

Note: For some reason, possibly related, had difficulty copying the
above text from emacs into clipboard (i.e., "M-w" didn't do anything),
so had to use a workaround.  It seems that this workaround altered the
character in question, the one above following each of the two instances
of "character:".

As for the meaning of the two outputs above, all that I can confidently
glean is that, if I want to use non-English characters in emacs, I have
to be an expert emacs developer.  :)





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 20:56                   ` B. T. Raven
@ 2009-06-13 16:16                     ` Xah Lee
  0 siblings, 0 replies; 32+ messages in thread
From: Xah Lee @ 2009-06-13 16:16 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 12, 1:56 pm, "B. T. Raven" <ni...@nihilo.net> wrote:

> > • Ethology, Ethnology, and Lyrics
> >  http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html

> Totally OT but prima facie the mosting interesting title is the last.
> Unfortunately I couldn't grok what ethology (the "anthropology" of
> animals)had to do with it unless the critters that emit "The Masochistic
> Cries of Lovelorn Females" are to be considered as less than human. I
> notice that Salt-n-Pepa's sweet little ditty (Don't want no S.D.M.) is
> missing from the list, but maybe that's more sadistic than masochistic;
> maybe it belongs in the Quagmire. ;-) Sexology is a bona fide area of
> inquiry pioneered by Kinsey et al. but sexualogy is not an English word
> nor (I keep my fingers crossed) will it ever become one.

sexualogy = sexology + sexuality.

^_^

ok, now it reads “... respect to ethology and sexuality.”. Simpler and
more fitting.

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:27                 ` Xah Lee
  2009-06-12 19:30                   ` Lewis Perin
  2009-06-12 20:56                   ` B. T. Raven
@ 2009-06-13 20:35                   ` Lewis Perin
  2009-06-14 11:47                     ` ken
  2 siblings, 1 reply; 32+ messages in thread
From: Lewis Perin @ 2009-06-13 20:35 UTC (permalink / raw)
  To: help-gnu-emacs

Xah Lee <xahlee@gmail.com> writes:

> [...]
> i use mixed chinese & english in single file often and in both mac os
> x and windows. They work well. On the mac, my emacs is version 22.x.
> On win, it is emacs23. My encoding in emacs is set to utf-8.

Thanks for mentioning v. 23.  I just downloaded it, despite my
misgivings about life on the bleeding edge, and my problem with some
Chinese UTF-8 characters' glyphs turning to boxes when the file is
reverted seems to have vanished.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13 20:35                   ` Lewis Perin
@ 2009-06-14 11:47                     ` ken
  2009-06-15  7:28                       ` Bernardo
  0 siblings, 1 reply; 32+ messages in thread
From: ken @ 2009-06-14 11:47 UTC (permalink / raw)
  To: Lewis Perin; +Cc: help-gnu-emacs

On 06/13/2009 04:35 PM Lewis Perin wrote:
> Xah Lee <xahlee@gmail.com> writes:
> 
>> [...]
>> i use mixed chinese & english in single file often and in both mac os
>> x and windows. They work well. On the mac, my emacs is version 22.x.
>> On win, it is emacs23. My encoding in emacs is set to utf-8.
> 
> Thanks for mentioning v. 23.  I just downloaded it, despite my
> misgivings about life on the bleeding edge, and my problem with some
> Chinese UTF-8 characters' glyphs turning to boxes when the file is
> reverted seems to have vanished.
> 
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://www.panix.com/~perin/babelcarp.html

Lew (or anyone),

Where did you find v.23?  The only place I'm seeing is cvs,
<http://cvs.savannah.gnu.org/viewvc/emacs/?root=emacs>.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: utf8 char display in buffer
  2009-06-14 11:47                     ` ken
@ 2009-06-15  7:28                       ` Bernardo
  0 siblings, 0 replies; 32+ messages in thread
From: Bernardo @ 2009-06-15  7:28 UTC (permalink / raw)
  To: help-gnu-emacs

http://alpha.gnu.org/gnu/emacs/pretest/

ken said the following on 14/06/09 21:47:
> 
> Lew (or anyone),
> 
> Where did you find v.23?  The only place I'm seeing is cvs,
> <http://cvs.savannah.gnu.org/viewvc/emacs/?root=emacs>.
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-06-15  7:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
2009-06-08 19:10 ` utf8 char display in buffer Teemu Likonen
2009-06-08 19:52 ` Xah Lee
2009-06-09 10:52   ` ken
2009-06-08 20:43 ` B. T. Raven
2009-06-08 20:49   ` B. T. Raven
2009-06-08 22:49     ` ken
2009-06-09 10:24   ` ken
     [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
2009-06-09 13:03     ` B. T. Raven
2009-06-09 14:51       ` ken
     [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
2009-06-10  1:34         ` B. T. Raven
2009-06-10 14:03           ` Lewis Perin
2009-06-11  3:21             ` B. T. Raven
2009-06-12 14:54               ` ken
2009-06-13  3:30                 ` Eli Zaretskii
     [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
2009-06-12 15:39                 ` Lewis Perin
2009-06-12 16:48                   ` B. T. Raven
2009-06-12 17:45                     ` Lewis Perin
2009-06-12 17:53                     ` Xah Lee
2009-06-12 20:59                       ` Lennart Borgman
2009-06-12 22:23                       ` ken
     [not found]                         ` <e01d8a50906121527k5e77f5abj8c2c44f62f85e537@mail.gmail.com>
     [not found]                           ` <4A32E6F6.5080501@mousecar.com>
     [not found]                             ` <E1MFKab-0000GU-Dg@fencepost.gnu.org>
2009-06-13 12:30                               ` ken
     [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
2009-06-13  0:35                         ` Xah Lee
2009-06-12 17:27                 ` Xah Lee
2009-06-12 19:30                   ` Lewis Perin
2009-06-12 19:43                     ` Xah Lee
2009-06-12 20:56                   ` B. T. Raven
2009-06-13 16:16                     ` Xah Lee
2009-06-13 20:35                   ` Lewis Perin
2009-06-14 11:47                     ` ken
2009-06-15  7:28                       ` Bernardo
2009-06-11 12:03 ` Teemu Likonen
2009-06-08 18:33 ken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).