utf8 char display in buffer

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* utf8 char display in buffer
@ 2009-06-08 18:33 ken
  0 siblings, 0 replies; 56+ messages in thread
From: ken @ 2009-06-08 18:33 UTC (permalink / raw
  To: GNU Emacs List

Hey, group,

I already use a few utf8 characters in emacs (and in web pages), but
recently needed to use a couple more.  One is an 'a' with a horizontal
line above it, the other an 'i' with a vertical line above it.  How do I
input these into a buffer?

tia,
ken

-- 
"To make an apple pie from scratch,
first create the universe."
        -- Carl Sagan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
@ 2009-06-08 19:10 ` Teemu Likonen
  2009-06-08 19:52 ` Xah Lee
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Teemu Likonen @ 2009-06-08 19:10 UTC (permalink / raw
  To: gebser; +Cc: GNU Emacs List

On 2009-06-08 14:33 (-0400), ken wrote:

> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more. One is an 'a' with a horizontal
> line above it, the other an 'i' with a vertical line above it. How do
> I input these into a buffer?

Some keyboards (Finnish, for example) can produce those characters
(semi-)directly but through Emacs's input methods it's possible with
just basic Ascii keys. For example, turn on "TeX" input method (C-x RET
C-\ TeX RET) and type \=a for "ā" and \=i for "ī". You can also use
"ucs" input method and type Unicode code points directly: type u0101 for
"ā" and u012b for "ī".

There are probably some language-specific input methods too which may
have even easier ways for inputting these characters.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
  2009-06-08 19:10 ` utf8 char display in buffer Teemu Likonen
@ 2009-06-08 19:52 ` Xah Lee
  2009-06-09 10:52   ` ken
  2009-06-08 20:43 ` B. T. Raven
  2009-06-11 12:03 ` Teemu Likonen
  3 siblings, 1 reply; 56+ messages in thread
From: Xah Lee @ 2009-06-08 19:52 UTC (permalink / raw
  To: help-gnu-emacs

On Jun 8, 11:33 am, ken <geb...@mousecar.com> wrote:
> Hey, group,
>
> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more.  One is an 'a' with a horizontal
> line above it, the other an 'i' with a vertical line above it.  How do I
> input these into a buffer?

i define keys to insert unicode chars that i frequently use. e.g.

(global-set-key (kbd "<kp-6>") "→")
(global-set-key (kbd "M-i a") "α")
(global-set-key (kbd "M-i b") "β")
(global-set-key (kbd "M-i t") "θ")

you can also insert unicode by its hex value. Alt+x ucs-insert.
There are few other ways...

some more tips here

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

  Xah
∑ http://xahlee.org/

☄


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
  2009-06-08 19:10 ` utf8 char display in buffer Teemu Likonen
  2009-06-08 19:52 ` Xah Lee
@ 2009-06-08 20:43 ` B. T. Raven
  2009-06-08 20:49   ` B. T. Raven
                     ` (2 more replies)
  2009-06-11 12:03 ` Teemu Likonen
  3 siblings, 3 replies; 56+ messages in thread
From: B. T. Raven @ 2009-06-08 20:43 UTC (permalink / raw
  To: help-gnu-emacs

ken wrote:
> Hey, group,
> 
> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more.  One is an 'a' with a horizontal
> line above it, the other an 'i' with a vertical line above it.  How do I
> input these into a buffer?
> 
> 
> tia,
> ken
> 

C-x ret C-\ latin-4-postfix

then a,e,i,o,u followed by hyphen generate macroned vowels

If you don't want all these then you could just put something like this 
in .emacs

(global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
(global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))

assuming you have these C-c combos free.

Ed


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 20:43 ` B. T. Raven
@ 2009-06-08 20:49   ` B. T. Raven
  2009-06-08 22:49     ` ken
  2009-06-09 10:24   ` ken
       [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 56+ messages in thread
From: B. T. Raven @ 2009-06-08 20:49 UTC (permalink / raw
  To: help-gnu-emacs

B. T. Raven wrote:
> ken wrote:
>> Hey, group,
>>
>> I already use a few utf8 characters in emacs (and in web pages), but
>> recently needed to use a couple more.  One is an 'a' with a horizontal
>> line above it, the other an 'i' with a vertical line above it.  How do I
>> input these into a buffer?
>>
>>
>> tia,
>> ken
>>

Oops, I see you said i with VERTICAL line. What is that character?
Any of these?  í ï î ì If so substitute for i with macron below.

> 
> C-x ret C-\ latin-4-postfix
> 
> then a,e,i,o,u followed by hyphen generate macroned vowels
> 
> If you don't want all these then you could just put something like this 
> in .emacs
> 
> (global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
> (global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))
> 
> assuming you have these C-c combos free.
> 
> Ed


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 20:49   ` B. T. Raven
@ 2009-06-08 22:49     ` ken
  0 siblings, 0 replies; 56+ messages in thread
From: ken @ 2009-06-08 22:49 UTC (permalink / raw
  To: GNU Emacs List


On 06/08/2009 04:49 PM B. T. Raven wrote:
> B. T. Raven wrote:
>> ken wrote:
>>> Hey, group,
>>>
>>> I already use a few utf8 characters in emacs (and in web pages), but
>>> recently needed to use a couple more.  One is an 'a' with a horizontal
>>> line above it, the other an 'i' with a vertical line above it.  How do I
>>> input these into a buffer?
>>>
>>>
>>> tia,
>>> ken
>>>
> 
> Oops, I see you said i with VERTICAL line. What is that character?
> Any of these?  í ï î ì If so substitute for i with macron below.
> 
>>....

The Oops is mine.  I meant to say "horizontal" for both.  So your
previous email did it all for me.

Thanks.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 20:43 ` B. T. Raven
  2009-06-08 20:49   ` B. T. Raven
@ 2009-06-09 10:24   ` ken
       [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 56+ messages in thread
From: ken @ 2009-06-09 10:24 UTC (permalink / raw
  To: GNU Emacs List

On 06/08/2009 04:43 PM B. T. Raven wrote:
> ken wrote:
>> Hey, group,
>>
>> I already use a few utf8 characters in emacs (and in web pages), but
>> recently needed to use a couple more.  One is an 'a' with a horizontal
>> line above it, the other an 'i' with a horizontal line above it.  How do I
>> input these into a buffer?
>>
>>
>> tia,
>> ken
>>
> 
> C-x ret C-\ latin-4-postfix
> 
> then a,e,i,o,u followed by hyphen generate macroned vowels
> 
> If you don't want all these then you could just put something like this
> in .emacs
> 
> (global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
> (global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))
> 
> assuming you have these C-c combos free.
> 
> Ed

Fantastic!  But... when I save and close the buffer and then open it up
again, in place of the beautiful and correct characters, there are
little boxes.

I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.

The fact that these non-English characters display properly in the
buffer initially tells me that I have the requisite fonts installed.  So
what little connection is emacs not making (and how do I tell it to make
that connection)?

Thanks, all.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-08 19:52 ` Xah Lee
@ 2009-06-09 10:52   ` ken
  0 siblings, 0 replies; 56+ messages in thread
From: ken @ 2009-06-09 10:52 UTC (permalink / raw
  Cc: help-gnu-emacs

On 06/08/2009 03:52 PM Xah Lee wrote:
>> ....
> 
> i define keys to insert unicode chars that i frequently use. e.g.
> 
> (global-set-key (kbd "<kp-6>") "→")
> (global-set-key (kbd "M-i a") "α")
> (global-set-key (kbd "M-i b") "β")
> (global-set-key (kbd "M-i t") "θ")

It's probably just me, but with the so many foreign characters I use,
remembering all the many key mappings becomes more than my little brain
can manage.  So I prefer to create a menu of character entities.
html-helper-mode (i.e., not html-mode) already has such a menu which
I've added to using "(mapchar 'html-helper-add-tag ...".  This menu
allows me to look up a 'character' which I can't remember *and* gives me
a reminder of what its key combo is.  My (too old) version of emacs,
however, doesn't have a "character entities" menu for regular (non-html)
buffers.  I've already got too much on my plate for the moment, so this
isn't a project for me right now.  But later....

> ....
> 
> some more tips here
> 
> • Emacs and Unicode Tips
>   http://xahlee.org/emacs/emacs_n_unicode.html
> 
> ....

Nice web page.  (Bookmarked.)  Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
@ 2009-06-09 13:03     ` B. T. Raven
  2009-06-09 14:51       ` ken
       [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 56+ messages in thread
From: B. T. Raven @ 2009-06-09 13:03 UTC (permalink / raw
  To: help-gnu-emacs

ken wrote:
> On 06/08/2009 04:43 PM B. T. Raven wrote:
>> ken wrote:
>>> Hey, group,
>>>
>>> I already use a few utf8 characters in emacs (and in web pages), but
>>> recently needed to use a couple more.  One is an 'a' with a horizontal
>>> line above it, the other an 'i' with a horizontal line above it.  How do I
>>> input these into a buffer?
>>>
>>>
>>> tia,
>>> ken
>>>
>> C-x ret C-\ latin-4-postfix
>>
>> then a,e,i,o,u followed by hyphen generate macroned vowels
>>
>> If you don't want all these then you could just put something like this
>> in .emacs
>>
>> (global-set-key "\C-ca" (lambda () (interactive) (insert  ?ā )))
>> (global-set-key "\C-ci" (lambda () (interactive) (insert  ?ī )))
>>
>> assuming you have these C-c combos free.
>>
>> Ed
> 
> Fantastic!  But... when I save and close the buffer and then open it up
> again, in place of the beautiful and correct characters, there are
> little boxes.

After you see then correctly in the buffer do:

C-x ret c utf-8

then

C-x C-s

Now next time you load that file it should appear correctly.
ā  and ī are not in iso-8859-1 and so you must use a more comprehensive 
coding system.

> 
> I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
> no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.
> 
> The fact that these non-English characters display properly in the
> buffer initially tells me that I have the requisite fonts installed.  So
> what little connection is emacs not making (and how do I tell it to make
> that connection)?

If you use utf-8 a lot you can put ;; -*- coding: utf-8[;] -*- into the 
first line of the file. I don't know whether that sem in brackets is 
needed or not.

> 
> Thanks, all.
> 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-09 13:03     ` B. T. Raven
@ 2009-06-09 14:51       ` ken
       [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 56+ messages in thread
From: ken @ 2009-06-09 14:51 UTC (permalink / raw
  To: GNU Emacs List

On 06/09/2009 09:03 AM B. T. Raven wrote:
> ken wrote:
>> On 06/08/2009 04:43 PM B. T. Raven wrote:
>>> ken wrote:
>>>> ....
>>>>
>>> C-x ret C-\ latin-4-postfix
>>>
>>> then a,e,i,o,u followed by hyphen generate macroned vowels
>>>
>>> ....
>>
>> Fantastic!  But... when I save and close the buffer and then open it up
>> again, in place of the beautiful and correct characters, there are
>> little boxes.
> 
> After you see then correctly in the buffer do:
> 
> C-x ret c utf-8
> 
> then
> 
> C-x C-s
> 
> Now next time you load that file it should appear correctly.
> ā  and ī are not in iso-8859-1 and so you must use a more comprehensive
> coding system.

Hmmm... it doesn't.  Doing everything just as you say above, I still get
the little boxes in place of the non-English characters.

When after reloading the buffer, I run "describe-coding-system" on this
buffer, I get:

=============================================
Coding system for saving this buffer:
  u -- mule-utf-8-unix
Default coding system (for new files):
  u -- mule-utf-8 (alias: utf-8)
Coding system for keyboard input:
  nil
Coding system for terminal output:
  0 -- iso-latin-9 (alias: iso-8859-15 latin-9 latin-0)
Defaults for subprocess I/O:
  decoding: u -- mule-utf-8 (alias: utf-8)
  encoding: u -- mule-utf-8 (alias: utf-8)

Priority order for recognizing coding systems when reading files:
  1. mule-utf-8 (alias: utf-8)
  2. iso-latin-1 (alias: iso-8859-1 latin-1)
  3. iso-2022-jp (alias: junet)
  4. iso-2022-7bit
  5. iso-2022-7bit-lock (alias: iso-2022-int-1)
  6. iso-2022-8bit-ss2
  7. emacs-mule
  8. raw-text
  9. japanese-shift-jis (alias: shift_jis sjis)
  10. chinese-big5 (alias: big5 cn-big5)
  11. no-conversion (alias: binary)

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

  The followings are decoded correctly but recognized as iso-2022-7bit-lock:
    iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
    iso-2022-jp-2 iso-2022-kr

....
==================================================================

I don't know... does utf-8 or mule-utf-8 contain latin-4, greek, and/or
German characters?  (This file has some of each.)


>>
>> I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
>> no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.
>>
>> The fact that these non-English characters display properly in the
>> buffer initially tells me that I have the requisite fonts installed.  So
>> what little connection is emacs not making (and how do I tell it to make
>> that connection)?
> 
> If you use utf-8 a lot you can put ;; -*- coding: utf-8[;] -*- into the
> first line of the file. I don't know whether that sem in brackets is
> needed or not.

Sorry, I should have mentioned that I have this (with the semi-colon) at
the top of the file.

Let me also say that, though the little boxes appear in the emacs
buffer, the proper non-English characters appear when the file is loaded
into firefox.  (Yeah, this emacs file is an HTML page.)



> 
>>
>> Thanks, all.
>>
>>





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
@ 2009-06-10  1:34         ` B. T. Raven
  2009-06-10 14:03           ` Lewis Perin
  0 siblings, 1 reply; 56+ messages in thread
From: B. T. Raven @ 2009-06-10  1:34 UTC (permalink / raw
  To: help-gnu-emacs

ken wrote:
> On 06/09/2009 09:03 AM B. T. Raven wrote:
>> ken wrote:
>>> On 06/08/2009 04:43 PM B. T. Raven wrote:
>>>> ken wrote:
>>>>> ....
>>>>>
>>>> C-x ret C-\ latin-4-postfix
>>>>
>>>> then a,e,i,o,u followed by hyphen generate macroned vowels
>>>>
>>>> ....
>>> Fantastic!  But... when I save and close the buffer and then open it up
>>> again, in place of the beautiful and correct characters, there are
>>> little boxes.
>> After you see then correctly in the buffer do:
>>
>> C-x ret c utf-8
>>
>> then
>>
>> C-x C-s
>>
>> Now next time you load that file it should appear correctly.
>> ā  and ī are not in iso-8859-1 and so you must use a more comprehensive
>> coding system.
> 
> Hmmm... it doesn't.  Doing everything just as you say above, I still get
> the little boxes in place of the non-English characters.
> 
> When after reloading the buffer, I run "describe-coding-system" on this
> buffer, I get:
> 
> =============================================
> Coding system for saving this buffer:
>   u -- mule-utf-8-unix
> Default coding system (for new files):
>   u -- mule-utf-8 (alias: utf-8)
> Coding system for keyboard input:
>   nil
> Coding system for terminal output:
>   0 -- iso-latin-9 (alias: iso-8859-15 latin-9 latin-0)
> Defaults for subprocess I/O:
>   decoding: u -- mule-utf-8 (alias: utf-8)
>   encoding: u -- mule-utf-8 (alias: utf-8)
> 
> Priority order for recognizing coding systems when reading files:
>   1. mule-utf-8 (alias: utf-8)
>   2. iso-latin-1 (alias: iso-8859-1 latin-1)
>   3. iso-2022-jp (alias: junet)
>   4. iso-2022-7bit
>   5. iso-2022-7bit-lock (alias: iso-2022-int-1)
>   6. iso-2022-8bit-ss2
>   7. emacs-mule
>   8. raw-text
>   9. japanese-shift-jis (alias: shift_jis sjis)
>   10. chinese-big5 (alias: big5 cn-big5)
>   11. no-conversion (alias: binary)
> 
>   Other coding systems cannot be distinguished automatically
>   from these, and therefore cannot be recognized automatically
>   with the present coding system priorities.
> 
>   The followings are decoded correctly but recognized as iso-2022-7bit-lock:
>     iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
>     iso-2022-jp-2 iso-2022-kr
> 
> ....
> ==================================================================
> 
> I don't know... does utf-8 or mule-utf-8 contain latin-4, greek, and/or
> German characters?  (This file has some of each.)
> 
> 
>>> I tried using ‘C-x C-m c utf-8 RET’ prior to 'C-x C-f filename'... but
>>> no joy.  Same no-go with 'C-x C-m c mule-utf-8 RET'.
>>>
>>> The fact that these non-English characters display properly in the
>>> buffer initially tells me that I have the requisite fonts installed.  So
>>> what little connection is emacs not making (and how do I tell it to make
>>> that connection)?
>> If you use utf-8 a lot you can put ;; -*- coding: utf-8[;] -*- into the
>> first line of the file. I don't know whether that sem in brackets is
>> needed or not.
> 
> Sorry, I should have mentioned that I have this (with the semi-colon) at
> the top of the file.
> 
> Let me also say that, though the little boxes appear in the emacs
> buffer, the proper non-English characters appear when the file is loaded
> into firefox.  (Yeah, this emacs file is an HTML page.)
> 
> 
> 
>>> Thanks, all.

Don't know. Your problem has just escalated above my pay grade. I don't 
know what it means that the files display okay in FF. I just loaded my 
.emacs into the browser and it looks fine (has many exotic non Latin-1 
characters in it). You are using GUI Emacs and not terminal, right. You 
could try these settings from my ver 22 .emacs, just for fun:

   (set-language-environment               'UTF-8)
         (set-default-coding-systems             'utf-8)
         (setq file-name-coding-system           'utf-8)
         (setq default-buffer-file-coding-system 'utf-8)
         (setq coding-system-for-write           'utf-8)
         (set-keyboard-coding-system             'utf-8)
         (set-terminal-coding-system          'utf-8)
         (set-clipboard-coding-system            'utf-8)
         (set-selection-coding-system            'utf-8)
         (prefer-coding-system                   'utf-8)
         (modify-coding-system-alist 'process 
"[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)


and try C-x ret c utf-8
C-x C-f

to open the file.



or install version 23.x w32 binary into a different directory from here

http://alpha.gnu.org/gnu/emacs/pretest/windows/


I don't think you need a .emacs with ver 23 in dealing with utf-8 since 
its internal representation is unicode.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-10  1:34         ` B. T. Raven
@ 2009-06-10 14:03           ` Lewis Perin
  2009-06-11  3:21             ` B. T. Raven
  0 siblings, 1 reply; 56+ messages in thread
From: Lewis Perin @ 2009-06-10 14:03 UTC (permalink / raw
  To: help-gnu-emacs

I've been following this thread closely because I have the original
poster's problem, only the characters that give me trouble are some -
not many, actually - Chinese characters, e.g. ni3, the normal second
person pronoun.  And, as with the original poster, the troublesome
characters, when copied and pasted to other applications from Emacs,
display perfectly.

"B. T. Raven" <nihil@nihilo.net> writes:

> [...]
>    (set-language-environment               'UTF-8)
>          (set-default-coding-systems             'utf-8)
>          (setq file-name-coding-system           'utf-8)
>          (setq default-buffer-file-coding-system 'utf-8)
>          (setq coding-system-for-write           'utf-8)
>          (set-keyboard-coding-system             'utf-8)
>          (set-terminal-coding-system          'utf-8)
>          (set-clipboard-coding-system            'utf-8)
>          (set-selection-coding-system            'utf-8)
>          (prefer-coding-system                   'utf-8)
>          (modify-coding-system-alist 'process
> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
> 
> 
> and try C-x ret c utf-8
> C-x C-f
> 
> to open the file.

I tried this, but it didn't help.  Emacs 22.3 / Win32.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-10 14:03           ` Lewis Perin
@ 2009-06-11  3:21             ` B. T. Raven
  2009-06-12 14:54               ` ken
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 56+ messages in thread
From: B. T. Raven @ 2009-06-11  3:21 UTC (permalink / raw
  To: help-gnu-emacs

Lewis Perin wrote:
> I've been following this thread closely because I have the original
> poster's problem, only the characters that give me trouble are some -
> not many, actually - Chinese characters, e.g. ni3, the normal second
> person pronoun.  And, as with the original poster, the troublesome
> characters, when copied and pasted to other applications from Emacs,
> display perfectly.
> 
> "B. T. Raven" <nihil@nihilo.net> writes:
> 
>> [...]
>>    (set-language-environment               'UTF-8)
>>          (set-default-coding-systems             'utf-8)
>>          (setq file-name-coding-system           'utf-8)
>>          (setq default-buffer-file-coding-system 'utf-8)
>>          (setq coding-system-for-write           'utf-8)
>>          (set-keyboard-coding-system             'utf-8)
>>          (set-terminal-coding-system          'utf-8)
>>          (set-clipboard-coding-system            'utf-8)
>>          (set-selection-coding-system            'utf-8)
>>          (prefer-coding-system                   'utf-8)
>>          (modify-coding-system-alist 'process
>> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
>>
>>
>> and try C-x ret c utf-8
>> C-x C-f
>>
>> to open the file.
> 
> I tried this, but it didn't help.  Emacs 22.3 / Win32.

Even on Emacs 23 although I see the characters in the buffer, I can't 
save the following as utf-8:

nǐ hǎo 你 好
u+4f60 and u+597d

Or at least not so as to be readable with 22.3. Both versions are using 
Arial Unicode MS.

Why is that?


> 
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
                   ` (2 preceding siblings ...)
  2009-06-08 20:43 ` B. T. Raven
@ 2009-06-11 12:03 ` Teemu Likonen
  2009-06-11 12:55   ` Lennart Borgman
  3 siblings, 1 reply; 56+ messages in thread
From: Teemu Likonen @ 2009-06-11 12:03 UTC (permalink / raw
  To: help-gnu-emacs

On 2009-06-08 14:33 (-0400), ken wrote:

> I already use a few utf8 characters in emacs (and in web pages), but
> recently needed to use a couple more. One is an 'a' with a horizontal
> line above it, the other an 'i' with a [horizontal] line above it. How
> do I input these into a buffer?

Let’s add one more nice way to insert Unicode chars: “rfc1345” input
method. It’s an input method for Unicode characters using mnemonics.
Examples:

    &a- = ā
    &i- = ī
    &W* = Ω
    &"6 = “
    &"9 = ”

For more info: C-h I rfc1345 RET


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11 12:03 ` Teemu Likonen
@ 2009-06-11 12:55   ` Lennart Borgman
  2009-06-11 13:04     ` Andreas Schwab
  0 siblings, 1 reply; 56+ messages in thread
From: Lennart Borgman @ 2009-06-11 12:55 UTC (permalink / raw
  To: Teemu Likonen, Emacs-Devel

Teemu mentioned this on gnu-emacs. It seems nice, but the help text
that C-h l rfc1345 brings up is not that much helpful for someone who
does not know this well. Could it perhaps be enhanced with some links
to relevant information?


On Thu, Jun 11, 2009 at 2:03 PM, Teemu Likonen<tlikonen@iki.fi> wrote:
> On 2009-06-08 14:33 (-0400), ken wrote:
>
>> I already use a few utf8 characters in emacs (and in web pages), but
>> recently needed to use a couple more. One is an 'a' with a horizontal
>> line above it, the other an 'i' with a [horizontal] line above it. How
>> do I input these into a buffer?
>
> Let’s add one more nice way to insert Unicode chars: “rfc1345” input
> method. It’s an input method for Unicode characters using mnemonics.
> Examples:
>
>    &a- = ā
>    &i- = ī
>    &W* = Ω
>    &"6 = “
>    &"9 = ”
>
> For more info: C-h I rfc1345 RET
>




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11 12:55   ` Lennart Borgman
@ 2009-06-11 13:04     ` Andreas Schwab
  2009-06-11 13:07       ` Lennart Borgman
  0 siblings, 1 reply; 56+ messages in thread
From: Andreas Schwab @ 2009-06-11 13:04 UTC (permalink / raw
  To: Lennart Borgman; +Cc: Teemu Likonen, Emacs-Devel

Lennart Borgman <lennart.borgman@gmail.com> writes:

> Teemu mentioned this on gnu-emacs. It seems nice, but the help text
> that C-h l rfc1345 brings up is not that much helpful for someone who
> does not know this well. Could it perhaps be enhanced with some links
> to relevant information?

This has been fixed in Emacs 23, where the complete translation table is
included.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11 13:04     ` Andreas Schwab
@ 2009-06-11 13:07       ` Lennart Borgman
  2009-06-11 13:08         ` Lennart Borgman
  0 siblings, 1 reply; 56+ messages in thread
From: Lennart Borgman @ 2009-06-11 13:07 UTC (permalink / raw
  To: Andreas Schwab; +Cc: Teemu Likonen, Emacs-Devel

On Thu, Jun 11, 2009 at 3:04 PM, Andreas Schwab<schwab@linux-m68k.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>
>> Teemu mentioned this on gnu-emacs. It seems nice, but the help text
>> that C-h l rfc1345 brings up is not that much helpful for someone who
>> does not know this well. Could it perhaps be enhanced with some links
>> to relevant information?
>
> This has been fixed in Emacs 23, where the complete translation table is
> included.

Really? This is what I get with
GNU Emacs 23.0.94.1 (i386-mingw-nt5.1.2600) of 2009-06-10 on
LENNART-69DE564 (patched)

------------------------------
Input method: rfc1345 (`m' in mode line) for UTF-8
  Unicode characters input method using RFC1345 mnemonics (non-ASCII only).
E.g. &a' -> á

[back]
------------------------------




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11 13:07       ` Lennart Borgman
@ 2009-06-11 13:08         ` Lennart Borgman
  2009-06-11 13:24           ` Tassilo Horn
  0 siblings, 1 reply; 56+ messages in thread
From: Lennart Borgman @ 2009-06-11 13:08 UTC (permalink / raw
  To: Andreas Schwab; +Cc: Teemu Likonen, Emacs-Devel

Hm, the check out date is 2009-05-29, not the date below.


On Thu, Jun 11, 2009 at 3:07 PM, Lennart
Borgman<lennart.borgman@gmail.com> wrote:
> On Thu, Jun 11, 2009 at 3:04 PM, Andreas Schwab<schwab@linux-m68k.org> wrote:
>> Lennart Borgman <lennart.borgman@gmail.com> writes:
>>
>>> Teemu mentioned this on gnu-emacs. It seems nice, but the help text
>>> that C-h l rfc1345 brings up is not that much helpful for someone who
>>> does not know this well. Could it perhaps be enhanced with some links
>>> to relevant information?
>>
>> This has been fixed in Emacs 23, where the complete translation table is
>> included.
>
> Really? This is what I get with
> GNU Emacs 23.0.94.1 (i386-mingw-nt5.1.2600) of 2009-06-10 on
> LENNART-69DE564 (patched)
>
> ------------------------------
> Input method: rfc1345 (`m' in mode line) for UTF-8
>  Unicode characters input method using RFC1345 mnemonics (non-ASCII only).
> E.g. &a' -> á
>
> [back]
> ------------------------------
>




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11 13:08         ` Lennart Borgman
@ 2009-06-11 13:24           ` Tassilo Horn
  0 siblings, 0 replies; 56+ messages in thread
From: Tassilo Horn @ 2009-06-11 13:24 UTC (permalink / raw
  To: Emacs-Devel

Lennart Borgman <lennart.borgman@gmail.com> writes:

Hi Lennart,

>>> This has been fixed in Emacs 23, where the complete translation
>>> table is included.
>>
>> Really?

Mine is about a week old, and it displays the complete table.  Nice, I
which I'd write more unicode chars. :-)

Bye,
Tassilo




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-11  3:21             ` B. T. Raven
@ 2009-06-12 14:54               ` ken
  2009-06-13  3:30                 ` Eli Zaretskii
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 56+ messages in thread
From: ken @ 2009-06-12 14:54 UTC (permalink / raw
  To: GNU Emacs List

Ed,

Thanks for distributing.

Everyone responding to this thread,

Please either CC me when posting about this issue or else edit the "To"
field so that your response comes to the whole list.  I'd like to get
everyone's input.  Thanks.

Lewis,

Thanks for posting.  It's lonely out there when you're the only one with
a particular problem.  To make sure we're suffering the same
cyber-indignity, here's the scenario as I see it (from an older version
of emacs running on Linux):

0) Some others and myself want to include some non-English characters in
a file being edited in emacs. Problems arise, however:

1) In a buffer which is already utf-8 encoded, I set the appropriate
input method, type in the desired characters. They display just peachy
and there is happiness in EmacsLand.

2) I save the buffer to a file, then close the buffer.

3) I visit the same file (i.e., load it again into emacs). Because it
has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
utf-8 encoded. This is confirmed by the presence of a 'u' as the second
character in the status bar.

4) The text in the buffer displays fine, except that in place of each of
those non-English characters is a little empty box. With the cursor on
one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
=", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
(While, in emacs the character after "Char:" is a little box, if I load
this same file into Firefox, that same character appears as it should,
as an 'a' with a horizontal bar above it. How it appears in your email
client will depend upon your email client.)

A) The fact that, as described in (4), the characters display correctly
in Firefox, but not in emacs indicates that emacs is not drawing on the
needed character set. Yet, the fact that in (1) the characters initially
display correctly (when first input) indicates that the needed character
set is present on the system and emacs can find it and has permission
access it. Further, we would think that emacs would throw out an error
message if either of these conditions were not met... and it doesn't. We
can only assume that, when visiting and then decoding a file and pulling
into a buffer for display, emacs is not even asking for the proper
character set when encountering a non-English character. This is where I
would start to look for the error.

B) It would be helpful if the code which does the decoding of a file and
renders it into the buffer display, if that part of it would throw an
error message when it encounters a character it doesn't know how to
display, i.e., when a little box character is displayed. After all,
isn't it an error when a little box is displayed in lieu of the correct
character? Possible error messages would be something like: "decoding
process can't find /path/to/charset.file" or "decoding process doesn't
have requisite permission to read /path/to/charset.file" or "invalid
character: [hex/decimal value]" or other.

On 06/10/2009 11:21 PM B. T. Raven wrote:
> Lewis Perin wrote:
>> I've been following this thread closely because I have the original
>> poster's problem, only the characters that give me trouble are some -
>> not many, actually - Chinese characters, e.g. ni3, the normal second
>> person pronoun.  And, as with the original poster, the troublesome
>> characters, when copied and pasted to other applications from Emacs,
>> display perfectly.
>>
>> "B. T. Raven" <nihil@nihilo.net> writes:
>>
>>> [...]
>>>    (set-language-environment               'UTF-8)
>>>          (set-default-coding-systems             'utf-8)
>>>          (setq file-name-coding-system           'utf-8)
>>>          (setq default-buffer-file-coding-system 'utf-8)
>>>          (setq coding-system-for-write           'utf-8)
>>>          (set-keyboard-coding-system             'utf-8)
>>>          (set-terminal-coding-system          'utf-8)
>>>          (set-clipboard-coding-system            'utf-8)
>>>          (set-selection-coding-system            'utf-8)
>>>          (prefer-coding-system                   'utf-8)
>>>          (modify-coding-system-alist 'process
>>> "[cC][mM][dD][pP][rR][oO][xX][yY]" 'utf-8-dos)
>>>
>>>
>>> and try C-x ret c utf-8
>>> C-x C-f
>>>
>>> to open the file.
>>
>> I tried this, but it didn't help.  Emacs 22.3 / Win32.
> 
> Even on Emacs 23 although I see the characters in the buffer, I can't
> save the following as utf-8:
> 
> nǐ hǎo 你 好
> u+4f60 and u+597d
> 
> Or at least not so as to be readable with 22.3. Both versions are using
> Arial Unicode MS.
> 
> Why is that?
> 
> 
>>
>> /Lew
>> ---
>> Lew Perin / perin@acm.org
>> http://www.panix.com/~perin/babelcarp.html

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
@ 2009-06-12 15:39                 ` Lewis Perin
  2009-06-12 16:48                   ` B. T. Raven
  2009-06-12 17:27                 ` Xah Lee
  1 sibling, 1 reply; 56+ messages in thread
From: Lewis Perin @ 2009-06-12 15:39 UTC (permalink / raw
  To: help-gnu-emacs

ken <gebser@mousecar.com> writes:

> [...]
> Lewis,
> 
> Thanks for posting.  It's lonely out there when you're the only one with
> a particular problem.

The few, the proud...

> To make sure we're suffering the same cyber-indignity, here's the
> scenario as I see it (from an older version of emacs running on
> Linux):
> 
> 0) Some others and myself want to include some non-English characters in
> a file being edited in emacs. Problems arise, however:
> 
> 1) In a buffer which is already utf-8 encoded, I set the appropriate
> input method, type in the desired characters. They display just peachy
> and there is happiness in EmacsLand.
> 
> 2) I save the buffer to a file, then close the buffer.
> 
> 3) I visit the same file (i.e., load it again into emacs). Because it
> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
> character in the status bar.

I haven't been inserting that special first line.

> 4) The text in the buffer displays fine, except that in place of each of
> those non-English characters is a little empty box. With the cursor on
> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
> (While, in emacs the character after "Char:" is a little box, if I load
> this same file into Firefox, that same character appears as it should,
> as an 'a' with a horizontal bar above it. How it appears in your email
> client will depend upon your email client.)

My situation differs in that most of the non-ASCII characters (Chinese
in my case) come through just fine.  But the ones that don't have
those irritating boxes in place of the correct glyphs.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 15:39                 ` Lewis Perin
@ 2009-06-12 16:48                   ` B. T. Raven
  2009-06-12 17:45                     ` Lewis Perin
  2009-06-12 17:53                     ` Xah Lee
  0 siblings, 2 replies; 56+ messages in thread
From: B. T. Raven @ 2009-06-12 16:48 UTC (permalink / raw
  To: help-gnu-emacs

Lewis Perin wrote:
> ken <gebser@mousecar.com> writes:
> 
>> [...]
>> Lewis,
>>
>> Thanks for posting.  It's lonely out there when you're the only one with
>> a particular problem.
> 
> The few, the proud...
> 
>> To make sure we're suffering the same cyber-indignity, here's the
>> scenario as I see it (from an older version of emacs running on
>> Linux):
>>
>> 0) Some others and myself want to include some non-English characters in
>> a file being edited in emacs. Problems arise, however:
>>
>> 1) In a buffer which is already utf-8 encoded, I set the appropriate
>> input method, type in the desired characters. They display just peachy
>> and there is happiness in EmacsLand.
>>
>> 2) I save the buffer to a file, then close the buffer.
>>
>> 3) I visit the same file (i.e., load it again into emacs). Because it
>> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
>> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
>> character in the status bar.
> 
> I haven't been inserting that special first line.
> 
>> 4) The text in the buffer displays fine, except that in place of each of
>> those non-English characters is a little empty box. With the cursor on
>> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
>> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
>> (While, in emacs the character after "Char:" is a little box, if I load
>> this same file into Firefox, that same character appears as it should,
>> as an 'a' with a horizontal bar above it. How it appears in your email
>> client will depend upon your email client.)
> 
> My situation differs in that most of the non-ASCII characters (Chinese
> in my case) come through just fine.  But the ones that don't have
> those irritating boxes in place of the correct glyphs.
> 
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://www.panix.com/~perin/babelcarp.html

I wouldn't be surprised if the gaps and overlaps in the CJK ranges of 
glyphs weren't so complicated that many characters from the following 
encodings may not be included in utf-8, especially if they are not 
precomposed. Try some of these encodings to see if some of the empty 
boxes are resolved into characters:

            chinese-big5
            chinese-hz
            chinese-iso-7bit
            chinese-iso-8bit
            chinese-iso-8bit-with-esc
            cn-big5
            cn-gb
            cn-gb-2312
            iso-2022-cjk
            iso-2022-cn
            iso-2022-cn-ext



Also it might help to install a fontset rather than depending on a 
single font to represent all these characters. Unfortunately I can't 
help with that. I am on w32 and I don't even know whether fontsets can 
be used in Emacs on that build.

Ed





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
  2009-06-12 15:39                 ` Lewis Perin
@ 2009-06-12 17:27                 ` Xah Lee
  2009-06-12 19:30                   ` Lewis Perin
                                     ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Xah Lee @ 2009-06-12 17:27 UTC (permalink / raw
  To: help-gnu-emacs

On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
> B) It would be helpful if the code which does the decoding of a file and
> renders it into the buffer display, if that part of it would throw an
> error message when it encounters a character it doesn't know how to
> display, i.e., when a little box character is displayed. After all,
> isn't it an error when a little box is displayed in lieu of the correct
> character? Possible error messages would be something like: "decoding
> process can't find /path/to/charset.file" or "decoding process doesn't
> have requisite permission to read /path/to/charset.file" or "invalid
> character: [hex/decimal value]" or other.

some thought process in the above is not correct.

In general, a program just read a text file as a byte stream, and
using a encoding scheme to interprete it, the program has little way
to determine if the encoding is correct. Theoretically, it could check
with command phrases but that is generally not done by the software we
use daily. (some program does scan text guess a encoding, but not
always correct)

here's some general technical issues and experiences about using
foreign chars:

• the software needs to know what encoding & char set is used in order
to interprete the binary stream. If you don't specifically set it,
typically it assumes ascii or some iso latin char set. (of software in
USA anyway)

• today's software generally don't contain any extra heuistics to
check if the encoding used is actually correct. There is no technical
way to check that in general. It can be only heuristics, i.e. guesses.
e.g. browsers will often guess when reading a page that doesn't have
encoding info.

• even when the encoding is correct, the software needs all the proper
fonts to display it. Or, rely on some font-replacement technology,
e.g. when it finds a char which the current font doesn't have, it uses
another font for that char. (in the case of Chinese, this often
results in ugly text of mixed char style, some appear thin, some
thick, some squarly (like sans-serif), some caligraphic, some
bitmapped) Windows OS and OS X both has font-replacement technology,
as well as all the major browsers for both os x and windows. This font
replacement technology, however, is not perfect. So, sometimes you'll
see squares or question marks here or there, especially on some chars
that's not widely used (e.g. math symbols in unicode, double right
arrow, tech symbols such as Apple's command key and option key, triple
asterisk, etc.).

• when writing a file, the software needs to use a encoding to write
it. Just like reading, if you havn't explicitly set it, typically it
uses ascii or some iso latin char set, in most western lang countries.

• when you use a software to open a text but with wrong encoding info,
the result is gibberish.

the above applies not just to emacs, but applies to all apps. Some
commentary are based on my experiences with browsers, web pages, word
processors, online forums, mailing list, email apps, instant messaging
chat apps, etc, on both mac and windows.

technically, the issues involved is char set, encoding, font. ( the
concept of char set and encoding are independent but is often mixed
together in a spec, esp earlier ones).

i use mixed chinese & english in single file often and in both mac os
x and windows. They work well. On the mac, my emacs is version 22.x.
On win, it is emacs23. My encoding in emacs is set to utf-8.

I've wrote a lot about these issues, the following docs might be
helpful.

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

• Unicode Characters Example
  http://xahlee.org/Periodic_dosage_dir/t1/20040505_unicode.html

• the Journey of a Foreign Character thru Internet
  http://xahlee.org/Periodic_dosage_dir/t2/non-ascii_journey.html

• Converting a File's Encoding with Python
  http://xahlee.org/perl-python/charset_encoding.html

• Character Sets and Encoding in HTML
  http://xahlee.org/js/html_chars.html

• The Complexity And Tedium of Software Engineering (parts about
unicode problem with unison and emacs)
  http://xahlee.org/UnixResource_dir/writ/programer_frustration.html

• Mac and Windows File Conversion (parts about unicode filename
issues)
  http://xahlee.org/mswin/mac_windows_file_conv.html

• Windows Font and Unicode
  http://xahlee.org/mswin/windows_font_unicode.html

the above article contain tens of links to Wikipedia in appropriate
places. Wikipedia has massive info in digestable form about these
issues, one can spend a month on the above foreign char issues ...

for some examples of mixed chinese & english text i work with, see:

• Chinese Core Simplified Chars
  http://xahlee.org/lojban/simplified_chars.html

• Ethology, Ethnology, and Lyrics
  http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html

  Xah
∑ http://xahlee.org/

☄

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 16:48                   ` B. T. Raven
@ 2009-06-12 17:45                     ` Lewis Perin
  2009-06-12 17:53                     ` Xah Lee
  1 sibling, 0 replies; 56+ messages in thread
From: Lewis Perin @ 2009-06-12 17:45 UTC (permalink / raw
  To: help-gnu-emacs

"B. T. Raven" <nihil@nihilo.net> writes:

> Lewis Perin wrote:
> > ken <gebser@mousecar.com> writes:
> >
> >> [...]
> >> Lewis,
> >>
> >> Thanks for posting.  It's lonely out there when you're the only one with
> >> a particular problem.
> > The few, the proud...
> >
> >> To make sure we're suffering the same cyber-indignity, here's the
> >> scenario as I see it (from an older version of emacs running on
> >> Linux):
> >>
> >> 0) Some others and myself want to include some non-English characters in
> >> a file being edited in emacs. Problems arise, however:
> >>
> >> 1) In a buffer which is already utf-8 encoded, I set the appropriate
> >> input method, type in the desired characters. They display just peachy
> >> and there is happiness in EmacsLand.
> >>
> >> 2) I save the buffer to a file, then close the buffer.
> >>
> >> 3) I visit the same file (i.e., load it again into emacs). Because it
> >> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
> >> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
> >> character in the status bar.
> > I haven't been inserting that special first line.
> >
> >> 4) The text in the buffer displays fine, except that in place of each of
> >> those non-English characters is a little empty box. With the cursor on
> >> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
> >> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
> >> (While, in emacs the character after "Char:" is a little box, if I load
> >> this same file into Firefox, that same character appears as it should,
> >> as an 'a' with a horizontal bar above it. How it appears in your email
> >> client will depend upon your email client.)
> > My situation differs in that most of the non-ASCII characters
> > (Chinese
> > in my case) come through just fine.  But the ones that don't have
> > those irritating boxes in place of the correct glyphs.
> 
> I wouldn't be surprised if the gaps and overlaps in the CJK ranges of
> glyphs weren't so complicated that many characters from the following
> encodings may not be included in utf-8,

Sorry, I'm not sure what you mean by "may not be included in utf-8":
do you mean utf-8 the standard, or do you mean Emacs's implementation
of it?  The characters I'm talking about are definitely in Unicode.

> especially if they are not precomposed.

This I don't really understand, either, I'm afraid.  Might this
explain why I can see the glyph for ni3 when I'm composing Chinese in
Emacs using the chinese-tonepy-punct input method but can't see it
when the saved file is read by Emacs?

> Try some of these encodings to see if some of the empty boxes are
> resolved into characters:
> [...]
>             cn-gb-2312

I created a little file with my bête noire character using that
encoding and saved it.  Reverting the file with that encoding, I did
see all the characters.
 
> Also it might help to install a fontset rather than depending on a
> single font to represent all these characters. Unfortunately I can't
> help with that. I am on w32 and I don't even know whether fontsets can
> be used in Emacs on that build.

Windows R Us, too.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 16:48                   ` B. T. Raven
  2009-06-12 17:45                     ` Lewis Perin
@ 2009-06-12 17:53                     ` Xah Lee
  2009-06-12 20:59                       ` Lennart Borgman
                                         ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Xah Lee @ 2009-06-12 17:53 UTC (permalink / raw
  To: help-gnu-emacs

On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
> B) It would be helpful if the code which does the decoding of a file and
> renders it into the buffer display, if that part of it would throw an
> error message when it encounters a character it doesn't know how to
> display, i.e., when a little box character is displayed. After all,
> isn't it an error when a little box is displayed in lieu of the correct
> character? Possible error messages would be something like: "decoding
> process can't find /path/to/charset.file" or "decoding process doesn't
> have requisite permission to read /path/to/charset.file" or "invalid
> character: [hex/decimal value]" or other.

some thought process in the above is not correct.

In general, a program just read a text file as a byte stream, and
using a encoding scheme to interpret it, the program has little way to
determine if the encoding is correct. Theoretically, it could check
with common phrases but that is generally not done by the software we
use daily. (some program does scan text guess a encoding, but not
always correct)

here's some general technical issues and experiences about using
foreign chars:

• the software needs to know what encoding & char set is used in order
to interpret the binary stream. If you don't specifically set it,
typically it assumes ascii or some iso latin char set. (of software in
USA anyway)

• today's software generally don't contain any extra heuristics to
check if the encoding used is actually correct. There is no technical
way to check that in general. It can be only heuristics, i.e. guesses.
e.g. browsers will often guess when reading a page that doesn't have
encoding info.

• even when the encoding is correct, the software needs all the proper
fonts to display it. Or, rely on some font-replacement technology,
e.g. when it finds a char which the current font doesn't have, it uses
another font for that char. (in the case of Chinese, this often
results in ugly text of mixed char style, some appear thin, some
thick, some squarely (like sans-serif), some calligraphic, some bit-
mapped) Windows OS and OS X both has font-replacement technology, as
well as all the major browsers for both os x and windows. This font
replacement technology, however, is not perfect. So, sometimes you'll
see squares or question marks here or there, especially on some chars
that's not widely used (e.g. math symbols in unicode, double right
arrow, tech symbols such as Apple's command key and option key, triple
asterisk, etc.).

• when writing a file, the software needs to use a encoding to write
it. Just like reading, if you haven't explicitly set it, typically it
uses ascii or some iso latin char set, in most western lang countries.

• when you use a software to open a text but with wrong encoding info,
the result is gibberish.

the above applies not just to emacs, but applies to all apps. Some
commentary are based on my experiences with browsers, web pages, word
processors, online forums, mailing list, email apps, instant messaging
chat apps, etc, on both mac and windows.

technically, the issues involved is char set, encoding, font. ( the
concept of char set and encoding are independent but is often mixed
together in a spec, esp earlier ones).

i use mixed chinese & english in single file often and in both mac os
x and windows. They work well. On the mac, my emacs is version 22.x.
On win, it is emacs23. My encoding in emacs is set to utf-8.

I've wrote a lot about these issues, the following docs might be
helpful.

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

• Unicode Characters Example
  http://xahlee.org/Periodic_dosage_dir/t1/20040505_unicode.html

• the Journey of a Foreign Character thru Internet
  http://xahlee.org/Periodic_dosage_dir/t2/non-ascii_journey.html

• Converting a File's Encoding with Python
  http://xahlee.org/perl-python/charset_encoding.html

• Character Sets and Encoding in HTML
  http://xahlee.org/js/html_chars.html

• The Complexity And Tedium of Software Engineering (parts about
unicode problem with unison and emacs)
  http://xahlee.org/UnixResource_dir/writ/programer_frustration.html

• Mac and Windows File Conversion (parts about unicode filename
issues)
  http://xahlee.org/mswin/mac_windows_file_conv.html

• Windows Font and Unicode
  http://xahlee.org/mswin/windows_font_unicode.html

the above article contain tens of links to Wikipedia in appropriate
places. Wikipedia has massive info in digestible form about these
issues, one can spend a month on the above foreign char issues ...

for some examples of mixed chinese & english text i work with, see:

• Chinese Core Simplified Chars
  http://xahlee.org/lojban/simplified_chars.html

• Ethology, Ethnology, and Lyrics
  http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html

  Xah
∑ http://xahlee.org/

☄

On Jun 12, 9:48 am, "B. T. Raven" <ni...@nihilo.net> wrote:

> I wouldn't be surprised if the gaps and overlaps in the CJK ranges of
> glyphs weren't so complicated that many characters from the following
> encodings may not be included in utf-8, especially if they are not
> precomposed. Try some of these encodings to see if some of the empty
> boxes are resolved into characters:
>
>             chinese-big5
>             chinese-hz
>             chinese-iso-7bit
>             chinese-iso-8bit
>             chinese-iso-8bit-with-esc
>             cn-big5
>             cn-gb
>             cn-gb-2312
>             iso-2022-cjk
>             iso-2022-cn
>             iso-2022-cn-ext

most chinese encodings are subset or identical to unicode's charset.

In particular, the current, mostly widely used chinese charset the GB
18030, actually is just unicode.

see http://en.wikipedia.org/wiki/GB_18030

Note also, that means china's GB 18030 contain the entirely of
traditional chars in unicode too. (though, i don't know about how big5
relates to unicode )

the list you gave above is from emacs? emacs's list always seems
strange to me... haven't really looked into it. maybe emacs's list is
really encompassing of all encoding that've existed, but it also could
be just screwed up like many open source things. For example, it
invents its own names by mixing up char set encoding with concepts of
EOL convention.

btw, who actually coded the low down levels of char encoding in emacs?
e.g. especially unicode, since it came after richard stallman still
doing the bulk of emacs. That person should be admirable. lol.

  Xah
∑ http://xahlee.org/

☄

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:27                 ` Xah Lee
@ 2009-06-12 19:30                   ` Lewis Perin
  2009-06-12 19:43                     ` Xah Lee
  2009-06-12 20:56                   ` B. T. Raven
  2009-06-13 20:35                   ` Lewis Perin
  2 siblings, 1 reply; 56+ messages in thread
From: Lewis Perin @ 2009-06-12 19:30 UTC (permalink / raw
  To: help-gnu-emacs

Xah Lee <xahlee@gmail.com> writes:

> [...]
> i use mixed chinese & english in single file often and in both mac os
> x and windows. They work well. On the mac, my emacs is version 22.x.
> On win, it is emacs23. My encoding in emacs is set to utf-8.
> 
> I've wrote a lot about these issues, the following docs might be
> helpful.
> [...]

I'll assume you have no trouble with ni3, the normal second person
pronoun, and have a look at your collected works.  Thanks!

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 19:30                   ` Lewis Perin
@ 2009-06-12 19:43                     ` Xah Lee
  0 siblings, 0 replies; 56+ messages in thread
From: Xah Lee @ 2009-06-12 19:43 UTC (permalink / raw
  To: help-gnu-emacs

On Jun 12, 12:30 pm, Lewis Perin <pe...@panix.com> wrote:
> I'll assume you have no trouble with ni3, the normal second person
> pronoun, and have a look at your collected works.  Thanks!

yeah, no prob with ni3 hao3 你好. This is written in emacs then pasted
to google groups.

 Xah


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:27                 ` Xah Lee
  2009-06-12 19:30                   ` Lewis Perin
@ 2009-06-12 20:56                   ` B. T. Raven
  2009-06-13 16:16                     ` Xah Lee
  2009-06-13 20:35                   ` Lewis Perin
  2 siblings, 1 reply; 56+ messages in thread
From: B. T. Raven @ 2009-06-12 20:56 UTC (permalink / raw
  To: help-gnu-emacs

Xah Lee wrote:
> On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
>> B) It would be helpful if the code which does the decoding of a file and
>> renders it into the buffer display, if that part of it would throw an
>> error message when it encounters a character it doesn't know how to
>> display, i.e., when a little box character is displayed. After all,
>> isn't it an error when a little box is displayed in lieu of the correct
>> character? Possible error messages would be something like: "decoding
>> process can't find /path/to/charset.file" or "decoding process doesn't
>> have requisite permission to read /path/to/charset.file" or "invalid
>> character: [hex/decimal value]" or other.
> 
> some thought process in the above is not correct.
> 
> In general, a program just read a text file as a byte stream, and
> using a encoding scheme to interprete it, the program has little way
> to determine if the encoding is correct. Theoretically, it could check
> with command phrases but that is generally not done by the software we
> use daily. (some program does scan text guess a encoding, but not
> always correct)
> 
> here's some general technical issues and experiences about using
> foreign chars:
> 
> • the software needs to know what encoding & char set is used in order
> to interprete the binary stream. If you don't specifically set it,
> typically it assumes ascii or some iso latin char set. (of software in
> USA anyway)
> 
> • today's software generally don't contain any extra heuistics to
> check if the encoding used is actually correct. There is no technical
> way to check that in general. It can be only heuristics, i.e. guesses.
> e.g. browsers will often guess when reading a page that doesn't have
> encoding info.
> 
> • even when the encoding is correct, the software needs all the proper
> fonts to display it. Or, rely on some font-replacement technology,
> e.g. when it finds a char which the current font doesn't have, it uses
> another font for that char. (in the case of Chinese, this often
> results in ugly text of mixed char style, some appear thin, some
> thick, some squarly (like sans-serif), some caligraphic, some
> bitmapped) Windows OS and OS X both has font-replacement technology,
> as well as all the major browsers for both os x and windows. This font
> replacement technology, however, is not perfect. So, sometimes you'll
> see squares or question marks here or there, especially on some chars
> that's not widely used (e.g. math symbols in unicode, double right
> arrow, tech symbols such as Apple's command key and option key, triple
> asterisk, etc.).
> 
> • when writing a file, the software needs to use a encoding to write
> it. Just like reading, if you havn't explicitly set it, typically it
> uses ascii or some iso latin char set, in most western lang countries.
> 
> • when you use a software to open a text but with wrong encoding info,
> the result is gibberish.
> 
> the above applies not just to emacs, but applies to all apps. Some
> commentary are based on my experiences with browsers, web pages, word
> processors, online forums, mailing list, email apps, instant messaging
> chat apps, etc, on both mac and windows.
> 
> technically, the issues involved is char set, encoding, font. ( the
> concept of char set and encoding are independent but is often mixed
> together in a spec, esp earlier ones).
> 
> i use mixed chinese & english in single file often and in both mac os
> x and windows. They work well. On the mac, my emacs is version 22.x.
> On win, it is emacs23. My encoding in emacs is set to utf-8.
> 
> I've wrote a lot about these issues, the following docs might be
> helpful.
> 
> • Emacs and Unicode Tips
>   http://xahlee.org/emacs/emacs_n_unicode.html
> 
> • Unicode Characters Example
>   http://xahlee.org/Periodic_dosage_dir/t1/20040505_unicode.html
> 
> • the Journey of a Foreign Character thru Internet
>   http://xahlee.org/Periodic_dosage_dir/t2/non-ascii_journey.html
> 
> • Converting a File's Encoding with Python
>   http://xahlee.org/perl-python/charset_encoding.html
> 
> • Character Sets and Encoding in HTML
>   http://xahlee.org/js/html_chars.html
> 
> • The Complexity And Tedium of Software Engineering (parts about
> unicode problem with unison and emacs)
>   http://xahlee.org/UnixResource_dir/writ/programer_frustration.html
> 
> • Mac and Windows File Conversion (parts about unicode filename
> issues)
>   http://xahlee.org/mswin/mac_windows_file_conv.html
> 
> • Windows Font and Unicode
>   http://xahlee.org/mswin/windows_font_unicode.html
> 
> the above article contain tens of links to Wikipedia in appropriate
> places. Wikipedia has massive info in digestable form about these
> issues, one can spend a month on the above foreign char issues ...
> 
> for some examples of mixed chinese & english text i work with, see:
> 
> • Chinese Core Simplified Chars
>   http://xahlee.org/lojban/simplified_chars.html
> 
> • Ethology, Ethnology, and Lyrics
>   http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html
> 
>   Xah
> ∑ http://xahlee.org/
> 
> ☄

Totally OT but prima facie the mosting interesting title is the last. 
Unfortunately I couldn't grok what ethology (the "anthropology" of 
animals)had to do with it unless the critters that emit "The Masochistic 
Cries of Lovelorn Females" are to be considered as less than human. I 
notice that Salt-n-Pepa's sweet little ditty (Don't want no S.D.M.) is 
missing from the list, but maybe that's more sadistic than masochistic; 
maybe it belongs in the Quagmire. ;-) Sexology is a bona fide area of 
inquiry pioneered by Kinsey et al. but sexualogy is not an English word 
nor (I keep my fingers crossed) will it ever become one.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:53                     ` Xah Lee
@ 2009-06-12 20:59                       ` Lennart Borgman
  2009-06-12 22:23                       ` ken
       [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 56+ messages in thread
From: Lennart Borgman @ 2009-06-12 20:59 UTC (permalink / raw
  To: Xah Lee; +Cc: help-gnu-emacs

On Fri, Jun 12, 2009 at 7:53 PM, Xah Lee<xahlee@gmail.com> wrote:
> the list you gave above is from emacs? emacs's list always seems
> strange to me... haven't really looked into it. maybe emacs's list is
> really encompassing of all encoding that've existed, but it also could
> be just screwed up like many open source things.

I do not know these things, but from the discussions on Emacs devel it
looks like those coding it in Emacs knows it very well.


> For example, it
> invents its own names by mixing up char set encoding with concepts of
> EOL convention.

It is a technical consideration. Hopefully it does not confuse anyone.


> btw, who actually coded the low down levels of char encoding in emacs?
> e.g. especially unicode, since it came after richard stallman still
> doing the bulk of emacs. That person should be admirable. lol.

Please look in the change log files.  (I think you need to check out
the sources to see those. Or look in the web interface for dito of
course.)




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:53                     ` Xah Lee
  2009-06-12 20:59                       ` Lennart Borgman
@ 2009-06-12 22:23                       ` ken
  2009-06-12 22:27                         ` Lennart Borgman
       [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 56+ messages in thread
From: ken @ 2009-06-12 22:23 UTC (permalink / raw
  To: GNU Emacs List

On 06/12/2009 01:53 PM Xah Lee wrote:
> On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
>> B) It would be helpful if the code which does the decoding of a file and
>> renders it into the buffer display, if that part of it would throw an
>> error message when it encounters a character it doesn't know how to
>> display, i.e., when a little box character is displayed. After all,
>> isn't it an error when a little box is displayed in lieu of the correct
>> character? Possible error messages would be something like: "decoding
>> process can't find /path/to/charset.file" or "decoding process doesn't
>> have requisite permission to read /path/to/charset.file" or "invalid
>> character: [hex/decimal value]" or other.
> 
> some thought process in the above is not correct.

Yet emacs puts a little box in the place of a character it cannot find
(or, per your explanation) possibly confused about.  The fact remains
that the little box is not a correct rendering of the code.  It is an
error... at least it is for me, because that's not what I typed in.  So
it is an error.  As an error, there should be a corresponding error
message, hopefully one (or more) which would help diagnose the problem.
 It seems obvious that, given the long thread on this issue with no
resolution, we could use some help-- like an error message-- which would
help in diagnosis.

Thanks for the information and the links though.

> 
> ....

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 22:23                       ` ken
@ 2009-06-12 22:27                         ` Lennart Borgman
  2009-06-12 23:38                           ` ken
  2009-06-13  1:36                           ` Miles Bader
  0 siblings, 2 replies; 56+ messages in thread
From: Lennart Borgman @ 2009-06-12 22:27 UTC (permalink / raw
  To: gebser, Emacs-Devel devel

Ken, I think this is a good idea so I have sent this along to Emacs devel.

On Sat, Jun 13, 2009 at 12:23 AM, ken<gebser@mousecar.com> wrote:
> Yet emacs puts a little box in the place of a character it cannot find
> (or, per your explanation) possibly confused about.  The fact remains
> that the little box is not a correct rendering of the code.  It is an
> error... at least it is for me, because that's not what I typed in.  So
> it is an error.  As an error, there should be a corresponding error
> message, hopefully one (or more) which would help diagnose the problem.
>  It seems obvious that, given the long thread on this issue with no
> resolution, we could use some help-- like an error message-- which would
> help in diagnosis.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 22:27                         ` Lennart Borgman
@ 2009-06-12 23:38                           ` ken
  2009-06-13  4:11                             ` Eli Zaretskii
  2009-06-14 20:59                             ` Stefan Monnier
  2009-06-13  1:36                           ` Miles Bader
  1 sibling, 2 replies; 56+ messages in thread
From: ken @ 2009-06-12 23:38 UTC (permalink / raw
  To: Lennart Borgman; +Cc: Emacs-Devel devel

On 06/12/2009 06:27 PM Lennart Borgman wrote:
> Ken, I think this is a good idea so I have sent this along to Emacs devel.
> 
> On Sat, Jun 13, 2009 at 12:23 AM, ken<gebser@mousecar.com> wrote:
>> Yet emacs puts a little box in the place of a character it cannot find
>> (or, per your explanation) possibly confused about.  The fact remains
>> that the little box is not a correct rendering of the code.  It is an
>> error... at least it is for me, because that's not what I typed in.  So
>> it is an error.  As an error, there should be a corresponding error
>> message, hopefully one (or more) which would help diagnose the problem.
>>  It seems obvious that, given the long thread on this issue with no
>> resolution, we could use some help-- like an error message-- which would
>> help in diagnosis.

Thank you, Lennart!  To give the people at emacs-devel some context to
the issue, the salient portion of the previous post is pasted below:

0) Some others and myself want to include some non-English characters in
a file being edited in emacs. Problems arise, however:

1) In a buffer which is already utf-8 encoded, I set the appropriate
input method, type in the desired characters. They display just peachy
and there is happiness in EmacsLand.

2) I save the buffer to a file, then close the buffer.

3) I visit the same file (i.e., load it again into emacs). Because it
has  as the first line, it opens
utf-8 encoded. This is confirmed by the presence of a 'u' as the second
character in the status bar.

4) The text in the buffer displays fine, except that in place of each of
those non-English characters is a little empty box. With the cursor on
one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
=", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".
(While, in emacs the character after "Char:" is a little box, if I load
this same file into Firefox, that same character appears as it should,
as an 'a' with a horizontal bar above it. How it appears in your email
client will depend upon your email client.)

A) The fact that, as described in (4), the characters display correctly
in Firefox, but not in emacs indicates that emacs is not drawing on the
needed character set. Yet, the fact that in (1) the characters initially
display correctly (when first input) indicates that the needed character
set is present on the system and emacs can find it and has permission
access it. Further, we would think that emacs would throw out an error
message if either of these conditions were not met... and it doesn't. We
can only assume that, when visiting and then decoding a file and pulling
into a buffer for display, emacs is not even asking for the proper
character set when encountering a non-English character. This is where I
would start to look for the error.

B) It would be helpful if the code which does the decoding of a file and
renders it into the buffer display, if that part of it would throw an
error message when it encounters a character it doesn't know how to
display, i.e., when a little box character is displayed. After all,
isn't it an error when a little box is displayed in lieu of the correct
character? Possible error messages would be something like: "decoding
process can't find /path/to/charset.file" or "decoding process doesn't
have requisite permission to read /path/to/charset.file" or "invalid
character: [hex/decimal value]" or other.

###

Thanks much,
ken

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
       [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
@ 2009-06-13  0:35                         ` Xah Lee
  0 siblings, 0 replies; 56+ messages in thread
From: Xah Lee @ 2009-06-13  0:35 UTC (permalink / raw
  To: help-gnu-emacs

On Jun 12, 3:23 pm, ken <geb...@mousecar.com> wrote:
> On 06/12/2009 01:53 PM Xah Lee wrote:
>
> > On Jun 12, 7:54 am, ken <geb...@mousecar.com> wrote:
> >> B) It would be helpful if the code which does the decoding of a file and
> >> renders it into the buffer display, if that part of it would throw an
> >> error message when it encounters a character it doesn't know how to
> >> display, i.e., when a little box character is displayed. After all,
> >> isn't it an error when a little box is displayed in lieu of the correct
> >> character? Possible error messages would be something like: "decoding
> >> process can't find /path/to/charset.file" or "decoding process doesn't
> >> have requisite permission to read /path/to/charset.file" or "invalid
> >> character: [hex/decimal value]" or other.
>
> > some thought process in the above is not correct.
>
> Yet emacs puts a little box in the place of a character it cannot find
> (or, per your explanation) possibly confused about.  The fact remains
> that the little box is not a correct rendering of the code.  It is an
> error... at least it is for me, because that's not what I typed in.  So
> it is an error.  As an error, there should be a corresponding error
> message, hopefully one (or more) which would help diagnose the problem.
>  It seems obvious that, given the long thread on this issue with no
> resolution, we could use some help-- like an error message-- which would
> help in diagnosis.
>
> Thanks for the information and the links though.

i think displaying a error for each char that emacs cannot find a font
for is just not feasible. The app can't know whether it used the right
encoding. And even if the encoding used is correct, it can't deal with
possible missing fonts in some of the characters in the char set.

i don't have experience in this, but imagine, when a app gets a byte
stream, and with a given charset/encoding. With that, it can decode
byte length to map to the code points in the char set. (e.g. utf-8,
utf-16, both don't have fixed byte-length for chars) After that done,
you get a sequence of a code points (i.e. a sequence of integers). At
this point, given a integer, you need to map this integere to a
character in a font. There are many issues here... a font i guess is a
set of glyphs... ultimately a set of integers. I'm not sure what sort
of spec or standard specifies what each integer means (i.e. support
your app now has a integer that represents B. Now suppose your app is
set to use font Aria. Now, Aria is a set of integers, but by what
standard that says what integer is B?)... Part of this step is what
happens when Aria don't have that character. (i'm guessing a font also
has data about what character set it contains...)
But in anycase, finally we'll have a B from font Arial. Then it goes
thru the whole display process...

 overall i think the technology we have today that actually display
fonts and unicode text etc are extremely complex, not to mention
vector based fonts and anti-aliasing and font-substitution etc techs.

some interesting read here:

http://en.wikipedia.org/wiki/Computer_font
http://en.wikipedia.org/wiki/Anti-aliasing
http://en.wikipedia.org/wiki/Font_rasterization
http://en.wikipedia.org/wiki/Subpixel_rendering
http://en.wikipedia.org/wiki/Font-substitution

for most modern apps, like browsers, i think they all call OS's APIs
to handle it. Some glimps over emacs dev list seems to suggest that
emacs implements its own display system... on one hand it's bad
because emacs misses out using all modern techs developed in 2 decades
by Apple or Adobe or Microsoft, or some Open Source's work, on the
other hand it is admirable in that it does it on its own...

sorry am rambling a bit. You are right that the bottom line is that
some things just rendered as squares and is a problem. Though, i
wanted to say that my point was that it is unfeasible to issue a error
for missing fonts or miss-interpretation of the encodings. Part of
this is because theoretically there's no way to know that encoding
chosen is correct. Part is because in practice missing font or bad
chosen encoding is very common. If we all stick with ascii, everything
is pretty good. If we stick to western langs, things are still not too
bad. But once you have chinese, japanese, korean alphabets, or the
ocational use of the many math symbols and greek letters, or adding
cyrillic/russian alphabets or arabian alphabets ... the chances of
missing font or missing encoding info is very high.

i think a large part of the problem is that char set and encoding info
is not part of the file. Things are getting better in the past decade
with mime type and unicode standard. But give a byte stream, after
being lucky of able to know it is text, there's still little way to
know how to interpret it. The char set and encoding meta data often
gets lost, implementation are often not robust, font for multi-lang
usually are not there, and font-substitution tech just started.
(according to Wikipedia, IE before 7 does not even have font
substitution (which means, you really need such beast as “unicode
font”, namely a font that contains some tens or hundreds thousands of
glyphs))

i think all these issue only started to get addressed in the past
decade since the globalization partly due to internet. Before, English
speakers just stick with ascii and that's pretty sufficient. Each
western lang region stick with their particular encoding for a few
special chars in their alphabet. Only when things started to mix they
get more complex, and now with Chinese & japanese etc. With unicode,
the use of math symbols also becomes more common. Before that, it's
just ascii markup...

speaking of this. Emacs and FSF docs still stick with 1980s's `quote
hack', and arrows like this ->  => ... very extremely stupid. Of
course i filed polite bug reports, and have argued here too heated,
but basically fallen to no ears. Somethings just is impossible to
progress in the FSF world.

  Xah
∑ http://xahlee.org/

☄

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 22:27                         ` Lennart Borgman
  2009-06-12 23:38                           ` ken
@ 2009-06-13  1:36                           ` Miles Bader
  2009-06-13  1:43                             ` Lennart Borgman
  2009-06-13  5:50                             ` Richard Stallman
  1 sibling, 2 replies; 56+ messages in thread
From: Miles Bader @ 2009-06-13  1:36 UTC (permalink / raw
  To: emacs-devel

Lennart Borgman <lennart.borgman@gmail.com> writes:
> Ken, I think this is a good idea so I have sent this along to Emacs devel.
>
>> Yet emacs puts a little box in the place of a character it cannot find
>> (or, per your explanation) possibly confused about.  The fact remains
>> that the little box is not a correct rendering of the code.  It is an
>> error... at least it is for me, because that's not what I typed in.  So
>> it is an error.  As an error, there should be a corresponding error
>> message, hopefully one (or more) which would help diagnose the problem.

An "error message" _when_?  Whether a character is displayable or not
isn't known until display time, and error messages for display issues
are generally a very bad idea.  For some display errors, the display
code will put messages in the *Messages* buffer (though they aren't
displayed to the user), but even that must be done with careful
consideration; currently they typically indicate that something
seriously screwy is going on (and a non-displayable character, however
annoying, isn't "seriously screwy").

If an "error message" were displayed, what would it say?  The only thing
I can think of is "no font could be found to display character FOO", but
that fact is already obvious from the little box.  Given no extra
detail, is a message even useful?

Maybe some sort of _once-only_ pop-up buffer note to the user saying
"little boxes indicate characters for which a font could not found; see
info manual section X.Y for details on blah blah"?  I suppose that could
help some people, but such a thing, even if once-only, would probably be
pretty annoying to non-complete-noob users, so ...

-Miles

-- 
[|nurgle|]  ddt- demonic? so quake will have an evil kinda setting? one that
            will  make every christian in the world foamm at the mouth?
[iddt]      nurg, that's the goal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13  1:36                           ` Miles Bader
@ 2009-06-13  1:43                             ` Lennart Borgman
  2009-06-13  5:50                             ` Richard Stallman
  1 sibling, 0 replies; 56+ messages in thread
From: Lennart Borgman @ 2009-06-13  1:43 UTC (permalink / raw
  To: Miles Bader; +Cc: emacs-devel

On Sat, Jun 13, 2009 at 3:36 AM, Miles Bader<miles@gnu.org> wrote:
> Lennart Borgman <lennart.borgman@gmail.com> writes:
>> Ken, I think this is a good idea so I have sent this along to Emacs devel.
>>
>>> Yet emacs puts a little box in the place of a character it cannot find
>>> (or, per your explanation) possibly confused about.  The fact remains
>>> that the little box is not a correct rendering of the code.  It is an
>>> error... at least it is for me, because that's not what I typed in.  So
>>> it is an error.  As an error, there should be a corresponding error
>>> message, hopefully one (or more) which would help diagnose the problem.
>
> An "error message" _when_?  Whether a character is displayable or not
> isn't known until display time, and error messages for display issues
> are generally a very bad idea.  For some display errors, the display
> code will put messages in the *Messages* buffer (though they aren't
> displayed to the user), but even that must be done with careful
> consideration; currently they typically indicate that something
> seriously screwy is going on (and a non-displayable character, however
> annoying, isn't "seriously screwy").
>
> If an "error message" were displayed, what would it say?  The only thing
> I can think of is "no font could be found to display character FOO", but
> that fact is already obvious from the little box.  Given no extra
> detail, is a message even useful?
>
> Maybe some sort of _once-only_ pop-up buffer note to the user saying
> "little boxes indicate characters for which a font could not found; see
> info manual section X.Y for details on blah blah"?  I suppose that could
> help some people, but such a thing, even if once-only, would probably be
> pretty annoying to non-complete-noob users, so ...

Yes, I know you have to be careful with error messages in a case like
this, but giving some type of information (like the one you suggested)
the first time in an emacs session when such a little box is shown was
what I had in mind.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 14:54               ` ken
@ 2009-06-13  3:30                 ` Eli Zaretskii
  0 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2009-06-13  3:30 UTC (permalink / raw
  To: help-gnu-emacs

> Date: Fri, 12 Jun 2009 10:54:23 -0400
> From: ken <gebser@mousecar.com>
> 1) In a buffer which is already utf-8 encoded, I set the appropriate
> input method, type in the desired characters. They display just peachy
> and there is happiness in EmacsLand.
> 
> 2) I save the buffer to a file, then close the buffer.
> 
> 3) I visit the same file (i.e., load it again into emacs). Because it
> has &lt;!-- -*- coding: utf-8; -*- --&gt; as the first line, it opens
> utf-8 encoded. This is confirmed by the presence of a 'u' as the second
> character in the status bar.
> 
> 4) The text in the buffer displays fine, except that in place of each of
> those non-English characters is a little empty box. With the cursor on
> one of those boxes, an 'a' with a horizontal bar above it, doing "C-x
> =", emacs returns "Char: ā (01210041, 331809, 0x51021, file ...)".

Please post here the full output of "C-u C-x =" (from a buffer popped
up by Emacs) for these characters, both when you type them using the
appropriate input method and they are displayed correctly (as in 1)
above), and when you see them as empty boxes after revisiting the
file.  The differences between these two cases should give you a hint
what is wrong; if not, someone else here might have ideas.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 23:38                           ` ken
@ 2009-06-13  4:11                             ` Eli Zaretskii
  2009-06-13 12:30                               ` ken
  2009-06-14 20:59                             ` Stefan Monnier
  1 sibling, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2009-06-13  4:11 UTC (permalink / raw
  To: gebser; +Cc: lennart.borgman, emacs-devel

> Date: Fri, 12 Jun 2009 19:38:30 -0400
> From: ken <gebser@mousecar.com>
> Cc: Emacs-Devel devel <emacs-devel@gnu.org>
> Reply-To: gebser@mousecar.com
> 
> Thank you, Lennart!  To give the people at emacs-devel some context to
> the issue, the salient portion of the previous post is pasted below:

Please provide the output of "C-u C-x =" on these characters, both
when they are displayed correctly and when they are displayed as empty
boxes.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13  1:36                           ` Miles Bader
  2009-06-13  1:43                             ` Lennart Borgman
@ 2009-06-13  5:50                             ` Richard Stallman
  2009-06-15  4:34                               ` Miles Bader
  2009-06-15 20:06                               ` Chong Yidong
  1 sibling, 2 replies; 56+ messages in thread
From: Richard Stallman @ 2009-06-13  5:50 UTC (permalink / raw
  To: Miles Bader; +Cc: emacs-devel

Would it be possible to display the codepoint numerically in the box?




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13  4:11                             ` Eli Zaretskii
@ 2009-06-13 12:30                               ` ken
  2009-06-13 13:23                                 ` Eli Zaretskii
  0 siblings, 1 reply; 56+ messages in thread
From: ken @ 2009-06-13 12:30 UTC (permalink / raw
  To: Eli Zaretskii, GNU Emacs List, emacs-devel

On 06/13/2009 12:11 AM Eli Zaretskii wrote:
>> ....
> 
> Please provide the output of "C-u C-x =" on these characters, both
> when they are displayed correctly and when they are displayed as empty
> boxes.

In a similar post on the same thread Eli Zaretskii wrote:
> Please post here the full output of "C-u C-x =" (from a buffer popped
> up by Emacs) for these characters, both when you type them using the
> appropriate input method and they are displayed correctly (as in 1)
> above), and when you see them as empty boxes after revisiting the
> file.  The differences between these two cases should give you a hint
> what is wrong; if not, someone else here might have ideas.

Eli, thanks for your response.  Here it is:

^[$-1 ¡ is 'a' with a horizontal bar over it.  On first inputting it
(after doing "set-input-method latin-4-postfix" and before changing the
input method to anything else), it appears correctly and "C-u C-x =" yields:

=============================================

  character: ^[$-1 ¡ (05140, 2656, 0xa60)
    charset: latin-iso8859-4
	     (Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110)
 code point: 96
     syntax: word
   category: l:Latin
buffer code: 0x84 0xE0
  file code: 0xC4 0x81 (encoded by coding system mule-utf-8-unix)
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-4

=============================================

When I reload the file (revisit the file), the same character is
replaced with a little box.  Doing "C-u C-x =" here yields:

=============================================

  character: ^[$-1 ¡ (01210041, 331809, 0x51021)
    charset: mule-unicode-0100-24ff
	     (Unicode characters of the range U+0100..U+24FF.)
 code point: 32 33
     syntax: word
   category: l:Latin
buffer code: 0x9C 0xF4 0xA0 0xA1
  file code: 0xC4 0x81 (encoded by coding system mule-utf-8-unix)
       font: -- none --

=============================================

Note: For some reason, possibly related, had difficulty copying the
above text from emacs into clipboard (i.e., "M-w" didn't do anything),
so had to use a workaround.  It seems that this workaround altered the
character in question, the one above following each of the two instances
of "character:".

As for the meaning of the two outputs above, all that I can confidently
glean is that, if I want to use non-English characters in emacs, I have
to be an expert emacs developer.  :)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13 12:30                               ` ken
@ 2009-06-13 13:23                                 ` Eli Zaretskii
  0 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2009-06-13 13:23 UTC (permalink / raw
  To: gebser; +Cc: emacs-devel

> Date: Sat, 13 Jun 2009 08:30:37 -0400
> From: ken <gebser@mousecar.com>
> Reply-To:  gebser@mousecar.com
> 
> ^[$-1 ¡ is 'a' with a horizontal bar over it.  On first inputting it
> (after doing "set-input-method latin-4-postfix" and before changing the
> input method to anything else), it appears correctly and "C-u C-x =" yields:
> 
> =============================================
> 
>   character: ^[$-1 ¡ (05140, 2656, 0xa60)
>     charset: latin-iso8859-4
> 	     (Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110)
>  code point: 96
>      syntax: word
>    category: l:Latin
> buffer code: 0x84 0xE0
>   file code: 0xC4 0x81 (encoded by coding system mule-utf-8-unix)
>        font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-4
> 
> =============================================
> 
> When I reload the file (revisit the file), the same character is
> replaced with a little box.  Doing "C-u C-x =" here yields:
> 
> =============================================
> 
>   character: ^[$-1 ¡ (01210041, 331809, 0x51021)
>     charset: mule-unicode-0100-24ff
> 	     (Unicode characters of the range U+0100..U+24FF.)
>  code point: 32 33
>      syntax: word
>    category: l:Latin
> buffer code: 0x9C 0xF4 0xA0 0xA1
>   file code: 0xC4 0x81 (encoded by coding system mule-utf-8-unix)
>        font: -- none --
> 
> =============================================

So I think everything is clear now: you have a font that covers this
characters when they are from the 8859-4 character set, but you do not
have a font that covers them in Unicode.  You should install the
Unicode font that supports these characters.

> As for the meaning of the two outputs above, all that I can confidently
> glean is that, if I want to use non-English characters in emacs, I have
> to be an expert emacs developer.  :)

That's exaggeration, I think.  You can use the "C-u C-x =" command,
just as you did above, to find out what Emacs thinks about each
character that is displayed as an empty box.  You can then look for
fonts that cover these characters.  "C-u C-x =" is a user-level
command, and one of its uses is precisely this: to find out what fonts
are missing on your machine.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 20:56                   ` B. T. Raven
@ 2009-06-13 16:16                     ` Xah Lee
  0 siblings, 0 replies; 56+ messages in thread
From: Xah Lee @ 2009-06-13 16:16 UTC (permalink / raw
  To: help-gnu-emacs

On Jun 12, 1:56 pm, "B. T. Raven" <ni...@nihilo.net> wrote:

> > • Ethology, Ethnology, and Lyrics
> >  http://xahlee.org/Periodic_dosage_dir/sanga_pemci/sanga_pemci.html

> Totally OT but prima facie the mosting interesting title is the last.
> Unfortunately I couldn't grok what ethology (the "anthropology" of
> animals)had to do with it unless the critters that emit "The Masochistic
> Cries of Lovelorn Females" are to be considered as less than human. I
> notice that Salt-n-Pepa's sweet little ditty (Don't want no S.D.M.) is
> missing from the list, but maybe that's more sadistic than masochistic;
> maybe it belongs in the Quagmire. ;-) Sexology is a bona fide area of
> inquiry pioneered by Kinsey et al. but sexualogy is not an English word
> nor (I keep my fingers crossed) will it ever become one.

sexualogy = sexology + sexuality.

^_^

ok, now it reads “... respect to ethology and sexuality.”. Simpler and
more fitting.

  Xah
∑ http://xahlee.org/

☄


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 17:27                 ` Xah Lee
  2009-06-12 19:30                   ` Lewis Perin
  2009-06-12 20:56                   ` B. T. Raven
@ 2009-06-13 20:35                   ` Lewis Perin
  2009-06-14 11:47                     ` ken
  2 siblings, 1 reply; 56+ messages in thread
From: Lewis Perin @ 2009-06-13 20:35 UTC (permalink / raw
  To: help-gnu-emacs

Xah Lee <xahlee@gmail.com> writes:

> [...]
> i use mixed chinese & english in single file often and in both mac os
> x and windows. They work well. On the mac, my emacs is version 22.x.
> On win, it is emacs23. My encoding in emacs is set to utf-8.

Thanks for mentioning v. 23.  I just downloaded it, despite my
misgivings about life on the bleeding edge, and my problem with some
Chinese UTF-8 characters' glyphs turning to boxes when the file is
reverted seems to have vanished.

/Lew
---
Lew Perin / perin@acm.org
http://www.panix.com/~perin/babelcarp.html


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13 20:35                   ` Lewis Perin
@ 2009-06-14 11:47                     ` ken
  2009-06-15  7:28                       ` Bernardo
  0 siblings, 1 reply; 56+ messages in thread
From: ken @ 2009-06-14 11:47 UTC (permalink / raw
  To: Lewis Perin; +Cc: help-gnu-emacs

On 06/13/2009 04:35 PM Lewis Perin wrote:
> Xah Lee <xahlee@gmail.com> writes:
> 
>> [...]
>> i use mixed chinese & english in single file often and in both mac os
>> x and windows. They work well. On the mac, my emacs is version 22.x.
>> On win, it is emacs23. My encoding in emacs is set to utf-8.
> 
> Thanks for mentioning v. 23.  I just downloaded it, despite my
> misgivings about life on the bleeding edge, and my problem with some
> Chinese UTF-8 characters' glyphs turning to boxes when the file is
> reverted seems to have vanished.
> 
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://www.panix.com/~perin/babelcarp.html

Lew (or anyone),

Where did you find v.23?  The only place I'm seeing is cvs,
<http://cvs.savannah.gnu.org/viewvc/emacs/?root=emacs>.





^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-12 23:38                           ` ken
  2009-06-13  4:11                             ` Eli Zaretskii
@ 2009-06-14 20:59                             ` Stefan Monnier
  1 sibling, 0 replies; 56+ messages in thread
From: Stefan Monnier @ 2009-06-14 20:59 UTC (permalink / raw
  To: gebser; +Cc: Lennart Borgman, Emacs-Devel devel

> Thank you, Lennart!  To give the people at emacs-devel some context to
> the issue, the salient portion of the previous post is pasted below:

IIUC an important missing detail is that you're using Emacs-22 and that
this same problem won't happen in Emacs-23, right?
I could imagine adding a help-text that would pop-up when the mouse is
over one of those dreaded square boxes.  But this problem has been
around for a while now, and should be much more rare in Emacs-23, so I'm
not sure it's worth "fixing".  Or rather I think that the approach taken
in Emacs-23 is such a fix.

        Stefan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13  5:50                             ` Richard Stallman
@ 2009-06-15  4:34                               ` Miles Bader
  2009-06-15 19:30                                 ` Richard Stallman
  2009-06-15 20:06                               ` Chong Yidong
  1 sibling, 1 reply; 56+ messages in thread
From: Miles Bader @ 2009-06-15  4:34 UTC (permalink / raw
  To: rms; +Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:
> Would it be possible to display the codepoint numerically in the box?

Would that help much?  I'm not sure that the codepoint is very useful to
most people (and the information is easily available via C-x =)...

[Well I suppose it might be a little useful, but maybe not enough to
justify implementation costs...]

-Miles

-- 
(\(\
(^.^)
(")")
*This is the cute bunny virus, please copy this into your sig so it can spread.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-14 11:47                     ` ken
@ 2009-06-15  7:28                       ` Bernardo
  0 siblings, 0 replies; 56+ messages in thread
From: Bernardo @ 2009-06-15  7:28 UTC (permalink / raw
  To: help-gnu-emacs

http://alpha.gnu.org/gnu/emacs/pretest/

ken said the following on 14/06/09 21:47:
> 
> Lew (or anyone),
> 
> Where did you find v.23?  The only place I'm seeing is cvs,
> <http://cvs.savannah.gnu.org/viewvc/emacs/?root=emacs>.
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-15  4:34                               ` Miles Bader
@ 2009-06-15 19:30                                 ` Richard Stallman
  2009-06-16  0:30                                   ` James Cloos
  2009-06-16 20:48                                   ` Stefan Monnier
  0 siblings, 2 replies; 56+ messages in thread
From: Richard Stallman @ 2009-06-15 19:30 UTC (permalink / raw
  To: Miles Bader; +Cc: emacs-devel

    > Would it be possible to display the codepoint numerically in the box?

    Would that help much?  I'm not sure that the codepoint is very useful to
    most people (and the information is easily available via C-x =)...

I think it would be quite useful.  First, you would immediately see
which of the undisplayable characters are the same.  Second, you might
come to recognize a few common codepoints, and that would be useful.

Whether it's worth the trouble depends on how much trouble that is,
which I don't know.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-13  5:50                             ` Richard Stallman
  2009-06-15  4:34                               ` Miles Bader
@ 2009-06-15 20:06                               ` Chong Yidong
  2009-06-15 21:57                                 ` Drew Adams
  2009-06-16  5:30                                 ` Richard Stallman
  1 sibling, 2 replies; 56+ messages in thread
From: Chong Yidong @ 2009-06-15 20:06 UTC (permalink / raw
  To: rms; +Cc: emacs-devel, Miles Bader

Richard Stallman <rms@gnu.org> writes:

> Would it be possible to display the codepoint numerically in the box?

I don't think there's enough space.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: utf8 char display in buffer
  2009-06-15 20:06                               ` Chong Yidong
@ 2009-06-15 21:57                                 ` Drew Adams
  2009-06-16  5:30                                 ` Richard Stallman
  1 sibling, 0 replies; 56+ messages in thread
From: Drew Adams @ 2009-06-15 21:57 UTC (permalink / raw
  To: 'Chong Yidong', rms; +Cc: 'Miles Bader', emacs-devel

> > Would it be possible to display the codepoint numerically 
> > in the box?
> 
> I don't think there's enough space.

I don't know whether Richard really meant to put the number inside the little
character-size box.

But a tooltip (mouseover) would work. And it would have room for more than just
the codepoint. It would not show you more than one at a time, however. (He
mentioned seeing immediately which little boxes represented the same codepoint.)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-15 19:30                                 ` Richard Stallman
@ 2009-06-16  0:30                                   ` James Cloos
  2009-06-16  1:10                                     ` Miles Bader
  2009-06-16 13:53                                     ` Chong Yidong
  2009-06-16 20:48                                   ` Stefan Monnier
  1 sibling, 2 replies; 56+ messages in thread
From: James Cloos @ 2009-06-16  0:30 UTC (permalink / raw
  To: emacs-devel; +Cc: rms, Miles Bader

>> Would it be possible to display the codepoint numerically in the box?

Displaying the UCS Code Point for characters which lack font support is
the norm in GTK.  This is done by drawing a box around four or six digit
glyphs which are rendered in a smaller point size.  I'd expect that this
is easier to read when using a proportional face.

Apple went in a slightly different direction and commissioned a fallback
font from Michael Everson which has one glyph per Unicode script and uses
that for each character associated with said script.

Emacs could easily do either (w/o the need for a font in the latter case).

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-16  0:30                                   ` James Cloos
@ 2009-06-16  1:10                                     ` Miles Bader
  2009-06-16  1:12                                       ` Miles Bader
  2009-06-16 13:53                                     ` Chong Yidong
  1 sibling, 1 reply; 56+ messages in thread
From: Miles Bader @ 2009-06-16  1:10 UTC (permalink / raw
  To: James Cloos; +Cc: rms, emacs-devel

James Cloos <cloos@jhcloos.com> writes:
> Displaying the UCS Code Point for characters which lack font support is
> the norm in GTK.  This is done by drawing a box around four or six digit
> glyphs which are rendered in a smaller point size.  I'd expect that this
> is easier to read when using a proportional face.
>
> Apple went in a slightly different direction and commissioned a fallback
> font from Michael Everson which has one glyph per Unicode script and uses
> that for each character associated with said script.
>
> Emacs could easily do either (w/o the need for a font in the latter case).

The GTK method does screw up one good thing about emacs' method -- the
boxes it displays are generally the correct width (single- or double-
width [CJK etc]), so text alignment is preserved.

The apple method might be able to preserve the width, and seems better
for the user anyway -- I think the most useful info is "what kind of
font should I install to fix this" and/or "do I really care enough to
fix this", so identifying the script is probably more important than
identifying the precise codepoint.

Drew's suggestion of a tooltip seems like it might be easier to
implement, and more functional than either in practice though --
it could display a lot more information without screwing up alignment,
basically a slightly more convenient/obvious version of C-x =

-Miles

-- 
Egotist, n. A person of low taste, more interested in himself than in me.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-16  1:10                                     ` Miles Bader
@ 2009-06-16  1:12                                       ` Miles Bader
  2009-06-17  5:07                                         ` Richard Stallman
  0 siblings, 1 reply; 56+ messages in thread
From: Miles Bader @ 2009-06-16  1:12 UTC (permalink / raw
  To: James Cloos; +Cc: rms, emacs-devel

Miles Bader <miles@gnu.org> writes:
> The GTK method does screw up one good thing about emacs' method -- the
> boxes it displays are generally the correct width (single- or double-
> width [CJK etc]), so text alignment is preserved.

Er, to be more clear, that should be "the boxes _emacs_ displays are
generally the correct width...".

-Miles

-- 
My books focus on timeless truths.  -- Donald Knuth




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-15 20:06                               ` Chong Yidong
  2009-06-15 21:57                                 ` Drew Adams
@ 2009-06-16  5:30                                 ` Richard Stallman
  1 sibling, 0 replies; 56+ messages in thread
From: Richard Stallman @ 2009-06-16  5:30 UTC (permalink / raw
  To: Chong Yidong; +Cc: emacs-devel, miles

    > Would it be possible to display the codepoint numerically in the box?

    I don't think there's enough space.

What determines how much space there is?  Could we make it wider?
Perhaps an earlier stage of redisplay could check whether there is a font
for the character.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-16  0:30                                   ` James Cloos
  2009-06-16  1:10                                     ` Miles Bader
@ 2009-06-16 13:53                                     ` Chong Yidong
  1 sibling, 0 replies; 56+ messages in thread
From: Chong Yidong @ 2009-06-16 13:53 UTC (permalink / raw
  To: James Cloos; +Cc: Miles Bader, rms, emacs-devel

James Cloos <cloos@jhcloos.com> writes:

> Apple went in a slightly different direction and commissioned a fallback
> font from Michael Everson which has one glyph per Unicode script and uses
> that for each character associated with said script.

If your system has a font for display a character, Emacs will
automatically use that font.  But I don't think we should package a
fallback font into the Emacs distribution.  The benefit would be
marginal anyway; if a user lacks a system font for displaying a script,
he or she probably cannot read that script.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-15 19:30                                 ` Richard Stallman
  2009-06-16  0:30                                   ` James Cloos
@ 2009-06-16 20:48                                   ` Stefan Monnier
  1 sibling, 0 replies; 56+ messages in thread
From: Stefan Monnier @ 2009-06-16 20:48 UTC (permalink / raw
  To: rms; +Cc: emacs-devel, Miles Bader

Can we stop wasting time on this: this was a problem in Emacs-22.
There's no evidence that this problem is significant in Emacs-23.


        Stefan "I said it already elsewhere, but I'm afraid it got lost"




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: utf8 char display in buffer
  2009-06-16  1:12                                       ` Miles Bader
@ 2009-06-17  5:07                                         ` Richard Stallman
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Stallman @ 2009-06-17  5:07 UTC (permalink / raw
  To: Miles Bader; +Cc: cloos, emacs-devel

    > The GTK method does screw up one good thing about emacs' method -- the
    > boxes it displays are generally the correct width (single- or double-
    > width [CJK etc]), so text alignment is preserved.

    Er, to be more clear, that should be "the boxes _emacs_ displays are
    generally the correct width...".

Another idea is to display the bottom byte of the character code as
hex in the box.  Maybe two hex digits can fit if they are small, and
the user would sometimes be able to identify the character from those.

Another idea is to display in the box a slightly smaller version of
the character which the bottom byte represents.

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2009-06-17  5:07 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.227.1244485995.2239.help-gnu-emacs@gnu.org>
2009-06-08 19:10 ` utf8 char display in buffer Teemu Likonen
2009-06-08 19:52 ` Xah Lee
2009-06-09 10:52   ` ken
2009-06-08 20:43 ` B. T. Raven
2009-06-08 20:49   ` B. T. Raven
2009-06-08 22:49     ` ken
2009-06-09 10:24   ` ken
     [not found]   ` <mailman.289.1244543082.2239.help-gnu-emacs@gnu.org>
2009-06-09 13:03     ` B. T. Raven
2009-06-09 14:51       ` ken
     [not found]       ` <mailman.297.1244559110.2239.help-gnu-emacs@gnu.org>
2009-06-10  1:34         ` B. T. Raven
2009-06-10 14:03           ` Lewis Perin
2009-06-11  3:21             ` B. T. Raven
2009-06-12 14:54               ` ken
2009-06-13  3:30                 ` Eli Zaretskii
     [not found]               ` <mailman.522.1244818530.2239.help-gnu-emacs@gnu.org>
2009-06-12 15:39                 ` Lewis Perin
2009-06-12 16:48                   ` B. T. Raven
2009-06-12 17:45                     ` Lewis Perin
2009-06-12 17:53                     ` Xah Lee
2009-06-12 20:59                       ` Lennart Borgman
2009-06-12 22:23                       ` ken
2009-06-12 22:27                         ` Lennart Borgman
2009-06-12 23:38                           ` ken
2009-06-13  4:11                             ` Eli Zaretskii
2009-06-13 12:30                               ` ken
2009-06-13 13:23                                 ` Eli Zaretskii
2009-06-14 20:59                             ` Stefan Monnier
2009-06-13  1:36                           ` Miles Bader
2009-06-13  1:43                             ` Lennart Borgman
2009-06-13  5:50                             ` Richard Stallman
2009-06-15  4:34                               ` Miles Bader
2009-06-15 19:30                                 ` Richard Stallman
2009-06-16  0:30                                   ` James Cloos
2009-06-16  1:10                                     ` Miles Bader
2009-06-16  1:12                                       ` Miles Bader
2009-06-17  5:07                                         ` Richard Stallman
2009-06-16 13:53                                     ` Chong Yidong
2009-06-16 20:48                                   ` Stefan Monnier
2009-06-15 20:06                               ` Chong Yidong
2009-06-15 21:57                                 ` Drew Adams
2009-06-16  5:30                                 ` Richard Stallman
     [not found]                       ` <mailman.536.1244845400.2239.help-gnu-emacs@gnu.org>
2009-06-13  0:35                         ` Xah Lee
2009-06-12 17:27                 ` Xah Lee
2009-06-12 19:30                   ` Lewis Perin
2009-06-12 19:43                     ` Xah Lee
2009-06-12 20:56                   ` B. T. Raven
2009-06-13 16:16                     ` Xah Lee
2009-06-13 20:35                   ` Lewis Perin
2009-06-14 11:47                     ` ken
2009-06-15  7:28                       ` Bernardo
2009-06-11 12:03 ` Teemu Likonen
2009-06-11 12:55   ` Lennart Borgman
2009-06-11 13:04     ` Andreas Schwab
2009-06-11 13:07       ` Lennart Borgman
2009-06-11 13:08         ` Lennart Borgman
2009-06-11 13:24           ` Tassilo Horn
2009-06-08 18:33 ken

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.