query-replace?

unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed

* query-replace?
@ 2006-01-07 19:04 B. T. Raven
  2006-01-08  4:20 ` query-replace? Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: B. T. Raven @ 2006-01-07 19:04 UTC (permalink / raw)


i.
I have a character \234 in a file that should be displayed as an oe
ligature. I have tried M-% C-q (0)234 but this doesn't work for getting it
into the patter to be replaced. How do I refer to this character?

ii. Is it possible to include newlines in regexps? I have a five line
header on each page that begins with same characters on 1st line and ends
with same on last line. What is the most automated method of deleting just
these lines?

.*\n can't work of course. Since some of the lines are blank, I inserted
C-q C-j into the string but that produced a litteral ^M. ??

Thanks

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
  2006-01-07 19:04 query-replace? B. T. Raven
@ 2006-01-08  4:20 ` Eli Zaretskii
       [not found] ` <mailman.301.1136694143.26925.help-gnu-emacs@gnu.org>
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2006-01-08  4:20 UTC (permalink / raw)


> From: "B. T. Raven" <ecinmn@peoplepc.com>
> Date: Sat, 07 Jan 2006 19:04:29 GMT
> 
> I have a character \234 in a file that should be displayed as an oe
> ligature. I have tried M-% C-q (0)234 but this doesn't work for getting it
> into the patter to be replaced. How do I refer to this character?

Not as \234.  Internally, non-ASCII characters are encoded differently
inside Emacs buffers; \234 is that character's _external_ encoding, in
a file.

To see what is the internal codepoint, go to that character and type
"C-u C-x =".  In the buffer that Emacs pops up, look for the number
labeled "buffer code".

> Is it possible to include newlines in regexps?

Yes, use "C-q C-j".

> .*\n can't work of course. Since some of the lines are blank, I inserted
> C-q C-j into the string but that produced a litteral ^M. ??

C-q C-j should produce a literal newline, not ^M.

Btw, in the future I suggest not to ask several different unrelated
questions in the same message, but instead split it into several
messages.  That would allow you to give meaningful Subject lines to
each message, and readers of this forum will be able to tell in
advance, by just reading the Subject lines, whether they can help you
with some of the problems.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
       [not found] ` <mailman.301.1136694143.26925.help-gnu-emacs@gnu.org>
@ 2006-01-08  5:38   ` B. T. Raven
  0 siblings, 0 replies; 14+ messages in thread
From: B. T. Raven @ 2006-01-08  5:38 UTC (permalink / raw)



"Eli Zaretskii" <eliz@gnu.org> wrote in message
news:mailman.301.1136694143.26925.help-gnu-emacs@gnu.org...
> > From: "B. T. Raven" <ecinmn@peoplepc.com>
> > Date: Sat, 07 Jan 2006 19:04:29 GMT
> >
> > I have a character \234 in a file that should be displayed as an oe
> > ligature. I have tried M-% C-q (0)234 but this doesn't work for
getting it
> > into the patter to be replaced. How do I refer to this character?
>
> Not as \234.  Internally, non-ASCII characters are encoded differently
> inside Emacs buffers; \234 is that character's _external_ encoding, in
> a file.
>
> To see what is the internal codepoint, go to that character and type
> "C-u C-x =".  In the buffer that Emacs pops up, look for the number
> labeled "buffer code".
>
> > Is it possible to include newlines in regexps?
>
> Yes, use "C-q C-j".
>
> > .*\n can't work of course. Since some of the lines are blank, I
inserted
> > C-q C-j into the string but that produced a litteral ^M. ??
>
> C-q C-j should produce a literal newline, not ^M.
>
> Btw, in the future I suggest not to ask several different unrelated
> questions in the same message, but instead split it into several
> messages.  That would allow you to give meaningful Subject lines to
> each message, and readers of this forum will be able to tell in
> advance, by just reading the Subject lines, whether they can help you
> with some of the problems.
>
>

Thanks, Eli. Actually \234 (dec. 156, hex 9c) was the internal
representation of the char. It was the only one with diacriticals that
showed up that way(clicking anywhere in the string put the cursor at the
backslash). The actual oe ligature is 01210163, 331891, 0x51073 but
wherever \234 was needed to be 'oe' from context. I had left
read-quoted-char-radix  at 16 and thought I could override it by typing
C-q 0234 but that is also a hex number I guess. I used query-replace for
the subject line because I saw both questions as related to that function.
I suppose one was really a question on regexp syntax. Anyway I understand
both procedures a little better now (until next time).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
  2006-01-07 19:04 query-replace? B. T. Raven
  2006-01-08  4:20 ` query-replace? Eli Zaretskii
       [not found] ` <mailman.301.1136694143.26925.help-gnu-emacs@gnu.org>
@ 2006-01-08 12:03 ` Peter Dyballa
  2006-01-08 12:11   ` query-replace? Lennart Borgman
       [not found] ` <mailman.315.1136721918.26925.help-gnu-emacs@gnu.org>
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Dyballa @ 2006-01-08 12:03 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 07.01.2006 um 19:04 schrieb B. T. Raven:

> i.
> I have a character \234 in a file that should be displayed as an oe
> ligature. I have tried M-% C-q (0)234 but this doesn't work for  
> getting it
> into the patter to be replaced. How do I refer to this character?

With default read-quoted-char-radix being 8 it's C-q 2 3 4 <RET>. HEX  
input would be C-q 9 C RET>, and decimal C-q 1 5 6 <RET>. I don't  
remember that this failed in any case ...

A different approach would be to bind some function key to enter œ.

>
> ii. Is it possible to include newlines in regexps? I have a five line
> header on each page that begins with same characters on 1st line  
> and ends
> with same on last line. What is the most automated method of  
> deleting just
> these lines?

C-q C-j.

--
Greetings

   Pete

There's no place like ~
                           (UNIX Guru)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
  2006-01-08 12:03 ` query-replace? Peter Dyballa
@ 2006-01-08 12:11   ` Lennart Borgman
  2006-01-08 12:40     ` query-replace? Peter Dyballa
  0 siblings, 1 reply; 14+ messages in thread
From: Lennart Borgman @ 2006-01-08 12:11 UTC (permalink / raw)
  Cc: B. T. Raven, help-gnu-emacs


>
>>
>> ii. Is it possible to include newlines in regexps? I have a five line
>> header on each page that begins with same characters on 1st line  and 
>> ends
>> with same on last line. What is the most automated method of  
>> deleting just
>> these lines?
>
>
> C-q C-j.

Maybe the question was about multiple lines, not the newline character? 
Then perhaps

    http://www.emacswiki.org/cgi-bin/wiki/MultilineRegexp

can help?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
  2006-01-08 12:11   ` query-replace? Lennart Borgman
@ 2006-01-08 12:40     ` Peter Dyballa
  2006-01-08 12:48       ` query-replace? Lennart Borgman
  2006-01-08 19:41       ` query-replace? Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Dyballa @ 2006-01-08 12:40 UTC (permalink / raw)
  Cc: B. T. Raven, help-gnu-emacs


Am 08.01.2006 um 13:11 schrieb Lennart Borgman:

> Maybe the question was about multiple lines, not the newline  
> character? Then perhaps
>
>    http://www.emacswiki.org/cgi-bin/wiki/MultilineRegexp
>
> can help?

Oh yes: \n. The general approach. Could be in Losedows C-q C-m C-q C- 
j (CR LF) is needed ...

Nice site anyway.

--
Greetings

   Pete

$ sumascii BILL GATES
   B   I   L   L   G   A   T   E   S
  66+ 73+ 76+ 76+ 71+ 65+ 84+ 69+ 83 = 663

  and add 3 because he's Bill Gates the third.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
  2006-01-08 12:40     ` query-replace? Peter Dyballa
@ 2006-01-08 12:48       ` Lennart Borgman
  2006-01-08 19:41       ` query-replace? Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Lennart Borgman @ 2006-01-08 12:48 UTC (permalink / raw)
  Cc: B. T. Raven, help-gnu-emacs

Peter Dyballa wrote:

>
> Am 08.01.2006 um 13:11 schrieb Lennart Borgman:
>
>> Maybe the question was about multiple lines, not the newline  
>> character? Then perhaps
>>
>>    http://www.emacswiki.org/cgi-bin/wiki/MultilineRegexp
>>
>> can help?
>
>
> Oh yes: \n. The general approach. Could be in Losedows C-q C-m C-q C- 
> j (CR LF) is needed ...
>
> Nice site anyway.

Those unlucky dwelling in Losedows (like me) and even those more 
fortunate may need to handle files from both hell and heaven. 
Fortunately the mighty Emacs then can help because it can see from where 
the file arosed. But you must ask Emacs to do so with \n.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* imput methods (was Re: query-replace?)
       [not found] ` <mailman.315.1136721918.26925.help-gnu-emacs@gnu.org>
@ 2006-01-08 16:17   ` B. T. Raven
  2006-01-08 17:05     ` Peter Dyballa
       [not found]     ` <mailman.349.1136740073.26925.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 14+ messages in thread
From: B. T. Raven @ 2006-01-08 16:17 UTC (permalink / raw)



"Peter Dyballa" <Peter_Dyballa@Web.DE> wrote in message
news:mailman.315.1136721918.26925.help-gnu-emacs@gnu.org...

Am 07.01.2006 um 19:04 schrieb B. T. Raven:

> i.
> I have a character \234 in a file that should be displayed as an oe
> ligature. I have tried M-% C-q (0)234 but this doesn't work for
> getting it
> into the patter to be replaced. How do I refer to this character?

With default read-quoted-char-radix being 8 it's C-q 2 3 4 <RET>. HEX
input would be C-q 9 C RET>, and decimal C-q 1 5 6 <RET>. I don't
remember that this failed in any case ...

A different approach would be to bind some function key to enter œ.

>
> ii. Is it possible to include newlines in regexps? I have a five line
> header on each page that begins with same characters on 1st line
> and ends
> with same on last line. What is the most automated method of
> deleting just
> these lines?

C-q C-j.


Thanks, Peter and Lennart. For me it's easier to use leim since I use
latin-4-postfix most of the time anyway. With this the oe ligature is O&
or o&, which seems mnemonic since the first component of ampersand is a
Greek e (e-t). This brings up another question: When I type C-h I
(describe input method) and then "latin-3-postfix" all I see are the empty
rectangles again. This happen while looking at them with the .ttf font
arialuni, which certainly has the glyphs for Turkish characters. ???

Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: imput methods (was Re: query-replace?)
  2006-01-08 16:17   ` imput methods (was Re: query-replace?) B. T. Raven
@ 2006-01-08 17:05     ` Peter Dyballa
       [not found]     ` <mailman.349.1136740073.26925.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Dyballa @ 2006-01-08 17:05 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 08.01.2006 um 16:17 schrieb B. T. Raven:

> This brings up another question: When I type C-h I
> (describe input method) and then "latin-3-postfix" all I see are  
> the empty
> rectangles again. This happen while looking at them with the .ttf font
> arialuni, which certainly has the glyphs for Turkish characters. ???

Arial Unicode looks to be very complete in the many Latin and Latin- 
Extended areas. Probably you just need to create fontsets:

(message "Neue fontsets für X11")
(if (fboundp 'new-fontset)
   (progn
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;; Adobe Courier - Unicode encoded OpenType font, version  
1.020, 374 glyphs
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
     (create-fontset-from-fontset-spec "-adobe-courier-medium-r-*-*-9- 
*-*-*-*-*-fontset-09pt_adobe_courier" t 'noerror)
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-1  '("adobe-courier" . "iso8859-1"))
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-2  '("adobe-courier" . "iso8859-2"))
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-3  '("adobe-courier" . "iso8859-3"))
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-4  '("adobe-courier" . "iso8859-4"))
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-9  '("adobe-courier" . "iso8859-9"))
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-14 '("adobe-courier" . "iso8859-14"))
	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-15 '("adobe-courier" . "iso8859-15"))
;	(set-fontset-font "fontset-09pt_adobe_courier"       'latin- 
iso8859-16 '("adobe-courier" . "iso8859-16"))
	(set-fontset-font "fontset-09pt_adobe_courier" 'mule- 
unicode-0100-24ff '("adobe-courier" . "iso10646-1"))
	(set-fontset-font "fontset-09pt_adobe_courier" 'mule- 
unicode-2500-33ff '("adobe-courier" . "iso10646-1"))
	(set-fontset-font "fontset-09pt_adobe_courier" 'mule-unicode-e000- 
ffff '("adobe-courier" . "iso10646-1"))
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0370) (decode-char 'ucs #x03cf)) '("courier new" .  
"iso10646-1"))	; Greek
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x03d0) (decode-char 'ucs #x03ff)) '("lucida sans typewriter" .  
"iso10646-1"))	; Coptic
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0400) (decode-char 'ucs #x04ff)) '("lucida sans typewriter" .  
"iso10646-1"))	; Cyrillic
	(set-fontset
-font "fontset-09pt_adobe_courier" (cons (decode-char 'ucs #x0500)  
(decode-char 'ucs #x052f)) '("lucida sans typewriter" .  
"iso10646-1"))	; Cyrillic Suppll
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0530) (decode-char 'ucs #x058f)) '("aramian unicode" .  
"iso10646-1"))	; Armenian (sylfaen
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0590) (decode-char 'ucs #x05ff)) '("courier new" .  
"iso10646-1"))	; Hebrew
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0600) (decode-char 'ucs #x06ff)) '("lucida sans typewriter" .  
"iso10646-1"))	; Arabic
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0700) (decode-char 'ucs #x074f)) '("courier new" .  
"iso10646-1"))	; Syriac
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0780) (decode-char 'ucs #x07bf)) '("courier new" .  
"iso10646-1"))	; Thaana
	(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char  
'ucs #x0900) (decode-char 'ucs #x097f)) '("courier new" .  
"iso10646-1"))	; Devanagari
))
(provide 'site-fontsets-x11)

One template that has some more regions of Unicode defined for one  
font, and of course there are some more sizes defined. Turkish is ISO  
Latin-5 or ISO 8859-9, so the definition above should work for your  
case.

This works in X11. I don't know whether it works Losedows or whether  
this is necessary at all ...


Try to find out which fonts GNU Emacs sees: M-x set-frame-font RET  
TAB TAB RET, change to *Completions* buffer and save it to a name you  
have determined before! If you try to expand a partial file name it  
will erase the *Completions* buffer ...

--
Greetings

   Pete

Mac OS X is like a wigwam: no fences, no gates, but an apache inside.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: query-replace?
  2006-01-08 12:40     ` query-replace? Peter Dyballa
  2006-01-08 12:48       ` query-replace? Lennart Borgman
@ 2006-01-08 19:41       ` Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2006-01-08 19:41 UTC (permalink / raw)


> From: Peter Dyballa <Peter_Dyballa@Web.DE>
> Date: Sun, 8 Jan 2006 13:40:32 +0100
> Cc: "B. T. Raven" <ecinmn@peoplepc.com>, help-gnu-emacs@gnu.org
> 
> Could be in Losedows C-q C-m C-q C-j (CR LF) is needed ...

There're no CR characters in the buffer after the CR-LF file is read
into it, so no, C-m is not needed on Windows more than it is needed on
other platforms.  Especially since Emacs gives the same treatment to
files with DOS CR-LF EOLs on all platforms.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* fontsets:  (was Re: query-replace?)
       [not found]     ` <mailman.349.1136740073.26925.help-gnu-emacs@gnu.org>
@ 2006-01-08 22:18       ` B. T. Raven
  2006-01-09 11:53         ` Peter Dyballa
       [not found]         ` <mailman.445.1136812310.26925.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 14+ messages in thread
From: B. T. Raven @ 2006-01-08 22:18 UTC (permalink / raw)



"Peter Dyballa" <Peter_Dyballa@Web.DE> wrote in message
news:mailman.349.1136740073.26925.help-gnu-emacs@gnu.org...

Am 08.01.2006 um 16:17 schrieb B. T. Raven:

> This brings up another question: When I type C-h I
> (describe input method) and then "latin-3-postfix" all I see are
> the empty
> rectangles again. This happen while looking at them with the .ttf font
> arialuni, which certainly has the glyphs for Turkish characters. ???

Arial Unicode looks to be very complete in the many Latin and Latin-
Extended areas. Probably you just need to create fontsets:

(message "Neue fontsets für X11")
(if (fboundp 'new-fontset)
   (progn
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;;;;;;; Adobe Courier - Unicode encoded OpenType font, version
1.020, 374 glyphs
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
     (create-fontset-from-fontset-spec "-adobe-courier-medium-r-*-*-9-
*-*-*-*-*-fontset-09pt_adobe_courier" t 'noerror)
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-1  '("adobe-courier" . "iso8859-1"))
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-2  '("adobe-courier" . "iso8859-2"))
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-3  '("adobe-courier" . "iso8859-3"))
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-4  '("adobe-courier" . "iso8859-4"))
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-9  '("adobe-courier" . "iso8859-9"))
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-14 '("adobe-courier" . "iso8859-14"))
(set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-15 '("adobe-courier" . "iso8859-15"))
; (set-fontset-font "fontset-09pt_adobe_courier"       'latin-
iso8859-16 '("adobe-courier" . "iso8859-16"))
(set-fontset-font "fontset-09pt_adobe_courier" 'mule-
unicode-0100-24ff '("adobe-courier" . "iso10646-1"))
(set-fontset-font "fontset-09pt_adobe_courier" 'mule-
unicode-2500-33ff '("adobe-courier" . "iso10646-1"))
(set-fontset-font "fontset-09pt_adobe_courier" 'mule-unicode-e000-
ffff '("adobe-courier" . "iso10646-1"))
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0370) (decode-char 'ucs #x03cf)) '("courier new" .
"iso10646-1")) ; Greek
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x03d0) (decode-char 'ucs #x03ff)) '("lucida sans typewriter" .
"iso10646-1")) ; Coptic
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0400) (decode-char 'ucs #x04ff)) '("lucida sans typewriter" .
"iso10646-1")) ; Cyrillic
(set-fontset
-font "fontset-09pt_adobe_courier" (cons (decode-char 'ucs #x0500)
(decode-char 'ucs #x052f)) '("lucida sans typewriter" .
"iso10646-1")) ; Cyrillic Suppll
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0530) (decode-char 'ucs #x058f)) '("aramian unicode" .
"iso10646-1")) ; Armenian (sylfaen
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0590) (decode-char 'ucs #x05ff)) '("courier new" .
"iso10646-1")) ; Hebrew
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0600) (decode-char 'ucs #x06ff)) '("lucida sans typewriter" .
"iso10646-1")) ; Arabic
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0700) (decode-char 'ucs #x074f)) '("courier new" .
"iso10646-1")) ; Syriac
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0780) (decode-char 'ucs #x07bf)) '("courier new" .
"iso10646-1")) ; Thaana
(set-fontset-font "fontset-09pt_adobe_courier" (cons (decode-char
'ucs #x0900) (decode-char 'ucs #x097f)) '("courier new" .
"iso10646-1")) ; Devanagari
))
(provide 'site-fontsets-x11)

One template that has some more regions of Unicode defined for one
font, and of course there are some more sizes defined. Turkish is ISO
Latin-5 or ISO 8859-9, so the definition above should work for your
case.

This works in X11. I don't know whether it works Losedows or whether
this is necessary at all ...


Try to find out which fonts GNU Emacs sees: M-x set-frame-font RET
TAB TAB RET, change to *Completions* buffer and save it to a name you
have determined before! If you try to expand a partial file name it
will erase the *Completions* buffer ...


The part of interest is here:

-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-*-#130
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-big5
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-gb2312
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso10646-1
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-1
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-13
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-2
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-4
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-5
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-6
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-7
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-8
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-iso8859-9
-outline-Arial Unicode
MS-normal-r-normal-normal-*-*-96-96-p-*-jisx0201-katakana
-outline-Arial Unicode
MS-normal-r-normal-normal-*-*-96-96-p-*-jisx0201-latin
-outline-Arial Unicode
MS-normal-r-normal-normal-*-*-96-96-p-*-jisx0208-sjis
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-koi8-r
-outline-Arial Unicode
MS-normal-r-normal-normal-*-*-96-96-p-*-ksc5601.1987
-outline-Arial Unicode MS-normal-r-normal-normal-*-*-96-96-p-*-tis620

So iso8859-3 isn't even there! How do I get it there? Is this related to
the codepage *.nls files again?

I normally have w32-use-w32-font-dialog set to t so that I get the
standard MSwindows font selection dialog box. When I set it to nil I get
what every one else sees, I guess, which says misc, courier, fontsets, and
under these, Lucida, Terminal, etc. but not nearly as many as are in the
\windows\fonts subdirectory.

Right now all I have relating to fonts in my .emacs is:

(custom-set-faces
  ;; custom-set-faces was added by Custom -- don't edit or cut/paste it!
  ;; Your init file should contain only one such instance.
 '(default ((t (:stipple nil :background "ghostwhite" :foreground "black"
:inverse-video nil :box nil :strike-through nil :overline nil :underline
nil :slant normal :weight normal :height 108 :width normal :family
"outline-arial unicode ms")))))

So that's what emacs starts up with.

Since this font covers such a large swath of Unicode I would rather stick
with the Losedows interface for now. I still don't understand fontsets
yet. Now the dialog box shows fonts, styles, point size (8-72) and script
all in one place. If I went the the fontset route and I wanted only sizes
9-12 and only four different font styles, wouldn't I have to produced
about 16 times as much lisp code as you show above to get the same Unicode
coverage?

Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fontsets:  (was Re: query-replace?)
  2006-01-08 22:18       ` fontsets: " B. T. Raven
@ 2006-01-09 11:53         ` Peter Dyballa
       [not found]         ` <mailman.445.1136812310.26925.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Dyballa @ 2006-01-09 11:53 UTC (permalink / raw)
  Cc: help-gnu-emacs


Am 08.01.2006 um 22:18 schrieb B. T. Raven:

> Since this font covers such a large swath of Unicode I would rather  
> stick
> with the Losedows interface for now. I still don't understand fontsets
> yet. Now the dialog box shows fonts, styles, point size (8-72) and  
> script
> all in one place. If I went the the fontset route and I wanted only  
> sizes
> 9-12 and only four different font styles, wouldn't I have to produced
> about 16 times as much lisp code as you show above to get the same  
> Unicode
> coverage?

THe font variants (italic, bold, bold-italic) are automatically  
chosen, so four sets for 9, 10, 11, and 12 would suffice.

Fontsets actually are necessary when you handle texts that have more  
different characters than the small MS or ISO encodings provide. Then  
GNU Emacs needs to create a table that maps code points (characters)  
to members in fonts (glyphs) and this choice can look ugly. You help  
GNU Emacs when you construct a fontset. IMO it would be enough to  
create a fontset like this for 9 pt and three others for 10, 11, and  
12 pt::

     (create-fontset-from-fontset-spec "-outline-Arial Unicode MS- 
normal-r-*-*-9-*-*-*-*-*-fontset-09pt_arial_UC" t 'noerror)
	(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-1   
'("Arial Unicode MS" . "iso8859-1"))
	(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-2   
'("Arial Unicode MS" . "iso8859-2"))
	(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-4   
'("Arial Unicode MS" . "iso8859-4"))
	(set-fontset-font "fontset-09pt_arial_UC"    'cyrillic-iso8859-5   
'("Arial Unicode MS" . "iso8859-5"))
	(set-fontset-font "fontset-09pt_arial_UC"      'arabic-iso8859-6   
'("Arial Unicode MS" . "iso8859-6"))
	(set-fontset-font "fontset-09pt_arial_UC"       'greek-iso8859-7   
'("Arial Unicode MS" . "iso8859-7"))
	(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-8   
'("Arial Unicode MS" . "iso8859-8"))
	(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-9   
'("Arial Unicode MS" . "iso8859-9"))
	(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-13  
'("Arial Unicode MS" . "iso8859-13"))
	(set-fontset-font "fontset-09pt_arial_UC" 'mule-unicode-0100-24ff  
'("Arial Unicode MS" . "iso10646-1"))
	(set-fontset-font "fontset-09pt_arial_UC" 'mule-unicode-2500-33ff  
'("Arial Unicode MS" . "iso10646-1"))
	(set-fontset-font "fontset-09pt_arial_UC" 'mule-unicode-e000-ffff  
'("Arial Unicode MS" . "iso10646-1"))

What about Lucida Console (666 glyphs, 714 mappings)? It can display  
ISO 8859-3 in X11 completely  (Arial Unicode MS has 51,180 glyphs and  
38,933 mappings -- and in X11 it has an ISO 8859-3 encoding!). It's  
even monospaced. There is another monospaced font on the Web: Lucida  
Sans Typewriter (1,376 glyphs, 1,425 mappings). It's part of the Java  
SDKs (starting with Java 1.4 the Lucida fonts were reduced in  
variants, so it's worth to retrieve JDK 1.3 first and update some of  
these fonts with 1.4 and/or 1.5 fonts). The JDKs too have Lucida Sans  
(2,929 glyphs, 2,410 mappings).

Probably you need some ISO 8859-3 encoding file. *I* have no idea  
where in MS Losedows this would be needed, somewhere in the machinery  
that creates partial, specifically named encodings from a Unicode  
encoded font? If it does not work in a *partial* encoding: would  
*complete* Unicode succeed?!

	;;; -*- mode: Text; coding: utf-8; -*-

First open in ISO 8859-3, then select to save in UTF-8 -- conversion  
done!


http://aspell.net/charsets/, http://www.slovo.info/unifonts.htm,  
http://www.cs.tut.fi/%7Ejkorpela/chars.html, http://www.topology.org/ 
soft/alpha.html, http://www.i18nguy.com/, http://web.archive.org/web/ 
20030622083607/www.diffuse.org/chars.html


How do you declare ISO Latin-3 or ISO 8859-3? This is meant for  
Southern European, Maltese, and Esperanto Glyphs, very exotic! Or do  
you live on Malta? Here is my test file for this encoding, starting  
with a hint for GNU Emacs:

;;; -*- mode: Text; coding: iso-8859-3; -*-
;
;	Time-stamp: <2005-07-15 14:20:24 pete>
;
;   Southern European, Maltese and Esperanto Glyphs (Latin 3)
;
;   oct   dec   hex    UCS2    UTF-8
;=====================================
   = 240 = 160 = A0 = U+00A0 =    C2 A0 : NO-BREAK SPACE
Ħ = 241 = 161 = A1 = U+0126 =    C4 A6 : LATIN CAPITAL LETTER H WITH  
STROKE
˘ = 242 = 162 = A2 = U+02D8 =    CB 98 : BREVE
£ = 243 = 163 = A3 = U+00A3 =    C2 A3 : POUND SIGN
¤ = 244 = 164 = A4 = U+00A4 =    C2 A4 : CURRENCY SIGN
Ĥ = 246 = 166 = A6 = U+0124 =    C4 A4 : LATIN CAPITAL LETTER H WITH  
CIRCUMFLEX
§ = 247 = 167 = A7 = U+00A7 =    C2 A7 : SECTION SIGN
¨ = 250 = 168 = A8 = U+00A8 =    C2 A8 : DIAERESIS
İ = 251 = 169 = A9 = U+0130 =    C4 B0 : LATIN CAPITAL LETTER I WITH  
DOT ABOVE
Ş = 252 = 170 = AA = U+015E =    C5 9E : LATIN CAPITAL LETTER S WITH  
CEDILLA
Ğ = 253 = 171 = AB = U+011E =    C4 9E : LATIN CAPITAL LETTER G WITH  
BREVE
Ĵ = 254 = 172 = AC = U+0134 =    C4 B4 : LATIN CAPITAL LETTER J WITH  
CIRCUMFLEX
 = 255 = 173 = AD = U+00AD =    C2 AD : HYPHEN-MINUS
Ż = 257 = 175 = AF = U+017B =    C5 BB : LATIN CAPITAL LETTER Z WITH  
DOT ABOVE
° = 260 = 176 = B0 = U+00B0 =    C2 B0 : DEGREE SIGN
ħ = 261 = 177 = B1 = U+0127 =    C4 A7 : LATIN SMALL LETTER H WITH  
STROKE
² = 262 = 178 = B2 = U+00B2 =    C2 B2 : SUPERSCRIPT TWO
³ = 263 = 179 = B3 = U+00B3 =    C2 B3 : SUPERSCRIPT THREE
´ = 264 = 180 = B4 = U+00B4 =    C2 B4 : ACUTE ACCENT
µ = 265 = 181 = B5 = U+00B5 =    C2 B5 : MICRO SIGN
ĥ = 266 = 182 = B6 = U+0125 =    C4 A5 : LATIN SMALL LETTER H WITH  
CIRCUMFLEX
· = 267 = 183 = B7 = U+00B7 =    C2 B7 : MIDDLE DOT
¸ = 270 = 184 = B8 = U+00B8 =    C2 B8 : CEDILLA
ı = 271 = 185 = B9 = U+0131 =    C4 B1 : LATIN SMALL LETTER DOTLESS I
ş = 272 = 186 = BA = U+015F =    C5 9F : LATIN SMALL LETTER S WITH  
CEDILLA
ğ = 273 = 187 = BB = U+011F =    C4 9F : LATIN SMALL LETTER G WITH  
BREVE
ĵ = 274 = 188 = BC = U+0135 =    C4 B5 : LATIN SMALL LETTER J WITH  
CIRCUMFLEX
½ = 275 = 189 = BD = U+00BD =    C2 BD : VULGAR FRACTION ONE HALF
ż = 277 = 191 = BF = U+017C =    C5 BC : LATIN SMALL LETTER Z WITH  
DOT ABOVE
À = 300 = 192 = C0 = U+00C0 =    C3 80 : LATIN CAPITAL LETTER A WITH  
GRAVE
Á = 301 = 193 = C1 = U+00C1 =    C3 81 : LATIN CAPITAL LETTER A WITH  
ACUTE
Â = 302 = 194 = C2 = U+00C2 =    C3 82 : LATIN CAPITAL LETTER A WITH  
CIRCUMFLEX
Ä = 304 = 196 = C4 = U+00C4 =    C3 84 : LATIN CAPITAL LETTER A WITH  
DIAERESIS
Ċ = 305 = 197 = C5 = U+010A =    C4 8A : LATIN CAPITAL LETTER C WITH  
DOT ABOVE
Ĉ = 306 = 198 = C6 = U+0108 =    C4 88 : LATIN CAPITAL LETTER C WITH  
CIRCUMFLEX
Ç = 307 = 199 = C7 = U+00C7 =    C3 87 : LATIN CAPITAL LETTER C WITH  
CEDILLA
È = 310 = 200 = C8 = U+00C8 =    C3 88 : LATIN CAPITAL LETTER E WITH  
GRAVE
É = 311 = 201 = C9 = U+00C9 =    C3 89 : LATIN CAPITAL LETTER E WITH  
ACUTE
Ê = 312 = 202 = CA = U+00CA =    C3 8A : LATIN CAPITAL LETTER E WITH  
CIRCUMFLEX
Ë = 313 = 203 = CB = U+00CB =    C3 8B : LATIN CAPITAL LETTER E WITH  
DIAERESIS
Ì = 314 = 204 = CC = U+00CC =    C3 8C : LATIN CAPITAL LETTER I WITH  
GRAVE
Í = 315 = 205 = CD = U+00CD =    C3 8D : LATIN CAPITAL LETTER I WITH  
ACUTE
Î = 316 = 206 = CE = U+00CE =    C3 8E : LATIN CAPITAL LETTER I WITH  
CIRCUMFLEX
Ï = 317 = 207 = CF = U+00CF =    C3 8F : LATIN CAPITAL LETTER I WITH  
DIAERESIS
Ñ = 321 = 209 = D1 = U+00D1 =    C3 91 : LATIN CAPITAL LETTER N WITH  
TILDE
Ò = 322 = 210 = D2 = U+00D2 =    C3 92 : LATIN CAPITAL LETTER O WITH  
GRAVE
Ó = 323 = 211 = D3 = U+00D3 =    C3 93 : LATIN CAPITAL LETTER O WITH  
ACUTE
Ô = 324 = 212 = D4 = U+00D4 =    C3 94 : LATIN CAPITAL LETTER O WITH  
CIRCUMFLEX
Ġ = 325 = 213 = D5 = U+0120 =    C4 A0 : LATIN CAPITAL LETTER G WITH  
DOT ABOVE
Ö = 326 = 214 = D6 = U+00D6 =    C3 96 : LATIN CAPITAL LETTER O WITH  
DIAERESIS
× = 327 = 215 = D7 = U+00D7 =    C3 97 : MULTIPLICATION SIGN
Ĝ = 330 = 216 = D8 = U+011C =    C4 9C : LATIN CAPITAL LETTER G WITH  
CIRCUMFLEX
Ù = 331 = 217 = D9 = U+00D9 =    C3 99 : LATIN CAPITAL LETTER U WITH  
GRAVE
Ú = 332 = 218 = DA = U+00DA =    C3 9A : LATIN CAPITAL LETTER U WITH  
ACUTE
Û = 333 = 219 = DB = U+00DB =    C3 9B : LATIN CAPITAL LETTER U WITH  
CIRCUMFLEX
Ü = 334 = 220 = DC = U+00DC =    C3 9C : LATIN CAPITAL LETTER U WITH  
DIAERESIS
Ŭ = 335 = 221 = DD = U+016C =    C5 AC : LATIN CAPITAL LETTER U WITH  
BREVE
Ŝ = 336 = 222 = DE = U+015C =    C5 9C : LATIN CAPITAL LETTER S WITH  
CIRCUMFLEX
ß = 337 = 223 = DF = U+00DF =    C3 9F : LATIN SMALL LETTER SHARP S
à = 340 = 224 = E0 = U+00E0 =    C3 A0 : LATIN SMALL LETTER A WITH  
GRAVE
á = 341 = 225 = E1 = U+00E1 =    C3 A1 : LATIN SMALL LETTER A WITH  
ACUTE
â = 342 = 226 = E2 = U+00E2 =    C3 A2 : LATIN SMALL LETTER A WITH  
CIRCUMFLEX
ä = 344 = 228 = E4 = U+00E4 =    C3 A4 : LATIN SMALL LETTER A WITH  
DIAERESIS
ċ = 345 = 229 = E5 = U+010B =    C4 8B : LATIN SMALL LETTER C WITH  
DOT ABOVE
ĉ = 346 = 230 = E6 = U+0109 =    C4 89 : LATIN SMALL LETTER C WITH  
CIRCUMFLEX
ç = 347 = 231 = E7 = U+00E7 =    C3 A7 : LATIN SMALL LETTER C WITH  
CEDILLA
è = 350 = 232 = E8 = U+00E8 =    C3 A8 : LATIN SMALL LETTER E WITH  
GRAVE
é = 351 = 233 = E9 = U+00E9 =    C3 A9 : LATIN SMALL LETTER E WITH  
ACUTE
ê = 352 = 234 = EA = U+00EA =    C3 AA : LATIN SMALL LETTER E WITH  
CIRCUMFLEX
ë = 353 = 235 = EB = U+00EB =    C3 AB : LATIN SMALL LETTER E WITH  
DIAERESIS
ì = 354 = 236 = EC = U+00EC =    C3 AC : LATIN SMALL LETTER I WITH  
GRAVE
í = 355 = 237 = ED = U+00ED =    C3 AD : LATIN SMALL LETTER I WITH  
ACUTE
î = 356 = 238 = EE = U+00EE =    C3 AE : LATIN SMALL LETTER I WITH  
CIRCUMFLEX
ï = 357 = 239 = EF = U+00EF =    C3 AF : LATIN SMALL LETTER I WITH  
DIAERESIS
ñ = 361 = 241 = F1 = U+00F1 =    C3 B1 : LATIN SMALL LETTER N WITH  
TILDE
ò = 362 = 242 = F2 = U+00F2 =    C3 B2 : LATIN SMALL LETTER O WITH  
GRAVE
ó = 363 = 243 = F3 = U+00F3 =    C3 B3 : LATIN SMALL LETTER O WITH  
ACUTE
ô = 364 = 244 = F4 = U+00F4 =    C3 B4 : LATIN SMALL LETTER O WITH  
CIRCUMFLEX
ġ = 365 = 245 = F5 = U+0121 =    C4 A1 : LATIN SMALL LETTER G WITH  
DOT ABOVE
ö = 366 = 246 = F6 = U+00F6 =    C3 B6 : LATIN SMALL LETTER O WITH  
DIAERESIS
÷ = 367 = 247 = F7 = U+00F7 =    C3 B7 : DIVISION SIGN
ĝ = 370 = 248 = F8 = U+011D =    C4 9D : LATIN SMALL LETTER G WITH  
CIRCUMFLEX
ù = 371 = 249 = F9 = U+00F9 =    C3 B9 : LATIN SMALL LETTER U WITH  
GRAVE
ú = 372 = 250 = FA = U+00FA =    C3 BA : LATIN SMALL LETTER U WITH  
ACUTE
û = 373 = 251 = FB = U+00FB =    C3 BB : LATIN SMALL LETTER U WITH  
CIRCUMFLEX
ü = 374 = 252 = FC = U+00FC =    C3 BC : LATIN SMALL LETTER U WITH  
DIAERESIS
ŭ = 375 = 253 = FD = U+016D =    C5 AD : LATIN SMALL LETTER U WITH  
BREVE
ŝ = 376 = 254 = FE = U+015D =    C5 9D : LATIN SMALL LETTER S WITH  
CIRCUMFLEX
˙ = 377 = 255 = FF = U+02D9 =    CB 99 : DOT ABOVE

--
Greetings

   Pete

The human brain operates at only 10% of its capacity. The rest is  
overhead for the operating system.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fontsets:  (was Re: query-replace?)
       [not found]         ` <mailman.445.1136812310.26925.help-gnu-emacs@gnu.org>
@ 2006-01-10  6:12           ` B. T. Raven
  2006-01-10 10:29             ` Peter Dyballa
  0 siblings, 1 reply; 14+ messages in thread
From: B. T. Raven @ 2006-01-10  6:12 UTC (permalink / raw)



"Peter Dyballa" <Peter_Dyballa@Web.DE> wrote in message
news:mailman.445.1136812310.26925.help-gnu-emacs@gnu.org...

Am 08.01.2006 um 22:18 schrieb B. T. Raven:

> Since this font covers such a large swath of Unicode I would rather
> stick
> with the Losedows interface for now. I still don't understand fontsets
> yet. Now the dialog box shows fonts, styles, point size (8-72) and
> script
> all in one place. If I went the the fontset route and I wanted only
> sizes
> 9-12 and only four different font styles, wouldn't I have to produced
> about 16 times as much lisp code as you show above to get the same
> Unicode
> coverage?

THe font variants (italic, bold, bold-italic) are automatically
chosen, so four sets for 9, 10, 11, and 12 would suffice.

Fontsets actually are necessary when you handle texts that have more
different characters than the small MS or ISO encodings provide. Then
GNU Emacs needs to create a table that maps code points (characters)
to members in fonts (glyphs) and this choice can look ugly. You help
GNU Emacs when you construct a fontset. IMO it would be enough to
create a fontset like this for 9 pt and three others for 10, 11, and
12 pt::

     (create-fontset-from-fontset-spec "-outline-Arial Unicode MS-
normal-r-*-*-9-*-*-*-*-*-fontset-09pt_arial_UC" t 'noerror)
(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-1
'("Arial Unicode MS" . "iso8859-1"))
(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-2
'("Arial Unicode MS" . "iso8859-2"))
(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-4
'("Arial Unicode MS" . "iso8859-4"))
(set-fontset-font "fontset-09pt_arial_UC"    'cyrillic-iso8859-5
'("Arial Unicode MS" . "iso8859-5"))
(set-fontset-font "fontset-09pt_arial_UC"      'arabic-iso8859-6
'("Arial Unicode MS" . "iso8859-6"))
(set-fontset-font "fontset-09pt_arial_UC"       'greek-iso8859-7
'("Arial Unicode MS" . "iso8859-7"))
(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-8
'("Arial Unicode MS" . "iso8859-8"))
(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-9
'("Arial Unicode MS" . "iso8859-9"))
(set-fontset-font "fontset-09pt_arial_UC"       'latin-iso8859-13
'("Arial Unicode MS" . "iso8859-13"))
(set-fontset-font "fontset-09pt_arial_UC" 'mule-unicode-0100-24ff
'("Arial Unicode MS" . "iso10646-1"))
(set-fontset-font "fontset-09pt_arial_UC" 'mule-unicode-2500-33ff
'("Arial Unicode MS" . "iso10646-1"))
(set-fontset-font "fontset-09pt_arial_UC" 'mule-unicode-e000-ffff
'("Arial Unicode MS" . "iso10646-1"))

The example most similar to this at the Gnu Windows FAQ shows a long
string in the  (create-fontset-from-fontset-spec "....") function. Is the
above to be evaluated as 15 forms or does another paren go on the end? Is
each fontset definition in a file somewhere or does all this go into the
.emacs? Anyway, I saved it to a file.

What about Lucida Console (666 glyphs, 714 mappings)? It can display
ISO 8859-3 in X11 completely  (Arial Unicode MS has 51,180 glyphs and
38,933 mappings -- and in X11 it has an ISO 8859-3 encoding!). It's
even monospaced. There is another monospaced font on the Web: Lucida
Sans Typewriter (1,376 glyphs, 1,425 mappings). It's part of the Java
SDKs (starting with Java 1.4 the Lucida fonts were reduced in
variants, so it's worth to retrieve JDK 1.3 first and update some of
these fonts with 1.4 and/or 1.5 fonts). The JDKs too have Lucida Sans
(2,929 glyphs, 2,410 mappings).

I think Windows has those encodings too. I don't know why I can't see the
glyphs in emacs. In Eli Z.'s codepage.el the Latin-3 encoding is
associated with cp857 and I have a cp_857.nls file in \windows\system
(win98). Also, I see all the characters in your table below. I can get
them into emacs, but by a roundabout method. Describe input method
latin-3-postfix shows only empty rectangles.

Probably you need some ISO 8859-3 encoding file. *I* have no idea
where in MS Losedows this would be needed, somewhere in the machinery
that creates partial, specifically named encodings from a Unicode
encoded font? If it does not work in a *partial* encoding: would
*complete* Unicode succeed?!

This is what I think the locale file cp_857.nls does. I had to add some of
these files to get emacs i18n working to the degree I have now.

;;; -*- mode: Text; coding: utf-8; -*-

First open in ISO 8859-3, then select to save in UTF-8 -- conversion
done!

But remember that some of us are lost in Dozeland. I edit with emacs,
almost everything else is done with other programs. I can get the table
below into emacs but only by copy-pasting into Open Office and then saving
as a utf-8 encoded text file. I am in Outlook Express here. Don't know how
to use Gnus, Rmail, most other things.


http://aspell.net/charsets/, http://www.slovo.info/unifonts.htm,
http://www.cs.tut.fi/%7Ejkorpela/chars.html, http://www.topology.org/
soft/alpha.html, http://www.i18nguy.com/, http://web.archive.org/web/
20030622083607/www.diffuse.org/chars.html

Thanks for these. The i18nguy has some very implessive stuff (and useful
links). Just this one:

http://www.i18nguy.com/unicode/codepages.html

is a treasure trove.


How do you declare ISO Latin-3 or ISO 8859-3? This is meant for
Southern European, Maltese, and Esperanto Glyphs, very exotic! Or do
you live on Malta? Here is my test file for this encoding, starting
with a hint for GNU Emacs:

No, in U.S. In Eli's codepage.el this block of chars (or glyphs) is headed
by ;; Turkish. This goes with DOS code page 857.


;;; -*- mode: Text; coding: iso-8859-3; -*-
;
; Time-stamp: <2005-07-15 14:20:24 pete>
;
;   Southern European, Maltese and Esperanto Glyphs (Latin 3)
;
;   oct   dec   hex    UCS2    UTF-8
;=====================================
   = 240 = 160 = A0 = U+00A0 =    C2 A0 : NO-BREAK SPACE
Ħ = 241 = 161 = A1 = U+0126 =    C4 A6 : LATIN CAPITAL LETTER H WITH
STROKE
˘ = 242 = 162 = A2 = U+02D8 =    CB 98 : BREVE
£ = 243 = 163 = A3 = U+00A3 =    C2 A3 : POUND SIGN
¤ = 244 = 164 = A4 = U+00A4 =    C2 A4 : CURRENCY SIGN
Ĥ = 246 = 166 = A6 = U+0124 =    C4 A4 : LATIN CAPITAL LETTER H WITH
CIRCUMFLEX
§ = 247 = 167 = A7 = U+00A7 =    C2 A7 : SECTION SIGN
.
.
.
etc.

Apparently your emacs is using Unicode as its internal representation. I
can't do that with 21.3. Thanks for all the food for thought.

Ed

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: fontsets:  (was Re: query-replace?)
  2006-01-10  6:12           ` B. T. Raven
@ 2006-01-10 10:29             ` Peter Dyballa
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Dyballa @ 2006-01-10 10:29 UTC (permalink / raw)
  Cc: help-gnu-emacs

Am 10.01.2006 um 06:12 schrieb B. T. Raven:

> No, in U.S. In Eli's codepage.el this block of chars (or glyphs) is  
> headed
> by ;; Turkish. This goes with DOS code page 857.
>

I don't know so much about DOS code pages, I stick to standards:

ISO 8859-1:  Western European Glyphs (Latin 1)
ISO 8859-2:  Central and Eastern European Glyphs (Latin 2)
ISO 8859-3:  Southern European, Maltese, and Esperanto Glyphs (Latin 3)
ISO 8859-4:  Northern European Glyphs (Latin 4)
ISO 8859-5:  Cyrillic Glyphs
ISO 8859-6:  Arabic Glyphs
ISO 8859-7:  Modern Greek Glyphs (ELOT928)
ISO 8859-8:  Hebrew Glyphs
ISO 8859-9:  Turkish Glyphs (Latin 5)
ISO 8859-10: New Nordic Glyphs: Saami, Inuit, Icelandic (Latin 6)
ISO 8859-11: Thai Glyphs
ISO 8859-13: Baltic Glyphs (Latin 7)
ISO 8859-14: Celtic Glyphs (Latin 8)
ISO 8859-15: Western European Glyphs with € (Latin 9, Latin 0)
ISO 8859-16: South-Eastern European Glyphs with €, Romanian (Latin 10)

If you want I can send you my 8 bit Turkish test file in ISO Latin-5.  
Which should work because you have -outline-Arial Unicode MS-normal-r- 
normal-normal-*-*-96-96-p-*-iso8859-5.

>
> Apparently your emacs is using Unicode as its internal  
> representation. I
> can't do that with 21.3. Thanks for all the food for thought.

No. I took the excerpt off GNU Emacs 22.0.50, the one that is coming  
closer to Unicode every day. The important thing is that the file has  
in the leftmost column the right code values. With the encoding line  
Emacs tries to present these codes in characters belonging to this  
set. Then I have the right fontset defined so that the right glyphs  
are chosen from the font(s). The codes in the leftmost column are  
still 8 bit!

Maybe you know printf, a function in C, that is used in UNIX as a  
modern substitute to echo, that is also used in script languages like  
awk or Perl. In Perl

	printf "%c = %d = %o = %x\n", 234, 234, 234, 234;

would create the left columns of one line (in a loop it can become  
more lines), the others come from a programme that can translate ISO  
to UTF-8 representation or lookup the Unicode position for that  
character. The descriptions, I think, are taken from a file in the  
Kermit distribution. In GNU Emacs they all fused together.

--
Greetings

   Pete

Eat the rich -- the poor are tough and stringy.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-01-10 10:29 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-07 19:04 query-replace? B. T. Raven
2006-01-08  4:20 ` query-replace? Eli Zaretskii
     [not found] ` <mailman.301.1136694143.26925.help-gnu-emacs@gnu.org>
2006-01-08  5:38   ` query-replace? B. T. Raven
2006-01-08 12:03 ` query-replace? Peter Dyballa
2006-01-08 12:11   ` query-replace? Lennart Borgman
2006-01-08 12:40     ` query-replace? Peter Dyballa
2006-01-08 12:48       ` query-replace? Lennart Borgman
2006-01-08 19:41       ` query-replace? Eli Zaretskii
     [not found] ` <mailman.315.1136721918.26925.help-gnu-emacs@gnu.org>
2006-01-08 16:17   ` imput methods (was Re: query-replace?) B. T. Raven
2006-01-08 17:05     ` Peter Dyballa
     [not found]     ` <mailman.349.1136740073.26925.help-gnu-emacs@gnu.org>
2006-01-08 22:18       ` fontsets: " B. T. Raven
2006-01-09 11:53         ` Peter Dyballa
     [not found]         ` <mailman.445.1136812310.26925.help-gnu-emacs@gnu.org>
2006-01-10  6:12           ` B. T. Raven
2006-01-10 10:29             ` Peter Dyballa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).