unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* prettify symbols question
@ 2020-11-11 17:01 Alfred M. Szmidt
  2020-11-12 14:59 ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-11 17:01 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

Not sure if this is better for help-gnu-emacs or here.

What would the proper way to handle say #o210 in prettify-symbols?

I've attached a simple test, I would expect to see the #o210 sequence
in the file to be shown as a unicode lambda, but nothing changes -- I
suspect it is due to some encoding mismatch between the buffer and the
string.

[-- Attachment #2: prettify-test.el --]
[-- Type: application/emacs-lisp, Size: 282 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-11 17:01 prettify symbols question Alfred M. Szmidt
@ 2020-11-12 14:59 ` Eli Zaretskii
  2020-11-12 15:17   ` Alfred M. Szmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-12 14:59 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Date: Wed, 11 Nov 2020 12:01:37 -0500
> 
> What would the proper way to handle say #o210 in prettify-symbols?
> 
> I've attached a simple test, I would expect to see the #o210 sequence
> in the file to be shown as a unicode lambda, but nothing changes -- I
> suspect it is due to some encoding mismatch between the buffer and the
> string.

prettify-symbols-mode doesn't act on text in comments, see
'prettify-symbols-default-compose-p'.  If you move your #o210 out of
the comment, it should get displayed as you expect.

You can replace 'prettify-symbols-default-compose-p' with your own
function, and set up 'prettify-symbols-compose-predicate' to use it
instead of the default predicate, if you want to prettify stuff in
comments.

If the above doesn't work, then maybe it _is_ related to encoding.
What does the mode line say about 'buffer-file-coding-system when' you
visit this file?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 14:59 ` Eli Zaretskii
@ 2020-11-12 15:17   ` Alfred M. Szmidt
  2020-11-12 15:38     ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-12 15:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   > What would the proper way to handle say #o210 in prettify-symbols?
   > 
   > I've attached a simple test, I would expect to see the #o210 sequence
   > in the file to be shown as a unicode lambda, but nothing changes -- I
   > suspect it is due to some encoding mismatch between the buffer and the
   > string.

   prettify-symbols-mode doesn't act on text in comments, see
   'prettify-symbols-default-compose-p'.  If you move your #o210 out of
   the comment, it should get displayed as you expect.

Ah, that explains some.

   You can replace 'prettify-symbols-default-compose-p' with your own
   function, and set up 'prettify-symbols-compose-predicate' to use it
   instead of the default predicate, if you want to prettify stuff in
   comments.

Thank you for the tip, that will be useful (I need this to act on all
sequences even in symbols).

   If the above doesn't work, then maybe it _is_ related to encoding.
   What does the mode line say about 'buffer-file-coding-system when' you
   visit this file?

So when the buffer-file-coding-system is utf-8-unix everything works
(where also the sequence is not acted on in comments).  But when the
buffer is raw-text-unix, it does not work for #o210, but works for say
#o10.  Some multi-byte thing going on?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 15:17   ` Alfred M. Szmidt
@ 2020-11-12 15:38     ` Eli Zaretskii
  2020-11-12 16:14       ` Eli Zaretskii
  2020-11-13  8:27       ` prettify symbols question Alfred M. Szmidt
  0 siblings, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-12 15:38 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Thu, 12 Nov 2020 10:17:05 -0500
> 
>    If the above doesn't work, then maybe it _is_ related to encoding.
>    What does the mode line say about 'buffer-file-coding-system when' you
>    visit this file?
> 
> So when the buffer-file-coding-system is utf-8-unix everything works
> (where also the sequence is not acted on in comments).  But when the
> buffer is raw-text-unix, it does not work for #o210, but works for say
> #o10.  Some multi-byte thing going on?

Yes, raw-text means the buffer includes raw bytes, not characters.
Emacs doesn't do anything useful with raw bytes above 127, and in
particular doesn't interpret them as characters.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 15:38     ` Eli Zaretskii
@ 2020-11-12 16:14       ` Eli Zaretskii
  2020-11-12 20:53         ` Alfred M. Szmidt
  2020-11-13  8:27       ` prettify symbols question Alfred M. Szmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-12 16:14 UTC (permalink / raw)
  To: ams; +Cc: emacs-devel

> Date: Thu, 12 Nov 2020 17:38:12 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> > So when the buffer-file-coding-system is utf-8-unix everything works
> > (where also the sequence is not acted on in comments).  But when the
> > buffer is raw-text-unix, it does not work for #o210, but works for say
> > #o10.  Some multi-byte thing going on?
> 
> Yes, raw-text means the buffer includes raw bytes, not characters.
> Emacs doesn't do anything useful with raw bytes above 127, and in
> particular doesn't interpret them as characters.

Btw, in what encoding does \210 stand for GREEK SMALL LETTER LAMBDA?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 16:14       ` Eli Zaretskii
@ 2020-11-12 20:53         ` Alfred M. Szmidt
  2020-11-12 21:12           ` Basil L. Contovounesios
  2020-11-13  7:24           ` Eli Zaretskii
  0 siblings, 2 replies; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-12 20:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   > > So when the buffer-file-coding-system is utf-8-unix everything works
   > > (where also the sequence is not acted on in comments).  But when the
   > > buffer is raw-text-unix, it does not work for #o210, but works for say
   > > #o10.  Some multi-byte thing going on?
   > 
   > Yes, raw-text means the buffer includes raw bytes, not characters.
   > Emacs doesn't do anything useful with raw bytes above 127, and in
   > particular doesn't interpret them as characters.

   Btw, in what encoding does \210 stand for GREEK SMALL LETTER LAMBDA?

The Lisp Machine character set -- there is a long story that I could
tell about why if anyone is curious but very much a tanget.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 20:53         ` Alfred M. Szmidt
@ 2020-11-12 21:12           ` Basil L. Contovounesios
  2020-11-12 21:25             ` Drew Adams
  2020-11-13  7:44             ` Eli Zaretskii
  2020-11-13  7:24           ` Eli Zaretskii
  1 sibling, 2 replies; 29+ messages in thread
From: Basil L. Contovounesios @ 2020-11-12 21:12 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: Eli Zaretskii, emacs-devel

"Alfred M. Szmidt" <ams@gnu.org> writes:

>    Btw, in what encoding does \210 stand for GREEK SMALL LETTER LAMBDA?
>
> The Lisp Machine character set -- there is a long story that I could
> tell about why if anyone is curious but very much a tanget.

Feel free to CC me if you end up going on that tangent. :)

-- 
Basil



^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: prettify symbols question
  2020-11-12 21:12           ` Basil L. Contovounesios
@ 2020-11-12 21:25             ` Drew Adams
  2020-11-13  7:44             ` Eli Zaretskii
  1 sibling, 0 replies; 29+ messages in thread
From: Drew Adams @ 2020-11-12 21:25 UTC (permalink / raw)
  To: Basil L. Contovounesios, Alfred M. Szmidt; +Cc: Eli Zaretskii, emacs-devel

> >    Btw, in what encoding does \210 stand for GREEK SMALL LETTER
> LAMBDA?
> >
> > The Lisp Machine character set -- there is a long story that I could
> > tell about why if anyone is curious but very much a tanget.
> 
> Feel free to CC me if you end up going on that tangent. :)

+1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 20:53         ` Alfred M. Szmidt
  2020-11-12 21:12           ` Basil L. Contovounesios
@ 2020-11-13  7:24           ` Eli Zaretskii
  2020-11-13 10:15             ` Alfred M. Szmidt
  2020-11-13 11:17             ` Alfred M. Szmidt
  1 sibling, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-13  7:24 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Thu, 12 Nov 2020 15:53:05 -0500
> 
>    > > So when the buffer-file-coding-system is utf-8-unix everything works
>    > > (where also the sequence is not acted on in comments).  But when the
>    > > buffer is raw-text-unix, it does not work for #o210, but works for say
>    > > #o10.  Some multi-byte thing going on?
>    > 
>    > Yes, raw-text means the buffer includes raw bytes, not characters.
>    > Emacs doesn't do anything useful with raw bytes above 127, and in
>    > particular doesn't interpret them as characters.
> 
>    Btw, in what encoding does \210 stand for GREEK SMALL LETTER LAMBDA?
> 
> The Lisp Machine character set

Emacs doesn't support such an encoding/charset, does it?  Maybe it
should?  Is this character set documented somewhere?  The Lisp Machine
Manual I have seems to say that \210 is BS or Overstrike, not LAMBDA
(https://tumbleweed.nu/r/lm-3/uv/chinual.html#The-Character-Set).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 21:12           ` Basil L. Contovounesios
  2020-11-12 21:25             ` Drew Adams
@ 2020-11-13  7:44             ` Eli Zaretskii
  1 sibling, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-13  7:44 UTC (permalink / raw)
  To: Basil L. Contovounesios; +Cc: ams, emacs-devel

> From: "Basil L. Contovounesios" <contovob@tcd.ie>
> Date: Thu, 12 Nov 2020 21:12:31 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> 
> "Alfred M. Szmidt" <ams@gnu.org> writes:
> 
> >    Btw, in what encoding does \210 stand for GREEK SMALL LETTER LAMBDA?
> >
> > The Lisp Machine character set -- there is a long story that I could
> > tell about why if anyone is curious but very much a tanget.
> 
> Feel free to CC me if you end up going on that tangent. :)

There's emacs-tangents@gnu.org for such tangents, personal email is
not necessarily necessary.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-12 15:38     ` Eli Zaretskii
  2020-11-12 16:14       ` Eli Zaretskii
@ 2020-11-13  8:27       ` Alfred M. Szmidt
  2020-11-13  8:40         ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13  8:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   >    If the above doesn't work, then maybe it _is_ related to encoding.
   >    What does the mode line say about 'buffer-file-coding-system when' you
   >    visit this file?
   > 
   > So when the buffer-file-coding-system is utf-8-unix everything works
   > (where also the sequence is not acted on in comments).  But when the
   > buffer is raw-text-unix, it does not work for #o210, but works for say
   > #o10.  Some multi-byte thing going on?

   Yes, raw-text means the buffer includes raw bytes, not characters.
   Emacs doesn't do anything useful with raw bytes above 127, and in
   particular doesn't interpret them as characters.

Do you have any ideas on what a good coding system would be for this?
utf-8 is obviously wrong.  The char. set is just 8-bit, or should I
write a coding system?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-13  8:27       ` prettify symbols question Alfred M. Szmidt
@ 2020-11-13  8:40         ` Eli Zaretskii
  0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-13  8:40 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 13 Nov 2020 03:27:17 -0500
> 
>    Yes, raw-text means the buffer includes raw bytes, not characters.
>    Emacs doesn't do anything useful with raw bytes above 127, and in
>    particular doesn't interpret them as characters.
> 
> Do you have any ideas on what a good coding system would be for this?
> utf-8 is obviously wrong.  The char. set is just 8-bit, or should I
> write a coding system?

The latter, IMO.  More accurately, define a charset, and then defining
a coding-system for it should be almost trivial.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-13  7:24           ` Eli Zaretskii
@ 2020-11-13 10:15             ` Alfred M. Szmidt
  2020-11-13 11:17             ` Alfred M. Szmidt
  1 sibling, 0 replies; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 10:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   > The Lisp Machine character set

   Emacs doesn't support such an encoding/charset, does it?  Maybe it
   should?

No, not yet.  I can see about doing that.

   Is this character set documented somewhere?  The Lisp Machine
   Manual I have seems to say that \210 is BS or Overstrike, not LAMBDA
   (https://tumbleweed.nu/r/lm-3/uv/chinual.html#The-Character-Set).

That is how the Lisp Machine sees things (there is an implicit
conversion of the files from the host to the Lisp Machine when read
over Chaosnet); e.g, newline is #o215, but when files are stored on a
Unix host they have been translated so that newline #o215 becomes
#o12, similar for tab, etc so things are viewable on ASCII systems.

So there are two encodings, one is native to the Lisp Machine (where
#o215 is left as is), and the other one for UNIX (where #o215, etc,
are translated).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-13  7:24           ` Eli Zaretskii
  2020-11-13 10:15             ` Alfred M. Szmidt
@ 2020-11-13 11:17             ` Alfred M. Szmidt
  2020-11-13 12:22               ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 11:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   > The Lisp Machine character set

   Emacs doesn't support such an encoding/charset, does it?  Maybe it
   should?  Is this character set documented somewhere?  The Lisp Machine
   Manual I have seems to say that \210 is BS or Overstrike, not LAMBDA
   (https://tumbleweed.nu/r/lm-3/uv/chinual.html#The-Character-Set).

That now contains both the Unix stored files, and the native one (also
attached).

I'm slightly confused as how to add a new coding system, do I need to
first add a charset (the converted one would be an :ascii-compatible-p
t, and the native nil?)?  I found the manual slightly sparse on this
front.


===File ~/lispm-charset.text================================
000 center-dot              040 space       100 @           140 `
001 down arrow              041 !           101 A           141 a
002 alpha                   042 "           102 B           142 b
003 beta                    043 #           103 C           143 c
004 and-sign                044 $           104 D           144 d
005 not-sign                045 %           105 E           145 e
006 epsilon                 046 &           106 F           146 f
007 pi                      047 '           107 G           147 g
010 lambda                  050 (           110 H           150 h
011 gamma                   051 )           111 I           151 i
012 delta                   052 *           112 J           152 j
013 uparrow                 053 +           113 K           153 k
014 plus-minus              054 ,           114 L           154 l
015 circle-plus             055 -           115 M           155 m
016 infinity                056 .           116 N           156 n
017 partial delta           057 /           117 O           157 o
020 left horseshoe          060 0           120 P           160 p
021 right horseshoe         061 1           121 Q           161 q
022 up horseshoe            062 2           122 R           162 r
023 down horseshoe          063 3           123 S           163 s
024 universal quantifier    064 4           124 T           164 t
025 existential quantifier  065 5           125 U           165 u
026 circle-X                066 6           126 V           166 v
027 double-arrow            067 7           127 W           167 w
030 left arrow              070 8           130 X           170 x
031 right arrow             071 9           131 Y           171 y
032 not-equals              072 :           132 Z           172 z
033 diamond (altmode)       073 ;           133 [           173 {
034 less-or-equal           074 <           134 \           174 |
035 greater-or-equal        075 =           135 ]           175 }
036 equivalence             076 >           136 ^           176 ~
037 or                      077 ?           137 _           177 @ref{ctl-qm}
200 Null character     210 Overstrike    220 Stop-output   230 Roman-iv
201 Break              211 Tab           221 Abort         231 Hand-up
202 Clear              212 Line          222 Resume        232 Hand-down
203 Call               213 Delete        223 Status        233 Hand-left
204 Terminal escape    214 Page          224 End           234 Hand-right
205 Macro/backnext     215 Return        225 Roman-i       235 System
206 Help               216 Quote         226 Roman-ii      236 Network
207 Rubout             217 Hold-output   227 Roman-iii
237-377 reserved for the future


                    The Lisp Machine Character Set
			(all numbers in octal)

\f
000 center-dot              040 space       100 @           140 `
001 down arrow              041 !           101 A           141 a
002 alpha                   042 "           102 B           142 b
003 beta                    043 #           103 C           143 c
004 and-sign                044 $           104 D           144 d
005 not-sign                045 %           105 E           145 e
006 epsilon                 046 &           106 F           146 f
007 pi                      047 '           107 G           147 g
210 lambda                  050 (           110 H           150 h
211 gamma                   051 )           111 I           151 i
212 delta                   052 *           112 J           152 j
213 uparrow                 053 +           113 K           153 k
214 plus-minus              054 ,           114 L           154 l
215 circle-plus             055 -           115 M           155 m
016 infinity                056 .           116 N           156 n
017 partial delta           057 /           117 O           157 o
020 left horseshoe          060 0           120 P           160 p
021 right horseshoe         061 1           121 Q           161 q
022 up horseshoe            062 2           122 R           162 r
023 down horseshoe          063 3           123 S           163 s
024 universal quantifier    064 4           124 T           164 t
025 existential quantifier  065 5           125 U           165 u
026 circle-X                066 6           126 V           166 v
027 double-arrow            067 7           127 W           167 w
030 left arrow              070 8           130 X           170 x
031 right arrow             071 9           131 Y           171 y
032 not-equals              072 :           132 Z           172 z
033 diamond (altmode)       073 ;           133 [           173 {
034 less-or-equal           074 <           134 \           174 |
035 greater-or-equal        075 =           135 ]           175 }
036 equivalence             076 >           136 ^           176 ~
037 or                      077 ?           137 _           177 @ref{ctl-qm}
200 Null character      10 Overstrike    220 Stop-output   230 Roman-iv
201 Break               11 Tab           221 Abort         231 Hand-up
202 Clear               15 Line          222 Resume        232 Hand-down
203 Call                13 Delete        223 Status        233 Hand-left
204 Terminal escape     14 Page          224 End           234 Hand-right
205 Macro/backnext      12 Return        225 Roman-i       235 System
206 Help               216 Quote         226 Roman-ii      236 Network
207 Rubout             217 Hold-output   227 Roman-iii
237-377 reserved for the future


                    The Lisp Machine Character Set
                          as stored on UNIX
			(all numbers in octal)
============================================================



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-13 11:17             ` Alfred M. Szmidt
@ 2020-11-13 12:22               ` Eli Zaretskii
  2020-11-13 13:31                 ` Alfred M. Szmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-13 12:22 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 13 Nov 2020 06:17:36 -0500
> 
> I'm slightly confused as how to add a new coding system, do I need to
> first add a charset (the converted one would be an :ascii-compatible-p
> t, and the native nil?)?

Yes.  You will also need to prepare a mapping file, see below.  See
the example of how we define, for example, coding-systems for
MS-Windows codepages:

  (define-charset 'windows-1250
    "WINDOWS-1250 (Central Europe)"
    :short-name "WINDOWS-1250"
    :ascii-compatible-p t
    :code-space [0 255]
    :map "CP1250")

  (define-coding-system 'windows-1250
    "windows-1250 (Central European) encoding (MIME: WINDOWS-1250)"
    :coding-type 'charset
    :mnemonic ?*
    :charset-list '(windows-1250)
    :mime-charset 'windows-1250)

(The mapping files are in etc/charsets; the :map attribute of the
charset names the mapping file to use.)

> I found the manual slightly sparse on this front.

That's on purpose.  The ELisp manual says:

     How to define a coding system is an arcane matter, and is not
  documented here.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-13 12:22               ` Eli Zaretskii
@ 2020-11-13 13:31                 ` Alfred M. Szmidt
  2020-11-13 13:47                   ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 13:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   > I'm slightly confused as how to add a new coding system, do I need to
   > first add a charset (the converted one would be an :ascii-compatible-p
   > t, and the native nil?)?

   Yes.  You will also need to prepare a mapping file, see below.  See
   the example of how we define, for example, coding-systems for
   MS-Windows codepages:

   (The mapping files are in etc/charsets; the :map attribute of the
   charset names the mapping file to use.)

Which are generated from the admin/charsets files, which in turn
sometimes pulled in from glibc.  So the easiest route is to add a
LISPM like charset mapping following glibc (and then also see if it
can be included there).  And then do,

(define-charset 'lispm
  "LISPM"
  :short-name "LISPM"
  :ascii-compatible-p nil
  :code-space [0 255]
  :map "LISPM")

(define-coding-system 'lispm
  "Lisp Machine encoding"
  :coding-type 'charset
  :mnemonic ?L
  :charset-list '(lispm))

So that sorts it out for the native one, but what should be done for
the Unix friendly mapping?  LISPM-ASCII, and similar as above?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: prettify symbols question
  2020-11-13 13:31                 ` Alfred M. Szmidt
@ 2020-11-13 13:47                   ` Eli Zaretskii
  2020-11-13 14:47                     ` new coding system (was: Re: prettify symbols question) Alfred M. Szmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-13 13:47 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 13 Nov 2020 08:31:42 -0500
> 
> (define-charset 'lispm
>   "LISPM"
>   :short-name "LISPM"
>   :ascii-compatible-p nil
>   :code-space [0 255]
>   :map "LISPM")
> 
> (define-coding-system 'lispm
>   "Lisp Machine encoding"
>   :coding-type 'charset
>   :mnemonic ?L
>   :charset-list '(lispm))
> 
> So that sorts it out for the native one, but what should be done for
> the Unix friendly mapping?  LISPM-ASCII, and similar as above?

Something like that.  Although I'm not sure about the name.  But why
do you need the native variant?  If we only need one charset, for how
it is seen on Unix, we could call that 'lispm'.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* new coding system (was: Re: prettify symbols question)
  2020-11-13 13:47                   ` Eli Zaretskii
@ 2020-11-13 14:47                     ` Alfred M. Szmidt
  2020-11-13 14:59                       ` Eli Zaretskii
  2020-11-13 17:32                       ` new coding system Andreas Schwab
  0 siblings, 2 replies; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 14:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 8156 bytes --]

   > (define-charset 'lispm
   >   "LISPM"
   >   :short-name "LISPM"
   >   :ascii-compatible-p nil
   >   :code-space [0 255]
   >   :map "LISPM")
   > 
   > (define-coding-system 'lispm
   >   "Lisp Machine encoding"
   >   :coding-type 'charset
   >   :mnemonic ?L
   >   :charset-list '(lispm))
   > 
   > So that sorts it out for the native one, but what should be done for
   > the Unix friendly mapping?  LISPM-ASCII, and similar as above?

   Something like that.  Although I'm not sure about the name.  But why
   do you need the native variant?  If we only need one charset, for how
   it is seen on Unix, we could call that 'lispm'.

Right, it is easy enough to convert if one has native files.  

So I've created a LISPM charmap, and a LISPM charset map based on
that.  Then calling define-charset and define-coding-system, if I now
try to open a Lisp machine file in the lispm coding it seems to be
unable to handle the various characters; e.g., #o210.

  These default coding systems were tried to encode text
  in the buffer ‘lispm-char-test.text’:
    (lispm-unix (1 . 0) (59 . 1) (117 . 2) (175 . 3) (233 . 4) (291 . 5)
    (349 . 6) (407 . 7) (465 . 4194184) (523 . 4194185) (581 . 4194186))
  However, each of them encountered characters it couldn’t encode:
    ....

Is there something that I forgot to do?

===File ~/emacs/admin/charsets/glibc/LISPM.gz===============
<code_set_name> LISPM
<comment_char> %
<escape_char> /
% version: 1.0
%  source: The Lisp Machine Manual, 6th ed.

CHARMAP
<U00B7>    /x00        MIDDLE DOT
<U2193>    /x01        DOWNWARDS ARROW
<U03B1>    /x02        GREEK SMALL LETTER ALPHA
<U03B2>    /x03        GREEK SMALL LETTER BETA
<U2227>    /x04        LOGICAL AND
<U00AC>    /x05        NOT SIGN
<U03B5>    /x06        GREEK SMALL LETTER EPSILON
<U03C0>    /x07        GREEK SMALL LETTER PI
<U03BB>    /x88        GREEK SMALL LETTER LAMDA
<U03B3>    /x89        GREEK SMALL LETTER GAMMA
<U03B4>    /x8a        GREEK SMALL LETTER DELTA
<U2191>    /x8b        UPWARDS ARROW
<U00B1>    /x8c        PLUS-MINUS SIGN
<U2295>    /x8d        CIRCLED PLUS
<U221E>    /x0e        INFINITY
<U2202>    /x0f        PARTIAL DIFFERENTIAL
<U2282>    /x10        SUBSET OF
<U2283>    /x11        SUPERSET OF
<U2229>    /x12        INTERSECTION
<U222A>    /x13        UNION
<U2200>    /x14        FOR ALL
<U2203>    /x15        THERE EXISTS
<U2297>    /x16        CIRCLED TIMES
<U2194>    /x17        LEFT RIGHT ARROW
<U2190>    /x18        LEFTWARDS ARROW
<U2192>    /x19        RIGHTWARDS ARROW
<U2260>    /x1a        NOT EQUAL TO
<U25CA>    /x1b        LOZENGE
<U2264>    /x1c        LESS-THAN OR EQUAL TO
<U2265>    /x1d        GREATER-THAN OR EQUAL TO
<U2261>    /x1e        IDENTICAL TO
<U2228>    /x1f        LOGICAL OR
<U0020>    /x20        SPACE
<U0021>    /x21        EXCLAMATION MARK
<U0022>    /x22        QUOTATION MARK
<U0023>    /x23        NUMBER SIGN
<U0024>    /x24        DOLLAR SIGN
<U0025>    /x25        PERCENT SIGN
<U0026>    /x26        AMPERSAND
<U0027>    /x27        APOSTROPHE
<U0028>    /x28        LEFT PARENTHESIS
<U0029>    /x29        RIGHT PARENTHESIS
<U002A>    /x2a        ASTERISK
<U002B>    /x2b        PLUS SIGN
<U002C>    /x2c        COMMA
<U002D>    /x2d        HYPHEN-MINUS
<U002E>    /x2e        FULL STOP
<U002F>    /x2f        SOLIDUS
<U0030>    /x30        DIGIT ZERO
<U0031>    /x31        DIGIT ONE
<U0032>    /x32        DIGIT TWO
<U0033>    /x33        DIGIT THREE
<U0034>    /x34        DIGIT FOUR
<U0035>    /x35        DIGIT FIVE
<U0036>    /x36        DIGIT SIX
<U0037>    /x37        DIGIT SEVEN
<U0038>    /x38        DIGIT EIGHT
<U0039>    /x39        DIGIT NINE
<U003A>    /x3a        COLON
<U003B>    /x3b        SEMICOLON
<U003C>    /x3c        LESS-THAN SIGN
<U003D>    /x3d        EQUALS SIGN
<U003E>    /x3e        GREATER-THAN SIGN
<U003F>    /x3f        QUESTION MARK
<U0040>    /x40        COMMERCIAL AT
<U0041>    /x41        LATIN CAPITAL LETTER A
<U0042>    /x42        LATIN CAPITAL LETTER B
<U0043>    /x43        LATIN CAPITAL LETTER C
<U0044>    /x44        LATIN CAPITAL LETTER D
<U0045>    /x45        LATIN CAPITAL LETTER E
<U0046>    /x46        LATIN CAPITAL LETTER F
<U0047>    /x47        LATIN CAPITAL LETTER G
<U0048>    /x48        LATIN CAPITAL LETTER H
<U0049>    /x49        LATIN CAPITAL LETTER I
<U004A>    /x4a        LATIN CAPITAL LETTER J
<U004B>    /x4b        LATIN CAPITAL LETTER K
<U004C>    /x4c        LATIN CAPITAL LETTER L
<U004D>    /x4d        LATIN CAPITAL LETTER M
<U004E>    /x4e        LATIN CAPITAL LETTER N
<U004F>    /x4f        LATIN CAPITAL LETTER O
<U0050>    /x50        LATIN CAPITAL LETTER P
<U0051>    /x51        LATIN CAPITAL LETTER Q
<U0052>    /x52        LATIN CAPITAL LETTER R
<U0053>    /x53        LATIN CAPITAL LETTER S
<U0054>    /x54        LATIN CAPITAL LETTER T
<U0055>    /x55        LATIN CAPITAL LETTER U
<U0056>    /x56        LATIN CAPITAL LETTER V
<U0057>    /x57        LATIN CAPITAL LETTER W
<U0058>    /x58        LATIN CAPITAL LETTER X
<U0059>    /x59        LATIN CAPITAL LETTER Y
<U005A>    /x5a        LATIN CAPITAL LETTER Z
<U005B>    /x5b        LEFT SQUARE BRACKET
<U005C>    /x5c        REVERSE SOLIDUS
<U005D>    /x5d        RIGHT SQUARE BRACKET
<U005E>    /x5e        CIRCUMFLEX ACCENT
<U005F>    /x5f        LOW LINE
<U0060>    /x60        GRAVE ACCENT
<U0061>    /x61        LATIN SMALL LETTER A
<U0062>    /x62        LATIN SMALL LETTER B
<U0063>    /x63        LATIN SMALL LETTER C
<U0064>    /x64        LATIN SMALL LETTER D
<U0065>    /x65        LATIN SMALL LETTER E
<U0066>    /x66        LATIN SMALL LETTER F
<U0067>    /x67        LATIN SMALL LETTER G
<U0068>    /x68        LATIN SMALL LETTER H
<U0069>    /x69        LATIN SMALL LETTER I
<U006A>    /x6a        LATIN SMALL LETTER J
<U006B>    /x6b        LATIN SMALL LETTER K
<U006C>    /x6c        LATIN SMALL LETTER L
<U006D>    /x6d        LATIN SMALL LETTER M
<U006E>    /x6e        LATIN SMALL LETTER N
<U006F>    /x6f        LATIN SMALL LETTER O
<U0070>    /x70        LATIN SMALL LETTER P
<U0071>    /x71        LATIN SMALL LETTER Q
<U0072>    /x72        LATIN SMALL LETTER R
<U0073>    /x73        LATIN SMALL LETTER S
<U0074>    /x74        LATIN SMALL LETTER T
<U0075>    /x75        LATIN SMALL LETTER U
<U0076>    /x76        LATIN SMALL LETTER V
<U0077>    /x77        LATIN SMALL LETTER W
<U0078>    /x78        LATIN SMALL LETTER X
<U0079>    /x79        LATIN SMALL LETTER Y
<U007A>    /x7a        LATIN SMALL LETTER Z
<U007B>    /x7b        LEFT CURLY BRACKET
<U007C>    /x7c        VERTICAL LINE
<U007D>    /x7d        RIGHT CURLY BRACKET
<U007E>    /x7e        TILDE
% 177 ctl-qm
% 200 Null character     
% 201 Break              
% 202 Clear              
% 203 Call               
% 204 Terminal escape    
% 205 Macro/backnext     
% 206 Help               
% 207 Rubout             
<U0008>    /x08         BACKSPACE (BS) / Overstrike    
<U0009>    /x09         CHARACTER TABULATION (HT) / Tab
<U000D>    /x0d         CARRIAGE RETURN (CR) / Line
<U000B>    /x0b         LINE TABULATION (VT) / Delete
<U000C>    /x0c         FORM FEED (FF) / Page
<U000A>    /x0a         LINE FEED (LF) / Return
% 216 Quote         
% 217 Hold-output   
% 220 Stop-output   
% 221 Abort         
% 222 Resume        
% 223 Status        
% 224 End           
% 225 Roman-i       
% 226 Roman-ii      
% 227 Roman-iii     
% 230 Roman-iv
% 231 Hand-up
% 232 Hand-down
% 233 Hand-left
% 234 Hand-right
% 235 System
% 236 Network
% 237-377 reserved for the future
END CHARMAP
============================================================

===File ~/emacs/etc/charsets/LISPM.map======================
# Generated from LISPM in localedata/charmaps of glibc
0x00 0x00B7
0x01 0x2193
0x02-0x03 0x03B1
0x04 0x2227
0x05 0x00AC
0x06 0x03B5
0x07 0x03C0
0x08-0x0D 0x0008
0x0E 0x221E
0x0F 0x2202
0x10-0x11 0x2282
0x12-0x13 0x2229
0x14 0x2200
0x15 0x2203
0x16 0x2297
0x17 0x2194
0x18 0x2190
0x19 0x2192
0x1A 0x2260
0x1B 0x25CA
0x1C-0x1D 0x2264
0x1E 0x2261
0x1F 0x2228
0x20-0x7E 0x0020
0x88 0x03BB
0x89-0x8A 0x03B3
0x8B 0x2191
0x8C 0x00B1
0x8D 0x2295
============================================================



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-13 14:47                     ` new coding system (was: Re: prettify symbols question) Alfred M. Szmidt
@ 2020-11-13 14:59                       ` Eli Zaretskii
  2020-11-13 17:11                         ` Alfred M. Szmidt
  2020-11-13 17:11                         ` Alfred M. Szmidt
  2020-11-13 17:32                       ` new coding system Andreas Schwab
  1 sibling, 2 replies; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-13 14:59 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 13 Nov 2020 09:47:16 -0500
> 
> So I've created a LISPM charmap, and a LISPM charset map based on
> that.  Then calling define-charset and define-coding-system, if I now
> try to open a Lisp machine file in the lispm coding it seems to be
> unable to handle the various characters; e.g., #o210.
> 
>   These default coding systems were tried to encode text
>   in the buffer ‘lispm-char-test.text’:
>     (lispm-unix (1 . 0) (59 . 1) (117 . 2) (175 . 3) (233 . 4) (291 . 5)
>     (349 . 6) (407 . 7) (465 . 4194184) (523 . 4194185) (581 . 4194186))
>   However, each of them encountered characters it couldn’t encode:
>     ....

What does "M-x describe-character-set RET lispm RET" show?

And what was shown where you show the ellipsis?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-13 14:59                       ` Eli Zaretskii
@ 2020-11-13 17:11                         ` Alfred M. Szmidt
  2020-11-14 14:24                           ` Eli Zaretskii
  2020-11-13 17:11                         ` Alfred M. Szmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 17:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

   > So I've created a LISPM charmap, and a LISPM charset map based on
   > that.  Then calling define-charset and define-coding-system, if I now
   > try to open a Lisp machine file in the lispm coding it seems to be
   > unable to handle the various characters; e.g., #o210.
   > 
   >   These default coding systems were tried to encode text
   >   in the buffer ‘lispm-char-test.text’:
   >     (lispm-unix (1 . 0) (59 . 1) (117 . 2) (175 . 3) (233 . 4) (291 . 5)
   >     (349 . 6) (407 . 7) (465 . 4194184) (523 . 4194185) (581 . 4194186))
   >   However, each of them encountered characters it couldn’t encode:
   >     ....

   What does "M-x describe-character-set RET lispm RET" show?

It says:

  Character set: lispm
  
  LISPM
  
  Number of contained characters: 256
  Map file: LISPM
  Code space: [0 255]

   And what was shown where you show the ellipsis?

These default coding systems were tried to encode text
in the buffer `lispm-char-test.text':
  (lispm-unix (1 . 0) (59 . 1) (117 . 2) (175 . 3) (233 . 4) (291 . 5)
  (349 . 6) (407 . 7) (465 . 4194184) (523 . 4194185) (581 . 4194186))
However, each of them encountered characters it couldn't encode:
  lispm-unix cannot encode these: ^@ ^A ^B ^C ^D ^E ^F ^G \210 \211 ...

(where ^@ etc are #o0, #o1, etc and #o210 ...)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-13 14:59                       ` Eli Zaretskii
  2020-11-13 17:11                         ` Alfred M. Szmidt
@ 2020-11-13 17:11                         ` Alfred M. Szmidt
  1 sibling, 0 replies; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 17:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Seems that it doesn't like it when you have:

  <U00B7>     /x00         MIDDLE DOT

but

  <U0000>     /x00         MIDDLE DOT

works.  According to the glibc manual, this should accept a UCS-4
value which U00B7 is.  Not sure what gives...



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system
  2020-11-13 14:47                     ` new coding system (was: Re: prettify symbols question) Alfred M. Szmidt
  2020-11-13 14:59                       ` Eli Zaretskii
@ 2020-11-13 17:32                       ` Andreas Schwab
  2020-11-13 17:36                         ` Alfred M. Szmidt
  1 sibling, 1 reply; 29+ messages in thread
From: Andreas Schwab @ 2020-11-13 17:32 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: Eli Zaretskii, emacs-devel

On Nov 13 2020, Alfred M. Szmidt wrote:

> % 200 Null character     

Why isn't that mapped to <U0000>?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system
  2020-11-13 17:32                       ` new coding system Andreas Schwab
@ 2020-11-13 17:36                         ` Alfred M. Szmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-13 17:36 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: eliz, emacs-devel

   > % 200 Null character     

   Why isn't that mapped to <U0000>?

I haven't gotten around to it (ditto the remaining high-bit
characters).



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-13 17:11                         ` Alfred M. Szmidt
@ 2020-11-14 14:24                           ` Eli Zaretskii
  2020-11-14 15:29                             ` Alfred M. Szmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-14 14:24 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Fri, 13 Nov 2020 12:11:18 -0500
> 
> These default coding systems were tried to encode text
> in the buffer `lispm-char-test.text':
>   (lispm-unix (1 . 0) (59 . 1) (117 . 2) (175 . 3) (233 . 4) (291 . 5)
>   (349 . 6) (407 . 7) (465 . 4194184) (523 . 4194185) (581 . 4194186))
> However, each of them encountered characters it couldn't encode:
>   lispm-unix cannot encode these: ^@ ^A ^B ^C ^D ^E ^F ^G \210 \211 ...
> 
> (where ^@ etc are #o0, #o1, etc and #o210 ...)

Is this the encoding on Unix systems?  If so, maybe try without
mapping characters below ASCII 128, I'm not sure this is supported in
an ASCII-compatible encoding.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-14 14:24                           ` Eli Zaretskii
@ 2020-11-14 15:29                             ` Alfred M. Szmidt
  2020-11-14 16:19                               ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-14 15:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel


   > From: "Alfred M. Szmidt" <ams@gnu.org>
   > Cc: emacs-devel@gnu.org
   > Date: Fri, 13 Nov 2020 12:11:18 -0500
   > 
   > These default coding systems were tried to encode text
   > in the buffer `lispm-char-test.text':
   >   (lispm-unix (1 . 0) (59 . 1) (117 . 2) (175 . 3) (233 . 4) (291 . 5)
   >   (349 . 6) (407 . 7) (465 . 4194184) (523 . 4194185) (581 . 4194186))
   > However, each of them encountered characters it couldn't encode:
   >   lispm-unix cannot encode these: ^@ ^A ^B ^C ^D ^E ^F ^G \210 \211 ...
   > 
   > (where ^@ etc are #o0, #o1, etc and #o210 ...)

   Is this the encoding on Unix systems?  If so, maybe try without
   mapping characters below ASCII 128, I'm not sure this is supported in
   an ASCII-compatible encoding.

I am not sure I understand.  On unix #o0 maps to the MIDDLE DOT, #o1
to DOWNWARDS ARROW, etc.  The Lisp Machine character set isn't
compatible with ASCII -- the control characters have a entierly
diffierent function.  As I understood it, the charmap/charset is a
mapping from UCS-4 Unicode to whatever is on the target?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-14 15:29                             ` Alfred M. Szmidt
@ 2020-11-14 16:19                               ` Eli Zaretskii
  2020-11-23 20:40                                 ` Alfred M. Szmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-14 16:19 UTC (permalink / raw)
  To: Alfred M. Szmidt; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 14 Nov 2020 10:29:15 -0500
> 
>    Is this the encoding on Unix systems?  If so, maybe try without
>    mapping characters below ASCII 128, I'm not sure this is supported in
>    an ASCII-compatible encoding.
> 
> I am not sure I understand.  On unix #o0 maps to the MIDDLE DOT, #o1
> to DOWNWARDS ARROW, etc.

If the low codes aren't identical to ASCII, then I think
ascii-compatible should be nil, and I think the relevant example to
follow is that of EBCDIC.  I'd suggest to construct a map file by
hand, using EBCDIC maps as example, and see if that works.

If it doesn't work, we might need to bring Kenichi Handa on board of
the discussion.

> As I understood it, the charmap/charset is a mapping from UCS-4
> Unicode to whatever is on the target?

Not UCS-4, but Unicode codepoints (which is the same thing in
practice, but just so we get our terminology right.)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-14 16:19                               ` Eli Zaretskii
@ 2020-11-23 20:40                                 ` Alfred M. Szmidt
  2020-11-23 20:49                                   ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-23 20:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

   >    Is this the encoding on Unix systems?  If so, maybe try without
   >    mapping characters below ASCII 128, I'm not sure this is supported in
   >    an ASCII-compatible encoding.
   > 
   > I am not sure I understand.  On unix #o0 maps to the MIDDLE DOT, #o1
   > to DOWNWARDS ARROW, etc.

   If the low codes aren't identical to ASCII, then I think
   ascii-compatible should be nil, and I think the relevant example to
   follow is that of EBCDIC.  I'd suggest to construct a map file by
   hand, using EBCDIC maps as example, and see if that works.

It didn't, I took the EBCDIC-US map, and replaced the first entry,

<U0000>     /x00         NULL (NUL)

with

<U00B7>     /x00        MIDDLE DOT

   If it doesn't work, we might need to bring Kenichi Handa on board of
   the discussion.

If Kenichi Handa can help, that would be very nice -- it isn't a very
important one but it would be useful for me to get this working.

   > As I understood it, the charmap/charset is a mapping from UCS-4
   > Unicode to whatever is on the target?

   Not UCS-4, but Unicode codepoints (which is the same thing in
   practice, but just so we get our terminology right.)

Are you sure? According to the glibc manual (and a quick glance at the
source, glibc/locale/program/charmap.c), the Unicode entry is supposed
to be a UCS-4 name.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-23 20:40                                 ` Alfred M. Szmidt
@ 2020-11-23 20:49                                   ` Eli Zaretskii
  2020-11-28 17:27                                     ` Alfred M. Szmidt
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2020-11-23 20:49 UTC (permalink / raw)
  To: Alfred M. Szmidt, Kenichi Handa; +Cc: emacs-devel

> From: "Alfred M. Szmidt" <ams@gnu.org>
> Cc: emacs-devel@gnu.org
> Date: Mon, 23 Nov 2020 15:40:24 -0500
> 
>    If it doesn't work, we might need to bring Kenichi Handa on board of
>    the discussion.
> 
> If Kenichi Handa can help, that would be very nice -- it isn't a very
> important one but it would be useful for me to get this working.

I've CC'ed him, let's hope he responds soon.

>    > As I understood it, the charmap/charset is a mapping from UCS-4
>    > Unicode to whatever is on the target?
> 
>    Not UCS-4, but Unicode codepoints (which is the same thing in
>    practice, but just so we get our terminology right.)
> 
> Are you sure? According to the glibc manual (and a quick glance at the
> source, glibc/locale/program/charmap.c), the Unicode entry is supposed
> to be a UCS-4 name.

There's no difference between them.  UCS-4 comes from ISO, the Unicode
codepoints from the Unicode Consortium, but the values are identical.
I prefer not to use UCS-4, because it's confusing nowadays.  The Emacs
manuals use the "Unicode codepoint" terminology.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: new coding system (was: Re: prettify symbols question)
  2020-11-23 20:49                                   ` Eli Zaretskii
@ 2020-11-28 17:27                                     ` Alfred M. Szmidt
  0 siblings, 0 replies; 29+ messages in thread
From: Alfred M. Szmidt @ 2020-11-28 17:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: handa, emacs-devel

   > If Kenichi Handa can help, that would be very nice -- it isn't a very
   > important one but it would be useful for me to get this working.

   I've CC'ed him, let's hope he responds soon.

Thank you.

   > Are you sure? According to the glibc manual (and a quick glance at the
   > source, glibc/locale/program/charmap.c), the Unicode entry is supposed
   > to be a UCS-4 name.

   There's no difference between them.  UCS-4 comes from ISO, the
   Unicode codepoints from the Unicode Consortium, but the values are
   identical.  I prefer not to use UCS-4, because it's confusing
   nowadays.  The Emacs manuals use the "Unicode codepoint"
   terminology.

That makes sense; double thanks!



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-11-28 17:27 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-11 17:01 prettify symbols question Alfred M. Szmidt
2020-11-12 14:59 ` Eli Zaretskii
2020-11-12 15:17   ` Alfred M. Szmidt
2020-11-12 15:38     ` Eli Zaretskii
2020-11-12 16:14       ` Eli Zaretskii
2020-11-12 20:53         ` Alfred M. Szmidt
2020-11-12 21:12           ` Basil L. Contovounesios
2020-11-12 21:25             ` Drew Adams
2020-11-13  7:44             ` Eli Zaretskii
2020-11-13  7:24           ` Eli Zaretskii
2020-11-13 10:15             ` Alfred M. Szmidt
2020-11-13 11:17             ` Alfred M. Szmidt
2020-11-13 12:22               ` Eli Zaretskii
2020-11-13 13:31                 ` Alfred M. Szmidt
2020-11-13 13:47                   ` Eli Zaretskii
2020-11-13 14:47                     ` new coding system (was: Re: prettify symbols question) Alfred M. Szmidt
2020-11-13 14:59                       ` Eli Zaretskii
2020-11-13 17:11                         ` Alfred M. Szmidt
2020-11-14 14:24                           ` Eli Zaretskii
2020-11-14 15:29                             ` Alfred M. Szmidt
2020-11-14 16:19                               ` Eli Zaretskii
2020-11-23 20:40                                 ` Alfred M. Szmidt
2020-11-23 20:49                                   ` Eli Zaretskii
2020-11-28 17:27                                     ` Alfred M. Szmidt
2020-11-13 17:11                         ` Alfred M. Szmidt
2020-11-13 17:32                       ` new coding system Andreas Schwab
2020-11-13 17:36                         ` Alfred M. Szmidt
2020-11-13  8:27       ` prettify symbols question Alfred M. Szmidt
2020-11-13  8:40         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).