* bug#7786: 23.2; Encoding of PostScript files
@ 2011-01-05 0:18 Peter Dyballa
2021-01-20 18:02 ` Lars Ingebrigtsen
` (2 more replies)
0 siblings, 3 replies; 37+ messages in thread
From: Peter Dyballa @ 2011-01-05 0:18 UTC (permalink / raw)
To: 7786
Hello!
When I open a PostScript file it's opened "(encoded by coding system
undecided-unix)" – as the *Help* buffer explains after invocation of C-
u x =.
This is incorrect, because, as PRML, The PostScript® Language
Reference manual, explains in a footnote near the end, on encodings:
3. The ISOLatin1Encoding encoding vector deviates from the ISO 8859-1
standard in one
respect: the character at position 140 is quoteleft, whereas the
ISO standard specifies
grave. A PostScript program needing to conform exactly to the ISO
standard should
create a modified encoding vector with this entry changed.
So what is displayed in the buffer as
character: ` (96, #o140, #x60)
is in reality, printed on some medium or on screen
character: ‘ (8216, #o20030, #x2018)
or: instead of /grave the character /quoteleft is encoded here.
IMO GNU Emacs should open a PostScript file in adobe-standard-
encoding, except it sees in the file that the font(s) used is (are) re-
encoded in ISOLatin1Encoding (which is *not* the same as ISO 8819-1),
CE Encoding, or whatever.
In GNU Emacs 23.2.1 (powerpc-apple-darwin9.8.0, X toolkit, Xaw3d
scroll bars)
of 2010-08-01 on Latsche.local
Windowing system distributor `The X.Org Foundation', version
11.0.10903000
configured using `configure '--without-sound' '--without-dbus' '--
without-pop' '--without-gconf' '--with-x-toolkit=athena' '--x-
libraries=/usr/X11/lib' '--x-includes=/usr/X11/include' '--enable-
locallisppath=/Library/Application Support/Emacs/calendar23:/Library/
Application Support/Emacs' 'CFLAGS=-H -Wno-pointer-sign -pipe -fPIC -
fno-common -mcpu=7450 -mtune=7450 -faltivec -fast' 'CPPFLAGS='
'LDFLAGS=' 'CC=gcc-4.2' 'CPP=cpp-4.2''
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: de_DE.UTF-8
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: de_DE.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: PostScript
Minor modes in effect:
doc-view-minor-mode: t
tooltip-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
view-mode: t
Recent input:
<down-mouse-1> <mouse-1> C-x d <return> <escape> <
s <down> <down> <down> <down> <down> <down> <down>
<down> v <end> <escape> > <prior> <prior> M-x d e s
c r i b <tab> e n c o <tab> <backspace> <backspace>
c <tab> <backspace> <backspace> <backspace> <tab> c
h a r <tab> a <tab> <return> C-g M-x d e s c r i b
e - <tab> c o d <tab> <return> <return> <help-echo>
<prior> <prior> <prior> <prior> <prior> <prior> <next>
<next> <down> <down> <right> <right> <right> <right>
<right> <right> <right> <right> <right> <right> <right>
C-u C-x = <help-echo> <help-echo> <menu-bar> <PostScript>
<Cookbook> <ISOLatin1Extended> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo>
<help-echo> <menu-bar> <help-menu> <send-emacs-bug
-report>
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Mark set
Type C-c C-c to toggle between editing or viewing the document.
View mode: type C-h for help, h for commands, q to quit.
Mark set
Making completion list...
Quit
Making completion list...
Char: ` (96, #o140, #x60) point=36185 of 39534 (92%) column=11
call-interactively: Buffer is read-only: #<buffer man_ascii.ps>
--
Mit friedvollen Grüßen
Pete
Üblicherweise begehen Menschen beim Entwerfen vollkommen
narrensicherer Dinge gerne den Fehler, das Genie des Volltrottels zu
unterschätzen.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2011-01-05 0:18 bug#7786: 23.2; Encoding of PostScript files Peter Dyballa
@ 2021-01-20 18:02 ` Lars Ingebrigtsen
2021-06-02 8:39 ` Lars Ingebrigtsen
2021-10-13 13:51 ` Lars Ingebrigtsen
2 siblings, 0 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-01-20 18:02 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> When I open a PostScript file it's opened "(encoded by coding system
> undecided-unix)" – as the *Help* buffer explains after invocation of
> C-u x =.
>
> This is incorrect, because, as PRML, The PostScript® Language
> Reference manual, explains in a footnote near the end, on encodings:
>
> 3. The ISOLatin1Encoding encoding vector deviates from the ISO
> 8859-1 standard in one
> respect: the character at position 140 is quoteleft,
> whereas the ISO standard specifies
> grave. A PostScript program needing to conform exactly to
> the ISO standard should
> create a modified encoding vector with this entry changed.
[...]
> IMO GNU Emacs should open a PostScript file in
> adobe-standard-encoding, except it sees in the file that the font(s)
> used is (are) re-encoded in ISOLatin1Encoding (which is *not* the same
> as ISO 8819-1), CE Encoding, or whatever.
(I'm going through old bug reports that unfortunately got no response at
the time.)
I'm not quite sure I understand the final paragraph there, but the
suggestion is that .ps files should be opened with
`adobe-standard-encoding' and not `iso-latin-1' if there's non-ASCII
characters in the file? Anybody got any comments on that?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2011-01-05 0:18 bug#7786: 23.2; Encoding of PostScript files Peter Dyballa
2021-01-20 18:02 ` Lars Ingebrigtsen
@ 2021-06-02 8:39 ` Lars Ingebrigtsen
2021-06-02 16:37 ` Peter Dyballa
2021-10-13 12:49 ` Lars Ingebrigtsen
2021-10-13 13:51 ` Lars Ingebrigtsen
2 siblings, 2 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-02 8:39 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> IMO GNU Emacs should open a PostScript file in
> adobe-standard-encoding, except it sees in the file that the font(s)
> used is (are) re-encoded in ISOLatin1Encoding (which is *not* the same
> as ISO 8819-1), CE Encoding, or whatever.
I took a first stab at this, but this is obviously not correct. I'm not
sure how to detect whether it's a ISOLatin1Encoding file? And... I'm
this will probably make the file opened like this be saved in utf-8,
which isn't what we want...
diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index 2d36dab632..dc936ba2c2 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1637,6 +1637,7 @@ 'utf-7-imap
("\\.el\\'" . prefer-utf-8)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
+ ("\\.ps\\'" . ps-find-file-coding-system)
;; We use raw-text for reading loaddefs.el so that if it
;; happens to have DOS or Mac EOLs, they are converted to
;; newlines. This is required to make the special treatment
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 9cd38afd8b..6efdaba6e8 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2511,6 +2511,17 @@ sgml-html-meta-auto-coding-function
(message "Warning: unknown coding system \"%s\"" match)
nil)))))
+(defun ps-find-file-coding-system (args)
+ (if (not (eq (car args) 'insert-file-contents))
+ 'undecided
+ (let ((coding-system
+ (coding-system-base
+ (detect-coding-region (point-min) (point-max) t))))
+ ;; If it's an ASCII file, then interpret ` specially.
+ (if (eq coding-system 'undecided)
+ 'adobe-standard-encoding
+ coding-system))))
+
(defun xml-find-file-coding-system (args)
"Determine the coding system of an XML file without a declaration.
Strictly speaking, the file should be utf-8, but mistakes are
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply related [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-06-02 8:39 ` Lars Ingebrigtsen
@ 2021-06-02 16:37 ` Peter Dyballa
2021-10-13 12:49 ` Lars Ingebrigtsen
1 sibling, 0 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-06-02 16:37 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
> Am 2.6.2021 um 10:39 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> I took a first stab at this, but this is obviously not correct. I'm not
> sure how to detect whether it's a ISOLatin1Encoding file? And... I'm
> this will probably make the file opened like this be saved in utf-8,
> which isn't what we want...
This looks to me like the proper solution. (Although I cannot remember how I would have tested this…)
--
Greetings
Pete
Almost anything is easier to get into than out of.
– Allen's Law
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-06-02 8:39 ` Lars Ingebrigtsen
2021-06-02 16:37 ` Peter Dyballa
@ 2021-10-13 12:49 ` Lars Ingebrigtsen
2021-10-13 13:12 ` Lars Ingebrigtsen
1 sibling, 1 reply; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 12:49 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
[-- Attachment #1: Type: text/plain, Size: 3260 bytes --]
Lars Ingebrigtsen <larsi@gnus.org> writes:
>> IMO GNU Emacs should open a PostScript file in
>> adobe-standard-encoding, except it sees in the file that the font(s)
>> used is (are) re-encoded in ISOLatin1Encoding (which is *not* the same
>> as ISO 8819-1), CE Encoding, or whatever.
>
> I took a first stab at this, but this is obviously not correct. I'm not
> sure how to detect whether it's a ISOLatin1Encoding file? And... I'm
> this will probably make the file opened like this be saved in utf-8,
> which isn't what we want...
I tested this a bit more now, and it doesn't work. First of all, saving
the file with adobe-standard-encoding means that all the newlines are
stripped from the file.
So I tried the patch below, but 1) it didn't display non-ascii chars
correctly, and 2) when saving, I got:
These default coding systems were tried to encode the following
problematic characters in the buffer ‘a.ps’:
Coding System Pos Codepoint Char
adobe-standard-encoding-unix 1 #xA
4 #xA
...
utf-8-unix 328 #x3FFFF3
I.e., it's complaining about the newlines, as well as the non-ASCII
char.
So it seems like the adobe coding system doesn't actually work, and I
wonder whether anybody's ever tried using it before? Possibly not?
I've never actually tried working with the coding system stuff before on
this level, so I'm probably missing something really simple.
The work-in-progress patch is below, as well as a .ps test file.
Anybody see immediately what's wrong here?
diff --git a/lisp/international/mule-conf.el b/lisp/international/mule-conf.el
index 9a68fce2e8..1fe4b5c55a 100644
--- a/lisp/international/mule-conf.el
+++ b/lisp/international/mule-conf.el
@@ -1637,6 +1637,7 @@ 'utf-7-imap
("\\.el\\'" . prefer-utf-8)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
+ ("\\.ps\\'" . ps-find-file-coding-system)
;; We use raw-text for reading loaddefs.el so that if it
;; happens to have DOS or Mac EOLs, they are converted to
;; newlines. This is required to make the special treatment
diff --git a/lisp/international/mule.el b/lisp/international/mule.el
index 5022a17db5..b2945bbbf3 100644
--- a/lisp/international/mule.el
+++ b/lisp/international/mule.el
@@ -2526,6 +2526,17 @@ sgml-html-meta-auto-coding-function
(message "Warning: unknown coding system \"%s\"" match)
nil)))))
+(defun ps-find-file-coding-system (args)
+ (if (not (eq (car args) 'insert-file-contents))
+ 'undecided
+ (let ((coding-system
+ (coding-system-base
+ (detect-coding-region (point-min) (point-max) t))))
+ ;; If it's an ASCII file, then interpret ` specially.
+ (if (memq coding-system '(undecided iso-latin-1))
+ 'adobe-standard-encoding-unix
+ coding-system))))
+
(defun xml-find-file-coding-system (args)
"Determine the coding system of an XML file without a declaration.
Strictly speaking, the file should be utf-8, but mistakes are
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
[-- Attachment #2: a.ps --]
[-- Type: application/postscript, Size: 331 bytes --]
^ permalink raw reply related [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 12:49 ` Lars Ingebrigtsen
@ 2021-10-13 13:12 ` Lars Ingebrigtsen
0 siblings, 0 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 13:12 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Aha!
;; To make a coding system with this, a pre-write-conversion should
;; account for the commented-out multi-valued code points in
;; stdenc.map.
(define-charset 'adobe-standard-encoding
And this hasn't been done? the stdenc.map file is missing a whole bunch
of characters...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2011-01-05 0:18 bug#7786: 23.2; Encoding of PostScript files Peter Dyballa
2021-01-20 18:02 ` Lars Ingebrigtsen
2021-06-02 8:39 ` Lars Ingebrigtsen
@ 2021-10-13 13:51 ` Lars Ingebrigtsen
2021-10-13 15:41 ` Eli Zaretskii
` (2 more replies)
2 siblings, 3 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 13:51 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> This is incorrect, because, as PRML, The PostScript® Language
> Reference manual, explains in a footnote near the end, on encodings:
>
> 3. The ISOLatin1Encoding encoding vector deviates from the ISO
> 8859-1 standard in one
> respect: the character at position 140 is quoteleft,
> whereas the ISO standard specifies
> grave. A PostScript program needing to conform exactly to
> the ISO standard should
> create a modified encoding vector with this entry changed.
This seems to be incorrect.
https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
https://en.wikipedia.org/wiki/ISO/IEC_8859-1
differ in a whole bunch of places. In addition, ISOLatin1Encoding is
not the same as stdenc (which is what adobe-standard-encoding uses):
https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/stdenc.txt
Emacs doesn't seem to have any support for ISOLatin1Encoding:
"In 1995, IBM assigned code page 1277 (CCSID 1277) to this character set."
Unless we have it under some other name.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 13:51 ` Lars Ingebrigtsen
@ 2021-10-13 15:41 ` Eli Zaretskii
2021-10-13 16:05 ` Lars Ingebrigtsen
2021-10-13 21:02 ` Peter Dyballa
2021-10-13 21:55 ` Peter Dyballa
2 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-13 15:41 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Peter_Dyballa, 7786
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 13 Oct 2021 15:51:48 +0200
> Cc: 7786@debbugs.gnu.org
>
> Emacs doesn't seem to have any support for ISOLatin1Encoding:
>
> "In 1995, IBM assigned code page 1277 (CCSID 1277) to this character set."
>
> Unless we have it under some other name.
I think you are right. But we could create such an encoding, see
etc/charsets/ and the coding-system definitions to go with them.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 15:41 ` Eli Zaretskii
@ 2021-10-13 16:05 ` Lars Ingebrigtsen
2021-10-13 16:18 ` Eli Zaretskii
0 siblings, 1 reply; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 16:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Peter_Dyballa, 7786
[-- Attachment #1: Type: text/plain, Size: 964 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
> I think you are right. But we could create such an encoding, see
> etc/charsets/ and the coding-system definitions to go with them.
We could, but unfortunately, I'm not able to find any quality source for
the charset. The closest I've been able to find is the file from IBM
(attached), but it doesn't map to Unicode code points, of course:
...
90 LI610000 i Dotless Small
91 SD130000 Grave Accent
92 SD110000 Acute Accent
glibc doesn't seem to have this, and I can't find it on the Unicode web
site, either.
So we'd have to maintain this by hand (and the easiest way is probably
to copy the table from Wikipedia and massage it).
But... it seems like an awful lot of work for something like this, so I
think I'll bow out. If somebody else wants to implement this, that's
totally OK, though.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
[-- Attachment #2: CP01277.txt --]
[-- Type: text/plain, Size: 7854 bytes --]
* ----------------------------------------------------------------------
* Copyright IBM Corporation 1995. All rights reserved.
* C-H 3-3220-050 : REGISTRY, Graphic Character Sets and Code Pages
* Code Page (CPGID) : 01277
* Common Name : Adobe (PostScript) Latin 1
* Registration Date : 1995
* Last Revision Date :
* Default Encoding : 4105
* Code : MS Windows (ISO 8 variant)
* Maximal Character
* Set (GCSGID) : 01427
* Other GCSGIDs :
* ----------------------------------------------------------------------
*- GCGID --------- GCGID Name ------------------------------------------
00
01
02
03
04
05
06
07
08
09
0A
0B
0C
0D
0E
0F
10
11
12
13
14
15
16
17
18
19
1A
1B
1C
1D
1E
1F
20 SP010000 Space
21 SP020000 Exclamation Point
22 SP040000 Quotation Marks
23 SM010000 Number Sign
24 SC030000 Dollar Sign
25 SM020000 Percent Sign
26 SM030000 Ampersand
27 SP200000 Right Single Quote
28 SP060000 Left Parenthesis
29 SP070000 Right Parenthesis
2A SM040000 Asterisk
2B SA010000 Plus Sign
2C SP080000 Comma
2D SP100000 Hyphen/Minus Sign
2E SP110000 Period/Full Stop
2F SP120000 Slash
30 ND100000 Zero
31 ND010000 One
32 ND020000 Two
33 ND030000 Three
34 ND040000 Four
35 ND050000 Five
36 ND060000 Six
37 ND070000 Seven
38 ND080000 Eight
39 ND090000 Nine
3A SP130000 Colon
3B SP140000 Semicolon
3C SA030000 Less Than Sign/Greater Than Sign (Arabic)
3D SA040000 Equal Sign
3E SA050000 Greater Than Sign/Less Than Sign (Arabic)
3F SP150000 Question Mark
40 SM050000 At Sign
41 LA020000 A Capital
42 LB020000 B Capital
43 LC020000 C Capital
44 LD020000 D Capital
45 LE020000 E Capital
46 LF020000 F Capital
47 LG020000 G Capital
48 LH020000 H Capital
49 LI020000 I Capital
4A LJ020000 J Capital
4B LK020000 K Capital
4C LL020000 L Capital
4D LM020000 M Capital
4E LN020000 N Capital
4F LO020000 O Capital
50 LP020000 P Capital
51 LQ020000 Q Capital
52 LR020000 R Capital
53 LS020000 S Capital
54 LT020000 T Capital
55 LU020000 U Capital
56 LV020000 V Capital
57 LW020000 W Capital
58 LX020000 X Capital
59 LY020000 Y Capital
5A LZ020000 Z Capital
5B SM060000 Left Bracket
5C SM070000 Backslash
5D SM080000 Right Bracket
5E SD150000 Circumflex Accent
5F SP090000 Underline/Continuous Underscore
60 SP190000 Left Single Quote
61 LA010000 a Small
62 LB010000 b Small
63 LC010000 c Small
64 LD010000 d Small
65 LE010000 e Small
66 LF010000 f Small
67 LG010000 g Small
68 LH010000 h Small
69 LI010000 i Small
6A LJ010000 j Small
6B LK010000 k Small
6C LL010000 l Small
6D LM010000 m Small
6E LN010000 n Small
6F LO010000 o Small
70 LP010000 p Small
71 LQ010000 q Small
72 LR010000 r Small
73 LS010000 s Small
74 LT010000 t Small
75 LU010000 u Small
76 LV010000 v Small
77 LW010000 w Small
78 LX010000 x Small
79 LY010000 y Small
7A LZ010000 z Small
7B SM110000 Left Brace
7C SM130000 Vertical Line/Logical OR
7D SM140000 Right Brace
7E SD190000 Tilde Accent
7F
80
81
82
83
84
85
86
87
88
89
8A
8B
8C
8D
8E
8F
90 LI610000 i Dotless Small
91 SD130000 Grave Accent
92 SD110000 Acute Accent
93 SD150100 Circumflex Accent (Over Small Alphabetics Without Ascenders)
94 SD190100 Tilde Accent (Over Small Alphabetics Without Ascenders)
95 SD310000 Macron Accent
96 SD230000 Breve Accent
97 SD290000 Overdot Accent
98 SD170000 Diaeresis/Umlaut Accent
99
9A SD270000 Overcircle Accent
9B SD410000 Cedilla or Sedila Accent
9C
9D SD250000 Double Acute Accent
9E SD430000 Ogonek Accent
9F SD210000 Caron Accent
A0 SP300000 Required Space
A1 SP030000 Exclamation Point, Inverted
A2 SC040000 Cent Sign
A3 SC020000 Pound Sterling Sign
A4 SC010000 International Currency Symbol
A5 SC050000 Yen Sign
A6 SM650000 Vertical Line, Broken
A7 SM240000 Section Symbol (USA)/Paragraph Symbol (Europe)
A8 SD170000 Diaeresis/Umlaut Accent
A9 SM520000 Copyright Symbol
AA SM210000 Ordinal Indicator, Feminine
AB SP170000 Left Angle Quotes
AC SM660000 Logical NOT/End Of Line Symbol
AD SP320000 Syllable Hyphen
AE SM530000 Registered Trademark Symbol
AF SD310000 Macron Accent
B0 SM190000 Degree Symbol
B1 SA020000 Plus or Minus Sign
B2 ND021000 Two Superscript
B3 ND031000 Three Superscript
B4 SD110000 Acute Accent
B5 SM170000 Micro Symbol
B6 SM250000 Paragraph Symbol (USA)
B7 SD630000 Middle Dot
B8 SD410000 Cedilla or Sedila Accent
B9 ND011000 One Superscript
BA SM200000 Ordinal Indicator, Masculine
BB SP180000 Right Angle Quotes
BC NF040000 One Quarter
BD NF010000 One Half
BE NF050000 Three Quarters
BF SP160000 Question Mark, Inverted
C0 LA140000 A Grave Capital
C1 LA120000 A Acute Capital
C2 LA160000 A Circumflex Capital
C3 LA200000 A Tilde Capital
C4 LA180000 A Diaeresis Capital
C5 LA280000 A Overcircle Capital
C6 LA520000 ae Diphthong Capital
C7 LC420000 C Cedilla Capital
C8 LE140000 E Grave Capital
C9 LE120000 E Acute Capital
CA LE160000 E Circumflex Capital
CB LE180000 E Diaeresis Capital
CC LI140000 I Grave Capital
CD LI120000 I Acute Capital
CE LI160000 I Circumflex Capital
CF LI180000 I Diaeresis Capital
D0 LD620000 D Stroke Capital/Eth Icelandic Capital
D1 LN200000 N Tilde Capital
D2 LO140000 O Grave Capital
D3 LO120000 O Acute Capital
D4 LO160000 O Circumflex Capital
D5 LO200000 O Tilde Capital
D6 LO180000 O Diaeresis Capital
D7 SA070000 Multiply Sign
D8 LO620000 O Slash Capital
D9 LU140000 U Grave Capital
DA LU120000 U Acute Capital
DB LU160000 U Circumflex Capital
DC LU180000 U Diaeresis Capital
DD LY120000 Y Acute Capital
DE LT640000 Thorn Icelandic Capital
DF LS610000 Sharp s Small
E0 LA130000 a Grave Small
E1 LA110000 a Acute Small
E2 LA150000 a Circumflex Small
E3 LA190000 a Tilde Small
E4 LA170000 a Diaeresis Small
E5 LA270000 a Overcircle Small
E6 LA510000 ae Diphthong Small
E7 LC410000 c Cedilla Small
E8 LE130000 e Grave Small
E9 LE110000 e Acute Small
EA LE150000 e Circumflex Small
EB LE170000 e Diaeresis Small
EC LI130000 i Grave Small
ED LI110000 i Acute Small
EE LI150000 i Circumflex Small
EF LI170000 i Diaeresis Small
F0 LD630000 eth Icelandic Small
F1 LN190000 n Tilde Small
F2 LO130000 o Grave Small
F3 LO110000 o Acute Small
F4 LO150000 o Circumflex Small
F5 LO190000 o Tilde Small
F6 LO170000 o Diaeresis Small
F7 SA060000 Divide Sign
F8 LO610000 o Slash Small
F9 LU130000 u Grave Small
FA LU110000 u Acute Small
FB LU150000 u Circumflex Small
FC LU170000 u Diaeresis Small
FD LY110000 y Acute Small
FE LT630000 Thorn Icelandic Small
FF LY170000 y Diaeresis Small
/* END of table --------------------------------------------------------
\x1a
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:05 ` Lars Ingebrigtsen
@ 2021-10-13 16:18 ` Eli Zaretskii
2021-10-13 16:20 ` Lars Ingebrigtsen
0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-13 16:18 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Peter_Dyballa, 7786
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Peter_Dyballa@Freenet.DE, 7786@debbugs.gnu.org
> Date: Wed, 13 Oct 2021 18:05:07 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > I think you are right. But we could create such an encoding, see
> > etc/charsets/ and the coding-system definitions to go with them.
>
> We could, but unfortunately, I'm not able to find any quality source for
> the charset. The closest I've been able to find is the file from IBM
> (attached), but it doesn't map to Unicode code points, of course:
What's wrong with this:
https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
It shows the Unicode codepoint for each character in the codepage. Or
what am I missing?
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:18 ` Eli Zaretskii
@ 2021-10-13 16:20 ` Lars Ingebrigtsen
2021-10-13 16:23 ` Peter Dyballa
2021-10-13 16:43 ` Eli Zaretskii
0 siblings, 2 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 16:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Peter_Dyballa, 7786
Eli Zaretskii <eliz@gnu.org> writes:
> What's wrong with this:
>
> https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
>
> It shows the Unicode codepoint for each character in the codepage. Or
> what am I missing?
Yes, that's what I suggested using as the source -- somebody would have
to transcribe that into a machine readable file.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:20 ` Lars Ingebrigtsen
@ 2021-10-13 16:23 ` Peter Dyballa
2021-10-13 16:28 ` Lars Ingebrigtsen
2021-10-13 16:45 ` Eli Zaretskii
2021-10-13 16:43 ` Eli Zaretskii
1 sibling, 2 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-10-13 16:23 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
> Am 13.10.2021 um 18:20 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
>> What's wrong with this:
>>
>> https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
>>
>> It shows the Unicode codepoint for each character in the codepage. Or
>> what am I missing?
>
> Yes, that's what I suggested using as the source -- somebody would have
> to transcribe that into a machine readable file.
Is this OK and usable?
;;; -*- mode: Text; coding: utf-8; -*-
;
; Time-stamp: <2011-01-05 10:52:40 pete>
;
; Standard PostScript Glyphs (Adobe)
;
; oct dec hex UTF8
;===================================
= 40 = 32 = 20 = 20 = U+0020 : SPACE
! = 41 = 33 = 21 = 21 = U+0021 : EXCLAMATION MARK
" = 42 = 34 = 22 = 22 = U+0022 : QUOTATION MARK
# = 43 = 35 = 23 = 23 = U+0023 : NUMBER SIGN
$ = 44 = 36 = 24 = 24 = U+0024 : DOLLAR SIGN
% = 45 = 37 = 25 = 25 = U+0025 : PERCENT SIGN
& = 46 = 38 = 26 = 26 = U+0026 : AMPERSAND
' = 47 = 39 = 27 = 27 = U+2019 : RIGHT SINGLE QUOTATION MARK
( = 50 = 40 = 28 = 28 = U+0028 : LEFT PARENTHESIS
) = 51 = 41 = 29 = 29 = U+0029 : RIGHT PARENTHESIS
* = 52 = 42 = 2A = 2A = U+002A : ASTERISK
+ = 53 = 43 = 2B = 2B = U+002B : PLUS SIGN
, = 54 = 44 = 2C = 2C = U+002C : COMMA
- = 55 = 45 = 2D = 2D = U+002D : HYPHEN-MINUS
. = 56 = 46 = 2E = 2E = U+002E : FULL STOP
/ = 57 = 47 = 2F = 2F = U+002F : SOLIDUS
0 = 60 = 48 = 30 = 30 = U+0030 : DIGIT ZERO
1 = 61 = 49 = 31 = 31 = U+0031 : DIGIT ONE
2 = 62 = 50 = 32 = 32 = U+0032 : DIGIT TWO
3 = 63 = 51 = 33 = 33 = U+0033 : DIGIT THREE
4 = 64 = 52 = 34 = 34 = U+0034 : DIGIT FOUR
5 = 65 = 53 = 35 = 35 = U+0035 : DIGIT FIVE
6 = 66 = 54 = 36 = 36 = U+0036 : DIGIT SIX
7 = 67 = 55 = 37 = 37 = U+0037 : DIGIT SEVEN
8 = 70 = 56 = 38 = 38 = U+0038 : DIGIT EIGHT
9 = 71 = 57 = 39 = 39 = U+0039 : DIGIT NINE
: = 72 = 58 = 3A = 3A = U+003A : COLON
; = 73 = 59 = 3B = 3B = U+003B : SEMICOLON
< = 74 = 60 = 3C = 3C = U+003C : LESS-THAN SIGN
= = 75 = 61 = 3D = 3D = U+003D : EQUALS SIGN
> = 76 = 62 = 3E = 3E = U+003E : GREATER-THAN SIGN
? = 77 = 63 = 3F = 3F = U+003F : QUESTION MARK
@ = 100 = 64 = 40 = 40 = U+0040 : COMMERCIAL AT
A = 101 = 65 = 41 = 41 = U+0041 : LATIN CAPITAL LETTER A
B = 102 = 66 = 42 = 42 = U+0042 : LATIN CAPITAL LETTER B
C = 103 = 67 = 43 = 43 = U+0043 : LATIN CAPITAL LETTER C
D = 104 = 68 = 44 = 44 = U+0044 : LATIN CAPITAL LETTER D
E = 105 = 69 = 45 = 45 = U+0045 : LATIN CAPITAL LETTER E
F = 106 = 70 = 46 = 46 = U+0046 : LATIN CAPITAL LETTER F
G = 107 = 71 = 47 = 47 = U+0047 : LATIN CAPITAL LETTER G
H = 110 = 72 = 48 = 48 = U+0048 : LATIN CAPITAL LETTER H
I = 111 = 73 = 49 = 49 = U+0049 : LATIN CAPITAL LETTER I
J = 112 = 74 = 4A = 4A = U+004A : LATIN CAPITAL LETTER J
K = 113 = 75 = 4B = 4B = U+004B : LATIN CAPITAL LETTER K
L = 114 = 76 = 4C = 4C = U+004C : LATIN CAPITAL LETTER L
M = 115 = 77 = 4D = 4D = U+004D : LATIN CAPITAL LETTER M
N = 116 = 78 = 4E = 4E = U+004E : LATIN CAPITAL LETTER N
O = 117 = 79 = 4F = 4F = U+004F : LATIN CAPITAL LETTER O
P = 120 = 80 = 50 = 50 = U+0050 : LATIN CAPITAL LETTER P
Q = 121 = 81 = 51 = 51 = U+0051 : LATIN CAPITAL LETTER Q
R = 122 = 82 = 52 = 52 = U+0052 : LATIN CAPITAL LETTER R
S = 123 = 83 = 53 = 53 = U+0053 : LATIN CAPITAL LETTER S
T = 124 = 84 = 54 = 54 = U+0054 : LATIN CAPITAL LETTER T
U = 125 = 85 = 55 = 55 = U+0055 : LATIN CAPITAL LETTER U
V = 126 = 86 = 56 = 56 = U+0056 : LATIN CAPITAL LETTER V
W = 127 = 87 = 57 = 57 = U+0057 : LATIN CAPITAL LETTER W
X = 130 = 88 = 58 = 58 = U+0058 : LATIN CAPITAL LETTER X
Y = 131 = 89 = 59 = 59 = U+0059 : LATIN CAPITAL LETTER Y
Z = 132 = 90 = 5A = 5A = U+005A : LATIN CAPITAL LETTER Z
[ = 133 = 91 = 5B = 5B = U+005B : LEFT SQUARE BRACKET
\ = 134 = 92 = 5C = 5C = U+005C : REVERSE SOLIDUS
] = 135 = 93 = 5D = 5D = U+005D : RIGHT SQUARE BRACKET
^ = 136 = 94 = 5E = 5E = U+005E : CIRCUMFLEX ACCENT
_ = 137 = 95 = 5F = 5F = U+005F : LOW LINE
` = 140 = 96 = 60 = 60 = U+2018 : LEFT SINGLE QUOTATION MARK
a = 141 = 97 = 61 = 61 = U+0061 : LATIN SMALL LETTER A
b = 142 = 98 = 62 = 62 = U+0062 : LATIN SMALL LETTER B
c = 143 = 99 = 63 = 63 = U+0063 : LATIN SMALL LETTER C
d = 144 = 100 = 64 = 64 = U+0064 : LATIN SMALL LETTER D
e = 145 = 101 = 65 = 65 = U+0065 : LATIN SMALL LETTER E
f = 146 = 102 = 66 = 66 = U+0066 : LATIN SMALL LETTER F
g = 147 = 103 = 67 = 67 = U+0067 : LATIN SMALL LETTER G
h = 150 = 104 = 68 = 68 = U+0068 : LATIN SMALL LETTER H
i = 151 = 105 = 69 = 69 = U+0069 : LATIN SMALL LETTER I
j = 152 = 106 = 6A = 6A = U+006A : LATIN SMALL LETTER J
k = 153 = 107 = 6B = 6B = U+006B : LATIN SMALL LETTER K
l = 154 = 108 = 6C = 6C = U+006C : LATIN SMALL LETTER L
m = 155 = 109 = 6D = 6D = U+006D : LATIN SMALL LETTER M
n = 156 = 110 = 6E = 6E = U+006E : LATIN SMALL LETTER N
o = 157 = 111 = 6F = 6F = U+006F : LATIN SMALL LETTER O
p = 160 = 112 = 70 = 70 = U+0070 : LATIN SMALL LETTER P
q = 161 = 113 = 71 = 71 = U+0071 : LATIN SMALL LETTER Q
r = 162 = 114 = 72 = 72 = U+0072 : LATIN SMALL LETTER R
s = 163 = 115 = 73 = 73 = U+0073 : LATIN SMALL LETTER S
t = 164 = 116 = 74 = 74 = U+0074 : LATIN SMALL LETTER T
u = 165 = 117 = 75 = 75 = U+0075 : LATIN SMALL LETTER U
v = 166 = 118 = 76 = 76 = U+0076 : LATIN SMALL LETTER V
w = 167 = 119 = 77 = 77 = U+0077 : LATIN SMALL LETTER W
x = 170 = 120 = 78 = 78 = U+0078 : LATIN SMALL LETTER X
y = 171 = 121 = 79 = 79 = U+0079 : LATIN SMALL LETTER Y
z = 172 = 122 = 7A = 7A = U+007A : LATIN SMALL LETTER Z
{ = 173 = 123 = 7B = 7B = U+007B : LEFT CURLY BRACKET
| = 174 = 124 = 7C = 7C = U+007C : VERTICAL LINE
} = 175 = 125 = 7D = 7D = U+007D : RIGHT CURLY BRACKET
~ = 176 = 126 = 7E = 7E = U+007E : TILDE
¡ = 241 = 161 = A1 = C2A1 = U+00A1 : INVERTED EXCLAMATION MARK
¢ = 242 = 162 = A2 = C2A2 = U+00A2 : CENT SIGN
£ = 243 = 163 = A3 = C2A3 = U+00A3 : POUND SIGN
⁄ = 244 = 164 = A4 = E28184 = U+2044 : FRACTION SLASH
¥ = 245 = 165 = A5 = C2A5 = U+00A5 : YEN SIGN
ƒ = 246 = 166 = A6 = C692 = U+0192 : LATIN SMALL LETTER F WITH HOOK
§ = 247 = 167 = A7 = C2A7 = U+00A7 : SECTION SIGN
¤ = 250 = 168 = A8 = C2A4 = U+00A4 : CURRENCY SIGN
' = 251 = 169 = A9 = 27 = U+0027 : APOSTROPHE
“ = 252 = 170 = AA = E2809C = U+201C : LEFT DOUBLE QUOTATION MARK
« = 253 = 171 = AB = C2AB = U+00AB : LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
‹ = 254 = 172 = AC = E280B9 = U+2039 : SINGLE LEFT-POINTING ANGLE QUOTATION MARK
› = 255 = 173 = AD = E280BA = U+203A : SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
fi = 256 = 174 = AE = EFAC81 = U+FB01 : LATIN SMALL LIGATURE FI
fl = 257 = 175 = AF = EFAC82 = U+FB02 : LATIN SMALL LIGATURE FL
– = 261 = 177 = B1 = E28093 = U+2013 : EN DASH
† = 262 = 178 = B2 = E280A0 = U+2020 : DAGGER
‡ = 263 = 179 = B3 = E280A1 = U+2021 : DOUBLE DAGGER
· = 264 = 180 = B4 = C2B7 = U+00B7 : MIDDLE DOT
¶ = 266 = 182 = B6 = C2B6 = U+00B6 : PILCROW SIGN
• = 267 = 183 = B7 = E280A2 = U+2022 : BULLET
‚ = 270 = 184 = B8 = E2809A = U+201A : SINGLE LOW-9 QUOTATION MARK
„ = 271 = 185 = B9 = E2809E = U+201E : DOUBLE LOW-9 QUOTATION MARK
” = 272 = 186 = BA = E2809D = U+201D : RIGHT DOUBLE QUOTATION MARK
» = 273 = 187 = BB = C2BB = U+00BB : RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
… = 274 = 188 = BC = E280A6 = U+2026 : HORIZONTAL ELLIPSIS
‰ = 275 = 189 = BD = E280B0 = U+2030 : PER MILLE SIGN
¿ = 277 = 191 = BF = C2BF = U+00BF : INVERTED QUESTION MARK
` = 301 = 193 = C1 = 60 = U+0060 : GRAVE ACCENT
´ = 302 = 194 = C2 = C2B4 = U+00B4 : ACUTE ACCENT
ˆ = 303 = 195 = C3 = CB86 = U+02C6 : MODIFIER LETTER CIRCUMFLEX ACCENT
˜ = 304 = 196 = C4 = CB9C = U+02DC : SMALL TILDE
¯ = 305 = 197 = C5 = C2AF = U+00AF : MACRON
˘ = 306 = 198 = C6 = CB98 = U+02D8 : BREVE
˙ = 307 = 199 = C7 = CB99 = U+02D9 : DOT ABOVE
¨ = 310 = 200 = C8 = C2A8 = U+00A8 : DIAERESIS
˚ = 312 = 202 = CA = CB9A = U+02DA : RING ABOVE
¸ = 313 = 203 = CB = C2B8 = U+00B8 : CEDILLA
˝ = 315 = 205 = CD = CB9D = U+02DD : DOUBLE ACUTE ACCENT
˛ = 316 = 206 = CE = CB9B = U+02DB : OGONEK
ˇ = 317 = 207 = CF = CB87 = U+02C7 : CARON
— = 320 = 208 = D0 = E28094 = U+2014 : EM DASH
Æ = 341 = 225 = E1 = C386 = U+00C6 : LATIN CAPITAL LETTER AE
ª = 343 = 227 = E3 = C2AA = U+00AA : FEMININE ORDINAL INDICATOR
Ł = 350 = 232 = E8 = C581 = U+0141 : LATIN CAPITAL LETTER L WITH STROKE
Ø = 351 = 233 = E9 = C398 = U+00D8 : LATIN CAPITAL LETTER O WITH STROKE
Œ = 352 = 234 = EA = C592 = U+0152 : LATIN CAPITAL LIGATURE OE
º = 353 = 235 = EB = C2BA = U+00BA : MASCULINE ORDINAL INDICATOR
æ = 361 = 241 = F1 = C3A6 = U+00E6 : LATIN SMALL LETTER AE
ı = 365 = 245 = F5 = C4B1 = U+0131 : LATIN SMALL LETTER DOTLESS I
ł = 370 = 248 = F8 = C582 = U+0142 : LATIN SMALL LETTER L WITH STROKE
ø = 371 = 249 = F9 = C3B8 = U+00F8 : LATIN SMALL LETTER O WITH STROKE
œ = 372 = 250 = FA = C593 = U+0153 : LATIN SMALL LIGATURE OE
ß = 373 = 251 = FB = C39F = U+00DF : LATIN SMALL LETTER SHARP S
--
Greetings
Pete
Film is a dog: the head is commerce, the tail is art. And only rarely does the tail wag the dog.
– Joseph Losey
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:23 ` Peter Dyballa
@ 2021-10-13 16:28 ` Lars Ingebrigtsen
2021-10-13 16:43 ` Peter Dyballa
2021-10-13 16:45 ` Eli Zaretskii
1 sibling, 1 reply; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 16:28 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> Is this OK and usable?
>
> ;;; -*- mode: Text; coding: utf-8; -*-
> ;
> ; Time-stamp: <2011-01-05 10:52:40 pete>
> ;
> ; Standard PostScript Glyphs (Adobe)
Where is this from?
[...]
> } = 175 = 125 = 7D = 7D = U+007D : RIGHT CURLY BRACKET
> ~ = 176 = 126 = 7E = 7E = U+007E : TILDE
> ¡ = 241 = 161 = A1 = C2A1 = U+00A1 : INVERTED EXCLAMATION MARK
> ¢ = 242 = 162 = A2 = C2A2 = U+00A2 : CENT SIGN
> £ = 243 = 163 = A3 = C2A3 = U+00A3 : POUND SIGN
> ⁄ = 244 = 164 = A4 = E28184 = U+2044 : FRACTION SLASH
But this doesn't seem to correspond to the table on Wikipedia (or the
IBM document) -- it doesn't have any of the mappings in the 0x90-0xA0
range, for instance. (I didn't check the rest.)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:20 ` Lars Ingebrigtsen
2021-10-13 16:23 ` Peter Dyballa
@ 2021-10-13 16:43 ` Eli Zaretskii
2021-10-13 18:55 ` Lars Ingebrigtsen
1 sibling, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-13 16:43 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Peter_Dyballa, 7786
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Peter_Dyballa@Freenet.DE, 7786@debbugs.gnu.org
> Date: Wed, 13 Oct 2021 18:20:34 +0200
>
> > https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
> >
> > It shows the Unicode codepoint for each character in the codepage. Or
> > what am I missing?
>
> Yes, that's what I suggested using as the source -- somebody would have
> to transcribe that into a machine readable file.
Will the below do?
# Generated from https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
0x00-0x5F 0x0000
0x60 0x2018
0x61-0x7E 0x0061
0x90 0x0131
0x91 0x0050
0x92 0x00B4
0x93 0x02C5
0x94 0x02DC
0x95 0x02C9
0x96-0x97 0x02D8
0x98 0x00A8
0x9A 0x02DA
0x9B 0x00B8
0x9D 0x02DD
0x9E 0x02DB
0x9F 0x02C7
0xA0-0xFF 0x00A0
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:28 ` Lars Ingebrigtsen
@ 2021-10-13 16:43 ` Peter Dyballa
0 siblings, 0 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-10-13 16:43 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
> Am 13.10.2021 um 18:28 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> Where is this from?
I am quite sure that I took it from PLRM2, the PostScript Language Reference Manual, December 1990, page 604.
--
Greetings
Pete
Time is an illusion. Lunchtime, doubly so.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:23 ` Peter Dyballa
2021-10-13 16:28 ` Lars Ingebrigtsen
@ 2021-10-13 16:45 ` Eli Zaretskii
2021-10-13 17:35 ` Peter Dyballa
1 sibling, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-13 16:45 UTC (permalink / raw)
To: Peter Dyballa; +Cc: larsi, 7786
> From: Peter Dyballa <Peter_Dyballa@Freenet.DE>
> Date: Wed, 13 Oct 2021 18:23:50 +0200
> Cc: Eli Zaretskii <eliz@gnu.org>,
> 7786@debbugs.gnu.org
>
> >> https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
> >>
> >> It shows the Unicode codepoint for each character in the codepage. Or
> >> what am I missing?
> >
> > Yes, that's what I suggested using as the source -- somebody would have
> > to transcribe that into a machine readable file.
>
> Is this OK and usable?
AFAICT, that's very different from the Wikipedia data.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:45 ` Eli Zaretskii
@ 2021-10-13 17:35 ` Peter Dyballa
0 siblings, 0 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-10-13 17:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: larsi, 7786
> Am 13.10.2021 um 18:45 schrieb Eli Zaretskii <eliz@gnu.org>:
>
>> From: Peter Dyballa <Peter_Dyballa@Freenet.DE>
>> Date: Wed, 13 Oct 2021 18:23:50 +0200
>> Cc: Eli Zaretskii <eliz@gnu.org>,
>> 7786@debbugs.gnu.org
>>
>>>> https://en.wikipedia.org/wiki/PostScript_Latin_1_Encoding
>>>>
>>>> It shows the Unicode codepoint for each character in the codepage. Or
>>>> what am I missing?
>>>
>>> Yes, that's what I suggested using as the source -- somebody would have
>>> to transcribe that into a machine readable file.
>>
>> Is this OK and usable?
>
> AFAICT, that's very different from the Wikipedia data.
I created the file ten years ago. Could be I missed later some standardisation by ISO… which the Wikipedia's author knew?
Could be we mix things. My encoding stands for the encoding vector of a PostScript font.
Anyway would it help you to retrieve these files?
stdenc.txt
# Name: Adobe Standard Encoding to Unicode
# Unicode version: 2.0
# Table version: 0.2
# Date: 30 March 1999
#
# Copyright (c) 1991-1999 Unicode, Inc. All Rights reserved.
#
# This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No
# claims are made as to fitness for any particular purpose. No warranties of
# any kind are expressed or implied. The recipient agrees to determine
# applicability of information provided. If this file has been provided on
# magnetic media by Unicode, Inc., the sole remedy for any claim will be
# exchange of defective media within 90 days of receipt.
#
# Recipient is granted the right to make copies in any form for internal
# distribution and to freely use the information supplied in the creation of
# products supporting Unicode. Unicode, Inc. specifically excludes the right
# to re-distribute this file directly to third parties or other organizations
# whether for profit or not.
#
# Format: 4 tab-delimited fields:
#
# (1) The Unicode value (in hexadecimal)
# (2) The Adobe Standard Encoding code point (in hexadecimal)
# (3) # Unicode name
# (4) # PostScript character name
#
# General Notes:
#
# The Unicode values in this table were produced as the result of applying
# the algorithm described in the section "Populating a Unicode space" in the
# document "Unicode and Glyph Names," at
# http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
# to the characters encoded in Adobe Standard Encoding. Note that some
# Standard Encoding characters, such as "space", are mapped to 2 Unicode
# values. Refer to the above document for more details.
#
# Revision History:
#
# [v0.2, 30 March 1999]
# Different algorithm to produce Unicode values (see notes above) results in
# some character codes being mapped to 2 Unicode values. Updated Unicode
# names to Unicode 2.0 names.
#
# [v0.1, 5 May 1995] First release.
#
# Contact <unicode-inc@unicode.org> with any questions or comments.
#
symbol.txt
#
# Name: Adobe Symbol Encoding to Unicode
# Unicode version: 2.0
# Table version: 0.2
# Date: 30 March 1999
#
# Copyright (c) 1991-1999 Unicode, Inc. All Rights reserved.
#
# This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No
# claims are made as to fitness for any particular purpose. No warranties of
# any kind are expressed or implied. The recipient agrees to determine
# applicability of information provided. If this file has been provided on
# magnetic media by Unicode, Inc., the sole remedy for any claim will be
# exchange of defective media within 90 days of receipt.
#
# Recipient is granted the right to make copies in any form for internal
# distribution and to freely use the information supplied in the creation of
# products supporting Unicode. Unicode, Inc. specifically excludes the right
# to re-distribute this file directly to third parties or other organizations
# whether for profit or not.
#
# Format: 4 tab-delimited fields:
#
# (1) The Unicode value (in hexadecimal)
# (2) The Symbol Encoding code point (in hexadecimal)
# (3) # Unicode name
# (4) # PostScript character name
#
# General Notes:
#
# The Unicode values in this table were produced as the result of applying
# the algorithm described in the section "Populating a Unicode space" in the
# document "Unicode and Glyph Names," at
# http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
# to the characters in Symbol. Note that some characters, such as "space",
# are mapped to 2 Unicode values. 29 characters have assignments in the
# Corporate Use Subarea; these are indicated by "(CUS)" in field 4. Refer to
# the above document for more details.
#
# Revision History:
#
# [v0.2, 30 March 1999]
# Different algorithm to produce Unicode values (see notes above) results in
# some character codes being mapped to 2 Unicode values; use of Corporate
# Use subarea values; addition of the euro character; changed assignments of
# some characters such as the COPYRIGHT SIGNs and RADICAL EXTENDER. Updated
# Unicode names to Unicode 2.0 names.
#
# [v0.1, 5 May 1995] First release.
#
# Contact <unicode-inc@unicode.org> with any questions or comments.
#
#
# Name: Adobe Zapf Dingbats Encoding to Unicode
# Unicode version: 2.0
# Table version: 0.2
# Date: 30 March 1999
#
# Copyright (c) 1991-1999 Unicode, Inc. All Rights reserved.
#
# This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No
# claims are made as to fitness for any particular purpose. No warranties of
# any kind are expressed or implied. The recipient agrees to determine
# applicability of information provided. If this file has been provided on
# magnetic media by Unicode, Inc., the sole remedy for any claim will be
# exchange of defective media within 90 days of receipt.
#
# Recipient is granted the right to make copies in any form for internal
# distribution and to freely use the information supplied in the creation of
# products supporting Unicode. Unicode, Inc. specifically excludes the right
# to re-distribute this file directly to third parties or other organizations
# whether for profit or not.
#
# Format: Three tab-delimited fields:
#
# (1) The Unicode value (in hexadecimal)
# (2) The Zapf Dingbats Encoding code point (in hexadecimal)
# (3) # Unicode 2.0 name
# (4) # PostScript character name
#
# General Notes:
#
# The Unicode values in this table were produced as the result of
# applying the algorithm described in the section "Populating a Unicode
# space" in the document "Unicode and Glyph Names," at
# http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
# to the characters in Zapf Dingbats. Note that some characters, such as
# "space", are mapped to 2 Unicode values. 14 characters have assignments in
# the Corporate Use Subarea; these are indicated by "(CUS)" in field 4.
# Refer to the above document for more details.
#
# Revision History:
#
# [v0.2, 30 March 1999] Different algorithm to produce Unicode values (see
# notes above) results in some character codes being mapped to 2 Unicode
# values; use of Corporate Use subarea values; included BLACK CIRCLE and
# RIGHT HALF BLACK CIRCLE. Updated Unicode names to Unicode 2.0 names.
#
# [v0.1, 5 May 1995] First release.
#
# Contact <unicode-inc@unicode.org> with any questions or comments.
#
zdingbat.txt
#
# Name: Adobe Zapf Dingbats Encoding to Unicode
# Unicode version: 2.0
# Table version: 0.2
# Date: 30 March 1999
#
# Copyright (c) 1991-1999 Unicode, Inc. All Rights reserved.
#
# This file is provided as-is by Unicode, Inc. (The Unicode Consortium). No
# claims are made as to fitness for any particular purpose. No warranties of
# any kind are expressed or implied. The recipient agrees to determine
# applicability of information provided. If this file has been provided on
# magnetic media by Unicode, Inc., the sole remedy for any claim will be
# exchange of defective media within 90 days of receipt.
#
# Recipient is granted the right to make copies in any form for internal
# distribution and to freely use the information supplied in the creation of
# products supporting Unicode. Unicode, Inc. specifically excludes the right
# to re-distribute this file directly to third parties or other organizations
# whether for profit or not.
#
# Format: Three tab-delimited fields:
#
# (1) The Unicode value (in hexadecimal)
# (2) The Zapf Dingbats Encoding code point (in hexadecimal)
# (3) # Unicode 2.0 name
# (4) # PostScript character name
#
# General Notes:
#
# The Unicode values in this table were produced as the result of
# applying the algorithm described in the section "Populating a Unicode
# space" in the document "Unicode and Glyph Names," at
# http://partners.adobe.com/asn/developer/typeforum/unicodegn.html
# to the characters in Zapf Dingbats. Note that some characters, such as
# "space", are mapped to 2 Unicode values. 14 characters have assignments in
# the Corporate Use Subarea; these are indicated by "(CUS)" in field 4.
# Refer to the above document for more details.
#
# Revision History:
#
# [v0.2, 30 March 1999] Different algorithm to produce Unicode values (see
# notes above) results in some character codes being mapped to 2 Unicode
# values; use of Corporate Use subarea values; included BLACK CIRCLE and
# RIGHT HALF BLACK CIRCLE. Updated Unicode names to Unicode 2.0 names.
#
# [v0.1, 5 May 1995] First release.
#
# Contact <unicode-inc@unicode.org> with any questions or comments.
#
These were just the file headers.
--
Greetings
Pete
"To infinity and beyond!"
– Captain Buzz Lightyear
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 16:43 ` Eli Zaretskii
@ 2021-10-13 18:55 ` Lars Ingebrigtsen
2021-10-13 19:05 ` Eli Zaretskii
2021-10-13 19:07 ` Peter Dyballa
0 siblings, 2 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-13 18:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Peter_Dyballa, 7786
Eli Zaretskii <eliz@gnu.org> writes:
> Will the below do?
Looks good, but...
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
>> Am 13.10.2021 um 18:28 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>>
>> Where is this from?
>
> I am quite sure that I took it from PLRM2, the PostScript Language
> Reference Manual, December 1990, page 604.
... I'm not sure whether that IBM document the Wikipedia page has
sourced the table for is authoritative. There seems to be many versions
of the Adobe PostScript ISO-8859-1-alike code page.
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> Could be we mix things. My encoding stands for the encoding vector of
> a PostScript font.
Hm... I don't think that's what we need here -- we need the encoding of
text files, not fonts.
> Anyway would it help you to retrieve these files?
>
> stdenc.txt
> # Name: Adobe Standard Encoding to Unicode
> # Unicode version: 2.0
> # Table version: 0.2
> # Date: 30 March 1999
That's the one we have in Emacs today as adobe-standard-encoding, but it
seems very odd. I mean, both stdenc.txt itself, as well as our
interpretation of it, because stdenc.map leaves most 8-bit chars
undefined.
So I'm not sure what we should do here, if anything. Is there some
Adobe printing expert we could reach out to? :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 18:55 ` Lars Ingebrigtsen
@ 2021-10-13 19:05 ` Eli Zaretskii
2021-10-13 19:07 ` Peter Dyballa
1 sibling, 0 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-13 19:05 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Peter_Dyballa, 7786
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Peter_Dyballa@Freenet.DE, 7786@debbugs.gnu.org
> Date: Wed, 13 Oct 2021 20:55:25 +0200
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Will the below do?
>
> Looks good, but...
>
> Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
>
> >> Am 13.10.2021 um 18:28 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
> >>
> >> Where is this from?
> >
> > I am quite sure that I took it from PLRM2, the PostScript Language
> > Reference Manual, December 1990, page 604.
>
> ... I'm not sure whether that IBM document the Wikipedia page has
> sourced the table for is authoritative. There seems to be many versions
> of the Adobe PostScript ISO-8859-1-alike code page.
The IBM PDF document is consistent 1:1 with Wikipedia, FWIW.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 18:55 ` Lars Ingebrigtsen
2021-10-13 19:05 ` Eli Zaretskii
@ 2021-10-13 19:07 ` Peter Dyballa
1 sibling, 0 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-10-13 19:07 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
> Am 13.10.2021 um 20:55 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> I mean, both stdenc.txt itself, as well as our
> interpretation of it, because stdenc.map leaves most 8-bit chars
> undefined.
Because it is also the font encoding! "map" stands for the character mapping of a font.
--
Greetings
Pete
Si ça fait mal c'est que ça fait du bien
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 13:51 ` Lars Ingebrigtsen
2021-10-13 15:41 ` Eli Zaretskii
@ 2021-10-13 21:02 ` Peter Dyballa
2021-10-14 6:42 ` Eli Zaretskii
2021-10-13 21:55 ` Peter Dyballa
2 siblings, 1 reply; 37+ messages in thread
From: Peter Dyballa @ 2021-10-13 21:02 UTC (permalink / raw)
To: Lars Ingebrigtsen, Eli Zaretskii; +Cc: 7786
Maybe this leads to an Adobe ISO Latin-1 encoding for GNU Emacs…
I copied off PLRM the encoding from page 605 and pasted into *scratch* buffer. In rectangular editing mode I reconstructed this table:
octal 0 1 2 3 4 5 6 7
------------------------------------------------------------------
\04x ! " # $ % & ’
\05x ( ) * + , - . /
\06x 0 1 2 3 4 5 6 7
\07x 8 9 : ; < = > ?
\10x @ A B C D E F G
\11x H I J K L M N O
\12x P Q R S T U V W
\13x X Y Z [ \ ] ^ _
\14x ‘ a b c d e f g
\15x h i j k l m n o
\16x p q r s t u v w
\17x x y z { | } ~
\20x
\21x
\22x ı ` ́ ˆ ̃ ̄ ̆ ̇
\23x ̈ ̊ ̧ ̋ ̨
\24x ¡ ¢ £ ¤ ¥ ¦ §
\25x ̈ © ª « ¬ - ® ̄
\26x ° ± ² ³ ́ μ ¶ ·
\27x ̧ ¹ º » ¼ ½ ¾ ¿
\30x À Á Â Ã Ä Å Æ Ç
\31x È É Ê Ë Ì Í Î Ï
\32x Ð Ñ Ò Ó Ô Õ Ö ×
\33x Ø Ù Ú Û Ü Ý Þ ß
\34x à á â ã ä å æ ç
\35x è é ê ë ì í î ï
\36x ð ñ ò ó ô õ ö ÷
\37x ø ù ú û ü ý þ ÿ
This was easy, each column is 27 lines high. So I could easily move, cut, and put.
In Wikipedia I went to "edit" the table. So I could copy the source code and paste it into *scratch* buffer. The lines started with some rubbish which ended in the reg-exp "[lr]l|". SO I could turn "^.*[rl]l|" → "". I sorted the lines in order remove all lines containing no context. On what was left I could sort-regexp-fields the complete lines on "|[0-9]+}}$". So the list was sorted. In column 64 I added in rectangular mode "• " because "•" is not encoded. A copy of the table above was stripped off the left-most column all TABs were converted into LINEFEEDs. By removing the empty lines I had a "vector" which I could easily move right of "• ". The result is:
0020|[[space character|SP]]|32|040}} •
0021|[[Exclamation mark|!]]|33|041}} • !
0022|[[Quotation mark|"]] |34|042}} • "
0023|[[Number sign|#]]|35|043}} • #
0024|[[Dollar sign|$]]|36|044}} • $
0025|[[Percent sign|%]]|37|045}} • %
0026|[[Ampersand|&]]|38|046}} • &
2019|[[Quotation mark|’]]|39|047}} • ’
0028|[[Bracket|(]]|40|050}} • (
0029|[[Bracket|)]]|41|051}} • )
002A|[[Asterisk|*]]|42|052}} • *
002B|[[Plus and minus signs|+]]|43|053}} • +
002C|[[Comma (punctuation)|,]] |44|054}} • ,
002D|[[Hyphen-minus|-]]|45|055}} • -
002E|[[Full stop|.]]|46|056}} • .
002F|[[Slash (punctuation)|/]] |47|057}} • /
0030|[[0 (number)|0]]|48|060}} • 0
0031|[[1 (number)|1]]|49|061}} • 1
0032|[[2 (number)|2]]|50|062}} • 2
0033|[[3 (number)|3]]|51|063}} • 3
0034|[[4 (number)|4]]|52|064}} • 4
0035|[[5 (number)|5]]|53|065}} • 5
0036|[[6 (number)|6]]|54|066}} • 6
0037|[[7 (number)|7]]|55|067}} • 7
0038|[[8 (number)|8]]|56|070}} • 8
0039|[[9 (number)|9]]|57|071}} • 9
003A|[[colon (punctuation)|:]]|58|072}} • :
003B|[[semicolon|;]]|59|073}} • ;
003C|[[less-than sign|<]]|60|074}} • <
003D|[[equal sign|{{=}}]]|61|075}} • =
003E|[[greater-than sign|>]]|62|076}} • >
003F|[[question mark|?]]|63|077}} • ?
0040|[[@]]|64|100}} • @
0041|[[A]]|65|101}} • A
0042|[[B]]|66|102}} • B
0043|[[C]]|67|103}} • C
0044|[[D]]|68|104}} • D
0045|[[E]]|69|105}} • E
0046|[[F]]|70|106}} • F
0047|[[G]]|71|107}} • G
0048|[[H]]|72|110}} • H
0049|[[I]]|73|111}} • I
004A|[[J]]|74|112}} • J
004B|[[K]]|75|113}} • K
004C|[[L]]|76|114}} • L
004D|[[M]]|77|115}} • M
004E|[[N]]|78|116}} • N
004F|[[O]]|79|117}} • O
0050|[[P]]|80|120}} • P
0051|[[Q]]|81|121}} • Q
0052|[[R]]|82|122}} • R
0053|[[S]]|83|123}} • S
0054|[[T]]|84|124}} • T
0055|[[U]]|85|125}} • U
0056|[[V]]|86|126}} • V
0057|[[W]]|87|127}} • W
0058|[[X]]|88|130}} • X
0059|[[Y]]|89|131}} • Y
005A|[[Z]]|90|132}} • Z
005B|[[Square brackets|[]]|91|133}} • [
005C|[[Backslash|\]]|92|134}} • \
005D|[[Square brackets|]]]|93|135}} • ]
005E|[[Circumflex|^]]|94|136}} • ^
005F|[[Underscore|_]]|95|137}} • _
2018|[[Quotation mark|‘]]|96|140}} • ‘
0061|[[a]]|97|141}} • a
0062|[[b]]|98|142}} • b
0063|[[c]]|99|143}} • c
0064|[[d]]|100|144}} • d
0065|[[e]]|101|145}} • e
0066|[[f]]|102|146}} • f
0067|[[g]]|103|147}} • g
0068|[[h]]|104|150}} • h
0069|[[i]]|105|151}} • i
006A|[[j]]|106|152}} • j
006B|[[k]]|107|153}} • k
006C|[[l]]|108|154}} • l
006D|[[m]]|109|155}} • m
006E|[[n]]|110|156}} • n
006F|[[o]]|111|157}} • o
0070|[[p]]|112|160}} • p
0071|[[q]]|113|161}} • q
0072|[[r]]|114|162}} • r
0073|[[s]]|115|163}} • s
0074|[[t]]|116|164}} • t
0075|[[u]]|117|165}} • u
0076|[[v]]|118|166}} • v
0077|[[w]]|119|167}} • w
0078|[[x]]|120|170}} • x
0079|[[y]]|121|171}} • y
007A|[[z]]|122|172}} • z
007B|[[Braces (punctuation)|{]]|123|173}} • {
007C|[[Vertical bar|{{pipe}}]]|124|174}} • |
007D|[[Braces (punctuation)|}]]|125|175}} • }
007E|[[Tilde|~]]|126|176}} • ~
0131|[[ı]]|144|220}} • ı
0060|[[`]]|145|221}} • `
00B4|[[´]]|146|222}} • ́
02C6|[[ˆ]]|147|223}} • ˆ
02DC|[[˜]]|148|224}} • ̃
02C9|[[ˉ]]|149|225}} • ̄
02D8|[[˘]]|150|226}} • ̆
02D9|[[˙]]|151|227}} • ̇
00A8|[[¨]]|152|230}} • ̈
02DA|[[˚]]|154|232}} • ̊
00B8|[[¸]]|155|233}} • ̧
02DD|[[˝]]|157|235}} • ̋
02DB|[[˛]]|158|236}} • ̨
02C7|[[ˇ]]|159|237}} • ˇ
00A0|[[Non-breaking space|NBSP]]|160|240}} •
00A1|[[Inverted question and exclamation marks|¡]]|161|241}} • ¡
00A2|[[Cent (currency)#Symbol|¢]]|162|242}} • ¢
00A3|[[Pound sign|£]]|163|243}} • £
00A4|[[Currency (typography)|¤]]|164|244}} • ¤
00A5|[[¥]]|165|245}} • ¥
00A6|[[Vertical bar|¦]]|166|246}} • ¦
00A7|[[Section sign|§]]|167|247}} • §
00A8|[[¨]]|168|250}} • ̈
00A9|[[Copyright symbol|©]]|169|251}} • ©
00AA|[[Ordinal indicator|ª]]|170|252}} • ª
00AB|[[Guillemet|«]]|171|253}} • «
00AC|[[Negation|¬]]|172|254}} • ¬
00AD|[[Soft hyphen|SHY]]|173|255}} • -
00AE|[[Registered trademark symbol|®]]|174|256}} • ®
00AF|[[Macron (diacritic)|¯]]|175|257}} • ̄
00B0|[[Degree symbol|°]]|176|260}} • °
00B1|[[Plus-minus sign|±]]|177|261}} • ±
00B2|[[Square (algebra)|²]]|178|262}} • ²
00B3|[[Cube (algebra)|³]]|179|263}} • ³
00B4|[[Acute accent|´]]|180|264}} • ́
00B5|[[Micro sign|µ]]|181|265}} • μ
00B6|[[Pilcrow|¶]]|182|266}} • ¶
00B7|[[Interpunct|·]]|183|267}} • ·
00B8|[[Cedilla|¸]]|184|270}} • ̧
00B9|[[Unicode subscripts and superscripts|¹]]|185|271}} • ¹
00BA|[[Ordinal indicator|º]]|186|272}} • º
00BB|[[Guillemet|»]]|187|273}} • »
00BC|[[1/4 (disambiguation)|¼]]|188|274}} • ¼
00BD|[[1/2 (disambiguation)|½]]|189|275}} • ½
00BE|[[3/4 (disambiguation)|¾]]|190|276}} • ¾
00BF|[[Inverted question mark|¿]]|191|277}} • ¿
00C0|[[À]]|192|300}} • À
00C1|[[Á]]|193|301}} • Á
00C2|[[Â]]|194|302}} • Â
00C3|[[Ã]]|195|303}} • Ã
00C4|[[Ä]]|196|304}} • Ä
00C5|[[Å]]|197|305}} • Å
00C6|[[Æ]]|198|306}} • Æ
00C7|[[Ç]]|199|307}} • Ç
00C8|[[È]]|200|310}} • È
00C9|[[É]]|201|311}} • É
00CA|[[Ê]]|202|312}} • Ê
00CB|[[Ë]]|203|313}} • Ë
00CC|[[Ì]]|204|314}} • Ì
00CD|[[Í]]|205|315}} • Í
00CE|[[Î]]|206|316}} • Î
00CF|[[Ï]]|207|317}} • Ï
00D0|[[Eth|Ð]]|208|320}} • Ð
00D1|[[Ñ]]|209|321}} • Ñ
00D2|[[Ò]]|210|322}} • Ò
00D3|[[Ó]]|211|323}} • Ó
00D4|[[Ô]]|212|324}} • Ô
00D5|[[Õ]]|213|325}} • Õ
00D6|[[Ö]]|214|326}} • Ö
00D7|[[Multiplication sign|×]]|215|327}} • ×
00D8|[[Ø]]|216|330}} • Ø
00D9|[[Ù]]|217|331}} • Ù
00DA|[[Ú]]|218|332}} • Ú
00DB|[[Û]]|219|333}} • Û
00DC|[[Ü]]|220|334}} • Ü
00DD|[[Ý]]|221|335}} • Ý
00DE|[[Thorn (letter)|Þ]]|222|336}} • Þ
00DF|[[ß]]|223|337}} • ß
00E0|[[à]]|224|340}} • à
00E1|[[á]]|225|341}} • á
00E2|[[â]]|226|342}} • â
00E3|[[ã]]|227|343}} • ã
00E4|[[ä]]|228|344}} • ä
00E5|[[å]]|229|345}} • å
00E6|[[æ]]|230|346}} • æ
00E7|[[ç]]|231|347}} • ç
00E8|[[è]]|232|350}} • è
00E9|[[é]]|233|351}} • é
00EA|[[ê]]|234|352}} • ê
00EB|[[ë]]|235|353}} • ë
00EC|[[ì]]|236|354}} • ì
00ED|[[í]]|237|355}} • í
00EE|[[î]]|238|356}} • î
00EF|[[ï]]|239|357}} • ï
00F0|[[Eth|ð]]|240|360}} • ð
00F1|[[ñ]]|241|361}} • ñ
00F2|[[ò]]|242|362}} • ò
00F3|[[ó]]|243|363}} • ó
00F4|[[ô]]|244|364}} • ô
00F5|[[õ]]|245|365}} • õ
00F6|[[ö]]|246|366}} • ö
00F7|[[Obelus|÷]]|247|367}} • ÷
00F8|[[ø]]|248|370}} • ø
00F9|[[ù]]|249|371}} • ù
00FA|[[ú]]|250|372}} • ú
00FB|[[û]]|251|373}} • û
00FC|[[ü]]|252|374}} • ü
00FD|[[ý]]|253|375}} • ý
00FE|[[Thorn (letter)|þ]]|254|376}} • þ
00FF|[[ÿ]]|255|377}} • ÿ
It looks as if both tables (PLRM and Wikipedia) describe the same encoding. To have a proof I took a copy of this table and cut the "vector" at the right edge and put it below the table, between them a separating line of plus signs. From the remainder of the table I could delete all from "]]" to "$", and then all from"^" to "|". The remaining "[[" pairs could be removed. I split the window into two and invoked compare-windows. Of course it choked a few times because some characters are in HTML notation, but it proved that both encoding vectors are actually the same.
So I presume the data above can be used for the Adobe Standard ISO Latin-1 encoding. (After some further editing.)
--
Greetings
Pete
It is so hot in some places that the people there have to live in other places.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 13:51 ` Lars Ingebrigtsen
2021-10-13 15:41 ` Eli Zaretskii
2021-10-13 21:02 ` Peter Dyballa
@ 2021-10-13 21:55 ` Peter Dyballa
2 siblings, 0 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-10-13 21:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 7786
> Am 13.10.2021 um 15:51 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> https://unicode.org/Public/MAPPINGS/VENDORS/ADOBE/stdenc.txt
The above is just the Adobe Standard Font Encoding in terms of Unicode names and code points. Quite similar to this:
;;; -*- mode: Text; coding: utf-8; -*-
;
; Time-stamp: <2011-01-05 10:52:40 pete>
;
; Standard PostScript Glyphs (Adobe)
;
; oct dec hex UTF8
;===================================
= 40 = 32 = 20 = 20 = U+0020 : SPACE
! = 41 = 33 = 21 = 21 = U+0021 : EXCLAMATION MARK
" = 42 = 34 = 22 = 22 = U+0022 : QUOTATION MARK
# = 43 = 35 = 23 = 23 = U+0023 : NUMBER SIGN
$ = 44 = 36 = 24 = 24 = U+0024 : DOLLAR SIGN
% = 45 = 37 = 25 = 25 = U+0025 : PERCENT SIGN
& = 46 = 38 = 26 = 26 = U+0026 : AMPERSAND
' = 47 = 39 = 27 = 27 = U+2019 : RIGHT SINGLE QUOTATION MARK
( = 50 = 40 = 28 = 28 = U+0028 : LEFT PARENTHESIS
) = 51 = 41 = 29 = 29 = U+0029 : RIGHT PARENTHESIS
* = 52 = 42 = 2A = 2A = U+002A : ASTERISK
+ = 53 = 43 = 2B = 2B = U+002B : PLUS SIGN
, = 54 = 44 = 2C = 2C = U+002C : COMMA
- = 55 = 45 = 2D = 2D = U+002D : HYPHEN-MINUS
. = 56 = 46 = 2E = 2E = U+002E : FULL STOP
/ = 57 = 47 = 2F = 2F = U+002F : SOLIDUS
0 = 60 = 48 = 30 = 30 = U+0030 : DIGIT ZERO
1 = 61 = 49 = 31 = 31 = U+0031 : DIGIT ONE
2 = 62 = 50 = 32 = 32 = U+0032 : DIGIT TWO
3 = 63 = 51 = 33 = 33 = U+0033 : DIGIT THREE
4 = 64 = 52 = 34 = 34 = U+0034 : DIGIT FOUR
5 = 65 = 53 = 35 = 35 = U+0035 : DIGIT FIVE
6 = 66 = 54 = 36 = 36 = U+0036 : DIGIT SIX
7 = 67 = 55 = 37 = 37 = U+0037 : DIGIT SEVEN
8 = 70 = 56 = 38 = 38 = U+0038 : DIGIT EIGHT
9 = 71 = 57 = 39 = 39 = U+0039 : DIGIT NINE
: = 72 = 58 = 3A = 3A = U+003A : COLON
; = 73 = 59 = 3B = 3B = U+003B : SEMICOLON
< = 74 = 60 = 3C = 3C = U+003C : LESS-THAN SIGN
= = 75 = 61 = 3D = 3D = U+003D : EQUALS SIGN
> = 76 = 62 = 3E = 3E = U+003E : GREATER-THAN SIGN
? = 77 = 63 = 3F = 3F = U+003F : QUESTION MARK
@ = 100 = 64 = 40 = 40 = U+0040 : COMMERCIAL AT
A = 101 = 65 = 41 = 41 = U+0041 : LATIN CAPITAL LETTER A
B = 102 = 66 = 42 = 42 = U+0042 : LATIN CAPITAL LETTER B
C = 103 = 67 = 43 = 43 = U+0043 : LATIN CAPITAL LETTER C
D = 104 = 68 = 44 = 44 = U+0044 : LATIN CAPITAL LETTER D
E = 105 = 69 = 45 = 45 = U+0045 : LATIN CAPITAL LETTER E
F = 106 = 70 = 46 = 46 = U+0046 : LATIN CAPITAL LETTER F
G = 107 = 71 = 47 = 47 = U+0047 : LATIN CAPITAL LETTER G
H = 110 = 72 = 48 = 48 = U+0048 : LATIN CAPITAL LETTER H
I = 111 = 73 = 49 = 49 = U+0049 : LATIN CAPITAL LETTER I
J = 112 = 74 = 4A = 4A = U+004A : LATIN CAPITAL LETTER J
K = 113 = 75 = 4B = 4B = U+004B : LATIN CAPITAL LETTER K
L = 114 = 76 = 4C = 4C = U+004C : LATIN CAPITAL LETTER L
M = 115 = 77 = 4D = 4D = U+004D : LATIN CAPITAL LETTER M
N = 116 = 78 = 4E = 4E = U+004E : LATIN CAPITAL LETTER N
O = 117 = 79 = 4F = 4F = U+004F : LATIN CAPITAL LETTER O
P = 120 = 80 = 50 = 50 = U+0050 : LATIN CAPITAL LETTER P
Q = 121 = 81 = 51 = 51 = U+0051 : LATIN CAPITAL LETTER Q
R = 122 = 82 = 52 = 52 = U+0052 : LATIN CAPITAL LETTER R
S = 123 = 83 = 53 = 53 = U+0053 : LATIN CAPITAL LETTER S
T = 124 = 84 = 54 = 54 = U+0054 : LATIN CAPITAL LETTER T
U = 125 = 85 = 55 = 55 = U+0055 : LATIN CAPITAL LETTER U
V = 126 = 86 = 56 = 56 = U+0056 : LATIN CAPITAL LETTER V
W = 127 = 87 = 57 = 57 = U+0057 : LATIN CAPITAL LETTER W
X = 130 = 88 = 58 = 58 = U+0058 : LATIN CAPITAL LETTER X
Y = 131 = 89 = 59 = 59 = U+0059 : LATIN CAPITAL LETTER Y
Z = 132 = 90 = 5A = 5A = U+005A : LATIN CAPITAL LETTER Z
[ = 133 = 91 = 5B = 5B = U+005B : LEFT SQUARE BRACKET
\ = 134 = 92 = 5C = 5C = U+005C : REVERSE SOLIDUS
] = 135 = 93 = 5D = 5D = U+005D : RIGHT SQUARE BRACKET
^ = 136 = 94 = 5E = 5E = U+005E : CIRCUMFLEX ACCENT
_ = 137 = 95 = 5F = 5F = U+005F : LOW LINE
` = 140 = 96 = 60 = 60 = U+2018 : LEFT SINGLE QUOTATION MARK
a = 141 = 97 = 61 = 61 = U+0061 : LATIN SMALL LETTER A
b = 142 = 98 = 62 = 62 = U+0062 : LATIN SMALL LETTER B
c = 143 = 99 = 63 = 63 = U+0063 : LATIN SMALL LETTER C
d = 144 = 100 = 64 = 64 = U+0064 : LATIN SMALL LETTER D
e = 145 = 101 = 65 = 65 = U+0065 : LATIN SMALL LETTER E
f = 146 = 102 = 66 = 66 = U+0066 : LATIN SMALL LETTER F
g = 147 = 103 = 67 = 67 = U+0067 : LATIN SMALL LETTER G
h = 150 = 104 = 68 = 68 = U+0068 : LATIN SMALL LETTER H
i = 151 = 105 = 69 = 69 = U+0069 : LATIN SMALL LETTER I
j = 152 = 106 = 6A = 6A = U+006A : LATIN SMALL LETTER J
k = 153 = 107 = 6B = 6B = U+006B : LATIN SMALL LETTER K
l = 154 = 108 = 6C = 6C = U+006C : LATIN SMALL LETTER L
m = 155 = 109 = 6D = 6D = U+006D : LATIN SMALL LETTER M
n = 156 = 110 = 6E = 6E = U+006E : LATIN SMALL LETTER N
o = 157 = 111 = 6F = 6F = U+006F : LATIN SMALL LETTER O
p = 160 = 112 = 70 = 70 = U+0070 : LATIN SMALL LETTER P
q = 161 = 113 = 71 = 71 = U+0071 : LATIN SMALL LETTER Q
r = 162 = 114 = 72 = 72 = U+0072 : LATIN SMALL LETTER R
s = 163 = 115 = 73 = 73 = U+0073 : LATIN SMALL LETTER S
t = 164 = 116 = 74 = 74 = U+0074 : LATIN SMALL LETTER T
u = 165 = 117 = 75 = 75 = U+0075 : LATIN SMALL LETTER U
v = 166 = 118 = 76 = 76 = U+0076 : LATIN SMALL LETTER V
w = 167 = 119 = 77 = 77 = U+0077 : LATIN SMALL LETTER W
x = 170 = 120 = 78 = 78 = U+0078 : LATIN SMALL LETTER X
y = 171 = 121 = 79 = 79 = U+0079 : LATIN SMALL LETTER Y
z = 172 = 122 = 7A = 7A = U+007A : LATIN SMALL LETTER Z
{ = 173 = 123 = 7B = 7B = U+007B : LEFT CURLY BRACKET
| = 174 = 124 = 7C = 7C = U+007C : VERTICAL LINE
} = 175 = 125 = 7D = 7D = U+007D : RIGHT CURLY BRACKET
~ = 176 = 126 = 7E = 7E = U+007E : TILDE
¡ = 241 = 161 = A1 = C2A1 = U+00A1 : INVERTED EXCLAMATION MARK
¢ = 242 = 162 = A2 = C2A2 = U+00A2 : CENT SIGN
£ = 243 = 163 = A3 = C2A3 = U+00A3 : POUND SIGN
⁄ = 244 = 164 = A4 = E28184 = U+2044 : FRACTION SLASH
¥ = 245 = 165 = A5 = C2A5 = U+00A5 : YEN SIGN
ƒ = 246 = 166 = A6 = C692 = U+0192 : LATIN SMALL LETTER F WITH HOOK
§ = 247 = 167 = A7 = C2A7 = U+00A7 : SECTION SIGN
¤ = 250 = 168 = A8 = C2A4 = U+00A4 : CURRENCY SIGN
' = 251 = 169 = A9 = 27 = U+0027 : APOSTROPHE
“ = 252 = 170 = AA = E2809C = U+201C : LEFT DOUBLE QUOTATION MARK
« = 253 = 171 = AB = C2AB = U+00AB : LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
‹ = 254 = 172 = AC = E280B9 = U+2039 : SINGLE LEFT-POINTING ANGLE QUOTATION MARK
› = 255 = 173 = AD = E280BA = U+203A : SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
fi = 256 = 174 = AE = EFAC81 = U+FB01 : LATIN SMALL LIGATURE FI
fl = 257 = 175 = AF = EFAC82 = U+FB02 : LATIN SMALL LIGATURE FL
– = 261 = 177 = B1 = E28093 = U+2013 : EN DASH
† = 262 = 178 = B2 = E280A0 = U+2020 : DAGGER
‡ = 263 = 179 = B3 = E280A1 = U+2021 : DOUBLE DAGGER
· = 264 = 180 = B4 = C2B7 = U+00B7 : MIDDLE DOT
¶ = 266 = 182 = B6 = C2B6 = U+00B6 : PILCROW SIGN
• = 267 = 183 = B7 = E280A2 = U+2022 : BULLET
‚ = 270 = 184 = B8 = E2809A = U+201A : SINGLE LOW-9 QUOTATION MARK
„ = 271 = 185 = B9 = E2809E = U+201E : DOUBLE LOW-9 QUOTATION MARK
” = 272 = 186 = BA = E2809D = U+201D : RIGHT DOUBLE QUOTATION MARK
» = 273 = 187 = BB = C2BB = U+00BB : RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
… = 274 = 188 = BC = E280A6 = U+2026 : HORIZONTAL ELLIPSIS
‰ = 275 = 189 = BD = E280B0 = U+2030 : PER MILLE SIGN
¿ = 277 = 191 = BF = C2BF = U+00BF : INVERTED QUESTION MARK
` = 301 = 193 = C1 = 60 = U+0060 : GRAVE ACCENT
´ = 302 = 194 = C2 = C2B4 = U+00B4 : ACUTE ACCENT
ˆ = 303 = 195 = C3 = CB86 = U+02C6 : MODIFIER LETTER CIRCUMFLEX ACCENT
˜ = 304 = 196 = C4 = CB9C = U+02DC : SMALL TILDE
¯ = 305 = 197 = C5 = C2AF = U+00AF : MACRON
˘ = 306 = 198 = C6 = CB98 = U+02D8 : BREVE
˙ = 307 = 199 = C7 = CB99 = U+02D9 : DOT ABOVE
¨ = 310 = 200 = C8 = C2A8 = U+00A8 : DIAERESIS
˚ = 312 = 202 = CA = CB9A = U+02DA : RING ABOVE
¸ = 313 = 203 = CB = C2B8 = U+00B8 : CEDILLA
˝ = 315 = 205 = CD = CB9D = U+02DD : DOUBLE ACUTE ACCENT
˛ = 316 = 206 = CE = CB9B = U+02DB : OGONEK
ˇ = 317 = 207 = CF = CB87 = U+02C7 : CARON
— = 320 = 208 = D0 = E28094 = U+2014 : EM DASH
Æ = 341 = 225 = E1 = C386 = U+00C6 : LATIN CAPITAL LETTER AE
ª = 343 = 227 = E3 = C2AA = U+00AA : FEMININE ORDINAL INDICATOR
Ł = 350 = 232 = E8 = C581 = U+0141 : LATIN CAPITAL LETTER L WITH STROKE
Ø = 351 = 233 = E9 = C398 = U+00D8 : LATIN CAPITAL LETTER O WITH STROKE
Œ = 352 = 234 = EA = C592 = U+0152 : LATIN CAPITAL LIGATURE OE
º = 353 = 235 = EB = C2BA = U+00BA : MASCULINE ORDINAL INDICATOR
æ = 361 = 241 = F1 = C3A6 = U+00E6 : LATIN SMALL LETTER AE
ı = 365 = 245 = F5 = C4B1 = U+0131 : LATIN SMALL LETTER DOTLESS I
ł = 370 = 248 = F8 = C582 = U+0142 : LATIN SMALL LETTER L WITH STROKE
ø = 371 = 249 = F9 = C3B8 = U+00F8 : LATIN SMALL LETTER O WITH STROKE
œ = 372 = 250 = FA = C593 = U+0153 : LATIN SMALL LIGATURE OE
ß = 373 = 251 = FB = C39F = U+00DF : LATIN SMALL LETTER SHARP S
--
Greetings
Pete
Encryption, n.:
A powerful algorithmic encoding technique employed in the creation of computer manuals.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-13 21:02 ` Peter Dyballa
@ 2021-10-14 6:42 ` Eli Zaretskii
2021-10-15 12:47 ` Lars Ingebrigtsen
0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-14 6:42 UTC (permalink / raw)
To: Peter Dyballa; +Cc: larsi, 7786
> From: Peter Dyballa <Peter_Dyballa@Freenet.DE>
> Date: Wed, 13 Oct 2021 23:02:29 +0200
> Cc: 7786@debbugs.gnu.org
>
> Maybe this leads to an Adobe ISO Latin-1 encoding for GNU Emacs…
>
> I copied off PLRM the encoding from page 605 and pasted into *scratch* buffer. In rectangular editing mode I reconstructed this table:
This seems to be the same as the CP1277.map file I threw together and
posted here yesterday. So I think it is ready to be used, and we
should just define the additional coding-system using it.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-14 6:42 ` Eli Zaretskii
@ 2021-10-15 12:47 ` Lars Ingebrigtsen
2021-10-15 15:59 ` Peter Dyballa
0 siblings, 1 reply; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-15 12:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Peter Dyballa, 7786
Eli Zaretskii <eliz@gnu.org> writes:
> This seems to be the same as the CP1277.map file I threw together and
> posted here yesterday. So I think it is ready to be used, and we
> should just define the additional coding-system using it.
Right. But it might be nice to have some real-world PS files to test
with.
Peter, do you have any (preferably smallish -- I mean, not
multi-gigabyte) PostScript files that use this encoding? Two or three
would be cool.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-15 12:47 ` Lars Ingebrigtsen
@ 2021-10-15 15:59 ` Peter Dyballa
2021-10-18 7:09 ` Lars Ingebrigtsen
0 siblings, 1 reply; 37+ messages in thread
From: Peter Dyballa @ 2021-10-15 15:59 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
[-- Attachment #1: Type: text/plain, Size: 1579 bytes --]
> Peter, do you have any (preferably smallish -- I mean, not
> multi-gigabyte) PostScript files that use this encoding? Two or three
> would be cool.
I have no real test files at hand. What I still have is a set of files with ISO 8859-X encodings. I once used a2ps to create PS files from them. These sorts are in the tar file. a2ps changed the real characters into their octal representations for portability. I took one such PS file, from ISO Latin-1 encoding, and added to these octal codes the real characters, taken off the encoding TXT file. PS-Test-1.ps displays in X11 with Ghostscript 9.54.0 OK. I can see "character MINUS character" at the left, followed by their description/explanation.
You could use any text file and convert it into PostScript. It should not matter whether you use a2ps or enscript or something else. The produced PS output file should be in ISOLatin1Encoding, presumingly using octal representations for 8 bit characters. You might take one such file and convert it to PDF. You could take the same file, change it, and save it under a new name in ISOLatin1Encoding. Convert it to PDF. Change the new file in ISOLatin1Encoding, undo the previous edit change, and save it as a newer file in ISO Latin-1 (or -15) text encoding. Convert this PS file too to PDF. Are there differences visible in PDF output?
Could be this is a way to test the ISOLatin1Encoding encoding.
--
Mit friedvollen Grüßen
Pete
To most people solutions mean finding the answers. But to chemists solutions
are things that are still all mixed up.
[-- Attachment #2: ISO-Latin-encodings.tar.xz --]
[-- Type: application/x-xz, Size: 19696 bytes --]
[-- Attachment #3: PS-Test-1.ps --]
[-- Type: application/postscript, Size: 23146 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-15 15:59 ` Peter Dyballa
@ 2021-10-18 7:09 ` Lars Ingebrigtsen
2021-10-18 12:25 ` Eli Zaretskii
0 siblings, 1 reply; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-18 7:09 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> You could use any text file and convert it into PostScript. It should
> not matter whether you use a2ps or enscript or something else.
Well, the problem is that I can't find anything that actually generates
codes that match the Wikipedia listing.
With a text file like this:
This is a sentence with `foo'.
a2ps gives me
(This is a sentence with `foo'.) p n
Note that the ` is 0x60, not 0x2018, like Wikipedia says it should be.
Perhaps the reason no software out there actually supports this encoding
is that it's not actually used in nature.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-18 7:09 ` Lars Ingebrigtsen
@ 2021-10-18 12:25 ` Eli Zaretskii
2021-10-18 13:17 ` Lars Ingebrigtsen
2021-10-18 15:51 ` Peter Dyballa
0 siblings, 2 replies; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-18 12:25 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Peter_Dyballa, 7786
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, 7786@debbugs.gnu.org
> Date: Mon, 18 Oct 2021 09:09:28 +0200
>
> Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
>
> > You could use any text file and convert it into PostScript. It should
> > not matter whether you use a2ps or enscript or something else.
>
> Well, the problem is that I can't find anything that actually generates
> codes that match the Wikipedia listing.
>
> With a text file like this:
>
> This is a sentence with `foo'.
>
> a2ps gives me
>
> (This is a sentence with `foo'.) p n
>
> Note that the ` is 0x60, not 0x2018, like Wikipedia says it should be.
And what a2ps produces prints correctly on a PS printer?
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-18 12:25 ` Eli Zaretskii
@ 2021-10-18 13:17 ` Lars Ingebrigtsen
2021-10-18 15:51 ` Peter Dyballa
1 sibling, 0 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-18 13:17 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Peter_Dyballa, 7786
[-- Attachment #1: Type: text/plain, Size: 148 bytes --]
Eli Zaretskii <eliz@gnu.org> writes:
> And what a2ps produces prints correctly on a PS printer?
ghostview displays the file perfectly, at least:
[-- Attachment #2: Type: image/png, Size: 1856 bytes --]
[-- Attachment #3: Type: text/plain, Size: 140 bytes --]
I don't have a PS printer, though.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-18 12:25 ` Eli Zaretskii
2021-10-18 13:17 ` Lars Ingebrigtsen
@ 2021-10-18 15:51 ` Peter Dyballa
2021-10-18 16:00 ` Eli Zaretskii
1 sibling, 1 reply; 37+ messages in thread
From: Peter Dyballa @ 2021-10-18 15:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, 7786
> Am 18.10.2021 um 14:25 schrieb Eli Zaretskii <eliz@gnu.org>:
>
> And what a2ps produces prints correctly on a PS printer?
Yes. The output was correct on Epson EPL-5800 PS (PostScript 3) and HP LaserJet 2100 TN.
I was not aware of an error 0x60 vs. 0x2018, i.e. ` vs. ‘. Which file is faulty? (Grep does not show a reasonable result, finds only comments.)
--
Greetings
Pete
Real Time, adj.:
Here and now, as opposed to fake time, which only occurs there and then.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-18 15:51 ` Peter Dyballa
@ 2021-10-18 16:00 ` Eli Zaretskii
2021-10-19 5:49 ` Peter Dyballa
0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-18 16:00 UTC (permalink / raw)
To: Peter Dyballa; +Cc: larsi, 7786
> From: Peter Dyballa <Peter_Dyballa@Freenet.DE>
> Date: Mon, 18 Oct 2021 17:51:47 +0200
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,
> 7786@debbugs.gnu.org
>
>
> > Am 18.10.2021 um 14:25 schrieb Eli Zaretskii <eliz@gnu.org>:
> >
> > And what a2ps produces prints correctly on a PS printer?
>
> Yes. The output was correct on Epson EPL-5800 PS (PostScript 3) and HP LaserJet 2100 TN.
Then maybe we should simply use Latin-1. AFAIR, that's what
ps-mule.el is doing.
> I was not aware of an error 0x60 vs. 0x2018, i.e. ` vs. ‘. Which file is faulty? (Grep does not show a reasonable result, finds only comments.)
It's the only difference between Latin-1 and that special encoding of
PS files, according to Wikipedia.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-18 16:00 ` Eli Zaretskii
@ 2021-10-19 5:49 ` Peter Dyballa
2021-10-19 11:59 ` Eli Zaretskii
0 siblings, 1 reply; 37+ messages in thread
From: Peter Dyballa @ 2021-10-19 5:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: larsi, 7786
PostScript is (was?) meant to produce good looking text print-outs. Therefore it uses typographical quotes instead of those from ASCII. This design decision should be honoured.
--
Greetings
Pete
"If I can't dance to it, it's not my revolution.“
– A t-shirt designed by Jack Frager
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-19 5:49 ` Peter Dyballa
@ 2021-10-19 11:59 ` Eli Zaretskii
2021-10-19 13:47 ` Lars Ingebrigtsen
0 siblings, 1 reply; 37+ messages in thread
From: Eli Zaretskii @ 2021-10-19 11:59 UTC (permalink / raw)
To: Peter Dyballa; +Cc: larsi, 7786
> From: Peter Dyballa <Peter_Dyballa@Freenet.DE>
> Date: Tue, 19 Oct 2021 07:49:34 +0200
> Cc: larsi@gnus.org,
> 7786@debbugs.gnu.org
>
> PostScript is (was?) meant to produce good looking text print-outs. Therefore it uses typographical quotes instead of those from ASCII. This design decision should be honoured.
So why a2ps doesn't?
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-19 11:59 ` Eli Zaretskii
@ 2021-10-19 13:47 ` Lars Ingebrigtsen
2021-10-20 5:39 ` Peter Dyballa
0 siblings, 1 reply; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-19 13:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Peter Dyballa, 7786
Eli Zaretskii <eliz@gnu.org> writes:
>> PostScript is (was?) meant to produce good looking text
>> print-outs. Therefore it uses typographical quotes instead of those
>> from ASCII. This design decision should be honoured.
My screenshot showed that the actual output used proper typographical
quotes, but that may be down to the font used.
> So why a2ps doesn't?
I think the conclusion here is that we shouldn't do anything. Adobe
created two encodings -- the "standard" one (which is ASCII with some
alterations), and the ISOLatin1Encoding (which is 8859-1 with some
alterations). But we can't really detect these simply: a2ps, for
instance, uses 8859-1 instead,
%%BeginResource: encoding ISO-8859-1Encoding
enscript does the same, but in a different way:
%%BeginResource: procset Enscript-Encoding-88591 1.6.5 90
None of the .ps files I can find on this laptop uses ISOLatin1Encoding
(or the "standard encoding"), as far as I can see.
So 1) these encodings went out of fashion decades ago and, 2) even if we
wanted to support them, Emacs can't auto-detect when they're used.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-19 13:47 ` Lars Ingebrigtsen
@ 2021-10-20 5:39 ` Peter Dyballa
2021-10-20 5:45 ` Lars Ingebrigtsen
0 siblings, 1 reply; 37+ messages in thread
From: Peter Dyballa @ 2021-10-20 5:39 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
> Am 19.10.2021 um 15:47 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> So 1) these encodings went out of fashion decades ago and, 2) even if we
> wanted to support them, Emacs can't auto-detect when they're used.
Can't there be a default binding of ISOLatin1Encoding to files with extension .ps or that are otherwise found or set to be PostScript files?
--
Greetings
Pete
A census taker is a man who goes from house to house increasing the population.
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-20 5:39 ` Peter Dyballa
@ 2021-10-20 5:45 ` Lars Ingebrigtsen
2021-10-20 6:18 ` Lars Ingebrigtsen
2021-10-20 16:34 ` Peter Dyballa
0 siblings, 2 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-20 5:45 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Peter Dyballa <Peter_Dyballa@Freenet.DE> writes:
> Can't there be a default binding of ISOLatin1Encoding to files with
> extension .ps or that are otherwise found or set to be PostScript
> files?
None of the .ps files we've found have been in ISOLatin1Encoding, so I'm
not sure I understand what you mean?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-20 5:45 ` Lars Ingebrigtsen
@ 2021-10-20 6:18 ` Lars Ingebrigtsen
2021-10-20 16:34 ` Peter Dyballa
1 sibling, 0 replies; 37+ messages in thread
From: Lars Ingebrigtsen @ 2021-10-20 6:18 UTC (permalink / raw)
To: Peter Dyballa; +Cc: 7786
Lars Ingebrigtsen <larsi@gnus.org> writes:
> None of the .ps files we've found have been in ISOLatin1Encoding, so I'm
> not sure I understand what you mean?
Er. My analysis of these .ps files is wrong -- ` is indeed interpreted
as quoteright instead of grave.
/ISO-8859-1Encoding [
[...]
/space /exclam /quotedbl /numbersign /dollar /percent /ampersand /quoteright
Which is what this bug report was originally about. However, none of
them adhere to the encoding found on the Wikipedia page (which claims to
document ISOLatin1Encoding) in the 0x9x area:
/x /y /z /braceleft /bar /braceright /asciitilde /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/space /exclamdown /cent /sterling /currency /yen /brokenbar /section
Like iso-8859-1, the 0x9x area is blank instead of having dotless i and
all the diacritics.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 37+ messages in thread
* bug#7786: 23.2; Encoding of PostScript files
2021-10-20 5:45 ` Lars Ingebrigtsen
2021-10-20 6:18 ` Lars Ingebrigtsen
@ 2021-10-20 16:34 ` Peter Dyballa
1 sibling, 0 replies; 37+ messages in thread
From: Peter Dyballa @ 2021-10-20 16:34 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 7786
> Am 20.10.2021 um 07:45 schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
>> Can't there be a default binding of ISOLatin1Encoding to files with
>> extension .ps or that are otherwise found or set to be PostScript
>> files?
>
> None of the .ps files we've found have been in ISOLatin1Encoding, so I'm
> not sure I understand what you mean?
Isn't this the default encoding of a PostScript (text) file using standard encoded fonts?
The situation can be different once you re-encode the font, then the PS file's text encoding has to follow. If the font is re-encoded in ISO Latin-1 then the PS code has to use it too. Same for every other 8-bit font re-encoding (I have no idea how it works with CJK). The characters that are allowed to be used as PostScript code are taken from US-ASCII. So the code is independent from the text encoding. Care has to be taken when texts in an 8-bit encoding should be output, or printed, usually enclosed in parentheses. The text encoding has to match the font encoding, or the font encoding (of the user-defined font) has to be prepared for the text encoding to be used below.
--
Greetings
Pete
The box said "Use Windows 95 or better," so I got a Macintosh.
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2021-10-20 16:34 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-05 0:18 bug#7786: 23.2; Encoding of PostScript files Peter Dyballa
2021-01-20 18:02 ` Lars Ingebrigtsen
2021-06-02 8:39 ` Lars Ingebrigtsen
2021-06-02 16:37 ` Peter Dyballa
2021-10-13 12:49 ` Lars Ingebrigtsen
2021-10-13 13:12 ` Lars Ingebrigtsen
2021-10-13 13:51 ` Lars Ingebrigtsen
2021-10-13 15:41 ` Eli Zaretskii
2021-10-13 16:05 ` Lars Ingebrigtsen
2021-10-13 16:18 ` Eli Zaretskii
2021-10-13 16:20 ` Lars Ingebrigtsen
2021-10-13 16:23 ` Peter Dyballa
2021-10-13 16:28 ` Lars Ingebrigtsen
2021-10-13 16:43 ` Peter Dyballa
2021-10-13 16:45 ` Eli Zaretskii
2021-10-13 17:35 ` Peter Dyballa
2021-10-13 16:43 ` Eli Zaretskii
2021-10-13 18:55 ` Lars Ingebrigtsen
2021-10-13 19:05 ` Eli Zaretskii
2021-10-13 19:07 ` Peter Dyballa
2021-10-13 21:02 ` Peter Dyballa
2021-10-14 6:42 ` Eli Zaretskii
2021-10-15 12:47 ` Lars Ingebrigtsen
2021-10-15 15:59 ` Peter Dyballa
2021-10-18 7:09 ` Lars Ingebrigtsen
2021-10-18 12:25 ` Eli Zaretskii
2021-10-18 13:17 ` Lars Ingebrigtsen
2021-10-18 15:51 ` Peter Dyballa
2021-10-18 16:00 ` Eli Zaretskii
2021-10-19 5:49 ` Peter Dyballa
2021-10-19 11:59 ` Eli Zaretskii
2021-10-19 13:47 ` Lars Ingebrigtsen
2021-10-20 5:39 ` Peter Dyballa
2021-10-20 5:45 ` Lars Ingebrigtsen
2021-10-20 6:18 ` Lars Ingebrigtsen
2021-10-20 16:34 ` Peter Dyballa
2021-10-13 21:55 ` Peter Dyballa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).