* bug#57072: [BUG] update-glyphless-char-display and variation selectors
@ 2022-08-09 8:38 Axel Svensson
2022-08-09 8:40 ` bug#57073: " Axel Svensson
2022-08-09 11:36 ` bug#57072: " Eli Zaretskii
0 siblings, 2 replies; 26+ messages in thread
From: Axel Svensson @ 2022-08-09 8:38 UTC (permalink / raw)
To: 57072
[-- Attachment #1: Type: text/plain, Size: 758 bytes --]
IIUC the documentation to glyphless-char-display-control, any of the
character groups can be assigned any of the display methods.
First bug + patch:
Using update-glyphless-char-display to choose to display
variation-selectors as acronyms does not work since these codepoints are
missing from char-acronym-table.
The attached patch attempts to fix this and succeeds for U+FE00..U+FE0E but
fails for U+FE0F.
Second bug:
It seems that U+FE0F will not at all respect glyphless-char-display,
instead always showing as an empty box.
This I have not solved.
Version: GNU Emacs 28.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version
3.24.24, cairo version 1.16.0) of 2022-07-21
Built from source, commit 5a223c7f2ef4c31abbd46367b6ea83cd19d30aa7
Regards,
Axel Svensson
[-- Attachment #2: Type: text/html, Size: 982 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57073: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson
@ 2022-08-09 8:40 ` Axel Svensson
2022-08-09 11:36 ` bug#57072: " Eli Zaretskii
1 sibling, 0 replies; 26+ messages in thread
From: Axel Svensson @ 2022-08-09 8:40 UTC (permalink / raw)
To: 57073
[-- Attachment #1: Type: text/plain, Size: 202 bytes --]
On Tue, Aug 9, 2022 at 10:38 AM Axel Svensson <svenssonaxel@gmail.com> wrote:
> The attached patch attempts to fix this and succeeds for U+FE00..U+FE0E but fails for U+FE0F.
Sorry, patch attached now.
[-- Attachment #2: 0001-Add-variation-selectors-to-char-acronym-table.patch --]
[-- Type: text/x-patch, Size: 1209 bytes --]
From d7d8cb6c0111223aa2492db5248818af2e789a1f Mon Sep 17 00:00:00 2001
From: Axel Svensson <mail@axelsvensson.com>
Date: Tue, 9 Aug 2022 01:11:02 +0200
Subject: [PATCH] Add variation selectors to `char-acronym-table'
---
lisp/international/characters.el | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/lisp/international/characters.el b/lisp/international/characters.el
index ca28222c81..616480769d 100644
--- a/lisp/international/characters.el
+++ b/lisp/international/characters.el
@@ -1525,6 +1525,15 @@ Setup `char-width-table' appropriate for non-CJK language environment."
(aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i))))
(aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG
+(let ((vs-acronyms
+ '("VS 1" "VS 2" "VS 3" "VS 4"
+ "VS 5" "VS 6" "VS 7" "VS 8"
+ "VS 9" "VS 10" "VS 11" "VS 12"
+ "VS 13" "VS 14" "VS 15" "VS 16")))
+ (dotimes (i 16)
+ (aset char-acronym-table (+ #xfe00 i) (car vs-acronyms))
+ (setq vs-acronyms (cdr vs-acronyms))))
+
;; We can't use the \N{name} things here, because this file is used
;; too early in the build process.
(defvar bidi-control-characters
--
2.30.2
^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson
2022-08-09 8:40 ` bug#57073: " Axel Svensson
@ 2022-08-09 11:36 ` Eli Zaretskii
2022-08-09 14:56 ` Axel Svensson
1 sibling, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-09 11:36 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
merge 57073 57072
thanks
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Tue, 9 Aug 2022 10:38:30 +0200
>
> IIUC the documentation to glyphless-char-display-control, any of the character groups can be assigned any
> of the display methods.
But not every glyphless character has an acronym, so this is not a
bug. You are suggesting an enhancement (which is fine).
> First bug + patch:
> Using update-glyphless-char-display to choose to display variation-selectors as acronyms does not work
> since these codepoints are missing from char-acronym-table.
> The attached patch attempts to fix this and succeeds for U+FE00..U+FE0E but fails for U+FE0F.
Why are the acronyms you propose so long? Why not use "VS01".."VS16"
instead? Shorter acronyms are an advantage, since they will be
displayed in a more legible way.
> Second bug:
> It seems that U+FE0F will not at all respect glyphless-char-display, instead always showing as an empty
> box.
> This I have not solved.
Please show a recipe for that starting from "emacs -Q".
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 11:36 ` bug#57072: " Eli Zaretskii
@ 2022-08-09 14:56 ` Axel Svensson
2022-08-09 16:23 ` Eli Zaretskii
2022-08-11 14:01 ` Eli Zaretskii
0 siblings, 2 replies; 26+ messages in thread
From: Axel Svensson @ 2022-08-09 14:56 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
[-- Attachment #1.1: Type: text/plain, Size: 1763 bytes --]
> You are suggesting an enhancement (which is fine).
Acknowledged.
See new patch attached.
It turns out there are 256 variation selectors, so I've included some fixes
for selectors 17-256 as well.
admin/unidata/blocks.awk is an exception; it seems to deal with only VS
1-16, but I have not fixed it.
> Why are the acronyms you propose so long? Why not use "VS01".."VS16"
You're right, that is better. The attached patch is fixed to have shorter
acronyms.
The acronyms I've chosen are "VS-1" through "VS-9", "VS10" through "VS99"
and "VS-100" through "VS-256".
Not sure that's optimal, perhaps "VS01" or "VS 1" is better, what do you
think?
> Please show a recipe for that starting from "emacs -Q".
To reproduce:
1) Start emacs -Q under X11.
2) Evaluate:
(progn
(let ((vs-acronyms
'("VS01" "VS02" "VS03" "VS04"
"VS05" "VS06" "VS07" "VS08"
"VS09" "VS10" "VS11" "VS12"
"VS13" "VS14" "VS15" "VS16")))
(dotimes (i 16)
(aset char-acronym-table (+ #xfe00 i) (car vs-acronyms))
(setq vs-acronyms (cdr vs-acronyms))))
(update-glyphless-char-display
'glyphless-char-display-control
'((format-control . acronym)
(variation-selectors . acronym)
(no-font . hex-code)))
(insert #xfe00 #xfe01 #xfe0e #xfe0f))
Expected:
Four boxes are shown, all of which contain "VS" in the upper half, and in
the lower half "01", "02", "15" and "16" respectively.
Actual:
The three first boxes appear as expected, but the fourth is empty.
Througout the codebase, I see U+FE0F sometimes singled out and treated
differently than the other variation selectors, so this isn't entirely
strange.
in places including:
- admin/unidata/emoji-data.txt:778
- admin/unidata/emoji-zwj.awk:102
- lisp/composite.el:856
[-- Attachment #1.2: Type: text/html, Size: 2620 bytes --]
[-- Attachment #2: 0001-Fixes-for-variation-selectors.patch --]
[-- Type: text/x-patch, Size: 3531 bytes --]
From a4ec9eae3de84bf9501c0d3f97ccade600716634 Mon Sep 17 00:00:00 2001
From: Axel Svensson <mail@axelsvensson.com>
Date: Tue, 9 Aug 2022 01:11:02 +0200
Subject: [PATCH] Fixes for variation selectors
---
doc/lispref/display.texi | 6 +++---
lisp/international/characters.el | 24 ++++++++++++++++++------
2 files changed, 21 insertions(+), 9 deletions(-)
diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi
index ace67fbedb..96079dc106 100644
--- a/doc/lispref/display.texi
+++ b/doc/lispref/display.texi
@@ -8596,9 +8596,9 @@ Glyphless Chars
images, such as U+00AD @sc{soft hyphen}.
@item variation-selectors
-Unicode VS-1 through VS-16 (U+FE00 through U+FE0F), which are used to
-select between different glyphs for the same codepoints (typically
-emojis).
+Unicode VS-1 through VS-256 (U+FE00 through U+FE0F and U+E0100 through
+U+E01EF), which are used to select between different glyphs for the same
+codepoints (typically emojis).
@item no-font
Characters for which there is no suitable font, or which cannot be
diff --git a/lisp/international/characters.el b/lisp/international/characters.el
index ca28222c81..78f8447208 100644
--- a/lisp/international/characters.el
+++ b/lisp/international/characters.el
@@ -1243,7 +1243,8 @@ ?L
(#x1E026 . #x1E02A)
(#x1E8D0 . #x1E8D6)
(#x1E944 . #x1E94A)
- (#xE0001 . #xE01EF))))
+ (#xE0001 . #xE01EF)
+ (#xE0100 . #xE01EF))))
(dolist (elt l)
(set-char-table-range char-width-table elt 0)))
@@ -1525,6 +1526,15 @@ char-acronym-table
(aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i))))
(aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG
+(dotimes (i 256)
+ (let* ((vs-number (1+ i))
+ (codepoint (if (< i 16)
+ (+ #xfe00 i)
+ (+ #xe0100 i -16)))
+ (dash (if (<= 10 vs-number 99) "" "-")))
+ (aset char-acronym-table codepoint
+ (format "VS%s%s" dash vs-number))))
+
;; We can't use the \N{name} things here, because this file is used
;; too early in the build process.
(defvar bidi-control-characters
@@ -1574,7 +1584,9 @@ update-glyphless-char-display
#x80 #x9F method))
((eq target 'variation-selectors)
(glyphless-set-char-table-range glyphless-char-display
- #xFE00 #xFE0F method))
+ #xFE00 #xFE0F method)
+ (glyphless-set-char-table-range glyphless-char-display
+ #xE0100 #xE01EF method))
((or (eq target 'format-control)
(eq target 'bidi-control))
(when unicode-category-table
@@ -1647,10 +1659,10 @@ glyphless-char-display-control
that are relevant for bidirectional formatting control,
like U+2069 (PDI) and U+202B (RLE).
`variation-selectors':
- Characters in the range U+FE00..U+FE0F, used for
- selecting alternate glyph presentations, such as
- Emoji vs Text presentation, of the preceding
- character(s).
+ Characters in the range U+FE00..U+FE0F and
+ U+E0100..U+E01EF, used for selecting alternate glyph
+ presentations, such as Emoji vs Text presentation, of
+ the preceding character(s).
`no-font': For GUI frames, characters for which no suitable
font is found; for text-mode frames, characters
that cannot be encoded by `terminal-coding-system'.
--
2.30.2
^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 14:56 ` Axel Svensson
@ 2022-08-09 16:23 ` Eli Zaretskii
2022-08-09 20:33 ` Axel Svensson
2022-08-11 14:01 ` Eli Zaretskii
1 sibling, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-09 16:23 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Tue, 9 Aug 2022 16:56:37 +0200
> Cc: 57072@debbugs.gnu.org
>
> See new patch attached.
Thanks, I will review it soon.
> To reproduce:
> 1) Start emacs -Q under X11.
> 2) Evaluate:
>
> (progn
> (let ((vs-acronyms
> '("VS01" "VS02" "VS03" "VS04"
> "VS05" "VS06" "VS07" "VS08"
> "VS09" "VS10" "VS11" "VS12"
> "VS13" "VS14" "VS15" "VS16")))
> (dotimes (i 16)
> (aset char-acronym-table (+ #xfe00 i) (car vs-acronyms))
> (setq vs-acronyms (cdr vs-acronyms))))
> (update-glyphless-char-display
> 'glyphless-char-display-control
> '((format-control . acronym)
> (variation-selectors . acronym)
> (no-font . hex-code)))
> (insert #xfe00 #xfe01 #xfe0e #xfe0f))
>
> Expected:
> Four boxes are shown, all of which contain "VS" in the upper half, and in the lower half "01", "02", "15" and
> "16" respectively.
>
> Actual:
> The three first boxes appear as expected, but the fourth is empty.
>
> Througout the codebase, I see U+FE0F sometimes singled out and treated differently than the other variation
> selectors, so this isn't entirely strange.
> in places including:
> - admin/unidata/emoji-data.txt:778
> - admin/unidata/emoji-zwj.awk:102
> - lisp/composite.el:856
This character (as any other character) will only be displayed using
the glyphless-char-display setup if it is shown as a separate
character. If it is composed with other surrounding characters, it
will be shown as the font tells us to show that sequence, and in that
case Emacs doesn't consult glyphless-char-display at all.
Now, VS16 is almost always composed with preceding characters, so I
think you can only see it as acronym if you deliberately force Emacs
not to compose it, e.g. by preceding it with U+20DD COMBINING
ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH
NON-JOINER, or disable auto-composition-mode.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 16:23 ` Eli Zaretskii
@ 2022-08-09 20:33 ` Axel Svensson
2022-08-10 13:10 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-09 20:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
> Now, VS16 is almost always composed with preceding characters, so I
> think you can only see it as acronym if you deliberately force Emacs
> not to compose it, e.g. by preceding it with U+20DD COMBINING
> ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH
> NON-JOINER, or disable auto-composition-mode.
- Preceding it with U+20DD still produces the empty box
- Preceding it and following it by U+200C still produces the empty box
- Disabling auto-composition-mode produces the "VS16" acronym.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 20:33 ` Axel Svensson
@ 2022-08-10 13:10 ` Eli Zaretskii
2022-08-16 11:55 ` Robert Pluim
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-10 13:10 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Tue, 9 Aug 2022 22:33:40 +0200
> Cc: 57072@debbugs.gnu.org
>
> > Now, VS16 is almost always composed with preceding characters, so I
> > think you can only see it as acronym if you deliberately force Emacs
> > not to compose it, e.g. by preceding it with U+20DD COMBINING
> > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH
> > NON-JOINER, or disable auto-composition-mode.
>
> - Preceding it with U+20DD still produces the empty box
> - Preceding it and following it by U+200C still produces the empty box
> - Disabling auto-composition-mode produces the "VS16" acronym.
Yes, I think this is because of the special composition rules we have
for VS16 (which are required to display Emoji sequences correctly).
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-09 14:56 ` Axel Svensson
2022-08-09 16:23 ` Eli Zaretskii
@ 2022-08-11 14:01 ` Eli Zaretskii
2022-08-11 14:58 ` Axel Svensson
1 sibling, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-11 14:01 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Tue, 9 Aug 2022 16:56:37 +0200
> Cc: 57072@debbugs.gnu.org
>
> See new patch attached.
>
> It turns out there are 256 variation selectors, so I've included some fixes for selectors 17-256 as well.
> admin/unidata/blocks.awk is an exception; it seems to deal with only VS 1-16, but I have not fixed it.
>
> > Why are the acronyms you propose so long? Why not use "VS01".."VS16"
> You're right, that is better. The attached patch is fixed to have shorter acronyms.
> The acronyms I've chosen are "VS-1" through "VS-9", "VS10" through "VS99" and "VS-100" through
> "VS-256".
> Not sure that's optimal, perhaps "VS01" or "VS 1" is better, what do you think?
I think "VS01" is better.
> diff --git a/lisp/international/characters.el b/lisp/international/characters.el
> index ca28222c81..78f8447208 100644
> --- a/lisp/international/characters.el
> +++ b/lisp/international/characters.el
> @@ -1243,7 +1243,8 @@ ?L
> (#x1E026 . #x1E02A)
> (#x1E8D0 . #x1E8D6)
> (#x1E944 . #x1E94A)
> - (#xE0001 . #xE01EF))))
> + (#xE0001 . #xE01EF)
> + (#xE0100 . #xE01EF))))
> (dolist (elt l)
> (set-char-table-range char-width-table elt 0)))
This hunk is a mistake, I think: the original code already covered all
range of these characters.
Thanks.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-11 14:01 ` Eli Zaretskii
@ 2022-08-11 14:58 ` Axel Svensson
2022-08-11 16:19 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-11 14:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
[-- Attachment #1: Type: text/plain, Size: 196 bytes --]
> I think "VS01" is better.
Fixed to be "VS01" through "VS09", "VS10" through "VS99" and "VS 100"
through "VS 256".
> This hunk is a mistake, I think
Good catch, fixed.
See new patch attached.
[-- Attachment #2: 0001-Fixes-for-variation-selectors.patch --]
[-- Type: text/x-patch, Size: 3367 bytes --]
From 033527ea3edcf414e28deb702eabfa5cea910487 Mon Sep 17 00:00:00 2001
From: Axel Svensson <mail@axelsvensson.com>
Date: Tue, 9 Aug 2022 01:11:02 +0200
Subject: [PATCH] Fixes for variation selectors
---
doc/lispref/display.texi | 6 +++---
lisp/international/characters.el | 23 ++++++++++++++++++-----
2 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi
index ace67fbedb..96079dc106 100644
--- a/doc/lispref/display.texi
+++ b/doc/lispref/display.texi
@@ -8596,9 +8596,9 @@ Glyphless Chars
images, such as U+00AD @sc{soft hyphen}.
@item variation-selectors
-Unicode VS-1 through VS-16 (U+FE00 through U+FE0F), which are used to
-select between different glyphs for the same codepoints (typically
-emojis).
+Unicode VS-1 through VS-256 (U+FE00 through U+FE0F and U+E0100 through
+U+E01EF), which are used to select between different glyphs for the same
+codepoints (typically emojis).
@item no-font
Characters for which there is no suitable font, or which cannot be
diff --git a/lisp/international/characters.el b/lisp/international/characters.el
index ca28222c81..d6e83c81e7 100644
--- a/lisp/international/characters.el
+++ b/lisp/international/characters.el
@@ -1525,6 +1525,17 @@ char-acronym-table
(aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i))))
(aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG
+(dotimes (i 256)
+ (let* ((vs-number (1+ i))
+ (codepoint (if (< i 16)
+ (+ #xfe00 i)
+ (+ #xe0100 i -16)))
+ (delimiter (cond ((<= vs-number 9) "0")
+ ((<= vs-number 99) "")
+ (t " "))))
+ (aset char-acronym-table codepoint
+ (format "VS%s%s" delimiter vs-number))))
+
;; We can't use the \N{name} things here, because this file is used
;; too early in the build process.
(defvar bidi-control-characters
@@ -1574,7 +1585,9 @@ update-glyphless-char-display
#x80 #x9F method))
((eq target 'variation-selectors)
(glyphless-set-char-table-range glyphless-char-display
- #xFE00 #xFE0F method))
+ #xFE00 #xFE0F method)
+ (glyphless-set-char-table-range glyphless-char-display
+ #xE0100 #xE01EF method))
((or (eq target 'format-control)
(eq target 'bidi-control))
(when unicode-category-table
@@ -1647,10 +1660,10 @@ glyphless-char-display-control
that are relevant for bidirectional formatting control,
like U+2069 (PDI) and U+202B (RLE).
`variation-selectors':
- Characters in the range U+FE00..U+FE0F, used for
- selecting alternate glyph presentations, such as
- Emoji vs Text presentation, of the preceding
- character(s).
+ Characters in the range U+FE00..U+FE0F and
+ U+E0100..U+E01EF, used for selecting alternate glyph
+ presentations, such as Emoji vs Text presentation, of
+ the preceding character(s).
`no-font': For GUI frames, characters for which no suitable
font is found; for text-mode frames, characters
that cannot be encoded by `terminal-coding-system'.
--
2.30.2
^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-11 14:58 ` Axel Svensson
@ 2022-08-11 16:19 ` Eli Zaretskii
2022-08-12 3:33 ` Axel Svensson
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-11 16:19 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Thu, 11 Aug 2022 16:58:37 +0200
> Cc: 57072@debbugs.gnu.org
>
> > I think "VS01" is better.
> Fixed to be "VS01" through "VS09", "VS10" through "VS99" and "VS 100"
> through "VS 256".
>
> > This hunk is a mistake, I think
> Good catch, fixed.
>
> See new patch attached.
Thanks, installed. Please in the future accompany the changes with a
ChangeLog-style log message describing the specific changes.
This changeset was small enough to be accepted without your assigning
copyright to the FSF, but if you'd like to continue contributing to
Emacs, we'd need your legal paperwork vis-a-vis the FSF copyright
clerk. Would you like to start the paperwork rolling at this time?
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-11 16:19 ` Eli Zaretskii
@ 2022-08-12 3:33 ` Axel Svensson
2022-08-12 5:53 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-12 3:33 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
[-- Attachment #1: Type: text/plain, Size: 369 bytes --]
> admin/unidata/blocks.awk is an exception; it seems to deal with only VS
1-16, but I have not fixed it.
How do we handle this one, should I file a new bug? I can't produce any
unexpected behavior, I just think it looks odd, and I do not intend to fix
it myself.
> Thanks, installed.
Great!
> Would you like to start the paperwork rolling at this time?
No thank you.
[-- Attachment #2: Type: text/html, Size: 518 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 3:33 ` Axel Svensson
@ 2022-08-12 5:53 ` Eli Zaretskii
2022-08-12 6:50 ` Axel Svensson
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-12 5:53 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Fri, 12 Aug 2022 05:33:59 +0200
> Cc: 57072@debbugs.gnu.org
>
> > admin/unidata/blocks.awk is an exception; it seems to deal with only VS 1-16, but I have not fixed it.
> How do we handle this one, should I file a new bug? I can't produce any unexpected behavior, I just think it
> looks odd, and I do not intend to fix it myself.
What does Unicode say about the functionality of the variation
selectors beyond VS-16?
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 5:53 ` Eli Zaretskii
@ 2022-08-12 6:50 ` Axel Svensson
2022-08-12 7:10 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-12 6:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
> What does Unicode say about the functionality of the variation
> selectors beyond VS-16?
The code charts divide them into three groups:
- VS1 through VS14 are "Variation selectors" [1]
- VS15 through VS16 are "Emoji-specific variation selectors" [1]
- VS17 through VS256 are "Ideographic-specific variation selectors" [2]
The standard itself in chapter 23.4 [3] makes no distinction between
them but say that the only sanctioned uses that should have any effect,
are the ones defined in:
- StandardizedVariants.txt [4] in the Unicode Character Database, which
currently uses only VS1 through VS3. Confusingly though, some of them
seem to be used for ideographic purposes.
- Unicode Technical Standard #51 for emojis [5], which says that VS15 is
"used to request a text presentation for an emoji character" while
VS16 is "used to request an emoji presentation for an emoji
character".
- Unicode Technical Standard #37 for ideographic variation [6], which
confirms that it only uses VS17 through VS256.
In any case, it seems that admin/unidata/blocks.awk needs fixing, since
it currently handles only VS1 through VS16 and does so as if they were
all for emoji use.
[1] https://www.unicode.org/charts/PDF/UFE00.pdf
[2] https://www.unicode.org/charts/PDF/UE0100.pdf
[3] https://www.unicode.org/versions/Unicode14.0.0/ch23.pdf
[4] https://www.unicode.org/Public/14.0.0/ucd/StandardizedVariants.txt
[5] https://www.unicode.org/reports/tr51/#Emoji_Variation_Sequences
[6] https://www.unicode.org/reports/tr37/
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 6:50 ` Axel Svensson
@ 2022-08-12 7:10 ` Eli Zaretskii
2022-08-12 7:57 ` Axel Svensson
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-12 7:10 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Fri, 12 Aug 2022 08:50:18 +0200
> Cc: 57072@debbugs.gnu.org
>
> > What does Unicode say about the functionality of the variation
> > selectors beyond VS-16?
>
> The code charts divide them into three groups:
> - VS1 through VS14 are "Variation selectors" [1]
> - VS15 through VS16 are "Emoji-specific variation selectors" [1]
> - VS17 through VS256 are "Ideographic-specific variation selectors" [2]
>
> The standard itself in chapter 23.4 [3] makes no distinction between
> them but say that the only sanctioned uses that should have any effect,
> are the ones defined in:
> - StandardizedVariants.txt [4] in the Unicode Character Database, which
> currently uses only VS1 through VS3. Confusingly though, some of them
> seem to be used for ideographic purposes.
> - Unicode Technical Standard #51 for emojis [5], which says that VS15 is
> "used to request a text presentation for an emoji character" while
> VS16 is "used to request an emoji presentation for an emoji
> character".
> - Unicode Technical Standard #37 for ideographic variation [6], which
> confirms that it only uses VS17 through VS256.
>
> In any case, it seems that admin/unidata/blocks.awk needs fixing, since
> it currently handles only VS1 through VS16 and does so as if they were
> all for emoji use.
AFAIR, blocks.awk does what it does only because VS16 has a special
function of requesting the Emoji presentation of characters that are
otherwise not Emoji, and our character-composition code needs to
realize that. Unless the selectors beyond VS16 have similar
functions, I don't see any reason why we'd need to modify blocks.awk.
Or what am I missing? IOW, to which part(s) of blocks.awk did you
allude when you wrote "it currently handles only VS1 through VS16 and
does so as if they were all for emoji use"?
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 7:10 ` Eli Zaretskii
@ 2022-08-12 7:57 ` Axel Svensson
2022-08-12 10:29 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-12 7:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
> Or what am I missing? IOW, to which part(s) of blocks.awk did you
> allude when you wrote "it currently handles only VS1 through VS16 and
> does so as if they were all for emoji use"?
I initially thought it was a mistake to exclude VS17 through VS256, but
now I believe it might be a mistake to include VS1 through VS14. I don't
understand the internals enough to be sure, but one possible fix could
be:
diff --git a/admin/unidata/blocks.awk b/admin/unidata/blocks.awk
index 5f392b5ad3..c14fa09863 100755
--- a/admin/unidata/blocks.awk
+++ b/admin/unidata/blocks.awk
@@ -226,7 +226,7 @@ END {
idx = 0
# ## These are here so that font_range can choose Emoji presentation
# ## for the preceding codepoint when it encounters a VS
- override_start[idx] = "FE00"
+ override_start[idx] = "FE0E"
override_end[idx] = "FE0F"
for (k in override_start)
--
^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 7:57 ` Axel Svensson
@ 2022-08-12 10:29 ` Eli Zaretskii
2022-08-12 11:51 ` Axel Svensson
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-12 10:29 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Fri, 12 Aug 2022 09:57:32 +0200
> Cc: 57072@debbugs.gnu.org
>
> I initially thought it was a mistake to exclude VS17 through VS256, but
> now I believe it might be a mistake to include VS1 through VS14. I don't
> understand the internals enough to be sure, but one possible fix could
> be:
Why do you think including VS1 through VS14 is a mistake?
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 10:29 ` Eli Zaretskii
@ 2022-08-12 11:51 ` Axel Svensson
2022-08-12 12:46 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-12 11:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072
> Why do you think including VS1 through VS14 is a mistake?
It appears like blocks.awk somehow designates VS1 through VS14 for
emoji use, while the Unicode standard per [1] and [5] above seem to
exclude them from emoji use. I am not sure whether VS1 through VS14, or
VS17 through VS256 need to be designated to some other script by
blocks.awk.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 11:51 ` Axel Svensson
@ 2022-08-12 12:46 ` Eli Zaretskii
2022-08-16 8:05 ` Robert Pluim
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-12 12:46 UTC (permalink / raw)
To: Axel Svensson, Robert Pluim; +Cc: 57072
> From: Axel Svensson <svenssonaxel@gmail.com>
> Date: Fri, 12 Aug 2022 13:51:21 +0200
> Cc: 57072@debbugs.gnu.org
>
> > Why do you think including VS1 through VS14 is a mistake?
> It appears like blocks.awk somehow designates VS1 through VS14 for
> emoji use, while the Unicode standard per [1] and [5] above seem to
> exclude them from emoji use. I am not sure whether VS1 through VS14, or
> VS17 through VS256 need to be designated to some other script by
> blocks.awk.
So you are saying that we should exclude VS1 through VS14 from the
Emoji script?
Robert, do you remember why we included them in the script?
As for VS17 and above, I'm not sure we should assign them to any
script. Perhaps to Han?
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-12 12:46 ` Eli Zaretskii
@ 2022-08-16 8:05 ` Robert Pluim
2022-08-16 13:06 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Robert Pluim @ 2022-08-16 8:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072, Axel Svensson
>>>>> On Fri, 12 Aug 2022 15:46:58 +0300, Eli Zaretskii <eliz@gnu.org> said:
>> From: Axel Svensson <svenssonaxel@gmail.com>
>> Date: Fri, 12 Aug 2022 13:51:21 +0200
>> Cc: 57072@debbugs.gnu.org
>>
>> > Why do you think including VS1 through VS14 is a mistake?
>> It appears like blocks.awk somehow designates VS1 through VS14 for
>> emoji use, while the Unicode standard per [1] and [5] above seem to
>> exclude them from emoji use. I am not sure whether VS1 through VS14, or
>> VS17 through VS256 need to be designated to some other script by
>> blocks.awk.
Eli> So you are saying that we should exclude VS1 through VS14 from the
Eli> Emoji script?
Eli> Robert, do you remember why we included them in the script?
Hmm. Ignorance on my part seems the most likely explanation. VS1-14
are not used for emoji/text presentation selection, so we should
probably just fix blocks.awk
Eli> As for VS17 and above, I'm not sure we should assign them to any
Eli> script. Perhaps to Han?
What problems are caused by them not having a script? The composition
rules for them with Han codepoints work now, no?
Robert
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-10 13:10 ` Eli Zaretskii
@ 2022-08-16 11:55 ` Robert Pluim
2022-08-16 12:01 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Robert Pluim @ 2022-08-16 11:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072, Axel Svensson
>>>>> On Wed, 10 Aug 2022 16:10:46 +0300, Eli Zaretskii <eliz@gnu.org> said:
>> From: Axel Svensson <svenssonaxel@gmail.com>
>> Date: Tue, 9 Aug 2022 22:33:40 +0200
>> Cc: 57072@debbugs.gnu.org
>>
>> > Now, VS16 is almost always composed with preceding characters, so I
>> > think you can only see it as acronym if you deliberately force Emacs
>> > not to compose it, e.g. by preceding it with U+20DD COMBINING
>> > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH
>> > NON-JOINER, or disable auto-composition-mode.
>>
>> - Preceding it with U+20DD still produces the empty box
>> - Preceding it and following it by U+200C still produces the empty box
>> - Disabling auto-composition-mode produces the "VS16" acronym.
Eli> Yes, I think this is because of the special composition rules we have
Eli> for VS16 (which are required to display Emoji sequences correctly).
I guess we could adjust the composition rules for U+FE0F, but getting
that right could be tricky (there are many of them, and there will be
ordering dependencies).
️
Robert
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-16 11:55 ` Robert Pluim
@ 2022-08-16 12:01 ` Eli Zaretskii
0 siblings, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-16 12:01 UTC (permalink / raw)
To: Robert Pluim; +Cc: 57072, svenssonaxel
> From: Robert Pluim <rpluim@gmail.com>
> Cc: Axel Svensson <svenssonaxel@gmail.com>, 57072@debbugs.gnu.org
> Date: Tue, 16 Aug 2022 13:55:52 +0200
>
> >>>>> On Wed, 10 Aug 2022 16:10:46 +0300, Eli Zaretskii <eliz@gnu.org> said:
>
> >> From: Axel Svensson <svenssonaxel@gmail.com>
> >> Date: Tue, 9 Aug 2022 22:33:40 +0200
> >> Cc: 57072@debbugs.gnu.org
> >>
> >> > Now, VS16 is almost always composed with preceding characters, so I
> >> > think you can only see it as acronym if you deliberately force Emacs
> >> > not to compose it, e.g. by preceding it with U+20DD COMBINING
> >> > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH
> >> > NON-JOINER, or disable auto-composition-mode.
> >>
> >> - Preceding it with U+20DD still produces the empty box
> >> - Preceding it and following it by U+200C still produces the empty box
> >> - Disabling auto-composition-mode produces the "VS16" acronym.
>
> Eli> Yes, I think this is because of the special composition rules we have
> Eli> for VS16 (which are required to display Emoji sequences correctly).
>
> I guess we could adjust the composition rules for U+FE0F, but getting
> that right could be tricky (there are many of them, and there will be
> ordering dependencies).
We could, but I'm not sure it's worth the hassle. There's no
particular reason for people to want to display VS-16 as an acronym,
of all the ways, since it almost always should be composed.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-16 8:05 ` Robert Pluim
@ 2022-08-16 13:06 ` Eli Zaretskii
2022-08-16 13:27 ` Robert Pluim
0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-16 13:06 UTC (permalink / raw)
To: Robert Pluim; +Cc: 57072, svenssonaxel
> From: Robert Pluim <rpluim@gmail.com>
> Cc: Axel Svensson <svenssonaxel@gmail.com>, 57072@debbugs.gnu.org
> Date: Tue, 16 Aug 2022 10:05:12 +0200
>
> Eli> Robert, do you remember why we included them in the script?
>
> Hmm. Ignorance on my part seems the most likely explanation. VS1-14
> are not used for emoji/text presentation selection, so we should
> probably just fix blocks.awk
According to the comment in composite.el, we should leave only VS-16
in the 'emoji' script.
> Eli> As for VS17 and above, I'm not sure we should assign them to any
> Eli> script. Perhaps to Han?
>
> What problems are caused by them not having a script? The composition
> rules for them with Han codepoints work now, no?
Yes, because they are set up in composite.el. So I think we are good
there.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-16 13:06 ` Eli Zaretskii
@ 2022-08-16 13:27 ` Robert Pluim
2022-08-16 13:39 ` Axel Svensson
0 siblings, 1 reply; 26+ messages in thread
From: Robert Pluim @ 2022-08-16 13:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 57072, svenssonaxel
>>>>> On Tue, 16 Aug 2022 16:06:12 +0300, Eli Zaretskii <eliz@gnu.org> said:
>> From: Robert Pluim <rpluim@gmail.com>
>> Cc: Axel Svensson <svenssonaxel@gmail.com>, 57072@debbugs.gnu.org
>> Date: Tue, 16 Aug 2022 10:05:12 +0200
>>
Eli> Robert, do you remember why we included them in the script?
>>
>> Hmm. Ignorance on my part seems the most likely explanation. VS1-14
>> are not used for emoji/text presentation selection, so we should
>> probably just fix blocks.awk
Eli> According to the comment in composite.el, we should leave only VS-16
Eli> in the 'emoji' script.
Yes. Thank you past-me for reminding present-us (Iʼd forgotten Iʼd
written that 😀)
I can do that later this week, unless the reporter of this bug wants
to handle it?
Robert
--
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-16 13:27 ` Robert Pluim
@ 2022-08-16 13:39 ` Axel Svensson
2022-08-16 14:48 ` Robert Pluim
0 siblings, 1 reply; 26+ messages in thread
From: Axel Svensson @ 2022-08-16 13:39 UTC (permalink / raw)
To: Robert Pluim; +Cc: 57072, Eli Zaretskii
> I can do that later this week, unless the reporter of this bug wants
> to handle it?
Nope, go ahead.
^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-16 13:39 ` Axel Svensson
@ 2022-08-16 14:48 ` Robert Pluim
2022-08-16 16:27 ` Eli Zaretskii
0 siblings, 1 reply; 26+ messages in thread
From: Robert Pluim @ 2022-08-16 14:48 UTC (permalink / raw)
To: Axel Svensson; +Cc: 57072, Eli Zaretskii
>>>>> On Tue, 16 Aug 2022 15:39:49 +0200, Axel Svensson <svenssonaxel@gmail.com> said:
>> I can do that later this week, unless the reporter of this bug wants
>> to handle it?
Axel> Nope, go ahead.
This doesnʼt have any negative effects on emoji display (using
admin/unidata/emoji-{zwj-,}sequences.txt) that I can see.
Iʼll test some more and push by the end of the week.
diff --git a/admin/unidata/blocks.awk b/admin/unidata/blocks.awk
index 5f392b5ad3..0d07a10f2a 100755
--- a/admin/unidata/blocks.awk
+++ b/admin/unidata/blocks.awk
@@ -224,9 +224,11 @@ FILENAME ~ "emoji-data.txt" && /^[0-9A-F].*; Emoji_Presentation / {
END {
idx = 0
- # ## These are here so that font_range can choose Emoji presentation
- # ## for the preceding codepoint when it encounters a VS
- override_start[idx] = "FE00"
+ ## This is here so that font_range can choose Emoji presentation
+ ## for the preceding codepoint when it encounters a VS-16 (U+FE0F).
+ ## It originally covered the whole FE00-FE0F range, but that
+ ## turned out to be a mistake.
+ override_start[idx] = "FE0F"
override_end[idx] = "FE0F"
for (k in override_start)
^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors
2022-08-16 14:48 ` Robert Pluim
@ 2022-08-16 16:27 ` Eli Zaretskii
0 siblings, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2022-08-16 16:27 UTC (permalink / raw)
To: Robert Pluim; +Cc: 57072, svenssonaxel
> From: Robert Pluim <rpluim@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, 57072@debbugs.gnu.org
> Date: Tue, 16 Aug 2022 16:48:07 +0200
>
> --- a/admin/unidata/blocks.awk
> +++ b/admin/unidata/blocks.awk
> @@ -224,9 +224,11 @@ FILENAME ~ "emoji-data.txt" && /^[0-9A-F].*; Emoji_Presentation / {
>
> END {
> idx = 0
> - # ## These are here so that font_range can choose Emoji presentation
> - # ## for the preceding codepoint when it encounters a VS
> - override_start[idx] = "FE00"
> + ## This is here so that font_range can choose Emoji presentation
> + ## for the preceding codepoint when it encounters a VS-16 (U+FE0F).
> + ## It originally covered the whole FE00-FE0F range, but that
> + ## turned out to be a mistake.
> + override_start[idx] = "FE0F"
> override_end[idx] = "FE0F"
>
> for (k in override_start)
>
That LGTM, thanks. But please mention in the comment the stuff in
composite.el which handles the other variation selectors, so that
these different places would be easier to find and inspect.
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2022-08-16 16:27 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-09 8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson
2022-08-09 8:40 ` bug#57073: " Axel Svensson
2022-08-09 11:36 ` bug#57072: " Eli Zaretskii
2022-08-09 14:56 ` Axel Svensson
2022-08-09 16:23 ` Eli Zaretskii
2022-08-09 20:33 ` Axel Svensson
2022-08-10 13:10 ` Eli Zaretskii
2022-08-16 11:55 ` Robert Pluim
2022-08-16 12:01 ` Eli Zaretskii
2022-08-11 14:01 ` Eli Zaretskii
2022-08-11 14:58 ` Axel Svensson
2022-08-11 16:19 ` Eli Zaretskii
2022-08-12 3:33 ` Axel Svensson
2022-08-12 5:53 ` Eli Zaretskii
2022-08-12 6:50 ` Axel Svensson
2022-08-12 7:10 ` Eli Zaretskii
2022-08-12 7:57 ` Axel Svensson
2022-08-12 10:29 ` Eli Zaretskii
2022-08-12 11:51 ` Axel Svensson
2022-08-12 12:46 ` Eli Zaretskii
2022-08-16 8:05 ` Robert Pluim
2022-08-16 13:06 ` Eli Zaretskii
2022-08-16 13:27 ` Robert Pluim
2022-08-16 13:39 ` Axel Svensson
2022-08-16 14:48 ` Robert Pluim
2022-08-16 16:27 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).