* bug#57072: [BUG] update-glyphless-char-display and variation selectors @ 2022-08-09 8:38 Axel Svensson 2022-08-09 8:40 ` bug#57073: " Axel Svensson 2022-08-09 11:36 ` bug#57072: " Eli Zaretskii 0 siblings, 2 replies; 26+ messages in thread From: Axel Svensson @ 2022-08-09 8:38 UTC (permalink / raw) To: 57072 [-- Attachment #1: Type: text/plain, Size: 758 bytes --] IIUC the documentation to glyphless-char-display-control, any of the character groups can be assigned any of the display methods. First bug + patch: Using update-glyphless-char-display to choose to display variation-selectors as acronyms does not work since these codepoints are missing from char-acronym-table. The attached patch attempts to fix this and succeeds for U+FE00..U+FE0E but fails for U+FE0F. Second bug: It seems that U+FE0F will not at all respect glyphless-char-display, instead always showing as an empty box. This I have not solved. Version: GNU Emacs 28.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0) of 2022-07-21 Built from source, commit 5a223c7f2ef4c31abbd46367b6ea83cd19d30aa7 Regards, Axel Svensson [-- Attachment #2: Type: text/html, Size: 982 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57073: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson @ 2022-08-09 8:40 ` Axel Svensson 2022-08-09 11:36 ` bug#57072: " Eli Zaretskii 1 sibling, 0 replies; 26+ messages in thread From: Axel Svensson @ 2022-08-09 8:40 UTC (permalink / raw) To: 57073 [-- Attachment #1: Type: text/plain, Size: 202 bytes --] On Tue, Aug 9, 2022 at 10:38 AM Axel Svensson <svenssonaxel@gmail.com> wrote: > The attached patch attempts to fix this and succeeds for U+FE00..U+FE0E but fails for U+FE0F. Sorry, patch attached now. [-- Attachment #2: 0001-Add-variation-selectors-to-char-acronym-table.patch --] [-- Type: text/x-patch, Size: 1209 bytes --] From d7d8cb6c0111223aa2492db5248818af2e789a1f Mon Sep 17 00:00:00 2001 From: Axel Svensson <mail@axelsvensson.com> Date: Tue, 9 Aug 2022 01:11:02 +0200 Subject: [PATCH] Add variation selectors to `char-acronym-table' --- lisp/international/characters.el | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/lisp/international/characters.el b/lisp/international/characters.el index ca28222c81..616480769d 100644 --- a/lisp/international/characters.el +++ b/lisp/international/characters.el @@ -1525,6 +1525,15 @@ Setup `char-width-table' appropriate for non-CJK language environment." (aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i)))) (aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG +(let ((vs-acronyms + '("VS 1" "VS 2" "VS 3" "VS 4" + "VS 5" "VS 6" "VS 7" "VS 8" + "VS 9" "VS 10" "VS 11" "VS 12" + "VS 13" "VS 14" "VS 15" "VS 16"))) + (dotimes (i 16) + (aset char-acronym-table (+ #xfe00 i) (car vs-acronyms)) + (setq vs-acronyms (cdr vs-acronyms)))) + ;; We can't use the \N{name} things here, because this file is used ;; too early in the build process. (defvar bidi-control-characters -- 2.30.2 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson 2022-08-09 8:40 ` bug#57073: " Axel Svensson @ 2022-08-09 11:36 ` Eli Zaretskii 2022-08-09 14:56 ` Axel Svensson 1 sibling, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-09 11:36 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 merge 57073 57072 thanks > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Tue, 9 Aug 2022 10:38:30 +0200 > > IIUC the documentation to glyphless-char-display-control, any of the character groups can be assigned any > of the display methods. But not every glyphless character has an acronym, so this is not a bug. You are suggesting an enhancement (which is fine). > First bug + patch: > Using update-glyphless-char-display to choose to display variation-selectors as acronyms does not work > since these codepoints are missing from char-acronym-table. > The attached patch attempts to fix this and succeeds for U+FE00..U+FE0E but fails for U+FE0F. Why are the acronyms you propose so long? Why not use "VS01".."VS16" instead? Shorter acronyms are an advantage, since they will be displayed in a more legible way. > Second bug: > It seems that U+FE0F will not at all respect glyphless-char-display, instead always showing as an empty > box. > This I have not solved. Please show a recipe for that starting from "emacs -Q". Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 11:36 ` bug#57072: " Eli Zaretskii @ 2022-08-09 14:56 ` Axel Svensson 2022-08-09 16:23 ` Eli Zaretskii 2022-08-11 14:01 ` Eli Zaretskii 0 siblings, 2 replies; 26+ messages in thread From: Axel Svensson @ 2022-08-09 14:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 [-- Attachment #1.1: Type: text/plain, Size: 1763 bytes --] > You are suggesting an enhancement (which is fine). Acknowledged. See new patch attached. It turns out there are 256 variation selectors, so I've included some fixes for selectors 17-256 as well. admin/unidata/blocks.awk is an exception; it seems to deal with only VS 1-16, but I have not fixed it. > Why are the acronyms you propose so long? Why not use "VS01".."VS16" You're right, that is better. The attached patch is fixed to have shorter acronyms. The acronyms I've chosen are "VS-1" through "VS-9", "VS10" through "VS99" and "VS-100" through "VS-256". Not sure that's optimal, perhaps "VS01" or "VS 1" is better, what do you think? > Please show a recipe for that starting from "emacs -Q". To reproduce: 1) Start emacs -Q under X11. 2) Evaluate: (progn (let ((vs-acronyms '("VS01" "VS02" "VS03" "VS04" "VS05" "VS06" "VS07" "VS08" "VS09" "VS10" "VS11" "VS12" "VS13" "VS14" "VS15" "VS16"))) (dotimes (i 16) (aset char-acronym-table (+ #xfe00 i) (car vs-acronyms)) (setq vs-acronyms (cdr vs-acronyms)))) (update-glyphless-char-display 'glyphless-char-display-control '((format-control . acronym) (variation-selectors . acronym) (no-font . hex-code))) (insert #xfe00 #xfe01 #xfe0e #xfe0f)) Expected: Four boxes are shown, all of which contain "VS" in the upper half, and in the lower half "01", "02", "15" and "16" respectively. Actual: The three first boxes appear as expected, but the fourth is empty. Througout the codebase, I see U+FE0F sometimes singled out and treated differently than the other variation selectors, so this isn't entirely strange. in places including: - admin/unidata/emoji-data.txt:778 - admin/unidata/emoji-zwj.awk:102 - lisp/composite.el:856 [-- Attachment #1.2: Type: text/html, Size: 2620 bytes --] [-- Attachment #2: 0001-Fixes-for-variation-selectors.patch --] [-- Type: text/x-patch, Size: 3531 bytes --] From a4ec9eae3de84bf9501c0d3f97ccade600716634 Mon Sep 17 00:00:00 2001 From: Axel Svensson <mail@axelsvensson.com> Date: Tue, 9 Aug 2022 01:11:02 +0200 Subject: [PATCH] Fixes for variation selectors --- doc/lispref/display.texi | 6 +++--- lisp/international/characters.el | 24 ++++++++++++++++++------ 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi index ace67fbedb..96079dc106 100644 --- a/doc/lispref/display.texi +++ b/doc/lispref/display.texi @@ -8596,9 +8596,9 @@ Glyphless Chars images, such as U+00AD @sc{soft hyphen}. @item variation-selectors -Unicode VS-1 through VS-16 (U+FE00 through U+FE0F), which are used to -select between different glyphs for the same codepoints (typically -emojis). +Unicode VS-1 through VS-256 (U+FE00 through U+FE0F and U+E0100 through +U+E01EF), which are used to select between different glyphs for the same +codepoints (typically emojis). @item no-font Characters for which there is no suitable font, or which cannot be diff --git a/lisp/international/characters.el b/lisp/international/characters.el index ca28222c81..78f8447208 100644 --- a/lisp/international/characters.el +++ b/lisp/international/characters.el @@ -1243,7 +1243,8 @@ ?L (#x1E026 . #x1E02A) (#x1E8D0 . #x1E8D6) (#x1E944 . #x1E94A) - (#xE0001 . #xE01EF)))) + (#xE0001 . #xE01EF) + (#xE0100 . #xE01EF)))) (dolist (elt l) (set-char-table-range char-width-table elt 0))) @@ -1525,6 +1526,15 @@ char-acronym-table (aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i)))) (aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG +(dotimes (i 256) + (let* ((vs-number (1+ i)) + (codepoint (if (< i 16) + (+ #xfe00 i) + (+ #xe0100 i -16))) + (dash (if (<= 10 vs-number 99) "" "-"))) + (aset char-acronym-table codepoint + (format "VS%s%s" dash vs-number)))) + ;; We can't use the \N{name} things here, because this file is used ;; too early in the build process. (defvar bidi-control-characters @@ -1574,7 +1584,9 @@ update-glyphless-char-display #x80 #x9F method)) ((eq target 'variation-selectors) (glyphless-set-char-table-range glyphless-char-display - #xFE00 #xFE0F method)) + #xFE00 #xFE0F method) + (glyphless-set-char-table-range glyphless-char-display + #xE0100 #xE01EF method)) ((or (eq target 'format-control) (eq target 'bidi-control)) (when unicode-category-table @@ -1647,10 +1659,10 @@ glyphless-char-display-control that are relevant for bidirectional formatting control, like U+2069 (PDI) and U+202B (RLE). `variation-selectors': - Characters in the range U+FE00..U+FE0F, used for - selecting alternate glyph presentations, such as - Emoji vs Text presentation, of the preceding - character(s). + Characters in the range U+FE00..U+FE0F and + U+E0100..U+E01EF, used for selecting alternate glyph + presentations, such as Emoji vs Text presentation, of + the preceding character(s). `no-font': For GUI frames, characters for which no suitable font is found; for text-mode frames, characters that cannot be encoded by `terminal-coding-system'. -- 2.30.2 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 14:56 ` Axel Svensson @ 2022-08-09 16:23 ` Eli Zaretskii 2022-08-09 20:33 ` Axel Svensson 2022-08-11 14:01 ` Eli Zaretskii 1 sibling, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-09 16:23 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Tue, 9 Aug 2022 16:56:37 +0200 > Cc: 57072@debbugs.gnu.org > > See new patch attached. Thanks, I will review it soon. > To reproduce: > 1) Start emacs -Q under X11. > 2) Evaluate: > > (progn > (let ((vs-acronyms > '("VS01" "VS02" "VS03" "VS04" > "VS05" "VS06" "VS07" "VS08" > "VS09" "VS10" "VS11" "VS12" > "VS13" "VS14" "VS15" "VS16"))) > (dotimes (i 16) > (aset char-acronym-table (+ #xfe00 i) (car vs-acronyms)) > (setq vs-acronyms (cdr vs-acronyms)))) > (update-glyphless-char-display > 'glyphless-char-display-control > '((format-control . acronym) > (variation-selectors . acronym) > (no-font . hex-code))) > (insert #xfe00 #xfe01 #xfe0e #xfe0f)) > > Expected: > Four boxes are shown, all of which contain "VS" in the upper half, and in the lower half "01", "02", "15" and > "16" respectively. > > Actual: > The three first boxes appear as expected, but the fourth is empty. > > Througout the codebase, I see U+FE0F sometimes singled out and treated differently than the other variation > selectors, so this isn't entirely strange. > in places including: > - admin/unidata/emoji-data.txt:778 > - admin/unidata/emoji-zwj.awk:102 > - lisp/composite.el:856 This character (as any other character) will only be displayed using the glyphless-char-display setup if it is shown as a separate character. If it is composed with other surrounding characters, it will be shown as the font tells us to show that sequence, and in that case Emacs doesn't consult glyphless-char-display at all. Now, VS16 is almost always composed with preceding characters, so I think you can only see it as acronym if you deliberately force Emacs not to compose it, e.g. by preceding it with U+20DD COMBINING ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH NON-JOINER, or disable auto-composition-mode. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 16:23 ` Eli Zaretskii @ 2022-08-09 20:33 ` Axel Svensson 2022-08-10 13:10 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-09 20:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 > Now, VS16 is almost always composed with preceding characters, so I > think you can only see it as acronym if you deliberately force Emacs > not to compose it, e.g. by preceding it with U+20DD COMBINING > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH > NON-JOINER, or disable auto-composition-mode. - Preceding it with U+20DD still produces the empty box - Preceding it and following it by U+200C still produces the empty box - Disabling auto-composition-mode produces the "VS16" acronym. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 20:33 ` Axel Svensson @ 2022-08-10 13:10 ` Eli Zaretskii 2022-08-16 11:55 ` Robert Pluim 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-10 13:10 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Tue, 9 Aug 2022 22:33:40 +0200 > Cc: 57072@debbugs.gnu.org > > > Now, VS16 is almost always composed with preceding characters, so I > > think you can only see it as acronym if you deliberately force Emacs > > not to compose it, e.g. by preceding it with U+20DD COMBINING > > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH > > NON-JOINER, or disable auto-composition-mode. > > - Preceding it with U+20DD still produces the empty box > - Preceding it and following it by U+200C still produces the empty box > - Disabling auto-composition-mode produces the "VS16" acronym. Yes, I think this is because of the special composition rules we have for VS16 (which are required to display Emoji sequences correctly). ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-10 13:10 ` Eli Zaretskii @ 2022-08-16 11:55 ` Robert Pluim 2022-08-16 12:01 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Robert Pluim @ 2022-08-16 11:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072, Axel Svensson >>>>> On Wed, 10 Aug 2022 16:10:46 +0300, Eli Zaretskii <eliz@gnu.org> said: >> From: Axel Svensson <svenssonaxel@gmail.com> >> Date: Tue, 9 Aug 2022 22:33:40 +0200 >> Cc: 57072@debbugs.gnu.org >> >> > Now, VS16 is almost always composed with preceding characters, so I >> > think you can only see it as acronym if you deliberately force Emacs >> > not to compose it, e.g. by preceding it with U+20DD COMBINING >> > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH >> > NON-JOINER, or disable auto-composition-mode. >> >> - Preceding it with U+20DD still produces the empty box >> - Preceding it and following it by U+200C still produces the empty box >> - Disabling auto-composition-mode produces the "VS16" acronym. Eli> Yes, I think this is because of the special composition rules we have Eli> for VS16 (which are required to display Emoji sequences correctly). I guess we could adjust the composition rules for U+FE0F, but getting that right could be tricky (there are many of them, and there will be ordering dependencies). ️ Robert -- ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-16 11:55 ` Robert Pluim @ 2022-08-16 12:01 ` Eli Zaretskii 0 siblings, 0 replies; 26+ messages in thread From: Eli Zaretskii @ 2022-08-16 12:01 UTC (permalink / raw) To: Robert Pluim; +Cc: 57072, svenssonaxel > From: Robert Pluim <rpluim@gmail.com> > Cc: Axel Svensson <svenssonaxel@gmail.com>, 57072@debbugs.gnu.org > Date: Tue, 16 Aug 2022 13:55:52 +0200 > > >>>>> On Wed, 10 Aug 2022 16:10:46 +0300, Eli Zaretskii <eliz@gnu.org> said: > > >> From: Axel Svensson <svenssonaxel@gmail.com> > >> Date: Tue, 9 Aug 2022 22:33:40 +0200 > >> Cc: 57072@debbugs.gnu.org > >> > >> > Now, VS16 is almost always composed with preceding characters, so I > >> > think you can only see it as acronym if you deliberately force Emacs > >> > not to compose it, e.g. by preceding it with U+20DD COMBINING > >> > ENCLOSING CIRCLE, or precede it and follow it by U+200C ZERO WIDTH > >> > NON-JOINER, or disable auto-composition-mode. > >> > >> - Preceding it with U+20DD still produces the empty box > >> - Preceding it and following it by U+200C still produces the empty box > >> - Disabling auto-composition-mode produces the "VS16" acronym. > > Eli> Yes, I think this is because of the special composition rules we have > Eli> for VS16 (which are required to display Emoji sequences correctly). > > I guess we could adjust the composition rules for U+FE0F, but getting > that right could be tricky (there are many of them, and there will be > ordering dependencies). We could, but I'm not sure it's worth the hassle. There's no particular reason for people to want to display VS-16 as an acronym, of all the ways, since it almost always should be composed. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-09 14:56 ` Axel Svensson 2022-08-09 16:23 ` Eli Zaretskii @ 2022-08-11 14:01 ` Eli Zaretskii 2022-08-11 14:58 ` Axel Svensson 1 sibling, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-11 14:01 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Tue, 9 Aug 2022 16:56:37 +0200 > Cc: 57072@debbugs.gnu.org > > See new patch attached. > > It turns out there are 256 variation selectors, so I've included some fixes for selectors 17-256 as well. > admin/unidata/blocks.awk is an exception; it seems to deal with only VS 1-16, but I have not fixed it. > > > Why are the acronyms you propose so long? Why not use "VS01".."VS16" > You're right, that is better. The attached patch is fixed to have shorter acronyms. > The acronyms I've chosen are "VS-1" through "VS-9", "VS10" through "VS99" and "VS-100" through > "VS-256". > Not sure that's optimal, perhaps "VS01" or "VS 1" is better, what do you think? I think "VS01" is better. > diff --git a/lisp/international/characters.el b/lisp/international/characters.el > index ca28222c81..78f8447208 100644 > --- a/lisp/international/characters.el > +++ b/lisp/international/characters.el > @@ -1243,7 +1243,8 @@ ?L > (#x1E026 . #x1E02A) > (#x1E8D0 . #x1E8D6) > (#x1E944 . #x1E94A) > - (#xE0001 . #xE01EF)))) > + (#xE0001 . #xE01EF) > + (#xE0100 . #xE01EF)))) > (dolist (elt l) > (set-char-table-range char-width-table elt 0))) This hunk is a mistake, I think: the original code already covered all range of these characters. Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-11 14:01 ` Eli Zaretskii @ 2022-08-11 14:58 ` Axel Svensson 2022-08-11 16:19 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-11 14:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 [-- Attachment #1: Type: text/plain, Size: 196 bytes --] > I think "VS01" is better. Fixed to be "VS01" through "VS09", "VS10" through "VS99" and "VS 100" through "VS 256". > This hunk is a mistake, I think Good catch, fixed. See new patch attached. [-- Attachment #2: 0001-Fixes-for-variation-selectors.patch --] [-- Type: text/x-patch, Size: 3367 bytes --] From 033527ea3edcf414e28deb702eabfa5cea910487 Mon Sep 17 00:00:00 2001 From: Axel Svensson <mail@axelsvensson.com> Date: Tue, 9 Aug 2022 01:11:02 +0200 Subject: [PATCH] Fixes for variation selectors --- doc/lispref/display.texi | 6 +++--- lisp/international/characters.el | 23 ++++++++++++++++++----- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi index ace67fbedb..96079dc106 100644 --- a/doc/lispref/display.texi +++ b/doc/lispref/display.texi @@ -8596,9 +8596,9 @@ Glyphless Chars images, such as U+00AD @sc{soft hyphen}. @item variation-selectors -Unicode VS-1 through VS-16 (U+FE00 through U+FE0F), which are used to -select between different glyphs for the same codepoints (typically -emojis). +Unicode VS-1 through VS-256 (U+FE00 through U+FE0F and U+E0100 through +U+E01EF), which are used to select between different glyphs for the same +codepoints (typically emojis). @item no-font Characters for which there is no suitable font, or which cannot be diff --git a/lisp/international/characters.el b/lisp/international/characters.el index ca28222c81..d6e83c81e7 100644 --- a/lisp/international/characters.el +++ b/lisp/international/characters.el @@ -1525,6 +1525,17 @@ char-acronym-table (aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i)))) (aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG +(dotimes (i 256) + (let* ((vs-number (1+ i)) + (codepoint (if (< i 16) + (+ #xfe00 i) + (+ #xe0100 i -16))) + (delimiter (cond ((<= vs-number 9) "0") + ((<= vs-number 99) "") + (t " ")))) + (aset char-acronym-table codepoint + (format "VS%s%s" delimiter vs-number)))) + ;; We can't use the \N{name} things here, because this file is used ;; too early in the build process. (defvar bidi-control-characters @@ -1574,7 +1585,9 @@ update-glyphless-char-display #x80 #x9F method)) ((eq target 'variation-selectors) (glyphless-set-char-table-range glyphless-char-display - #xFE00 #xFE0F method)) + #xFE00 #xFE0F method) + (glyphless-set-char-table-range glyphless-char-display + #xE0100 #xE01EF method)) ((or (eq target 'format-control) (eq target 'bidi-control)) (when unicode-category-table @@ -1647,10 +1660,10 @@ glyphless-char-display-control that are relevant for bidirectional formatting control, like U+2069 (PDI) and U+202B (RLE). `variation-selectors': - Characters in the range U+FE00..U+FE0F, used for - selecting alternate glyph presentations, such as - Emoji vs Text presentation, of the preceding - character(s). + Characters in the range U+FE00..U+FE0F and + U+E0100..U+E01EF, used for selecting alternate glyph + presentations, such as Emoji vs Text presentation, of + the preceding character(s). `no-font': For GUI frames, characters for which no suitable font is found; for text-mode frames, characters that cannot be encoded by `terminal-coding-system'. -- 2.30.2 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-11 14:58 ` Axel Svensson @ 2022-08-11 16:19 ` Eli Zaretskii 2022-08-12 3:33 ` Axel Svensson 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-11 16:19 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Thu, 11 Aug 2022 16:58:37 +0200 > Cc: 57072@debbugs.gnu.org > > > I think "VS01" is better. > Fixed to be "VS01" through "VS09", "VS10" through "VS99" and "VS 100" > through "VS 256". > > > This hunk is a mistake, I think > Good catch, fixed. > > See new patch attached. Thanks, installed. Please in the future accompany the changes with a ChangeLog-style log message describing the specific changes. This changeset was small enough to be accepted without your assigning copyright to the FSF, but if you'd like to continue contributing to Emacs, we'd need your legal paperwork vis-a-vis the FSF copyright clerk. Would you like to start the paperwork rolling at this time? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-11 16:19 ` Eli Zaretskii @ 2022-08-12 3:33 ` Axel Svensson 2022-08-12 5:53 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-12 3:33 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 [-- Attachment #1: Type: text/plain, Size: 369 bytes --] > admin/unidata/blocks.awk is an exception; it seems to deal with only VS 1-16, but I have not fixed it. How do we handle this one, should I file a new bug? I can't produce any unexpected behavior, I just think it looks odd, and I do not intend to fix it myself. > Thanks, installed. Great! > Would you like to start the paperwork rolling at this time? No thank you. [-- Attachment #2: Type: text/html, Size: 518 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 3:33 ` Axel Svensson @ 2022-08-12 5:53 ` Eli Zaretskii 2022-08-12 6:50 ` Axel Svensson 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-12 5:53 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Fri, 12 Aug 2022 05:33:59 +0200 > Cc: 57072@debbugs.gnu.org > > > admin/unidata/blocks.awk is an exception; it seems to deal with only VS 1-16, but I have not fixed it. > How do we handle this one, should I file a new bug? I can't produce any unexpected behavior, I just think it > looks odd, and I do not intend to fix it myself. What does Unicode say about the functionality of the variation selectors beyond VS-16? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 5:53 ` Eli Zaretskii @ 2022-08-12 6:50 ` Axel Svensson 2022-08-12 7:10 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-12 6:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 > What does Unicode say about the functionality of the variation > selectors beyond VS-16? The code charts divide them into three groups: - VS1 through VS14 are "Variation selectors" [1] - VS15 through VS16 are "Emoji-specific variation selectors" [1] - VS17 through VS256 are "Ideographic-specific variation selectors" [2] The standard itself in chapter 23.4 [3] makes no distinction between them but say that the only sanctioned uses that should have any effect, are the ones defined in: - StandardizedVariants.txt [4] in the Unicode Character Database, which currently uses only VS1 through VS3. Confusingly though, some of them seem to be used for ideographic purposes. - Unicode Technical Standard #51 for emojis [5], which says that VS15 is "used to request a text presentation for an emoji character" while VS16 is "used to request an emoji presentation for an emoji character". - Unicode Technical Standard #37 for ideographic variation [6], which confirms that it only uses VS17 through VS256. In any case, it seems that admin/unidata/blocks.awk needs fixing, since it currently handles only VS1 through VS16 and does so as if they were all for emoji use. [1] https://www.unicode.org/charts/PDF/UFE00.pdf [2] https://www.unicode.org/charts/PDF/UE0100.pdf [3] https://www.unicode.org/versions/Unicode14.0.0/ch23.pdf [4] https://www.unicode.org/Public/14.0.0/ucd/StandardizedVariants.txt [5] https://www.unicode.org/reports/tr51/#Emoji_Variation_Sequences [6] https://www.unicode.org/reports/tr37/ ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 6:50 ` Axel Svensson @ 2022-08-12 7:10 ` Eli Zaretskii 2022-08-12 7:57 ` Axel Svensson 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-12 7:10 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Fri, 12 Aug 2022 08:50:18 +0200 > Cc: 57072@debbugs.gnu.org > > > What does Unicode say about the functionality of the variation > > selectors beyond VS-16? > > The code charts divide them into three groups: > - VS1 through VS14 are "Variation selectors" [1] > - VS15 through VS16 are "Emoji-specific variation selectors" [1] > - VS17 through VS256 are "Ideographic-specific variation selectors" [2] > > The standard itself in chapter 23.4 [3] makes no distinction between > them but say that the only sanctioned uses that should have any effect, > are the ones defined in: > - StandardizedVariants.txt [4] in the Unicode Character Database, which > currently uses only VS1 through VS3. Confusingly though, some of them > seem to be used for ideographic purposes. > - Unicode Technical Standard #51 for emojis [5], which says that VS15 is > "used to request a text presentation for an emoji character" while > VS16 is "used to request an emoji presentation for an emoji > character". > - Unicode Technical Standard #37 for ideographic variation [6], which > confirms that it only uses VS17 through VS256. > > In any case, it seems that admin/unidata/blocks.awk needs fixing, since > it currently handles only VS1 through VS16 and does so as if they were > all for emoji use. AFAIR, blocks.awk does what it does only because VS16 has a special function of requesting the Emoji presentation of characters that are otherwise not Emoji, and our character-composition code needs to realize that. Unless the selectors beyond VS16 have similar functions, I don't see any reason why we'd need to modify blocks.awk. Or what am I missing? IOW, to which part(s) of blocks.awk did you allude when you wrote "it currently handles only VS1 through VS16 and does so as if they were all for emoji use"? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 7:10 ` Eli Zaretskii @ 2022-08-12 7:57 ` Axel Svensson 2022-08-12 10:29 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-12 7:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 > Or what am I missing? IOW, to which part(s) of blocks.awk did you > allude when you wrote "it currently handles only VS1 through VS16 and > does so as if they were all for emoji use"? I initially thought it was a mistake to exclude VS17 through VS256, but now I believe it might be a mistake to include VS1 through VS14. I don't understand the internals enough to be sure, but one possible fix could be: diff --git a/admin/unidata/blocks.awk b/admin/unidata/blocks.awk index 5f392b5ad3..c14fa09863 100755 --- a/admin/unidata/blocks.awk +++ b/admin/unidata/blocks.awk @@ -226,7 +226,7 @@ END { idx = 0 # ## These are here so that font_range can choose Emoji presentation # ## for the preceding codepoint when it encounters a VS - override_start[idx] = "FE00" + override_start[idx] = "FE0E" override_end[idx] = "FE0F" for (k in override_start) -- ^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 7:57 ` Axel Svensson @ 2022-08-12 10:29 ` Eli Zaretskii 2022-08-12 11:51 ` Axel Svensson 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-12 10:29 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Fri, 12 Aug 2022 09:57:32 +0200 > Cc: 57072@debbugs.gnu.org > > I initially thought it was a mistake to exclude VS17 through VS256, but > now I believe it might be a mistake to include VS1 through VS14. I don't > understand the internals enough to be sure, but one possible fix could > be: Why do you think including VS1 through VS14 is a mistake? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 10:29 ` Eli Zaretskii @ 2022-08-12 11:51 ` Axel Svensson 2022-08-12 12:46 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-12 11:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072 > Why do you think including VS1 through VS14 is a mistake? It appears like blocks.awk somehow designates VS1 through VS14 for emoji use, while the Unicode standard per [1] and [5] above seem to exclude them from emoji use. I am not sure whether VS1 through VS14, or VS17 through VS256 need to be designated to some other script by blocks.awk. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 11:51 ` Axel Svensson @ 2022-08-12 12:46 ` Eli Zaretskii 2022-08-16 8:05 ` Robert Pluim 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-12 12:46 UTC (permalink / raw) To: Axel Svensson, Robert Pluim; +Cc: 57072 > From: Axel Svensson <svenssonaxel@gmail.com> > Date: Fri, 12 Aug 2022 13:51:21 +0200 > Cc: 57072@debbugs.gnu.org > > > Why do you think including VS1 through VS14 is a mistake? > It appears like blocks.awk somehow designates VS1 through VS14 for > emoji use, while the Unicode standard per [1] and [5] above seem to > exclude them from emoji use. I am not sure whether VS1 through VS14, or > VS17 through VS256 need to be designated to some other script by > blocks.awk. So you are saying that we should exclude VS1 through VS14 from the Emoji script? Robert, do you remember why we included them in the script? As for VS17 and above, I'm not sure we should assign them to any script. Perhaps to Han? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-12 12:46 ` Eli Zaretskii @ 2022-08-16 8:05 ` Robert Pluim 2022-08-16 13:06 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Robert Pluim @ 2022-08-16 8:05 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072, Axel Svensson >>>>> On Fri, 12 Aug 2022 15:46:58 +0300, Eli Zaretskii <eliz@gnu.org> said: >> From: Axel Svensson <svenssonaxel@gmail.com> >> Date: Fri, 12 Aug 2022 13:51:21 +0200 >> Cc: 57072@debbugs.gnu.org >> >> > Why do you think including VS1 through VS14 is a mistake? >> It appears like blocks.awk somehow designates VS1 through VS14 for >> emoji use, while the Unicode standard per [1] and [5] above seem to >> exclude them from emoji use. I am not sure whether VS1 through VS14, or >> VS17 through VS256 need to be designated to some other script by >> blocks.awk. Eli> So you are saying that we should exclude VS1 through VS14 from the Eli> Emoji script? Eli> Robert, do you remember why we included them in the script? Hmm. Ignorance on my part seems the most likely explanation. VS1-14 are not used for emoji/text presentation selection, so we should probably just fix blocks.awk Eli> As for VS17 and above, I'm not sure we should assign them to any Eli> script. Perhaps to Han? What problems are caused by them not having a script? The composition rules for them with Han codepoints work now, no? Robert -- ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-16 8:05 ` Robert Pluim @ 2022-08-16 13:06 ` Eli Zaretskii 2022-08-16 13:27 ` Robert Pluim 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-08-16 13:06 UTC (permalink / raw) To: Robert Pluim; +Cc: 57072, svenssonaxel > From: Robert Pluim <rpluim@gmail.com> > Cc: Axel Svensson <svenssonaxel@gmail.com>, 57072@debbugs.gnu.org > Date: Tue, 16 Aug 2022 10:05:12 +0200 > > Eli> Robert, do you remember why we included them in the script? > > Hmm. Ignorance on my part seems the most likely explanation. VS1-14 > are not used for emoji/text presentation selection, so we should > probably just fix blocks.awk According to the comment in composite.el, we should leave only VS-16 in the 'emoji' script. > Eli> As for VS17 and above, I'm not sure we should assign them to any > Eli> script. Perhaps to Han? > > What problems are caused by them not having a script? The composition > rules for them with Han codepoints work now, no? Yes, because they are set up in composite.el. So I think we are good there. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-16 13:06 ` Eli Zaretskii @ 2022-08-16 13:27 ` Robert Pluim 2022-08-16 13:39 ` Axel Svensson 0 siblings, 1 reply; 26+ messages in thread From: Robert Pluim @ 2022-08-16 13:27 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 57072, svenssonaxel >>>>> On Tue, 16 Aug 2022 16:06:12 +0300, Eli Zaretskii <eliz@gnu.org> said: >> From: Robert Pluim <rpluim@gmail.com> >> Cc: Axel Svensson <svenssonaxel@gmail.com>, 57072@debbugs.gnu.org >> Date: Tue, 16 Aug 2022 10:05:12 +0200 >> Eli> Robert, do you remember why we included them in the script? >> >> Hmm. Ignorance on my part seems the most likely explanation. VS1-14 >> are not used for emoji/text presentation selection, so we should >> probably just fix blocks.awk Eli> According to the comment in composite.el, we should leave only VS-16 Eli> in the 'emoji' script. Yes. Thank you past-me for reminding present-us (Iʼd forgotten Iʼd written that 😀) I can do that later this week, unless the reporter of this bug wants to handle it? Robert -- ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-16 13:27 ` Robert Pluim @ 2022-08-16 13:39 ` Axel Svensson 2022-08-16 14:48 ` Robert Pluim 0 siblings, 1 reply; 26+ messages in thread From: Axel Svensson @ 2022-08-16 13:39 UTC (permalink / raw) To: Robert Pluim; +Cc: 57072, Eli Zaretskii > I can do that later this week, unless the reporter of this bug wants > to handle it? Nope, go ahead. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-16 13:39 ` Axel Svensson @ 2022-08-16 14:48 ` Robert Pluim 2022-08-16 16:27 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Robert Pluim @ 2022-08-16 14:48 UTC (permalink / raw) To: Axel Svensson; +Cc: 57072, Eli Zaretskii >>>>> On Tue, 16 Aug 2022 15:39:49 +0200, Axel Svensson <svenssonaxel@gmail.com> said: >> I can do that later this week, unless the reporter of this bug wants >> to handle it? Axel> Nope, go ahead. This doesnʼt have any negative effects on emoji display (using admin/unidata/emoji-{zwj-,}sequences.txt) that I can see. Iʼll test some more and push by the end of the week. diff --git a/admin/unidata/blocks.awk b/admin/unidata/blocks.awk index 5f392b5ad3..0d07a10f2a 100755 --- a/admin/unidata/blocks.awk +++ b/admin/unidata/blocks.awk @@ -224,9 +224,11 @@ FILENAME ~ "emoji-data.txt" && /^[0-9A-F].*; Emoji_Presentation / { END { idx = 0 - # ## These are here so that font_range can choose Emoji presentation - # ## for the preceding codepoint when it encounters a VS - override_start[idx] = "FE00" + ## This is here so that font_range can choose Emoji presentation + ## for the preceding codepoint when it encounters a VS-16 (U+FE0F). + ## It originally covered the whole FE00-FE0F range, but that + ## turned out to be a mistake. + override_start[idx] = "FE0F" override_end[idx] = "FE0F" for (k in override_start) ^ permalink raw reply related [flat|nested] 26+ messages in thread
* bug#57072: [BUG] update-glyphless-char-display and variation selectors 2022-08-16 14:48 ` Robert Pluim @ 2022-08-16 16:27 ` Eli Zaretskii 0 siblings, 0 replies; 26+ messages in thread From: Eli Zaretskii @ 2022-08-16 16:27 UTC (permalink / raw) To: Robert Pluim; +Cc: 57072, svenssonaxel > From: Robert Pluim <rpluim@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, 57072@debbugs.gnu.org > Date: Tue, 16 Aug 2022 16:48:07 +0200 > > --- a/admin/unidata/blocks.awk > +++ b/admin/unidata/blocks.awk > @@ -224,9 +224,11 @@ FILENAME ~ "emoji-data.txt" && /^[0-9A-F].*; Emoji_Presentation / { > > END { > idx = 0 > - # ## These are here so that font_range can choose Emoji presentation > - # ## for the preceding codepoint when it encounters a VS > - override_start[idx] = "FE00" > + ## This is here so that font_range can choose Emoji presentation > + ## for the preceding codepoint when it encounters a VS-16 (U+FE0F). > + ## It originally covered the whole FE00-FE0F range, but that > + ## turned out to be a mistake. > + override_start[idx] = "FE0F" > override_end[idx] = "FE0F" > > for (k in override_start) > That LGTM, thanks. But please mention in the comment the stuff in composite.el which handles the other variation selectors, so that these different places would be easier to find and inspect. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2022-08-16 16:27 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-09 8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson 2022-08-09 8:40 ` bug#57073: " Axel Svensson 2022-08-09 11:36 ` bug#57072: " Eli Zaretskii 2022-08-09 14:56 ` Axel Svensson 2022-08-09 16:23 ` Eli Zaretskii 2022-08-09 20:33 ` Axel Svensson 2022-08-10 13:10 ` Eli Zaretskii 2022-08-16 11:55 ` Robert Pluim 2022-08-16 12:01 ` Eli Zaretskii 2022-08-11 14:01 ` Eli Zaretskii 2022-08-11 14:58 ` Axel Svensson 2022-08-11 16:19 ` Eli Zaretskii 2022-08-12 3:33 ` Axel Svensson 2022-08-12 5:53 ` Eli Zaretskii 2022-08-12 6:50 ` Axel Svensson 2022-08-12 7:10 ` Eli Zaretskii 2022-08-12 7:57 ` Axel Svensson 2022-08-12 10:29 ` Eli Zaretskii 2022-08-12 11:51 ` Axel Svensson 2022-08-12 12:46 ` Eli Zaretskii 2022-08-16 8:05 ` Robert Pluim 2022-08-16 13:06 ` Eli Zaretskii 2022-08-16 13:27 ` Robert Pluim 2022-08-16 13:39 ` Axel Svensson 2022-08-16 14:48 ` Robert Pluim 2022-08-16 16:27 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).