unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Axel Svensson <svenssonaxel@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 57072@debbugs.gnu.org
Subject: bug#57072: [BUG] update-glyphless-char-display and variation selectors
Date: Tue, 9 Aug 2022 16:56:37 +0200	[thread overview]
Message-ID: <CAJ40yazh2uaaZPK=beRtMx8FLpZ03doLa7NoyvQiOVQDKL80og@mail.gmail.com> (raw)
In-Reply-To: <83edxpvec4.fsf@gnu.org>


[-- Attachment #1.1: Type: text/plain, Size: 1763 bytes --]

> You are suggesting an enhancement (which is fine).
Acknowledged.

See new patch attached.

It turns out there are 256 variation selectors, so I've included some fixes
for selectors 17-256 as well.
admin/unidata/blocks.awk is an exception; it seems to deal with only VS
1-16, but I have not fixed it.

> Why are the acronyms you propose so long?  Why not use "VS01".."VS16"
You're right, that is better. The attached patch is fixed to have shorter
acronyms.
The acronyms I've chosen are "VS-1" through "VS-9", "VS10" through "VS99"
and "VS-100" through "VS-256".
Not sure that's optimal, perhaps "VS01" or "VS 1" is better, what do you
think?

> Please show a recipe for that starting from "emacs -Q".

To reproduce:
1) Start emacs -Q under X11.
2) Evaluate:

(progn
  (let ((vs-acronyms
         '("VS01" "VS02" "VS03" "VS04"
           "VS05" "VS06" "VS07" "VS08"
           "VS09" "VS10" "VS11" "VS12"
           "VS13" "VS14" "VS15" "VS16")))
    (dotimes (i 16)
      (aset char-acronym-table (+ #xfe00 i) (car vs-acronyms))
      (setq vs-acronyms (cdr vs-acronyms))))
  (update-glyphless-char-display
   'glyphless-char-display-control
   '((format-control . acronym)
     (variation-selectors . acronym)
     (no-font . hex-code)))
  (insert #xfe00 #xfe01 #xfe0e #xfe0f))

Expected:
Four boxes are shown, all of which contain "VS" in the upper half, and in
the lower half "01", "02", "15" and "16" respectively.

Actual:
The three first boxes appear as expected, but the fourth is empty.

Througout the codebase, I see U+FE0F sometimes singled out and treated
differently than the other variation selectors, so this isn't entirely
strange.
in places including:
- admin/unidata/emoji-data.txt:778
- admin/unidata/emoji-zwj.awk:102
- lisp/composite.el:856

[-- Attachment #1.2: Type: text/html, Size: 2620 bytes --]

[-- Attachment #2: 0001-Fixes-for-variation-selectors.patch --]
[-- Type: text/x-patch, Size: 3531 bytes --]

From a4ec9eae3de84bf9501c0d3f97ccade600716634 Mon Sep 17 00:00:00 2001
From: Axel Svensson <mail@axelsvensson.com>
Date: Tue, 9 Aug 2022 01:11:02 +0200
Subject: [PATCH] Fixes for variation selectors

---
 doc/lispref/display.texi         |  6 +++---
 lisp/international/characters.el | 24 ++++++++++++++++++------
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/doc/lispref/display.texi b/doc/lispref/display.texi
index ace67fbedb..96079dc106 100644
--- a/doc/lispref/display.texi
+++ b/doc/lispref/display.texi
@@ -8596,9 +8596,9 @@ Glyphless Chars
 images, such as U+00AD @sc{soft hyphen}.
 
 @item variation-selectors
-Unicode VS-1 through VS-16 (U+FE00 through U+FE0F), which are used to
-select between different glyphs for the same codepoints (typically
-emojis).
+Unicode VS-1 through VS-256 (U+FE00 through U+FE0F and U+E0100 through
+U+E01EF), which are used to select between different glyphs for the same
+codepoints (typically emojis).
 
 @item no-font
 Characters for which there is no suitable font, or which cannot be
diff --git a/lisp/international/characters.el b/lisp/international/characters.el
index ca28222c81..78f8447208 100644
--- a/lisp/international/characters.el
+++ b/lisp/international/characters.el
@@ -1243,7 +1243,8 @@ ?L
 	   (#x1E026 . #x1E02A)
 	   (#x1E8D0 . #x1E8D6)
 	   (#x1E944 . #x1E94A)
-	   (#xE0001 . #xE01EF))))
+	   (#xE0001 . #xE01EF)
+	   (#xE0100 . #xE01EF))))
   (dolist (elt l)
     (set-char-table-range char-width-table elt 0)))
 
@@ -1525,6 +1526,15 @@ char-acronym-table
   (aset char-acronym-table (+ #xE0021 i) (format " %c TAG" (+ 33 i))))
 (aset char-acronym-table #xE007F "->|TAG") ; CANCEL TAG
 
+(dotimes (i 256)
+  (let* ((vs-number (1+ i))
+         (codepoint (if (< i 16)
+                        (+ #xfe00 i)
+                      (+ #xe0100 i -16)))
+         (dash (if (<= 10 vs-number 99) "" "-")))
+    (aset char-acronym-table codepoint
+          (format "VS%s%s" dash vs-number))))
+
 ;; We can't use the \N{name} things here, because this file is used
 ;; too early in the build process.
 (defvar bidi-control-characters
@@ -1574,7 +1584,9 @@ update-glyphless-char-display
 					     #x80 #x9F method))
 	    ((eq target 'variation-selectors)
 	     (glyphless-set-char-table-range glyphless-char-display
-					     #xFE00 #xFE0F method))
+					     #xFE00 #xFE0F method)
+             (glyphless-set-char-table-range glyphless-char-display
+					     #xE0100 #xE01EF method))
 	    ((or (eq target 'format-control)
                  (eq target 'bidi-control))
 	     (when unicode-category-table
@@ -1647,10 +1659,10 @@ glyphless-char-display-control
                     that are relevant for bidirectional formatting control,
                     like U+2069 (PDI) and U+202B (RLE).
   `variation-selectors':
-                    Characters in the range U+FE00..U+FE0F, used for
-                    selecting alternate glyph presentations, such as
-                    Emoji vs Text presentation, of the preceding
-                    character(s).
+                    Characters in the range U+FE00..U+FE0F and
+                    U+E0100..U+E01EF, used for selecting alternate glyph
+                    presentations, such as Emoji vs Text presentation, of
+                    the preceding character(s).
   `no-font':        For GUI frames, characters for which no suitable
                     font is found; for text-mode frames, characters
                     that cannot be encoded by `terminal-coding-system'.
-- 
2.30.2


  reply	other threads:[~2022-08-09 14:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09  8:38 bug#57072: [BUG] update-glyphless-char-display and variation selectors Axel Svensson
2022-08-09  8:40 ` bug#57073: " Axel Svensson
2022-08-09 11:36 ` bug#57072: " Eli Zaretskii
2022-08-09 14:56   ` Axel Svensson [this message]
2022-08-09 16:23     ` Eli Zaretskii
2022-08-09 20:33       ` Axel Svensson
2022-08-10 13:10         ` Eli Zaretskii
2022-08-16 11:55           ` Robert Pluim
2022-08-16 12:01             ` Eli Zaretskii
2022-08-11 14:01     ` Eli Zaretskii
2022-08-11 14:58       ` Axel Svensson
2022-08-11 16:19         ` Eli Zaretskii
2022-08-12  3:33           ` Axel Svensson
2022-08-12  5:53             ` Eli Zaretskii
2022-08-12  6:50               ` Axel Svensson
2022-08-12  7:10                 ` Eli Zaretskii
2022-08-12  7:57                   ` Axel Svensson
2022-08-12 10:29                     ` Eli Zaretskii
2022-08-12 11:51                       ` Axel Svensson
2022-08-12 12:46                         ` Eli Zaretskii
2022-08-16  8:05                           ` Robert Pluim
2022-08-16 13:06                             ` Eli Zaretskii
2022-08-16 13:27                               ` Robert Pluim
2022-08-16 13:39                                 ` Axel Svensson
2022-08-16 14:48                                   ` Robert Pluim
2022-08-16 16:27                                     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ40yazh2uaaZPK=beRtMx8FLpZ03doLa7NoyvQiOVQDKL80og@mail.gmail.com' \
    --to=svenssonaxel@gmail.com \
    --cc=57072@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).