* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
@ 2021-11-23 23:01 PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors
[not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
2021-11-24 4:58 ` bug#52067: possible fix for " Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 2 replies; 6+ messages in thread
From: PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-23 23:01 UTC (permalink / raw)
To: 52067
In a clean instance of emacs paste the following lines
into the scratch buffer and evaluate the lines starting
with string-glyph-split. I show the outputs below:
(string-glyph-split "🌍🦹")
("🌍" "🦹")
(string-glyph-split "✈️✈️")
("✈️" "✈️")
(string-glyph-split "🌍✈️")
("🌍" "✈️")
(string-glyph-split "✈️🌍")
The last line will halt emacs; C-g can stop the evaluation.
The expected behavior was to split the string in two glyphs.
In GNU Emacs 29.0.50 (build 2, aarch64-apple-darwin21.1.0, NS appkit-2113.00 Version 12.0.1 (Build 21A559))
of 2021-11-21 built on MacbookPro13.local
Repository revision: b7db7eb2c7b8ac1bddf4afa9ccf9b30ebeb0224e
Repository branch: master
Windowing system distributor 'Apple', version 10.3.2113
System Description: macOS 12.0.1
Configured using:
'configure --disable-silent-rules
--enable-locallisppath=/usr/local/share/emacs/28.0.50/site-lisp
--prefix=/usr/local/opt/gccemacs --without-dbus --without-imagemagick
--with-mailutils --with-ns --disable-ns-self-contained --with-cairo
--with-modules --with-xml2 --with-gnutls --with-json --with-rsvg
--with-native-compilation CC=/usr/bin/clang
CFLAGS=-I/opt/homebrew/lib/gcc/11/include
'LDFLAGS=-L/opt/homebrew/lib/gcc/11/
-I/opt/homebrew/lib/gcc/11/include'
CPPFLAGS=-I/opt/homebrew/opt/libffi/include
'PKG_CONFIG_PATH=/opt/homebrew/opt/libffi/lib/pkgconfig --no-create
--no-recursion''
Configured features:
ACL GLIB GNUTLS JSON LCMS2 LIBXML2 MODULES NATIVE_COMP NOTIFY KQUEUE NS
PDUMPER PNG RSVG THREADS TOOLKIT_SCROLL_BARS WEBP XIM ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message mailcap yank-media rmc puny
dired dired-loaddefs rfc822 mml mml-sec epa derived epg rfc6068
epg-config gnus-util rmail rmail-loaddefs auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map
text-property-search seq gv byte-opt bytecomp byte-compile cconv
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils time-date subr-x help-fns radix-tree cl-print debug backtrace
find-func help-mode cl-loaddefs cl-lib iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/ns-win ns-win ucs-normalize mule-util term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget keymap hashtable-print-readable backquote threads
kqueue cocoa ns lcms2 multi-tty make-network-process native-compile
emacs)
Memory information:
((conses 16 76126 7924)
(symbols 48 7053 0)
(strings 32 21398 1914)
(string-bytes 1 725508)
(vectors 16 15268)
(vector-slots 8 317989 13284)
(floats 8 26 61)
(intervals 56 341 0)
(buffers 992 13))
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on certain emoji strings)
[not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
@ 2021-11-24 3:51 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-11-24 7:30 ` bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings Lars Ingebrigtsen
0 siblings, 1 reply; 6+ messages in thread
From: Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-24 3:51 UTC (permalink / raw)
To: 52067
The logic in string-glyph-split expects the first two elements in the result
from find-composition-internal to give the start and end of a multibyte grapheme
and return nil when there is a regular character at position POS. However, this
isn't always the case.
Let's call x the argument POS in find-composition-internal,
and "interval" the first two elements of the return value.
The following example works as expected, i.e. x of 0, or 1 returns the interval (0 2),
and x of 2, or 3 returns (2 4).
(null
(pp
(mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️✈️" nil))) '(0 1 2 3 4))))
((0
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(1
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(2
(2 4
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(3
(2 4
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(4 nil))
nil
In the following case, however, x of 2 returns interval (0 2).
(null
(pp
(mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️🌍" nil))) '(0 1 2 3))))
((0
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(1
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(2
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(3 nil))
nil
Interestingly, in the following case, an x of 0, 1, 2, or 3 all return (0 2).
(null
(pp
(mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️🌍🌍" nil))) '(0 1 2 3 4))))
((0
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(1
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(2
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(3
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(4 nil))
nil
And in the following case a POS of 3 returns (3 5)
(null
(pp
(mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️🌍✈️" nil))) '(0 1 2 3 4 5))))
((0
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(1
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(2
(0 2
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(3
(3 5
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(4
(3 5
[[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
296
[0 1 9992 233 23 0 23 18 4 nil]]))
(5 nil))
nil
> On Nov 23, 2021, at 6:02 PM, GNU bug Tracking System <help-debbugs@gnu.org> wrote:
>
> Thank you for filing a new bug report with debbugs.gnu.org.
>
> This is an automatically generated reply to let you know your message
> has been received.
>
> Your message is being forwarded to the package maintainers and other
> interested parties for their attention; they will reply in due course.
>
> Your message has been sent to the package maintainer(s):
> bug-gnu-emacs@gnu.org
>
> If you wish to submit further information on this problem, please
> send it to 52067@debbugs.gnu.org.
>
> Please do not send mail to help-debbugs@gnu.org unless you wish
> to report a problem with the Bug-tracking system.
>
> --
> 52067: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=52067
> GNU Bug Tracking System
> Contact help-debbugs@gnu.org with problems
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52067: possible fix for string-glyph-split halts on certain emoji strings.
2021-11-23 23:01 bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors
[not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
@ 2021-11-24 4:58 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 6+ messages in thread
From: Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-24 4:58 UTC (permalink / raw)
To: 52067
The following code fixes this bug, though there might be better ways to fix it for someone who understands the domain.
I don't know much about glyph/grapheme representations, so although this code passes my limited tests, it may break other things.
(defun pm-string-glyph-split (string)
"Split STRING into a list of strings representing separate glyphs.
This takes into account combining characters and grapheme clusters."
(let ((result nil)
(start 0)
(laststart -1) ;; the last start of a character with the composition property
comp)
(while (< start (length string))
(setq comp (find-composition-internal start nil string nil))
(if (and comp (/= laststart (car comp))) ;; check that we don't return to same start
(progn
(push (substring string (car comp) (cadr comp)) result)
(setq laststart start) ;; keep the start of the last successful search.
(setq start (cadr comp)))
(push (substring string start (1+ start)) result)
(setq start (1+ start))))
(nreverse result)))
Compare to the original:
(defun string-glyph-split (string)
"Split STRING into a list of strings representing separate glyphs.
This takes into account combining characters and grapheme clusters."
(let ((result nil)
(start 0)
comp)
(while (< start (length string))
(if (setq comp (find-composition-internal start nil string nil))
(progn
(push (substring string (car comp) (cadr comp)) result)
(setq start (cadr comp)))
(push (substring string start (1+ start)) result)
(setq start (1+ start))))
(nreverse result)))
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
2021-11-24 3:51 ` bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on certain emoji strings) Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-11-24 7:30 ` Lars Ingebrigtsen
2021-11-24 15:15 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 6+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-24 7:30 UTC (permalink / raw)
To: Paul Maragakis; +Cc: 52067
Paul Maragakis <paul.maragakis@icloud.com> writes:
> The logic in string-glyph-split expects the first two elements in the result
> from find-composition-internal to give the start and end of a multibyte grapheme
> and return nil when there is a regular character at position POS. However, this
> isn't always the case.
Yup.
Paul Maragakis <paul.maragakis@icloud.com> writes:
> The following code fixes this bug, though there might be better ways
> to fix it for someone who understands the domain.
Thanks. `find-composition' takes a the LIMIT parameter, and that'll
make it avoid searching back into the bit of the string that we've
already handled. So I did that instead in Emacs 29.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
2021-11-24 7:30 ` bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings Lars Ingebrigtsen
@ 2021-11-24 15:15 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-11-24 16:14 ` Lars Ingebrigtsen
0 siblings, 1 reply; 6+ messages in thread
From: Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-24 15:15 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: 52067
Excellent---and thanks for the explanation!
I confirm that the latest Emacs 29 fixes the bug.
You can close this ticket.
Paul
> On Nov 24, 2021, at 2:30 AM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
> Paul Maragakis <paul.maragakis@icloud.com> writes:
>
>> The logic in string-glyph-split expects the first two elements in the result
>> from find-composition-internal to give the start and end of a multibyte grapheme
>> and return nil when there is a regular character at position POS. However, this
>> isn't always the case.
>
> Yup.
>
> Paul Maragakis <paul.maragakis@icloud.com> writes:
>
>> The following code fixes this bug, though there might be better ways
>> to fix it for someone who understands the domain.
>
> Thanks. `find-composition' takes a the LIMIT parameter, and that'll
> make it avoid searching back into the bit of the string that we've
> already handled. So I did that instead in Emacs 29.
>
> --
> (domestic pets only, the antidote for overdose, milk.)
> bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
2021-11-24 15:15 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-11-24 16:14 ` Lars Ingebrigtsen
0 siblings, 0 replies; 6+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-24 16:14 UTC (permalink / raw)
To: Paul Maragakis; +Cc: 52067
Paul Maragakis <paul.maragakis@icloud.com> writes:
> Excellent---and thanks for the explanation!
> I confirm that the latest Emacs 29 fixes the bug.
> You can close this ticket.
Thanks for checking; closed now.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-11-24 16:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-23 23:01 bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors
[not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
2021-11-24 3:51 ` bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on certain emoji strings) Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-11-24 7:30 ` bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings Lars Ingebrigtsen
2021-11-24 15:15 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-11-24 16:14 ` Lars Ingebrigtsen
2021-11-24 4:58 ` bug#52067: possible fix for " Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.