unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
@ 2021-11-23 23:01 PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors
       [not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
  2021-11-24  4:58 ` bug#52067: possible fix for " Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 6+ messages in thread
From: PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-23 23:01 UTC (permalink / raw)
  To: 52067


In a clean instance of emacs paste the following lines
into the scratch buffer and evaluate the lines starting
with string-glyph-split.  I show the outputs below:

(string-glyph-split "🌍🦹")
("🌍" "🦹")

(string-glyph-split "✈️✈️")
("✈️" "✈️")

(string-glyph-split "🌍✈️")
("🌍" "✈️")

(string-glyph-split "✈️🌍")

The last line will halt emacs; C-g can stop the evaluation.
The expected behavior was to split the string in two glyphs.




In GNU Emacs 29.0.50 (build 2, aarch64-apple-darwin21.1.0, NS appkit-2113.00 Version 12.0.1 (Build 21A559))
of 2021-11-21 built on MacbookPro13.local
Repository revision: b7db7eb2c7b8ac1bddf4afa9ccf9b30ebeb0224e
Repository branch: master
Windowing system distributor 'Apple', version 10.3.2113
System Description:  macOS 12.0.1

Configured using:
'configure --disable-silent-rules
--enable-locallisppath=/usr/local/share/emacs/28.0.50/site-lisp
--prefix=/usr/local/opt/gccemacs --without-dbus --without-imagemagick
--with-mailutils --with-ns --disable-ns-self-contained --with-cairo
--with-modules --with-xml2 --with-gnutls --with-json --with-rsvg
--with-native-compilation CC=/usr/bin/clang
CFLAGS=-I/opt/homebrew/lib/gcc/11/include
'LDFLAGS=-L/opt/homebrew/lib/gcc/11/
-I/opt/homebrew/lib/gcc/11/include'
CPPFLAGS=-I/opt/homebrew/opt/libffi/include
'PKG_CONFIG_PATH=/opt/homebrew/opt/libffi/lib/pkgconfig --no-create
--no-recursion''

Configured features:
ACL GLIB GNUTLS JSON LCMS2 LIBXML2 MODULES NATIVE_COMP NOTIFY KQUEUE NS
PDUMPER PNG RSVG THREADS TOOLKIT_SCROLL_BARS WEBP XIM ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media rmc puny
dired dired-loaddefs rfc822 mml mml-sec epa derived epg rfc6068
epg-config gnus-util rmail rmail-loaddefs auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json map
text-property-search seq gv byte-opt bytecomp byte-compile cconv
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils time-date subr-x help-fns radix-tree cl-print debug backtrace
find-func help-mode cl-loaddefs cl-lib iso-transl tooltip eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/ns-win ns-win ucs-normalize mule-util term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget keymap hashtable-print-readable backquote threads
kqueue cocoa ns lcms2 multi-tty make-network-process native-compile
emacs)

Memory information:
((conses 16 76126 7924)
(symbols 48 7053 0)
(strings 32 21398 1914)
(string-bytes 1 725508)
(vectors 16 15268)
(vector-slots 8 317989 13284)
(floats 8 26 61)
(intervals 56 341 0)
(buffers 992 13))





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on certain emoji strings)
       [not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
@ 2021-11-24  3:51   ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-11-24  7:30     ` bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings Lars Ingebrigtsen
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-24  3:51 UTC (permalink / raw)
  To: 52067


The logic in string-glyph-split expects the first two elements in the result
from find-composition-internal to give the start and end of a multibyte grapheme
and return nil when there is a regular character at position POS.  However, this 
isn't always the case.

Let's call x the argument POS in find-composition-internal, 
and "interval" the first two elements of the return value.

The following example works as expected, i.e. x of 0, or 1 returns the interval (0 2), 
and x of 2, or 3 returns (2 4).

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️✈️" nil))) '(0 1 2 3 4))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (2 4
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3
  (2 4
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (4 nil))
nil


In the following case, however, x of 2 returns interval (0 2).

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️🌍" nil))) '(0 1 2 3))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3 nil))
nil


Interestingly, in the following case, an x of 0, 1, 2, or 3 all return (0 2).

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️🌍🌍" nil))) '(0 1 2 3 4))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (4 nil))
nil


And in the following case a POS of 3 returns (3 5)

(null
 (pp
  (mapcar '(lambda (x) (list x (find-composition-internal x nil "✈️🌍✈️" nil))) '(0 1 2 3 4 5))))
((0
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (1
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (2
  (0 2
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (3
  (3 5
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (4
  (3 5
     [[#<font-object "-*-Apple Color Emoji-medium-normal-normal-*-19-*-*-*-m-0-iso10646-1"> 9992 65039]
      296
      [0 1 9992 233 23 0 23 18 4 nil]]))
 (5 nil))
nil


> On Nov 23, 2021, at 6:02 PM, GNU bug Tracking System <help-debbugs@gnu.org> wrote:
> 
> Thank you for filing a new bug report with debbugs.gnu.org.
> 
> This is an automatically generated reply to let you know your message
> has been received.
> 
> Your message is being forwarded to the package maintainers and other
> interested parties for their attention; they will reply in due course.
> 
> Your message has been sent to the package maintainer(s):
> bug-gnu-emacs@gnu.org
> 
> If you wish to submit further information on this problem, please
> send it to 52067@debbugs.gnu.org.
> 
> Please do not send mail to help-debbugs@gnu.org unless you wish
> to report a problem with the Bug-tracking system.
> 
> -- 
> 52067: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=52067
> GNU Bug Tracking System
> Contact help-debbugs@gnu.org with problems






^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52067: possible fix for string-glyph-split halts on certain emoji strings.
  2021-11-23 23:01 bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors
       [not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
@ 2021-11-24  4:58 ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 6+ messages in thread
From: Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-24  4:58 UTC (permalink / raw)
  To: 52067

The following code fixes this bug, though there might be better ways to fix it for someone who understands the domain.
I don't know much about glyph/grapheme representations, so although this code passes my limited tests, it may break other things.

(defun pm-string-glyph-split (string)
  "Split STRING into a list of strings representing separate glyphs.
This takes into account combining characters and grapheme clusters."
  (let ((result nil)
        (start 0)
	(laststart -1) ;; the last start of a character with the composition property
        comp)
    (while (< start (length string))
      (setq comp (find-composition-internal start nil string nil))
      (if (and comp (/= laststart (car comp)))  ;; check that we don't return to same start
          (progn
            (push (substring string (car comp) (cadr comp)) result)
	    (setq laststart start)  ;; keep the start of the last successful search.
            (setq start (cadr comp)))
        (push (substring string start (1+ start)) result)
        (setq start (1+ start))))
    (nreverse result)))


Compare to the original:

(defun string-glyph-split (string)
  "Split STRING into a list of strings representing separate glyphs.
This takes into account combining characters and grapheme clusters."
  (let ((result nil)
        (start 0)
        comp)
    (while (< start (length string))
      (if (setq comp (find-composition-internal start nil string nil))
          (progn
            (push (substring string (car comp) (cadr comp)) result)
            (setq start (cadr comp)))
        (push (substring string start (1+ start)) result)
        (setq start (1+ start))))
    (nreverse result)))







^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
  2021-11-24  3:51   ` bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on certain emoji strings) Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-11-24  7:30     ` Lars Ingebrigtsen
  2021-11-24 15:15       ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 6+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-24  7:30 UTC (permalink / raw)
  To: Paul Maragakis; +Cc: 52067

Paul Maragakis <paul.maragakis@icloud.com> writes:

> The logic in string-glyph-split expects the first two elements in the result
> from find-composition-internal to give the start and end of a multibyte grapheme
> and return nil when there is a regular character at position POS.  However, this 
> isn't always the case.

Yup.  

Paul Maragakis <paul.maragakis@icloud.com> writes:

> The following code fixes this bug, though there might be better ways
> to fix it for someone who understands the domain.

Thanks.  `find-composition' takes a the LIMIT parameter, and that'll
make it avoid searching back into the bit of the string that we've
already handled.  So I did that instead in Emacs 29.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
  2021-11-24  7:30     ` bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings Lars Ingebrigtsen
@ 2021-11-24 15:15       ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-11-24 16:14         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 6+ messages in thread
From: Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-11-24 15:15 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 52067

Excellent---and thanks for the explanation!  
I confirm that the latest Emacs 29 fixes the bug.  
You can close this ticket.

Paul

> On Nov 24, 2021, at 2:30 AM, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> 
> Paul Maragakis <paul.maragakis@icloud.com> writes:
> 
>> The logic in string-glyph-split expects the first two elements in the result
>> from find-composition-internal to give the start and end of a multibyte grapheme
>> and return nil when there is a regular character at position POS.  However, this 
>> isn't always the case.
> 
> Yup.  
> 
> Paul Maragakis <paul.maragakis@icloud.com> writes:
> 
>> The following code fixes this bug, though there might be better ways
>> to fix it for someone who understands the domain.
> 
> Thanks.  `find-composition' takes a the LIMIT parameter, and that'll
> make it avoid searching back into the bit of the string that we've
> already handled.  So I did that instead in Emacs 29.
> 
> -- 
> (domestic pets only, the antidote for overdose, milk.)
>   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings
  2021-11-24 15:15       ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-11-24 16:14         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Ingebrigtsen @ 2021-11-24 16:14 UTC (permalink / raw)
  To: Paul Maragakis; +Cc: 52067

Paul Maragakis <paul.maragakis@icloud.com> writes:

> Excellent---and thanks for the explanation!  
> I confirm that the latest Emacs 29 fixes the bug.  
> You can close this ticket.

Thanks for checking; closed now.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-24 16:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 23:01 bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings PAVLOS MARAGAKIS via Bug reports for GNU Emacs, the Swiss army knife of text editors
     [not found] ` <handler.52067.B.16377084873203.ack@debbugs.gnu.org>
2021-11-24  3:51   ` bug#52067: Acknowledgement (29.0.50; string-glyph-split halts on certain emoji strings) Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-11-24  7:30     ` bug#52067: 29.0.50; string-glyph-split halts on certain emoji strings Lars Ingebrigtsen
2021-11-24 15:15       ` Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-11-24 16:14         ` Lars Ingebrigtsen
2021-11-24  4:58 ` bug#52067: possible fix for " Paul Maragakis via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).