all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#49066: 26.3; Segmentation fault on specific utf8 string
@ 2021-06-16 21:07 Miguel V. S. Frasson
  2021-06-16 21:12 ` Lars Ingebrigtsen
  2021-06-16 21:22 ` bug#49066: file foo Miguel V. S. Frasson
  0 siblings, 2 replies; 18+ messages in thread
From: Miguel V. S. Frasson @ 2021-06-16 21:07 UTC (permalink / raw)
  To: 49066

Dear Emacs developers

I was editting a "comma-separated values" csv file for a geographic
map creation, tried simple edition commands that now I see that wer
irrelevant to bug reprodution. I managed to isolate the problem.

It seams that my version of emacs with gui is unable to display a
specific UTF8 line of a file possibly with mixing of text LTR and RTL
and crashes.

To help debug, I read /usr/share/emacs/26.3/etc/DEBUG, downloaded
Emacs sources from 2 places, builded to see if I can reproduce that.

I tried these versions:

* from Ubuntu package
  GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.13)
of 2019-12-24 -> emacs -Q foo -> always crash (I did it more tahn 20
times)
  same emacs, no gui -> emacs -nw -Q foo -> no crash

* git GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu) of 2021-06-16
without toolkits and images --> no crash
(1h30 of compilation time discoraged me to try to recompile)

* 26.3 compiled from source download from http://ftpmirror.gnu.org/emacs/
 - without toolkits -> no crash
 - with gtk3 -> no crash

So I got stuck with my usual emacs without debug symbols and gtk ...

How to reproduce:

1) Since just displaying the line crashes my Emacs I like to avoid
display it below. So please download the 641 bytes file "foo" from

wget https://sites.icmc.usp.br/frasson/foo

Its content is just 1 line of UTF8 text with the name of Saint Pierre
and Miquelon Islands in several languages.

You can obtain it also decoding the following base64 output with "base64 -d":

UTM0NjE3LNiz2KfZhiDYqNmK2YrYsSDZiNmF2YrZg9mE2YjZhizgprjgpr7gpoEg4Kaq4Ka/4Kav
4Ka84KeH4KawIOCmkyDgpq7gpr/gppXigIzgprLgp4vgpoEsU2FpbnQtUGllcnJlIHVuZCBNaXF1
ZWxvbixTYWludCBQaWVycmUgYW5kIE1pcXVlbG9uLFNhbiBQZWRybyB5IE1pcXVlbMOzbixTYWlu
dC1QaWVycmUtZXQtTWlxdWVsb24szqPOsc65zr0gzqDOuc61z4EgzrrOsc65IM6czrnOus61zrvP
jM69LOCkuOCkvuCkgS3gpKrgpY3gpK/gpYfgpLAg4KSU4KSwIOCkruClgOCkleClh+CksuCli+Ck
gixTYWludC1QaWVycmUgw6lzIE1pcXVlbG9uLFNhaW50IFBpZXJyZSBkYW4gTWlxdWVsb24sU2Fp
bnQtUGllcnJlIGUgTWlxdWVsb24s44K144Oz44OU44Ko44O844Or5bO244O744Of44Kv44Ot44Oz
5bO2LOyDne2UvOyXkOultCDrr7jtgbTrobEsU2FpbnQtUGllcnJlIGVuIE1pcXVlbG9uLFNhaW50
LVBpZXJyZSBpIE1pcXVlbG9uLFNhaW50LVBpZXJyZSBlIE1pcXVlbG9uLNCh0LXQvS3Qn9GM0LXR
gCDQuCDQnNC40LrQtdC70L7QvSxTYWludC1QaWVycmUgb2NoIE1pcXVlbG9uLFNhaW50IFBpZXJy
ZSB2ZSBNaXF1ZWxvbixTYWludC1QaWVycmUgdsOgIE1pcXVlbG9uLOWco+earuWfg+WwlOWSjOWv
huWFi+mahue+pOWymwo=

2) emacs -nw -Q foo

Ok, exit Emacs, no crash.

3) emacs -Q foo

Emacs crashes :-X

4) I see that with "emacs -nw -Q foo", if I delete the initial Q (or
maybe a character that resembles Q), text direction changes abruptly,
display/navigation gets crasy, just navigating with left and right
arrow keys, we jump from first line to last, some up and down keys
jumps a lot.  This happens even with trunk git emacs that I compiled.

If you like to see this, I recorded a screencast (2.63Mb):
wget https://sites.icmc.usp.br/frasson/emacs-navigation.mp4

From command line I get the following output:

Fatal error 11: Segmentation fault
Backtrace:
emacs[0x51ab42]
emacs[0x500211]
emacs[0x518f14]
emacs[0x51914d]
emacs[0x5191cd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f7fca29b3c0]
emacs[0x5ebe9b]
emacs[0x5ef70d]
emacs[0x58a752]
emacs[0x57913c]
emacs[0x5b8174]
emacs[0x57bb61]
emacs[0x5790bb]
emacs[0x5783fa]
emacs[0x4369ac]
emacs[0x443276]
emacs[0x5d9aa8]
emacs[0x5ddbe0]
emacs[0x44f664]
emacs[0x44d695]
emacs[0x4556f8]
emacs[0x45a843]
emacs[0x46f0c3]
emacs[0x472183]
emacs[0x57829e]
emacs[0x43a016]
emacs[0x45e079]
emacs[0x50a447]
emacs[0x50dad0]
emacs[0x50f1e4]
emacs[0x578206]
emacs[0x5005d4]
emacs[0x578175]
emacs[0x500573]
emacs[0x5057b7]
emacs[0x505b18]
emacs[0x4206d2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f7fc9f870b3]
emacs[0x4213de]
Falha de segmentação

Best regards

Miguel


In GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.13)
 of 2019-12-24 built on lcy01-amd64-029
Windowing system distributor 'The X.Org Foundation', version 11.0.12009000
System Description:    Ubuntu 20.04.2 LTS

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
saida-raw50.csv has auto save data; consider M-x recover-this-file
Mark set
Type y, n, ! or SPC (the space bar):
Defining kbd macro...
Mark set [2 times]
Replaced 169 occurrences
Keyboard macro defined

Configured using:
 'configure --build=x86_64-linux-gnu --prefix=/usr
 '--includedir=${prefix}/include' '--mandir=${prefix}/share/man'
 '--infodir=${prefix}/share/info' --sysconfdir=/etc --localstatedir=/var
 --disable-silent-rules '--libdir=${prefix}/lib/x86_64-linux-gnu'
 '--libexecdir=${prefix}/lib/x86_64-linux-gnu' --disable-maintainer-mode
 --disable-dependency-tracking --prefix=/usr --sharedstatedir=/var/lib
 --program-suffix=26 --with-modules --with-file-notification=inotify
 --with-mailutils --with-x=yes --with-x-toolkit=gtk3 --with-xwidgets
 --with-lcms2 'CFLAGS=-g -O2
 -fdebug-prefix-map=/build/emacs26-XQGPla/emacs26-26.3~1.git96dd019=.
-fstack-protector-strong
 -Wformat -Werror=format-security -no-pie' 'CPPFLAGS=-Wdate-time
 -D_FORTIFY_SOURCE=2' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro
 -no-pie''

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS GLIB
NOTIFY LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS XWIDGETS
LIBSYSTEMD LCMS2

Important settings:
  value of $LANG: pt_BR.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822 mml
mml-sec password-cache epa derived epg epg-config gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils macros misearch multi-isearch kmacro
cl-extra help-mode easymenu cl-loaddefs cl-lib novice elec-pair
time-date mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
xwidget-internal move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 99690 8444)
 (symbols 48 20739 1)
 (miscs 40 284 240)
 (strings 32 29677 1323)
 (string-bytes 1 787981)
 (vectors 16 15049)
 (vector-slots 8 550898 10514)
 (floats 8 51 224)
 (intervals 56 261 0)
 (buffers 992 13))


-- 
Miguel Vinicius Santini Frasson
mvsfrasson@gmail.com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-16 21:07 bug#49066: 26.3; Segmentation fault on specific utf8 string Miguel V. S. Frasson
@ 2021-06-16 21:12 ` Lars Ingebrigtsen
  2021-06-17  6:43   ` Eli Zaretskii
  2021-06-16 21:22 ` bug#49066: file foo Miguel V. S. Frasson
  1 sibling, 1 reply; 18+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-16 21:12 UTC (permalink / raw)
  To: Miguel V. S. Frasson; +Cc: 49066

"Miguel V. S. Frasson" <mvsfrasson@gmail.com> writes:

> * git GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu) of 2021-06-16
> without toolkits and images --> no crash
> (1h30 of compilation time discoraged me to try to recompile)

I can reproduce the crash in Emacs 26.1, but not in Emacs 27.1, so I
guess this has been fixed in later versions of Emacs?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: file foo
  2021-06-16 21:07 bug#49066: 26.3; Segmentation fault on specific utf8 string Miguel V. S. Frasson
  2021-06-16 21:12 ` Lars Ingebrigtsen
@ 2021-06-16 21:22 ` Miguel V. S. Frasson
  1 sibling, 0 replies; 18+ messages in thread
From: Miguel V. S. Frasson @ 2021-06-16 21:22 UTC (permalink / raw)
  To: 49066

[-- Attachment #1: Type: text/plain, Size: 57 bytes --]

-- 
Miguel Vinicius Santini Frasson
mvsfrasson@gmail.com

[-- Attachment #2: foo --]
[-- Type: application/octet-stream, Size: 641 bytes --]

Q34617,سان بيير وميكلون,সাঁ পিয়ের ও মিক‌লোঁ,Saint-Pierre und Miquelon,Saint Pierre and Miquelon,San Pedro y Miquelón,Saint-Pierre-et-Miquelon,Σαιν Πιερ και Μικελόν,साँ-प्येर और मीकेलों,Saint-Pierre és Miquelon,Saint Pierre dan Miquelon,Saint-Pierre e Miquelon,サンピエール島・ミクロン島,생피에르 미클롱,Saint-Pierre en Miquelon,Saint-Pierre i Miquelon,Saint-Pierre e Miquelon,Сен-Пьер и Микелон,Saint-Pierre och Miquelon,Saint Pierre ve Miquelon,Saint-Pierre và Miquelon,圣皮埃尔和密克隆群岛

^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-16 21:12 ` Lars Ingebrigtsen
@ 2021-06-17  6:43   ` Eli Zaretskii
  2021-06-17  7:43     ` Robert Pluim
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-17  6:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 49066, mvsfrasson

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Wed, 16 Jun 2021 23:12:44 +0200
> Cc: 49066@debbugs.gnu.org
> 
> "Miguel V. S. Frasson" <mvsfrasson@gmail.com> writes:
> 
> > * git GNU Emacs 28.0.50 (build 1, x86_64-pc-linux-gnu) of 2021-06-16
> > without toolkits and images --> no crash
> > (1h30 of compilation time discoraged me to try to recompile)
> 
> I can reproduce the crash in Emacs 26.1, but not in Emacs 27.1, so I
> guess this has been fixed in later versions of Emacs?

I cannot reproduce at all, neither in Emacs 26 nor in all subsequent
versions.

Lars, can you show a backtrace from the crash?  Perhaps if I see that,
I could tell if it's a known (and fixed) problem.

Thanks.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-17  6:43   ` Eli Zaretskii
@ 2021-06-17  7:43     ` Robert Pluim
  2021-06-17  8:13       ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Pluim @ 2021-06-17  7:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49066, Lars Ingebrigtsen, mvsfrasson

>>>>> On Thu, 17 Jun 2021 09:43:40 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> I can reproduce the crash in Emacs 26.1, but not in Emacs 27.1, so I
    >> guess this has been fixed in later versions of Emacs?

    Eli> I cannot reproduce at all, neither in Emacs 26 nor in all subsequent
    Eli> versions.

    Eli> Lars, can you show a backtrace from the crash?  Perhaps if I see that,
    Eli> I could tell if it's a known (and fixed) problem.

    Eli> Thanks.

This is from an optimized build of emacs-26.1. I can redo it with a
'-g3 -O0' if you want.

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
    at ftfont.c:2573
2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) bt
#0  ftfont_shape_by_fltPython Exception <class 'gdb.error'> value has been optimized out: 
 (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=)
    at ftfont.c:2573
#1  ftfont_shapePython Exception <class 'gdb.error'> value has been optimized out: 
 (lgstring=, lgstring@entry=XIL(0xaa2755)) at ftfont.c:2615
#2  0x00000000005d97f5 in xftfont_shape (lgstring=XIL(0xaa2755)) at xftfont.c:670
#3  0x000000000057fc2a in Ffont_shape_gstringPython Exception <class 'gdb.error'> value has been optimized out: 
 (gstring=) at font.c:4427
#4  0x000000000056fede in funcall_subr (subr=0x97fac0 <Sfont_shape_gstring>, numargs=numargs@entry=1, args=args@entry=0x7fffffff59a0)
    at eval.c:2844
#5  0x000000000056ecff in Ffuncall (nargs=<optimized out>, args=args@entry=0x7fffffff5998) at lisp.h:600


Robert
-- 





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-17  7:43     ` Robert Pluim
@ 2021-06-17  8:13       ` Eli Zaretskii
  2021-06-17 13:07         ` Robert Pluim
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-17  8:13 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 49066, larsi, mvsfrasson

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  49066@debbugs.gnu.org,
>   mvsfrasson@gmail.com
> Date: Thu, 17 Jun 2021 09:43:03 +0200
> 
> This is from an optimized build of emacs-26.1. I can redo it with a
> '-g3 -O0' if you want.

That'd help.

> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
>     at ftfont.c:2573
> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));

So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
some way?  IOW, what is the immediate reason for the segfault?

> (gdb) bt
> #0  ftfont_shape_by_fltPython Exception <class 'gdb.error'> value has been optimized out: 

What's the story with these Python exceptions?  Looks like some
problem in our .gdbinit?

>  (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=)
>     at ftfont.c:2573
> #1  ftfont_shapePython Exception <class 'gdb.error'> value has been optimized out: 
>  (lgstring=, lgstring@entry=XIL(0xaa2755)) at ftfont.c:2615
> #2  0x00000000005d97f5 in xftfont_shape (lgstring=XIL(0xaa2755)) at xftfont.c:670
> #3  0x000000000057fc2a in Ffont_shape_gstringPython Exception <class 'gdb.error'> value has been optimized out: 
>  (gstring=) at font.c:4427
> #4  0x000000000056fede in funcall_subr (subr=0x97fac0 <Sfont_shape_gstring>, numargs=numargs@entry=1, args=args@entry=0x7fffffff59a0)
>     at eval.c:2844
> #5  0x000000000056ecff in Ffuncall (nargs=<optimized out>, args=args@entry=0x7fffffff5998) at lisp.h:600

The backtrace stops too soon.  Can you show more?  I'd like at the
very least to see which sequence of characters causes the trouble.
From the above, I can only glean that we were performing a character
composition.

It could be some problem with the shaping engine: I guess versions
after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
use m17n-flt in a later Emacs, does it still not crash?

Thanks.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-17  8:13       ` Eli Zaretskii
@ 2021-06-17 13:07         ` Robert Pluim
  2021-06-17 13:59           ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Pluim @ 2021-06-17 13:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49066, larsi, mvsfrasson

>>>>> On Thu, 17 Jun 2021 11:13:17 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> From: Robert Pluim <rpluim@gmail.com>
    >> Cc: Lars Ingebrigtsen <larsi@gnus.org>,  49066@debbugs.gnu.org,
    >> mvsfrasson@gmail.com
    >> Date: Thu, 17 Jun 2021 09:43:03 +0200
    >> 
    >> This is from an optimized build of emacs-26.1. I can redo it with a
    >> '-g3 -O0' if you want.

    Eli> That'd help.

Full backtrace from an unoptimized build:

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x0000000000557a9d in AREF (array=XIL(0), idx=1) at lisp.h:1614
1614	  return XVECTOR (array)->contents[idx];
(gdb) bt
#0  0x0000000000557a9d in AREF (array=XIL(0), idx=1) at lisp.h:1614
#1  0x0000000000693602 in ftfont_shape_by_flt
    (lgstring=XIL(0xb64755), font=0x1308cb0 <bss_sbrk_buffer+8590480>, ft_face=0x340fef0, otf=0x342c810, matrix=0x1308da8 <bss_sbrk_buffer+8590728>) at ftfont.c:2573
#2  0x00000000006939c4 in ftfont_shape (lgstring=XIL(0xb64755)) at ftfont.c:2615
#3  0x0000000000695ae8 in xftfont_shape (lgstring=XIL(0xb64755)) at xftfont.c:670
#4  0x0000000000624f14 in Ffont_shape_gstring (gstring=XIL(0xb64755)) at font.c:4427
#5  0x000000000060714d in funcall_subr (subr=0xa41d60 <Sfont_shape_gstring>, numargs=1, args=0x7fffffff6830) at eval.c:2844
#6  0x0000000000606d80 in Ffuncall (nargs=2, args=0x7fffffff6828) at eval.c:2769
#7  0x000000000064ef3a in exec_byte_code
    (bytestr=XIL(0x81e114), vector=XIL(0x81e135), maxdepth=make_number(6), args_template=XIL(0), nargs=0, args=0x0) at bytecode.c:629
#8  0x0000000000607b03 in funcall_lambda (fun=XIL(0x81e0a5), nargs=5, arg_vector=0x81e135 <pure+964437>) at eval.c:3052
#9  0x0000000000606dc4 in Ffuncall (nargs=6, args=0x7fffffff6d20) at eval.c:2771
#10 0x000000000060392c in internal_condition_case_n (bfun=0x606c02 <Ffuncall>, nargs=6, args=0x7fffffff6d20, handlers=XIL(0xc090), hfun=
    0x43f2a4 <safe_eval_handler>) at eval.c:1412
#11 0x000000000043f519 in safe__call (inhibit_quit=false, nargs=6, func=XIL(0x8e6520), ap=0x7fffffff6e00) at xdisp.c:2617
#12 0x000000000043f60c in safe_call (nargs=6, func=XIL(0x8e6520)) at xdisp.c:2633
#13 0x000000000067e4e6 in autocmp_chars
    (rule=XIL(0xf2b705), charpos=40, bytepos=78, limit=42, win=0x103bc30 <bss_sbrk_buffer+5653520>, face=0x349d570, string=XIL(0))
    at composite.c:928
#14 0x000000000067fad8 in composition_reseat_it
    (cmp_it=0x7fffffff8f30, charpos=40, bytepos=78, endpos=464, w=0x103bc30 <bss_sbrk_buffer+5653520>, face=0x349d570, string=XIL(0))
    at composite.c:1228
#15 0x000000000044e88f in next_element_from_buffer (it=0x7fffffff86b0) at xdisp.c:8483
#16 0x000000000044ab2a in get_next_display_element (it=0x7fffffff86b0) at xdisp.c:7026
#17 0x00000000004715db in display_line (it=0x7fffffff86b0, cursor_vpos=3) at xdisp.c:21409
#18 0x0000000000466d36 in try_window (window=XIL(0x103bc35), pos=..., flags=1) at xdisp.c:17627
#19 0x00000000004648da in redisplay_window (window=XIL(0x103bc35), just_this_one_p=false) at xdisp.c:17074
#20 0x000000000045de89 in redisplay_window_0 (window=XIL(0x103bc35)) at xdisp.c:14831
#21 0x00000000006037bc in internal_condition_case_1
    (bfun=0x45de47 <redisplay_window_0>, arg=XIL(0x103bc35), handlers=XIL(0xb3de33), hfun=0x45de0f <redisplay_window_error>) at eval.c:1356
#22 0x000000000045dde4 in redisplay_windows (window=XIL(0x103bc35)) at xdisp.c:14811
#23 0x000000000045cd16 in redisplay_internal () at xdisp.c:14300
#24 0x000000000045ada7 in redisplay () at xdisp.c:13518
#25 0x0000000000563326 in read_char (commandflag=1, map=XIL(0x142c4b3), prev_event=XIL(0), used_mouse_menu=0x7fffffffdaef, end_time=0x0)
    at keyboard.c:2480
#26 0x000000000057056f in read_key_sequence
    (keybuf=0x7fffffffdc40, bufsize=30, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9147
#27 0x00000000005607c3 in command_loop_1 () at keyboard.c:1368
#28 0x0000000000603715 in internal_condition_case (bfun=0x5603b5 <command_loop_1>, handlers=XIL(0x5250), hfun=0x55fb97 <cmd_error>)
    at eval.c:1332
#29 0x00000000005600a6 in command_loop_2 (ignore=XIL(0)) at keyboard.c:1110
#30 0x0000000000602fed in internal_catch (tag=XIL(0xc6f0), func=0x560079 <command_loop_2>, arg=XIL(0)) at eval.c:1097
#31 0x0000000000560045 in command_loop () at keyboard.c:1089
#32 0x000000000055f76a in recursive_edit_1 () at keyboard.c:695
#33 0x000000000055f8ea in Frecursive_edit () at keyboard.c:766
#34 0x000000000055d58e in main (argc=2, argv=0x7fffffffe128) at emacs.c:1713

Lisp Backtrace:
"font-shape-gstring" (0xffff6830)
"auto-compose-chars" (0xffff6d28)
"redisplay_internal (C function)" (0x0)
(gdb) 

    >> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
    >> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
    >> at ftfont.c:2573
    >> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));

    Eli> So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
    Eli> some way?  IOW, what is the immediate reason for the
    Eli> segfault?

Itʼs lgstring, I think this is one of those 'nil's in lgstring

0  0x0000000000557a9d in AREF (array=XIL(0), idx=1) at lisp.h:1614
1614	  return XVECTOR (array)->contents[idx];
(gdb) up
#1  0x0000000000693602 in ftfont_shape_by_flt (lgstring=XIL(0xb64755), font=0x1308cb0 <bss_sbrk_buffer+8590480>, ft_face=0x340fef0, 
    otf=0x342c810, matrix=0x1308da8 <bss_sbrk_buffer+8590728>) at ftfont.c:2573
2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) pp lgstring
[[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]
(gdb) p g
$2 = (MFLTGlyphFT *) 0x2e631e0
(gdb) p *g
$3 = {
  g = {
    c = 2453,
    code = 20,
    from = 0,
    to = 2,
    xadv = 1024,
    yadv = 0,
    ascent = 768,
    descent = 0,
    lbearing = -64,
    rbearing = 1024,
    xoff = 0,
    yoff = 0,
    encoded = 1,
    measured = 1,
    adjusted = 0,
    internal = 0
  },
  libotf_positioning_type = 0
}

    >> (gdb) bt
    >> #0  ftfont_shape_by_fltPython Exception <class 'gdb.error'> value has been optimized out: 

    Eli> What's the story with these Python exceptions?  Looks like some
    Eli> problem in our .gdbinit?

They donʼt happen with an unoptimized build.

    Eli> The backtrace stops too soon.  Can you show more?  I'd like at the
    Eli> very least to see which sequence of characters causes the trouble.
    Eli> From the above, I can only glean that we were performing a character
    Eli> composition.

This is enough to cause the crash: ক‌

Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?

    Eli> It could be some problem with the shaping engine: I guess versions
    Eli> after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
    Eli> use m17n-flt in a later Emacs, does it still not crash?

emacs-27 built '--without-harfbuzz' and thus with m17n-flt crashes the same way.

Robert
-- 





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-17 13:07         ` Robert Pluim
@ 2021-06-17 13:59           ` Eli Zaretskii
  2021-06-17 15:04             ` Eli Zaretskii
  2021-06-27  2:29             ` handa
  0 siblings, 2 replies; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-17 13:59 UTC (permalink / raw)
  To: Robert Pluim, Kenichi Handa; +Cc: 49066, larsi, mvsfrasson

> From: Robert Pluim <rpluim@gmail.com>
> Cc: larsi@gnus.org,  49066@debbugs.gnu.org,  mvsfrasson@gmail.com
> Date: Thu, 17 Jun 2021 15:07:18 +0200
> 
> Full backtrace from an unoptimized build:

Thanks.

>     >> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
>     >> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
>     >> at ftfont.c:2573
>     >> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
> 
>     Eli> So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
>     Eli> some way?  IOW, what is the immediate reason for the
>     Eli> segfault?
> 
> Itʼs lgstring, I think this is one of those 'nil's in lgstring

Yes, I think so.  We can verify that by looking at the value of
g->g.to:

  (gdb) p *g
  $3 = {
    g = {
      c = 2453,
      code = 20,
      from = 0,
      to = 2, <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

And the LGLYPH whose index is 2 is indeed nil:

  (gdb) pp lgstring
  [[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]  ^^^

I think this is a bug in that loop: it should actually exit whenever
it finds the first LGLYPH that is nil, and update gstring.used
accordingly.  Something like this:

  for (i = 0; i < gstring.used; i++)
    {
      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;

      if (NILP (LGSTRING_GLYPH (lgstring, g->g.from))
          || NILP (LGSTRING_GLYPH (lgstring, g->g.to)))
	break;
      g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
    }
  gstring.used = i;

CC'ing Handa-san, as I'm not really familiar with this code.

> This is enough to cause the crash: ক‌
> 
> Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?

Because #x995 is a Bengali character, and lisp/language/indian.el
says:

  (defconst bengali-composable-pattern
    (let ((table
	   '(("a" . "\u0981")		; SIGN CANDRABINDU
	     ("A" . "[\u0982\u0983]")	; SIGN ANUSVARA .. VISARGA
	     ("V" . "[\u0985-\u0994\u09E0\u09E1]") ; independent vowel
	     ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
	     ("B" . "[\u09AC\u09AF\u09B0\u09F0]")		; BA, YA, RA
	     ("R" . "[\u09B0\u09F0]")		; RA
	     ("n" . "\u09BC")		; NUKTA
	     ("v" . "[\u09BE-\u09CC\u09D7\u09E2\u09E3]") ; vowel sign
	     ("H" . "\u09CD")		; HALANT
	     ("T" . "\u09CE")		; KHANDA TA
	     ("N" . "\u200C")		; ZWNJ  <<<<<<<<<<<<<<<<<<<<<<<<<<<
	     ("J" . "\u200D")		; ZWJ
	     ("X" . "[\u0980-\u09FF]"))))	; all coverage
      (indian-compose-regexp
       (concat
	;; syllables with an independent vowel, or
	"\\(?:RH\\)?Vn?\\(?:J?HB\\)?v*n?a?A?\\|"
	;; consonant-based syllables, or
	"Cn?\\(?:J?HJ?Cn?\\)*\\(?:H[NJ]?\\|v*[NJ]?v?a?A?\\)\\|"
	;; another syllables with an independent vowel, or
	"\\(?:RH\\)?T\\|"
	;; special consonant form, or
	"JHB\\|"
	;; any other singleton characters
	"X")
       table))
    "Regexp matching a composable sequence of Bengali characters.")

(which is used below that in setting up composition-function-table for
Bengali characters).

>     Eli> It could be some problem with the shaping engine: I guess versions
>     Eli> after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
>     Eli> use m17n-flt in a later Emacs, does it still not crash?
> 
> emacs-27 built '--without-harfbuzz' and thus with m17n-flt crashes the same way.

Yes, it figures.

I hope Handa-san will suggest a solution, for those who want to stick
with m17n-flt.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-17 13:59           ` Eli Zaretskii
@ 2021-06-17 15:04             ` Eli Zaretskii
  2021-06-27  2:29             ` handa
  1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-17 15:04 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: 49066, rpluim, larsi, mvsfrasson

> Date: Thu, 17 Jun 2021 16:59:42 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 49066@debbugs.gnu.org, larsi@gnus.org, mvsfrasson@gmail.com
> 
> > This is enough to cause the crash: ক‌
> > 
> > Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?
> 
> Because #x995 is a Bengali character, and lisp/language/indian.el
> says:

Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
compose unless they are followed by a character.  See section 12.2 in
the Unicode Standard.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-17 13:59           ` Eli Zaretskii
  2021-06-17 15:04             ` Eli Zaretskii
@ 2021-06-27  2:29             ` handa
  2021-06-27  6:20               ` Eli Zaretskii
  1 sibling, 1 reply; 18+ messages in thread
From: handa @ 2021-06-27  2:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49066, rpluim, eggert, larsi, mvsfrasson

Hi,

>   (gdb) pp lgstring
>   [[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]  ^^^

> I think this is a bug in that loop: it should actually exit whenever
> it finds the first LGLYPH that is nil, and update gstring.used
> accordingly.  Something like this:

>   for (i = 0; i < gstring.used; i++)
>     {
>       MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;

>       if (NILP (LGSTRING_GLYPH (lgstring, g->g.from))
>           || NILP (LGSTRING_GLYPH (lgstring, g->g.to)))
> 	break;
>       g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
>       g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
>     }
>   gstring.used = i;

I don't think so because glyphs of indices g->g.from and g->g.to should
not be nil.

> > This is enough to cause the crash: ক‌

As I surely remember that rendering that string with m17n-flt had no
problem before, I suspect that some change after I wrote the code has a
problem.

So, I tried to restore the old code as the attached patch, and then the
patched emacs has no problem of rendering the above Bengali string.

The patch cancels this change: 
------------------------------------------------------------
commit 04ac097f34d887e1ae8dea1e884118728e931c7a
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Fri Nov 13 12:02:21 2015 -0800

    Spruce up ftfont.c memory allocation
    
    * src/ftfont.c (setup_otf_gstring):
    Avoid O(N**2) behavior when reallocating.
    (ftfont_shape_by_flt): Prefer xpalloc to xrealloc when
    reallocating buffers; this simplifies the code.  Do not trust
    mflt_run to leave the output areas unchanged on failure, as
    this isn’t part of its interface spec.
------------------------------------------------------------

But, at the moment I don't know why the new code does not work.

---
K. Handa
handa@gnu.org

diff --git a/src/ftfont.c b/src/ftfont.c
index 0603dd9ce6..26198928d8 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -2720,6 +2720,37 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 	}
     }
 
+#define RESTORE_OLD_CODE
+#ifdef RESTORE_OLD_CODE
+  if (gstring.allocated == 0)
+    {
+      gstring.glyph_size = sizeof (MFLTGlyph);
+      gstring.glyphs = xnmalloc (len * 2, sizeof *gstring.glyphs);
+      gstring.allocated = len * 2;
+    }
+  else if (gstring.allocated < len * 2)
+    {
+      gstring.glyphs = xnrealloc (gstring.glyphs, len * 2,
+				  sizeof *gstring.glyphs);
+      gstring.allocated = len * 2;
+    }
+  memset (gstring.glyphs, 0, len * sizeof *gstring.glyphs);
+  for (i = 0; i < len; i++)
+    {
+      Lisp_Object g = LGSTRING_GLYPH (lgstring, i);
+
+      gstring.glyphs[i].c = LGLYPH_CHAR (g);
+      if (with_variation_selector)
+	{
+	  gstring.glyphs[i].code = LGLYPH_CODE (g);
+	  gstring.glyphs[i].encoded = 1;
+	}
+    }
+
+  gstring.used = len;
+  gstring.r2l = 0;
+#endif
+
   {
     Lisp_Object family = Ffont_get (LGSTRING_FONT (lgstring), QCfamily);
 
@@ -2763,6 +2794,20 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 	return make_fixnum (0);
     }
 
+#ifdef RESTORE_OLD_CODE
+  for (i = 0; i < 3; i++)
+    {
+      int result = mflt_run (&gstring, 0, len, &flt_font_ft.flt_font, flt);
+      if (result != -2)
+	break;
+      int len2;
+      if (INT_MULTIPLY_WRAPV (gstring.allocated, 2, &len2))
+	memory_full (SIZE_MAX);
+      gstring.glyphs = xnrealloc (gstring.glyphs,
+				  gstring.allocated, 2 * sizeof (MFLTGlyphFT));
+      gstring.allocated = len2;
+    }
+#else
   MFLTGlyphFT *glyphs = (MFLTGlyphFT *) gstring.glyphs;
   ptrdiff_t allocated = gstring.allocated;
   ptrdiff_t incr_min = len - allocated;
@@ -2795,6 +2840,7 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
       gstring.r2l = 0;
     }
   while (mflt_run (&gstring, 0, len, &flt_font_ft.flt_font, flt) == -2);
+#endif
 
   if (gstring.used > LGSTRING_GLYPH_LEN (lgstring))
     return Qnil;





^ permalink raw reply related	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-27  2:29             ` handa
@ 2021-06-27  6:20               ` Eli Zaretskii
  2021-06-27 18:02                 ` Paul Eggert
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-27  6:20 UTC (permalink / raw)
  To: handa; +Cc: 49066, rpluim, eggert, larsi, mvsfrasson

> From: handa <handa@gnu.org>
> Cc: rpluim@gmail.com, larsi@gnus.org, 49066@debbugs.gnu.org,
>  mvsfrasson@gmail.com, eggert@cs.ucla.edu
> Date: Sun, 27 Jun 2021 11:29:28 +0900
> 
> So, I tried to restore the old code as the attached patch, and then the
> patched emacs has no problem of rendering the above Bengali string.

Thanks.  Robert, Miguel: could you please try this patch and see if it
fixes the problem?

Since we are moving away of m17n-flt, I don't think we should optimize
memory management when m17n-flt is used, especially if that causes
problems.  So if the patch fixes the crash, I think we should install
it.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-27  6:20               ` Eli Zaretskii
@ 2021-06-27 18:02                 ` Paul Eggert
  2021-06-27 19:15                   ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Paul Eggert @ 2021-06-27 18:02 UTC (permalink / raw)
  To: Eli Zaretskii, handa; +Cc: 49066, rpluim, larsi, mvsfrasson

On 6/26/21 11:20 PM, Eli Zaretskii wrote:
> Since we are moving away of m17n-flt, I don't think we should optimize
> memory management when m17n-flt is used, especially if that causes
> problems.  So if the patch fixes the crash, I think we should install
> it.

Sure, and I can volunteer to do that. Would you like me to do it in 
master now, or wait for confirmation and install it on the emacs-27 
branch? or perhaps some other course of action?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-27 18:02                 ` Paul Eggert
@ 2021-06-27 19:15                   ` Eli Zaretskii
  2021-06-28 10:56                     ` Robert Pluim
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-27 19:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 49066, handa, rpluim, larsi, mvsfrasson

> Cc: rpluim@gmail.com, larsi@gnus.org, 49066@debbugs.gnu.org,
>  mvsfrasson@gmail.com
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Sun, 27 Jun 2021 11:02:26 -0700
> 
> On 6/26/21 11:20 PM, Eli Zaretskii wrote:
> > Since we are moving away of m17n-flt, I don't think we should optimize
> > memory management when m17n-flt is used, especially if that causes
> > problems.  So if the patch fixes the crash, I think we should install
> > it.
> 
> Sure, and I can volunteer to do that. Would you like me to do it in 
> master now, or wait for confirmation and install it on the emacs-27 
> branch? or perhaps some other course of action?

I'd like to see the confirmation, and then install this on master.

Thanks.





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-27 19:15                   ` Eli Zaretskii
@ 2021-06-28 10:56                     ` Robert Pluim
  2021-06-28 12:05                       ` Eli Zaretskii
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Pluim @ 2021-06-28 10:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49066, handa, larsi, Paul Eggert, mvsfrasson

>>>>> On Sun, 27 Jun 2021 22:15:50 +0300, Eli Zaretskii <eliz@gnu.org> said:

    >> Cc: rpluim@gmail.com, larsi@gnus.org, 49066@debbugs.gnu.org,
    >> mvsfrasson@gmail.com
    >> From: Paul Eggert <eggert@cs.ucla.edu>
    >> Date: Sun, 27 Jun 2021 11:02:26 -0700
    >> 
    >> On 6/26/21 11:20 PM, Eli Zaretskii wrote:
    >> > Since we are moving away of m17n-flt, I don't think we should optimize
    >> > memory management when m17n-flt is used, especially if that causes
    >> > problems.  So if the patch fixes the crash, I think we should install
    >> > it.
    >> 
    >> Sure, and I can volunteer to do that. Would you like me to do it in 
    >> master now, or wait for confirmation and install it on the emacs-27 
    >> branch? or perhaps some other course of action?

    Eli> I'd like to see the confirmation, and then install this on master.

    Eli> Thanks.

With the patch it still crashes for me in emacs-master with harfbuzz disabled:

Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
0x000055555576d4e7 in AREF (array=XIL(0), idx=1) at lisp.h:1838
1838	  return XVECTOR (array)->contents[idx];
(gdb) bt
#0  0x000055555576d4e7 in AREF (array=XIL(0), idx=1) at lisp.h:1838
#1  0x0000555555774be0 in ftfont_shape_by_flt
    (lgstring=XIL(0x7ffff1e5301d), font=0x55555604f410, ft_face=0x5555566a2400, otf=0x555556696b60, matrix=0x55555604f508) at ftfont.c:2852
#2  0x0000555555775002 in ftfont_shape (lgstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at ftfont.c:2890
#3  0x000055555577629e in ftcrfont_shape (lgstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at ftcrfont.c:477
#4  0x000055555571344c in Ffont_shape_gstring (gstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at font.c:4499
#5  0x00005555557019fb in Ffuncall (nargs=3, args=args@entry=0x7fffffffd670) at eval.c:3039
#6  0x000055555573cdf8 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:632
#7  0x0000555555701937 in Ffuncall (nargs=nargs@entry=7, args=args@entry=0x7fffffffd990) at eval.c:3055
#8  0x0000555555700cf9 in internal_condition_case_n (bfun=
    0x555555701760 <Ffuncall>, nargs=nargs@entry=7, args=args@entry=0x7fffffffd990, handlers=handlers@entry=XIL(0x30), hfun=hfun@entry=
    0x5555555ca5e0 <safe_eval_handler>) at eval.c:1642
#9  0x00005555555b8603 in safe__call
    (inhibit_quit=inhibit_quit@entry=false, nargs=nargs@entry=7, func=<optimized out>, ap=ap@entry=0x7fffffffda28) at lisp.h:1002
#10 0x00005555555c79b5 in safe_call (nargs=nargs@entry=7, func=<optimized out>) at xdisp.c:3009
#11 0x00005555557609c5 in autocmp_chars
    (rule=XIL(0x7ffff1e501bd), charpos=charpos@entry=146, bytepos=<optimized out>, limit=<optimized out>, 
    limit@entry=148, win=win@entry=0x555556030100, face=face@entry=0x0, string=XIL(0), direction=XIL(0)) at lisp.h:731
#12 0x000055555576426d in find_automatic_composition (pos=pos@entry=146, limit=146, 
    limit@entry=-1, backlim=backlim@entry=-1, start=start@entry=0x7fffffffdc68, end=end@entry=0x7fffffffdc70, gstring=gstring@entry=0x7fffffffdc78, string=XIL(0)) at composite.c:1661
#13 0x0000555555764f39 in composition_adjust_point (last_pt=last_pt@entry=146, new_pt=new_pt@entry=146) at lisp.h:1002
#14 0x00005555556960ff in command_loop_1 () at keyboard.c:1569
#15 0x00005555557009d7 in internal_condition_case
    (bfun=bfun@entry=0x555555695020 <command_loop_1>, handlers=handlers@entry=XIL(0x90), hfun=hfun@entry=0x55555568bac0 <cmd_error>)
    at eval.c:1478
#16 0x0000555555686064 in command_loop_2 (ignore=ignore@entry=XIL(0)) at lisp.h:1002
#17 0x0000555555702ed3 in internal_catch (tag=tag@entry=XIL(0xe520), func=func@entry=0x555555686040 <command_loop_2>, arg=arg@entry=XIL(0))
    at eval.c:1198
#18 0x000055555568600b in command_loop () at lisp.h:1002
#19 0x000055555568b6d6 in recursive_edit_1 () at keyboard.c:720
#20 0x000055555568ba02 in Frecursive_edit () at keyboard.c:789
#21 0x00005555555a177f in main (argc=2, argv=<optimized out>) at emacs.c:2308

Lisp Backtrace:
"font-shape-gstring" (0xffffd678)
"auto-compose-chars" (0xffffd998)
(gdb) up
#1  0x0000555555774be0 in ftfont_shape_by_flt (lgstring=XIL(0x7ffff1e5301d), font=0x55555604f410, ft_face=0x5555566a2400, 
    otf=0x555556696b60, matrix=0x55555604f508) at ftfont.c:2852
2852	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) up
#2  0x0000555555775002 in ftfont_shape (lgstring=XIL(0x7ffff1e5301d), direction=XIL(0)) at ftfont.c:2890
2890	  return ftfont_shape_by_flt (lgstring, font, ftfont_info->ft_size->face, otf,
(gdb) pp lgstring
[[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 16 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil nil nil nil]
(gdb) down
#1  0x0000555555774be0 in ftfont_shape_by_flt (lgstring=XIL(0x7ffff1e5301d), font=0x55555604f410, ft_face=0x5555566a2400, 
    otf=0x555556696b60, matrix=0x55555604f508) at ftfont.c:2852
2852	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
(gdb) p *g
$1 = {
  g = {
    c = 2453,
    code = 0,
    from = 0,
    to = 2,
    xadv = 704,
    yadv = 0,
    ascent = 896,
    descent = 0,
    lbearing = 64,
    rbearing = 640,
    xoff = 0,
    yoff = 0,
    encoded = 1,
    measured = 1,
    adjusted = 0,
    internal = 1073741823
  },
  libotf_positioning_type = 8204
}

Robert
-- 





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-28 10:56                     ` Robert Pluim
@ 2021-06-28 12:05                       ` Eli Zaretskii
  2021-07-03  2:05                         ` handa
  0 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2021-06-28 12:05 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 49066, handa, larsi, eggert, mvsfrasson

> From: Robert Pluim <rpluim@gmail.com>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  handa@gnu.org,  larsi@gnus.org,
>   49066@debbugs.gnu.org,  mvsfrasson@gmail.com
> Date: Mon, 28 Jun 2021 12:56:06 +0200
> 
>     Eli> I'd like to see the confirmation, and then install this on master.
> 
>     Eli> Thanks.
> 
> With the patch it still crashes for me in emacs-master with harfbuzz disabled:

Too bad.

Kenichi, any suggestions?





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-06-28 12:05                       ` Eli Zaretskii
@ 2021-07-03  2:05                         ` handa
  2021-07-05  9:28                           ` Robert Pluim
  0 siblings, 1 reply; 18+ messages in thread
From: handa @ 2021-07-03  2:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 49066, rpluim, eggert, larsi, mvsfrasson

[-- Attachment #1: Type: text/plain, Size: 2340 bytes --]

In article <83bl7qp52q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > With the patch it still crashes for me in emacs-master with harfbuzz disabled:

> Too bad.
> Kenichi, any suggestions?

I checked the code again, and found that it was a fault of m17n-lib
which was not robust enough to handle an OTF table that is different
from what the library expects.

Here is a revised patch to handle such a case.  Could you please try it?

------------------------------------------------------------
diff --git a/src/ftfont.c b/src/ftfont.c
index 0603dd9ce6..12d0d72d27 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -2798,10 +2798,31 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 
   if (gstring.used > LGSTRING_GLYPH_LEN (lgstring))
     return Qnil;
+
+  /* mflt_run may fail to set g->g.to (which must be a valid index
+     into lgstring) correctly if the font has an OTF table that is
+     different from what the m17n library expects. */
   for (i = 0; i < gstring.used; i++)
     {
       MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+      if (g->g.to >= len)
+	{
+	  /* Invalid g->g.to. */
+	  g->g.to = len - 1;
+	  int from = g->g.from;
+	  /* Fix remaining glyphs. */
+	  for (++i; i < gstring.used; i++)
+	    {
+	      g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+	      g->g.from = from;
+	      g->g.to = len - 1;
+	    }
+	}
+    }
 
+  for (i = 0; i < gstring.used; i++)
+    {
+      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
       g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
       g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
     }
------------------------------------------------------------

> Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
> compose unless they are followed by a character.  See section 12.2 in
> the Unicode Standard.

Even if they should not be composed with, we must include them in the
string to shape because their existence may change the glyph of the
previous character.  A shaper (m17n-lib or harfbuzz) must return a glyph
string that has an independent grapheme cluster for the last ZWJ/ZWNJ.

At the time of developing m17n-lib, the above rule was not clear.  To
conform to that rule, please to put the attached BNG2-OTF.flt under the
directory ~/.m17n.d/.

---
K. Handa
handa@gnu.org


[-- Attachment #2: BNG2-OTF.flt --]
[-- Type: application/octet-stream, Size: 6915 bytes --]

;; BNG2-OTF.flt -- Font Layout Table for bng2 OpenType fonts
;; Copyright (C) 2010 AIST (H15PRO112)
;; See the end for copying conditions.

(font layouter bng2-otf nil
      (version "1.6.0")
      (font (nil nil unicode-bmp :otf=bng2)))

;;; <li> BNG2-OTF.flt
;;;
;;; For bng2 OpenType fonts to draw the Bengali script.  

;; It seems that "Shornar Bangla.ttf" is designed to render the bng2
;; script with the following glyph sequence.
;; 1. pre matra
;; 2. half forms and below forms
;; 3. base glyph
;; 4. below forms
;; 5. below matra (09C1..09C4)
;; 6. reph
;; 7. post forms
;; 8. post matra (09C0, 09D7)
;; 9. candrabindu (0981)
;; 10. anusvara (0982) or visarga (0983)

(category
 ;; X: generic
 ;; V: independent vowel
 ;; C: consonant
 ;; R: RA
 ;; T: KHANDA TA
 ;; n: NUKTA
 ;; H: HALANT
 ;; m: vowel sign (pre)
 ;; b: vowel sign (below)
 ;; p: vowel sign (post)
 ;; a: vowel modifier (above)
 ;; A: vowel modifier (post)
 ;; N: ZWNJ
 ;; J: ZWJ
 (0x0980 0x09FF	?X)			; generic
 (0x0981	?a)			; SIGN CANDRABINDU
 (0x0982 0x0983	?A)			; SIGN ANUSVARA .. VISARGA
 (0x0985 0x0994	?V)			; LETTER A .. AU
 (0x0995 0x09B9	?C)			; LETTER KA .. HA
 (0x09B0	?R)			; LETTER RA
 (0x09BC	?n)			; SIGN NUKTA
 (0x09BE	?p)			; VOWEL SIGN AA
 (0x09BF	?m)			; VOWEL SIGN I
 (0x09C0	?p)			; VOWEL SIGN II
 (0x09C1 0x09C4	?b)			; VOWEL SIGN U .. RR
 (0x09C7 0x09C8	?m)			; VOWEL SIGN E .. AI
 (0x09CD	?H)			; SIGN VIRAMA
 (0x09CE	?T)			; LETTER KHANDA TA
 (0x09D7	?p)			; AU LENGTH MARK
 (0x09DC 0x09DF	?C)			; LETTER RRA .. YYA
 (0x09E0 0x09E1	?V)			; LETTER VOCALIC RR, LL
 (0x09E2 0x09E3	?b)			; VOWEL SIGN L .. LL
 (0x09F0	?R)			; LETTER RA WITH MIDDLE DIAGONAL
 (0x09F1	?C)			; LETTER RA WITH LOWER DIAGONAL

 (0x200C	?N)			; ZWNJ
 (0x200D	?J)			; ZWJ
 (0x25CC	?X)			; DOTTED CIRCLE

 (rphf		?r)
 (pstf		?P)
 )

;; Stage 0
;; Preprocessing
(generator
 (0
  (cond
   ;; Decompose two-part vowel signs.
   ((0x09CB)
    0x09C7 0x09BE)
   ((0x09CC)
    0x09C7 0x09D7)

   ;; TA + HALANT + ZWJ -> KHANDA-TA
   ((0x09A4 0x09CD 0x200D)
    0x09CE)

   ;; consonant + NUKTA
   ((0x09A1 0x09BC)
    0x09DC)
   ((0x09A2 0x09BC)
    0x09DD)
   ((0x09AF 0x09BC)
    0x09DF)

   ("." =))
  *))

;; Stage 1
;; Syllable identification
(generator
 (0
  (cond
   ;; Syllables with an independent vowel
   ("(RH)?Vn?(J?H[CR])?m?b?p?n?a?A?"
    < | = * | >)

   ;; KHANDA-TA combines only with reph.
   ("(RH)?(T)"
    < (2 =) (1 :otf=bng2=rphf+) >)

   ;; Consonant-based syllables
   ("([CR]n?J?HJ?)*[CR]n?(H[NJ]?|m?([NJ]?b)?p?n?)a?A?"
    < | = * | >)

   ;; Two-part vowel signs
   ((0x09C7 0x09BE)
    (cond
     ((font-facility 0x25CC) < 0x09C7 0x25CC 0x09BE >)
     (".+" < 0x09CB >)))
   ((0x09C7 0x09D7)
    (cond
     ((font-facility 0x25CC) < 0x09C7 0x25CC 0x09D7 >)
     (".+" < 0x09CC >)))

   ;; Combining marks are displayed with a DOTTED CIRCLE.
   ("m"
    (cond
     ((font-facility 0x25CC) < = 0x25CC >)
     ("." [ = ])))
   ("[nHbpaA]"
    (cond
     ((font-facility 0x25CC) < 0x25CC = >)
     ("." [ = ])))
   ("JH[CR]"
    (cond
     ((font-facility 0x25CC) < 0x25CC :otf=bng2=blwf,pstf+ >)
     (".+" [ :otf=bng2=blwf,pstf+ ])))

   ("." =))
  *))

;; Stage 2
;; Basic shaping forms and matra reordering
(generator
 (0
  (cond
   ;; Explicit halant form starting with RA + H + ZWJ
   (" (RHJ[CRnHJ]+)(HN?a?A?) "
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (1 b4post) (1 post) (2 = *) |)

   ;; Explicit halant form starting with a reph
   (" (RH)([CRnHJ]+)(HN?a?A?) "
    (2 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (1 :otf=bng2=rphf+) (2 b4post) (2 post) (3 = *) |)

   ;; Other explicit halant forms
   (" ([CRnHJ]+)(HN?a?A?) "
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (1 b4post) (1 post) (2 = *) |)

   ;; Ordinary syllables starting with RA + H + ZWJ
   ;; 1             2     3     45
   (" (RHJ[CRnHJN]*)(mn?)?(bn?)?((pn?)?a?A?) "
    ;;            |
    ;; This is an asterisk.  (See DEV2-OTF.flt)
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (2 = *) (1 b4post) (3 = *) (1 post) (4 = *) |)

   ;; Ordinary syllables starting with a reph
   ;; 1   2           3     4     56
   (" (RH)([CRnHJVN]+)(mn?)?(bn?)?((pn?)?a?A?) "
    (2 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (3 = *) (1 :otf=bng2=rphf+) (2 b4post) (4 = *) (2 post) (5 = *) |)

   ;; Other ordinary syllables
   ;; 1           2     3     45
   (" ([CRnHJVN]+)(mn?)?(bn?)?((pn?)?a?A?) "
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (2 = *) (1 b4post) (3 = *) (1 post) (4 = *) |)

   ("." =))
  *)

 (b4post
  (cond
   ;;1                 23       4
   ("([CRnHJP]*[CRV]n?)((J?PP)+)([NJ])?$"
    (1 :otf=bng2=locl,nukt,akhn,blwf,half,vatu,cjct+) (4 =))
   (".+"
    (0 :otf=bng2=locl,nukt,akhn,blwf,half,vatu,cjct+) (4 =))))

 (post
  (cond
   ("[CRnHJP]*[CRV]n?((J?PP)+)([NJ])?$"
    (1 :otf=bng2=pstf+))))
 )

;; Stage 3
;; Final reordering #1 (Move pre-base matra after the last halant)
(generator
 (0
  (cond
   ;; 1    2         3
   (" (mn?)([^ ]+HJ?)([^H ]+) "
    | (2 = *) (1 = *) (3 = *) |)

   ("." =))
  *))

;; Stage 4
;; Final reordering #2 (Move reph after the first halant)
(generator
 (0
  (cond
   ;; Syllables with a reph and an explicit halant
   ;; 1     2  3           4
   (" (mn?)?(r)([^HP ]+HJ?)([^ ]*) "
    | (1 = *) (3 = *) (2 =) (4 = *) |)

   ;; A reph without explicit halant
   ;; 1     2  3          4
   (" (mn?)?(r)([^PpaA ]+)(P*H?p?n?a?A?) "
    | (1 = *) (3 = *) (2 =) (4 = *) |)

   ("." =))
  *))

;; Stage 5
;; Nukta for matra and Presentation forms
(generator
 (0
  (cond
   (" (mn?)?([^ ]+) "
    | (1 :otf=bng2=nukt,init+)
    (2 :otf=bng2=nukt,pres,abvs,blws,psts,haln,calt+) |)

   ("." =))
  *))

;; Stage 6
;; Remove ZWNJ/ZWJ
(generator
 (0
  (cond
   ("( .+ )([NJ])$"
    (1 = *) (2 < = > ))

   ("[NJ]")

   ("." =))
  *))

;; Stage 7
;; GPOS processing
(generator
 (0
  (cond
   (" ([^ ]+) "
    (1 :otf=bng2=+kern,dist,abvm,blwm))

   ("." =))
  *))

;; Copyright (C) 2010
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H15PRO112

;; This file is part of the m17n database; a sub-part of the m17n
;; library.

;; The m17n library is free software; you can redistribute it and/or
;; modify it under the terms of the GNU Lesser General Public License
;; as published by the Free Software Foundation; either version 2.1 of
;; the License, or (at your option) any later version.

;; The m17n library is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; Lesser General Public License for more details.

;; You should have received a copy of the GNU Lesser General Public
;; License along with the m17n library; if not, write to the Free
;; Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.

;; Local Variables:
;; mode: emacs-lisp
;; End:

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-07-03  2:05                         ` handa
@ 2021-07-05  9:28                           ` Robert Pluim
  2021-07-20 12:23                             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 18+ messages in thread
From: Robert Pluim @ 2021-07-05  9:28 UTC (permalink / raw)
  To: handa; +Cc: 49066, eggert, larsi, mvsfrasson

>>>>> On Sat, 03 Jul 2021 11:05:05 +0900, handa <handa@gnu.org> said:

    handa> In article <83bl7qp52q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
    >> > With the patch it still crashes for me in emacs-master with harfbuzz disabled:

    >> Too bad.
    >> Kenichi, any suggestions?

    handa> I checked the code again, and found that it was a fault of m17n-lib
    handa> which was not robust enough to handle an OTF table that is different
    handa> from what the library expects.

    handa> Here is a revised patch to handle such a case.  Could you please try it?

Thanks, that fixes the crash, and results in the ZWNJ being composed.

    >> Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
    >> compose unless they are followed by a character.  See section 12.2 in
    >> the Unicode Standard.

    handa> Even if they should not be composed with, we must include them in the
    handa> string to shape because their existence may change the glyph of the
    handa> previous character.  A shaper (m17n-lib or harfbuzz) must return a glyph
    handa> string that has an independent grapheme cluster for the last ZWJ/ZWNJ.

    handa> At the time of developing m17n-lib, the above rule was not clear.  To
    handa> conform to that rule, please to put the attached BNG2-OTF.flt under the
    handa> directory ~/.m17n.d/.

I believe you, but I did not test this specifically.

Robert
-- 





^ permalink raw reply	[flat|nested] 18+ messages in thread

* bug#49066: 26.3; Segmentation fault on specific utf8 string
  2021-07-05  9:28                           ` Robert Pluim
@ 2021-07-20 12:23                             ` Lars Ingebrigtsen
  0 siblings, 0 replies; 18+ messages in thread
From: Lars Ingebrigtsen @ 2021-07-20 12:23 UTC (permalink / raw)
  To: Robert Pluim; +Cc: 49066, handa, eggert, mvsfrasson

Robert Pluim <rpluim@gmail.com> writes:

>     handa> Here is a revised patch to handle such a case.  Could you
>     handa> please try it?
>
> Thanks, that fixes the crash, and results in the ZWNJ being composed.

I see that the patch wasn't applied, so I pushed it now to Emacs 28.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-07-20 12:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-16 21:07 bug#49066: 26.3; Segmentation fault on specific utf8 string Miguel V. S. Frasson
2021-06-16 21:12 ` Lars Ingebrigtsen
2021-06-17  6:43   ` Eli Zaretskii
2021-06-17  7:43     ` Robert Pluim
2021-06-17  8:13       ` Eli Zaretskii
2021-06-17 13:07         ` Robert Pluim
2021-06-17 13:59           ` Eli Zaretskii
2021-06-17 15:04             ` Eli Zaretskii
2021-06-27  2:29             ` handa
2021-06-27  6:20               ` Eli Zaretskii
2021-06-27 18:02                 ` Paul Eggert
2021-06-27 19:15                   ` Eli Zaretskii
2021-06-28 10:56                     ` Robert Pluim
2021-06-28 12:05                       ` Eli Zaretskii
2021-07-03  2:05                         ` handa
2021-07-05  9:28                           ` Robert Pluim
2021-07-20 12:23                             ` Lars Ingebrigtsen
2021-06-16 21:22 ` bug#49066: file foo Miguel V. S. Frasson

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.