all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Robert Pluim <rpluim@gmail.com>, Kenichi Handa <handa@gnu.org>
Cc: 49066@debbugs.gnu.org, larsi@gnus.org, mvsfrasson@gmail.com
Subject: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Thu, 17 Jun 2021 16:59:42 +0300	[thread overview]
Message-ID: <83r1h0zj5d.fsf@gnu.org> (raw)
In-Reply-To: <878s3863nd.fsf@gmail.com> (message from Robert Pluim on Thu, 17 Jun 2021 15:07:18 +0200)

> From: Robert Pluim <rpluim@gmail.com>
> Cc: larsi@gnus.org,  49066@debbugs.gnu.org,  mvsfrasson@gmail.com
> Date: Thu, 17 Jun 2021 15:07:18 +0200
> 
> Full backtrace from an unoptimized build:

Thanks.

>     >> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
>     >> ftfont_shape_by_flt (matrix=<optimized out>, otf=<optimized out>, ft_face=<optimized out>, font=<optimized out>, lgstring=...)
>     >> at ftfont.c:2573
>     >> 2573	      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
> 
>     Eli> So, is 'g' a NULL pointer or something?  Or is 'lgstring' faulty in
>     Eli> some way?  IOW, what is the immediate reason for the
>     Eli> segfault?
> 
> Itʼs lgstring, I think this is one of those 'nil's in lgstring

Yes, I think so.  We can verify that by looking at the value of
g->g.to:

  (gdb) p *g
  $3 = {
    g = {
      c = 2453,
      code = 20,
      from = 0,
      to = 2, <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

And the LGLYPH whose index is 2 is indeed nil:

  (gdb) pp lgstring
  [[#<font-object "-GOOG-Noto Sans Bengali-normal-normal-normal-*-19-*-*-*-*-0-iso10646-1"> 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil]  ^^^

I think this is a bug in that loop: it should actually exit whenever
it finds the first LGLYPH that is nil, and update gstring.used
accordingly.  Something like this:

  for (i = 0; i < gstring.used; i++)
    {
      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;

      if (NILP (LGSTRING_GLYPH (lgstring, g->g.from))
          || NILP (LGSTRING_GLYPH (lgstring, g->g.to)))
	break;
      g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
      g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
    }
  gstring.used = i;

CC'ing Handa-san, as I'm not really familiar with this code.

> This is enough to cause the crash: ক‌
> 
> Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ?

Because #x995 is a Bengali character, and lisp/language/indian.el
says:

  (defconst bengali-composable-pattern
    (let ((table
	   '(("a" . "\u0981")		; SIGN CANDRABINDU
	     ("A" . "[\u0982\u0983]")	; SIGN ANUSVARA .. VISARGA
	     ("V" . "[\u0985-\u0994\u09E0\u09E1]") ; independent vowel
	     ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant
	     ("B" . "[\u09AC\u09AF\u09B0\u09F0]")		; BA, YA, RA
	     ("R" . "[\u09B0\u09F0]")		; RA
	     ("n" . "\u09BC")		; NUKTA
	     ("v" . "[\u09BE-\u09CC\u09D7\u09E2\u09E3]") ; vowel sign
	     ("H" . "\u09CD")		; HALANT
	     ("T" . "\u09CE")		; KHANDA TA
	     ("N" . "\u200C")		; ZWNJ  <<<<<<<<<<<<<<<<<<<<<<<<<<<
	     ("J" . "\u200D")		; ZWJ
	     ("X" . "[\u0980-\u09FF]"))))	; all coverage
      (indian-compose-regexp
       (concat
	;; syllables with an independent vowel, or
	"\\(?:RH\\)?Vn?\\(?:J?HB\\)?v*n?a?A?\\|"
	;; consonant-based syllables, or
	"Cn?\\(?:J?HJ?Cn?\\)*\\(?:H[NJ]?\\|v*[NJ]?v?a?A?\\)\\|"
	;; another syllables with an independent vowel, or
	"\\(?:RH\\)?T\\|"
	;; special consonant form, or
	"JHB\\|"
	;; any other singleton characters
	"X")
       table))
    "Regexp matching a composable sequence of Bengali characters.")

(which is used below that in setting up composition-function-table for
Bengali characters).

>     Eli> It could be some problem with the shaping engine: I guess versions
>     Eli> after Emacs 26 are built with HarfBuzz, not m17n-flt?  If you forcibly
>     Eli> use m17n-flt in a later Emacs, does it still not crash?
> 
> emacs-27 built '--without-harfbuzz' and thus with m17n-flt crashes the same way.

Yes, it figures.

I hope Handa-san will suggest a solution, for those who want to stick
with m17n-flt.





  reply	other threads:[~2021-06-17 13:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 21:07 bug#49066: 26.3; Segmentation fault on specific utf8 string Miguel V. S. Frasson
2021-06-16 21:12 ` Lars Ingebrigtsen
2021-06-17  6:43   ` Eli Zaretskii
2021-06-17  7:43     ` Robert Pluim
2021-06-17  8:13       ` Eli Zaretskii
2021-06-17 13:07         ` Robert Pluim
2021-06-17 13:59           ` Eli Zaretskii [this message]
2021-06-17 15:04             ` Eli Zaretskii
2021-06-27  2:29             ` handa
2021-06-27  6:20               ` Eli Zaretskii
2021-06-27 18:02                 ` Paul Eggert
2021-06-27 19:15                   ` Eli Zaretskii
2021-06-28 10:56                     ` Robert Pluim
2021-06-28 12:05                       ` Eli Zaretskii
2021-07-03  2:05                         ` handa
2021-07-05  9:28                           ` Robert Pluim
2021-07-20 12:23                             ` Lars Ingebrigtsen
2021-06-16 21:22 ` bug#49066: file foo Miguel V. S. Frasson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83r1h0zj5d.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=49066@debbugs.gnu.org \
    --cc=handa@gnu.org \
    --cc=larsi@gnus.org \
    --cc=mvsfrasson@gmail.com \
    --cc=rpluim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.