From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#49066: 26.3; Segmentation fault on specific utf8 string Date: Thu, 17 Jun 2021 16:59:42 +0300 Message-ID: <83r1h0zj5d.fsf@gnu.org> References: <871r91ikdv.fsf@gnus.org> <83czsl0z4z.fsf@gnu.org> <87czsl543c.fsf@gmail.com> <831r9029k2.fsf@gnu.org> <878s3863nd.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22206"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 49066@debbugs.gnu.org, larsi@gnus.org, mvsfrasson@gmail.com To: Robert Pluim , Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jun 17 16:00:28 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ltsZH-0005bl-IQ for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 17 Jun 2021 16:00:27 +0200 Original-Received: from localhost ([::1]:49766 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ltsZG-0004d9-6C for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 17 Jun 2021 10:00:26 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:57738) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ltsYt-0004d1-00 for bug-gnu-emacs@gnu.org; Thu, 17 Jun 2021 10:00:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:43281) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ltsYs-000778-9A for bug-gnu-emacs@gnu.org; Thu, 17 Jun 2021 10:00:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1ltsYs-0001gU-2y for bug-gnu-emacs@gnu.org; Thu, 17 Jun 2021 10:00:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 17 Jun 2021 14:00:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 49066 X-GNU-PR-Package: emacs Original-Received: via spool by 49066-submit@debbugs.gnu.org id=B49066.16239383886425 (code B ref 49066); Thu, 17 Jun 2021 14:00:02 +0000 Original-Received: (at 49066) by debbugs.gnu.org; 17 Jun 2021 13:59:48 +0000 Original-Received: from localhost ([127.0.0.1]:54827 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ltsYd-0001fY-KI for submit@debbugs.gnu.org; Thu, 17 Jun 2021 09:59:47 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:59912) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ltsYb-0001fJ-F5 for 49066@debbugs.gnu.org; Thu, 17 Jun 2021 09:59:46 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:34848) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ltsYW-0006u2-1F; Thu, 17 Jun 2021 09:59:40 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:1481 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ltsYV-0007Ef-Km; Thu, 17 Jun 2021 09:59:39 -0400 In-Reply-To: <878s3863nd.fsf@gmail.com> (message from Robert Pluim on Thu, 17 Jun 2021 15:07:18 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:208664 Archived-At: > From: Robert Pluim > Cc: larsi@gnus.org, 49066@debbugs.gnu.org, mvsfrasson@gmail.com > Date: Thu, 17 Jun 2021 15:07:18 +0200 > > Full backtrace from an unoptimized build: Thanks. > >> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault. > >> ftfont_shape_by_flt (matrix=, otf=, ft_face=, font=, lgstring=...) > >> at ftfont.c:2573 > >> 2573 g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to)); > > Eli> So, is 'g' a NULL pointer or something? Or is 'lgstring' faulty in > Eli> some way? IOW, what is the immediate reason for the > Eli> segfault? > > Itʼs lgstring, I think this is one of those 'nil's in lgstring Yes, I think so. We can verify that by looking at the value of g->g.to: (gdb) p *g $3 = { g = { c = 2453, code = 20, from = 0, to = 2, <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< And the LGLYPH whose index is 2 is indeed nil: (gdb) pp lgstring [[# 2453 8204] nil [0 0 2453 20 16 -1 17 12 0 nil] [1 1 8204 658 0 -1 1 15 4 nil] nil nil nil [5 5 0 3039 11 0 12 7 5 nil] [6 6 1606 1044 11 0 11 8 3 nil] nil] ^^^ I think this is a bug in that loop: it should actually exit whenever it finds the first LGLYPH that is nil, and update gstring.used accordingly. Something like this: for (i = 0; i < gstring.used; i++) { MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i; if (NILP (LGSTRING_GLYPH (lgstring, g->g.from)) || NILP (LGSTRING_GLYPH (lgstring, g->g.to))) break; g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from)); g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to)); } gstring.used = i; CC'ing Handa-san, as I'm not really familiar with this code. > This is enough to cause the crash: ক‌ > > Thats #x995 followed by #x200c. Why are we trying to compose a ZWNJ? Because #x995 is a Bengali character, and lisp/language/indian.el says: (defconst bengali-composable-pattern (let ((table '(("a" . "\u0981") ; SIGN CANDRABINDU ("A" . "[\u0982\u0983]") ; SIGN ANUSVARA .. VISARGA ("V" . "[\u0985-\u0994\u09E0\u09E1]") ; independent vowel ("C" . "[\u0995-\u09B9\u09DC-\u09DF\u09F1]") ; consonant ("B" . "[\u09AC\u09AF\u09B0\u09F0]") ; BA, YA, RA ("R" . "[\u09B0\u09F0]") ; RA ("n" . "\u09BC") ; NUKTA ("v" . "[\u09BE-\u09CC\u09D7\u09E2\u09E3]") ; vowel sign ("H" . "\u09CD") ; HALANT ("T" . "\u09CE") ; KHANDA TA ("N" . "\u200C") ; ZWNJ <<<<<<<<<<<<<<<<<<<<<<<<<<< ("J" . "\u200D") ; ZWJ ("X" . "[\u0980-\u09FF]")))) ; all coverage (indian-compose-regexp (concat ;; syllables with an independent vowel, or "\\(?:RH\\)?Vn?\\(?:J?HB\\)?v*n?a?A?\\|" ;; consonant-based syllables, or "Cn?\\(?:J?HJ?Cn?\\)*\\(?:H[NJ]?\\|v*[NJ]?v?a?A?\\)\\|" ;; another syllables with an independent vowel, or "\\(?:RH\\)?T\\|" ;; special consonant form, or "JHB\\|" ;; any other singleton characters "X") table)) "Regexp matching a composable sequence of Bengali characters.") (which is used below that in setting up composition-function-table for Bengali characters). > Eli> It could be some problem with the shaping engine: I guess versions > Eli> after Emacs 26 are built with HarfBuzz, not m17n-flt? If you forcibly > Eli> use m17n-flt in a later Emacs, does it still not crash? > > emacs-27 built '--without-harfbuzz' and thus with m17n-flt crashes the same way. Yes, it figures. I hope Handa-san will suggest a solution, for those who want to stick with m17n-flt.