all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: handa <handa@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 49066@debbugs.gnu.org, rpluim@gmail.com, eggert@cs.ucla.edu,
	larsi@gnus.org, mvsfrasson@gmail.com
Subject: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sat, 03 Jul 2021 11:05:05 +0900	[thread overview]
Message-ID: <87zgv4cfu6.fsf@gnu.org> (raw)
In-Reply-To: <83bl7qp52q.fsf@gnu.org> (message from Eli Zaretskii on Mon, 28 Jun 2021 15:05:33 +0300)

[-- Attachment #1: Type: text/plain, Size: 2340 bytes --]

In article <83bl7qp52q.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > With the patch it still crashes for me in emacs-master with harfbuzz disabled:

> Too bad.
> Kenichi, any suggestions?

I checked the code again, and found that it was a fault of m17n-lib
which was not robust enough to handle an OTF table that is different
from what the library expects.

Here is a revised patch to handle such a case.  Could you please try it?

------------------------------------------------------------
diff --git a/src/ftfont.c b/src/ftfont.c
index 0603dd9ce6..12d0d72d27 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -2798,10 +2798,31 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 
   if (gstring.used > LGSTRING_GLYPH_LEN (lgstring))
     return Qnil;
+
+  /* mflt_run may fail to set g->g.to (which must be a valid index
+     into lgstring) correctly if the font has an OTF table that is
+     different from what the m17n library expects. */
   for (i = 0; i < gstring.used; i++)
     {
       MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+      if (g->g.to >= len)
+	{
+	  /* Invalid g->g.to. */
+	  g->g.to = len - 1;
+	  int from = g->g.from;
+	  /* Fix remaining glyphs. */
+	  for (++i; i < gstring.used; i++)
+	    {
+	      g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+	      g->g.from = from;
+	      g->g.to = len - 1;
+	    }
+	}
+    }
 
+  for (i = 0; i < gstring.used; i++)
+    {
+      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
       g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
       g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
     }
------------------------------------------------------------

> Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
> compose unless they are followed by a character.  See section 12.2 in
> the Unicode Standard.

Even if they should not be composed with, we must include them in the
string to shape because their existence may change the glyph of the
previous character.  A shaper (m17n-lib or harfbuzz) must return a glyph
string that has an independent grapheme cluster for the last ZWJ/ZWNJ.

At the time of developing m17n-lib, the above rule was not clear.  To
conform to that rule, please to put the attached BNG2-OTF.flt under the
directory ~/.m17n.d/.

---
K. Handa
handa@gnu.org


[-- Attachment #2: BNG2-OTF.flt --]
[-- Type: application/octet-stream, Size: 6915 bytes --]

;; BNG2-OTF.flt -- Font Layout Table for bng2 OpenType fonts
;; Copyright (C) 2010 AIST (H15PRO112)
;; See the end for copying conditions.

(font layouter bng2-otf nil
      (version "1.6.0")
      (font (nil nil unicode-bmp :otf=bng2)))

;;; <li> BNG2-OTF.flt
;;;
;;; For bng2 OpenType fonts to draw the Bengali script.  

;; It seems that "Shornar Bangla.ttf" is designed to render the bng2
;; script with the following glyph sequence.
;; 1. pre matra
;; 2. half forms and below forms
;; 3. base glyph
;; 4. below forms
;; 5. below matra (09C1..09C4)
;; 6. reph
;; 7. post forms
;; 8. post matra (09C0, 09D7)
;; 9. candrabindu (0981)
;; 10. anusvara (0982) or visarga (0983)

(category
 ;; X: generic
 ;; V: independent vowel
 ;; C: consonant
 ;; R: RA
 ;; T: KHANDA TA
 ;; n: NUKTA
 ;; H: HALANT
 ;; m: vowel sign (pre)
 ;; b: vowel sign (below)
 ;; p: vowel sign (post)
 ;; a: vowel modifier (above)
 ;; A: vowel modifier (post)
 ;; N: ZWNJ
 ;; J: ZWJ
 (0x0980 0x09FF	?X)			; generic
 (0x0981	?a)			; SIGN CANDRABINDU
 (0x0982 0x0983	?A)			; SIGN ANUSVARA .. VISARGA
 (0x0985 0x0994	?V)			; LETTER A .. AU
 (0x0995 0x09B9	?C)			; LETTER KA .. HA
 (0x09B0	?R)			; LETTER RA
 (0x09BC	?n)			; SIGN NUKTA
 (0x09BE	?p)			; VOWEL SIGN AA
 (0x09BF	?m)			; VOWEL SIGN I
 (0x09C0	?p)			; VOWEL SIGN II
 (0x09C1 0x09C4	?b)			; VOWEL SIGN U .. RR
 (0x09C7 0x09C8	?m)			; VOWEL SIGN E .. AI
 (0x09CD	?H)			; SIGN VIRAMA
 (0x09CE	?T)			; LETTER KHANDA TA
 (0x09D7	?p)			; AU LENGTH MARK
 (0x09DC 0x09DF	?C)			; LETTER RRA .. YYA
 (0x09E0 0x09E1	?V)			; LETTER VOCALIC RR, LL
 (0x09E2 0x09E3	?b)			; VOWEL SIGN L .. LL
 (0x09F0	?R)			; LETTER RA WITH MIDDLE DIAGONAL
 (0x09F1	?C)			; LETTER RA WITH LOWER DIAGONAL

 (0x200C	?N)			; ZWNJ
 (0x200D	?J)			; ZWJ
 (0x25CC	?X)			; DOTTED CIRCLE

 (rphf		?r)
 (pstf		?P)
 )

;; Stage 0
;; Preprocessing
(generator
 (0
  (cond
   ;; Decompose two-part vowel signs.
   ((0x09CB)
    0x09C7 0x09BE)
   ((0x09CC)
    0x09C7 0x09D7)

   ;; TA + HALANT + ZWJ -> KHANDA-TA
   ((0x09A4 0x09CD 0x200D)
    0x09CE)

   ;; consonant + NUKTA
   ((0x09A1 0x09BC)
    0x09DC)
   ((0x09A2 0x09BC)
    0x09DD)
   ((0x09AF 0x09BC)
    0x09DF)

   ("." =))
  *))

;; Stage 1
;; Syllable identification
(generator
 (0
  (cond
   ;; Syllables with an independent vowel
   ("(RH)?Vn?(J?H[CR])?m?b?p?n?a?A?"
    < | = * | >)

   ;; KHANDA-TA combines only with reph.
   ("(RH)?(T)"
    < (2 =) (1 :otf=bng2=rphf+) >)

   ;; Consonant-based syllables
   ("([CR]n?J?HJ?)*[CR]n?(H[NJ]?|m?([NJ]?b)?p?n?)a?A?"
    < | = * | >)

   ;; Two-part vowel signs
   ((0x09C7 0x09BE)
    (cond
     ((font-facility 0x25CC) < 0x09C7 0x25CC 0x09BE >)
     (".+" < 0x09CB >)))
   ((0x09C7 0x09D7)
    (cond
     ((font-facility 0x25CC) < 0x09C7 0x25CC 0x09D7 >)
     (".+" < 0x09CC >)))

   ;; Combining marks are displayed with a DOTTED CIRCLE.
   ("m"
    (cond
     ((font-facility 0x25CC) < = 0x25CC >)
     ("." [ = ])))
   ("[nHbpaA]"
    (cond
     ((font-facility 0x25CC) < 0x25CC = >)
     ("." [ = ])))
   ("JH[CR]"
    (cond
     ((font-facility 0x25CC) < 0x25CC :otf=bng2=blwf,pstf+ >)
     (".+" [ :otf=bng2=blwf,pstf+ ])))

   ("." =))
  *))

;; Stage 2
;; Basic shaping forms and matra reordering
(generator
 (0
  (cond
   ;; Explicit halant form starting with RA + H + ZWJ
   (" (RHJ[CRnHJ]+)(HN?a?A?) "
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (1 b4post) (1 post) (2 = *) |)

   ;; Explicit halant form starting with a reph
   (" (RH)([CRnHJ]+)(HN?a?A?) "
    (2 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (1 :otf=bng2=rphf+) (2 b4post) (2 post) (3 = *) |)

   ;; Other explicit halant forms
   (" ([CRnHJ]+)(HN?a?A?) "
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (1 b4post) (1 post) (2 = *) |)

   ;; Ordinary syllables starting with RA + H + ZWJ
   ;; 1             2     3     45
   (" (RHJ[CRnHJN]*)(mn?)?(bn?)?((pn?)?a?A?) "
    ;;            |
    ;; This is an asterisk.  (See DEV2-OTF.flt)
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (2 = *) (1 b4post) (3 = *) (1 post) (4 = *) |)

   ;; Ordinary syllables starting with a reph
   ;; 1   2           3     4     56
   (" (RH)([CRnHJVN]+)(mn?)?(bn?)?((pn?)?a?A?) "
    (2 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (3 = *) (1 :otf=bng2=rphf+) (2 b4post) (4 = *) (2 post) (5 = *) |)

   ;; Other ordinary syllables
   ;; 1           2     3     45
   (" ([CRnHJVN]+)(mn?)?(bn?)?((pn?)?a?A?) "
    (1 :otf?bng2=locl,nukt,akhn,blwf,pstf+)
    | (2 = *) (1 b4post) (3 = *) (1 post) (4 = *) |)

   ("." =))
  *)

 (b4post
  (cond
   ;;1                 23       4
   ("([CRnHJP]*[CRV]n?)((J?PP)+)([NJ])?$"
    (1 :otf=bng2=locl,nukt,akhn,blwf,half,vatu,cjct+) (4 =))
   (".+"
    (0 :otf=bng2=locl,nukt,akhn,blwf,half,vatu,cjct+) (4 =))))

 (post
  (cond
   ("[CRnHJP]*[CRV]n?((J?PP)+)([NJ])?$"
    (1 :otf=bng2=pstf+))))
 )

;; Stage 3
;; Final reordering #1 (Move pre-base matra after the last halant)
(generator
 (0
  (cond
   ;; 1    2         3
   (" (mn?)([^ ]+HJ?)([^H ]+) "
    | (2 = *) (1 = *) (3 = *) |)

   ("." =))
  *))

;; Stage 4
;; Final reordering #2 (Move reph after the first halant)
(generator
 (0
  (cond
   ;; Syllables with a reph and an explicit halant
   ;; 1     2  3           4
   (" (mn?)?(r)([^HP ]+HJ?)([^ ]*) "
    | (1 = *) (3 = *) (2 =) (4 = *) |)

   ;; A reph without explicit halant
   ;; 1     2  3          4
   (" (mn?)?(r)([^PpaA ]+)(P*H?p?n?a?A?) "
    | (1 = *) (3 = *) (2 =) (4 = *) |)

   ("." =))
  *))

;; Stage 5
;; Nukta for matra and Presentation forms
(generator
 (0
  (cond
   (" (mn?)?([^ ]+) "
    | (1 :otf=bng2=nukt,init+)
    (2 :otf=bng2=nukt,pres,abvs,blws,psts,haln,calt+) |)

   ("." =))
  *))

;; Stage 6
;; Remove ZWNJ/ZWJ
(generator
 (0
  (cond
   ("( .+ )([NJ])$"
    (1 = *) (2 < = > ))

   ("[NJ]")

   ("." =))
  *))

;; Stage 7
;; GPOS processing
(generator
 (0
  (cond
   (" ([^ ]+) "
    (1 :otf=bng2=+kern,dist,abvm,blwm))

   ("." =))
  *))

;; Copyright (C) 2010
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H15PRO112

;; This file is part of the m17n database; a sub-part of the m17n
;; library.

;; The m17n library is free software; you can redistribute it and/or
;; modify it under the terms of the GNU Lesser General Public License
;; as published by the Free Software Foundation; either version 2.1 of
;; the License, or (at your option) any later version.

;; The m17n library is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; Lesser General Public License for more details.

;; You should have received a copy of the GNU Lesser General Public
;; License along with the m17n library; if not, write to the Free
;; Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.

;; Local Variables:
;; mode: emacs-lisp
;; End:

  reply	other threads:[~2021-07-03  2:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 21:07 bug#49066: 26.3; Segmentation fault on specific utf8 string Miguel V. S. Frasson
2021-06-16 21:12 ` Lars Ingebrigtsen
2021-06-17  6:43   ` Eli Zaretskii
2021-06-17  7:43     ` Robert Pluim
2021-06-17  8:13       ` Eli Zaretskii
2021-06-17 13:07         ` Robert Pluim
2021-06-17 13:59           ` Eli Zaretskii
2021-06-17 15:04             ` Eli Zaretskii
2021-06-27  2:29             ` handa
2021-06-27  6:20               ` Eli Zaretskii
2021-06-27 18:02                 ` Paul Eggert
2021-06-27 19:15                   ` Eli Zaretskii
2021-06-28 10:56                     ` Robert Pluim
2021-06-28 12:05                       ` Eli Zaretskii
2021-07-03  2:05                         ` handa [this message]
2021-07-05  9:28                           ` Robert Pluim
2021-07-20 12:23                             ` Lars Ingebrigtsen
2021-06-16 21:22 ` bug#49066: file foo Miguel V. S. Frasson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgv4cfu6.fsf@gnu.org \
    --to=handa@gnu.org \
    --cc=49066@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    --cc=eliz@gnu.org \
    --cc=larsi@gnus.org \
    --cc=mvsfrasson@gmail.com \
    --cc=rpluim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.