From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#14461: 24.3.50; bad display for 'space' + (U+0336) unicode combination Date: Sat, 17 Aug 2019 15:00:18 +0300 Message-ID: <83d0h4ngrx.fsf@gnu.org> References: <519f7a1c.0324b40a.4997.ffff96ea@mx.google.com> <87woff117q.fsf@mouse.gnus.org> <871rxmiyyi.fsf@gmx.net> <5d552da4.1c69fb81.c51aa.76af@mx.google.com> <87wofehasr.fsf@gmx.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="153772"; mail-complaints-to="usenet@blaine.gmane.org" Cc: 14461@debbugs.gnu.org, larsi@gnus.org, cedric.chepied@gmail.com To: Stephen Berman , Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Aug 17 14:01:23 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hyxOc-000dsk-Q2 for geb-bug-gnu-emacs@m.gmane.org; Sat, 17 Aug 2019 14:01:22 +0200 Original-Received: from localhost ([::1]:35906 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hyxOb-0007LZ-Pc for geb-bug-gnu-emacs@m.gmane.org; Sat, 17 Aug 2019 08:01:21 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:40165) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hyxON-0007KF-84 for bug-gnu-emacs@gnu.org; Sat, 17 Aug 2019 08:01:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hyxOM-0003I5-6u for bug-gnu-emacs@gnu.org; Sat, 17 Aug 2019 08:01:07 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:46568) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hyxOH-0003G2-W0 for bug-gnu-emacs@gnu.org; Sat, 17 Aug 2019 08:01:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hyxOH-0007t0-SS for bug-gnu-emacs@gnu.org; Sat, 17 Aug 2019 08:01:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 17 Aug 2019 12:01:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 14461 X-GNU-PR-Package: emacs Original-Received: via spool by 14461-submit@debbugs.gnu.org id=B14461.156604324230282 (code B ref 14461); Sat, 17 Aug 2019 12:01:01 +0000 Original-Received: (at 14461) by debbugs.gnu.org; 17 Aug 2019 12:00:42 +0000 Original-Received: from localhost ([127.0.0.1]:55389 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hyxNy-0007sM-AG for submit@debbugs.gnu.org; Sat, 17 Aug 2019 08:00:42 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:55916) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hyxNw-0007s7-J8 for 14461@debbugs.gnu.org; Sat, 17 Aug 2019 08:00:41 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:39663) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hyxNl-0002zJ-HM; Sat, 17 Aug 2019 08:00:30 -0400 Original-Received: from [176.228.60.248] (port=2149 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hyxNf-0004cO-G6; Sat, 17 Aug 2019 08:00:25 -0400 In-reply-to: <87wofehasr.fsf@gmx.net> (message from Stephen Berman on Thu, 15 Aug 2019 14:29:08 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:165257 Archived-At: > From: Stephen Berman > Date: Thu, 15 Aug 2019 14:29:08 +0200 > Cc: 14461@debbugs.gnu.org, Lars Ingebrigtsen > > On Thu, 15 Aug 2019 12:02:21 +0200 Cédric Chépied wrote: > > ... I assume combining characters are always displayed after a space > instead of over it -- at least that's what I see with e.g. U+0301 > (COMBINING ACUTE ACCENT) and U+0302 (COMBINING CIRCUMFLEX ACCENT). Indeed, we reject base characters of certain general categories, including those whose general category is Zs (space separator). In composite.el:compose-gstring-for-graphic we have: ;; This sequence doesn't start with a proper base character. ((memq (get-char-code-property (lgstring-char gstring 0) 'general-category) '(Mn Mc Me Zs Zl Zp Cc Cf Cs)) nil) > That makes sense to me (otherwise, you couldn't visually distinguish > e.g. the sequence 'aU+0301U+0302' from the sequence 'aU+0301 U+0302') I don't see why: the former should be displayed as a single grapheme cluster, with both diacritics on top of a, whereas the latter should be displayed as 2 grapheme clusters, with U+0302 on top of the SPC character instead of on top of a. > and I would guess some Unicode standard prescribes it. Actually , the Unicode Standard prescribes the opposite. It says (paragraph 3.6): D50 Graphic character: A character with the General Category of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). ... D51 Base character: Any graphic character except for those with the General Category of Combining Mark (M). • Most Unicode characters are base characters. In terms of General Category values, a base character is any code point that has one of the following categories: Letter (L), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). ... D52 Combining character: A character with the General Category of Combining Mark (M). and (in 2.11) All combining characters can be applied to any base character and can, in principle, be used with any script. So I don't think we are right when we exclude space separators from base characters eligible for character composition, I think it's a mistake. Perhaps Handa-san (CC'ed) could comment on why we do that.