all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stephen Berman <stephen.berman@gmx.net>, Kenichi Handa <handa@gnu.org>
Cc: 14461@debbugs.gnu.org, larsi@gnus.org, cedric.chepied@gmail.com
Subject: bug#14461: 24.3.50; bad display for 'space' + (U+0336) unicode combination
Date: Sat, 17 Aug 2019 15:00:18 +0300	[thread overview]
Message-ID: <83d0h4ngrx.fsf@gnu.org> (raw)
In-Reply-To: <87wofehasr.fsf@gmx.net> (message from Stephen Berman on Thu, 15 Aug 2019 14:29:08 +0200)

> From: Stephen Berman <stephen.berman@gmx.net>
> Date: Thu, 15 Aug 2019 14:29:08 +0200
> Cc: 14461@debbugs.gnu.org, Lars Ingebrigtsen <larsi@gnus.org>
> 
> On Thu, 15 Aug 2019 12:02:21 +0200 Cédric Chépied <cedric.chepied@gmail.com> wrote:
> 
> ... I assume combining characters are always displayed after a space
> instead of over it -- at least that's what I see with e.g. U+0301
> (COMBINING ACUTE ACCENT) and U+0302 (COMBINING CIRCUMFLEX ACCENT).

Indeed, we reject base characters of certain general categories,
including those whose general category is Zs (space separator).  In
composite.el:compose-gstring-for-graphic we have:

     ;; This sequence doesn't start with a proper base character.
     ((memq (get-char-code-property (lgstring-char gstring 0)
				    'general-category)
	    '(Mn Mc Me Zs Zl Zp Cc Cf Cs))
      nil)

> That makes sense to me (otherwise, you couldn't visually distinguish
> e.g. the sequence 'aU+0301U+0302' from the sequence 'aU+0301 U+0302')

I don't see why: the former should be displayed as a single grapheme
cluster, with both diacritics on top of a, whereas the latter should
be displayed as 2 grapheme clusters, with U+0302 on top of the SPC
character instead of on top of a.

> and I would guess some Unicode standard prescribes it.

Actually , the Unicode Standard prescribes the opposite.  It says
(paragraph 3.6):

  D50 Graphic character: A character with the General Category of
      Letter (L), Combining Mark (M), Number (N), Punctuation (P),
      Symbol (S), or Space Separator (Zs).
  ...
  D51 Base character: Any graphic character except for those with the
      General Category of Combining Mark (M).
       • Most Unicode characters are base characters. In terms of
	 General Category values, a base character is any code point
	 that has one of the following categories: Letter (L), Number
	 (N), Punctuation (P), Symbol (S), or Space Separator (Zs).
  ...
  D52 Combining character: A character with the General Category of
      Combining Mark (M).

and (in 2.11)

      All combining characters can be applied to any base character and
      can, in principle, be used with any script.

So I don't think we are right when we exclude space separators from
base characters eligible for character composition, I think it's a
mistake.  Perhaps Handa-san (CC'ed) could comment on why we do that.





  parent reply	other threads:[~2019-08-17 12:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-24 14:30 bug#14461: 24.3.50; bad display for 'space' + (U+0336) unicode combination Cédric Chépied
2019-08-15  4:50 ` Lars Ingebrigtsen
2019-08-15  9:01   ` Stephen Berman
2019-08-15 10:02     ` Cédric Chépied
2019-08-15 12:29       ` Stephen Berman
2019-08-16  1:03         ` Lars Ingebrigtsen
2019-08-16  6:55           ` Eli Zaretskii
2019-08-17 12:00         ` Eli Zaretskii [this message]
2019-08-17 13:50           ` Stephen Berman
2019-08-17 14:14             ` Eli Zaretskii
2019-08-17 14:40               ` Stephen Berman
2019-08-17 15:09                 ` Eli Zaretskii
2019-08-17 15:39                   ` Stephen Berman
2019-08-17 15:44                     ` Eli Zaretskii
2019-08-17 17:05                       ` Stephen Berman
2019-08-17 17:29                         ` Eli Zaretskii
2019-08-17 18:11                           ` Stephen Berman
2019-08-17 18:22                             ` Eli Zaretskii
2019-08-17 18:58                               ` Stephen Berman
2019-09-07  9:21           ` Eli Zaretskii
2019-08-15 14:48   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83d0h4ngrx.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=14461@debbugs.gnu.org \
    --cc=cedric.chepied@gmail.com \
    --cc=handa@gnu.org \
    --cc=larsi@gnus.org \
    --cc=stephen.berman@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.