From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: ucs-normalize and diacritics Date: Tue, 24 Jul 2018 22:07:00 +0300 Message-ID: <83lga0v4ff.fsf@gnu.org> References: <8736w88pnn.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1532459131 22646 195.159.176.226 (24 Jul 2018 19:05:31 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 24 Jul 2018 19:05:31 +0000 (UTC) Cc: emacs-devel@gnu.org To: Robert Pluim Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jul 24 21:05:27 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fi2ch-0005nd-BU for ged-emacs-devel@m.gmane.org; Tue, 24 Jul 2018 21:05:27 +0200 Original-Received: from localhost ([::1]:42217 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fi2em-0001mx-HP for ged-emacs-devel@m.gmane.org; Tue, 24 Jul 2018 15:07:36 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37166) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fi2e8-0001mr-Ff for emacs-devel@gnu.org; Tue, 24 Jul 2018 15:06:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fi2e4-0006sF-Dn for emacs-devel@gnu.org; Tue, 24 Jul 2018 15:06:56 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:38667) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fi2e4-0006rx-91; Tue, 24 Jul 2018 15:06:52 -0400 Original-Received: from [176.228.60.248] (port=4525 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fi2e3-0008MY-N4; Tue, 24 Jul 2018 15:06:52 -0400 In-reply-to: <8736w88pnn.fsf@gmail.com> (message from Robert Pluim on Tue, 24 Jul 2018 20:17:00 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:227779 Archived-At: > From: Robert Pluim > Date: Tue, 24 Jul 2018 20:17:00 +0200 > > in > > someone noted that â (that is LATIN SMALL LETTER A followed by > COMBINING CIRCUMFLEX ACCENT) was not visually shown as â (LATIN SMALL > LETTER A WITH CIRCUMFLEX). I can play around with the various > ucs-normalize-* functions and get that to happen, but Iʼm just curious > why it doesnʼt happen automatically (assuming it should, of course. It does display as â, but only if you use a font that supports both 'a' and '̂'. Emacs can only compose characters that are displayed by the same font. > I couldn't find any documentation on this kind of stuff). Not sure what documentation you were looking for, but the data structure to make this happen is set up in composite.el: (when unicode-category-table (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic] [nil 0 compose-gstring-for-graphic]))) (map-char-table #'(lambda (key val) (if (memq val '(Mn Mc Me)) (set-char-table-range composition-function-table key elt))) unicode-category-table)) ;; for dotted-circle (aset composition-function-table #x25CC `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))) That "\\c.\\c^+" regexp causes any base character (\c.) followed by one or more combining characters (\c^) to be composable using the function compose-gstring-for-graphic (which see).