From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: ucs-normalize and diacritics Date: Wed, 25 Jul 2018 21:59:00 +0200 Message-ID: <87va935bp7.fsf@gmail.com> References: <8736w88pnn.fsf@gmail.com> <83lga0v4ff.fsf@gnu.org> <83in54v3sp.fsf@gnu.org> <87y3e07425.fsf@gmail.com> <83h8koujs9.fsf@gnu.org> <87pnzb7ogd.fsf@gmail.com> <834lgnv0p9.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1532548673 3683 195.159.176.226 (25 Jul 2018 19:57:53 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 25 Jul 2018 19:57:53 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jul 25 21:57:49 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fiPur-0000ok-2N for ged-emacs-devel@m.gmane.org; Wed, 25 Jul 2018 21:57:45 +0200 Original-Received: from localhost ([::1]:55288 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fiPwx-0005yP-Op for ged-emacs-devel@m.gmane.org; Wed, 25 Jul 2018 15:59:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56966) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fiPwC-0005yA-5a for emacs-devel@gnu.org; Wed, 25 Jul 2018 15:59:11 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fiPw9-0008FA-7x for emacs-devel@gnu.org; Wed, 25 Jul 2018 15:59:08 -0400 Original-Received: from mail-wr1-x42a.google.com ([2a00:1450:4864:20::42a]:39719) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fiPw9-0008Eo-0T; Wed, 25 Jul 2018 15:59:05 -0400 Original-Received: by mail-wr1-x42a.google.com with SMTP id h10-v6so8517090wre.6; Wed, 25 Jul 2018 12:59:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:mail-followup-to:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:mime-version :content-transfer-encoding; bh=5TbjcTfDulfzGQVskRx3dsTma6R52lsfAxcO8pvpjg4=; b=AeWF6YvLxvNiMTxXTk8kxDJCXxAG8BCSSvK8am6sK0imkEXKDcXXDc6i1NgqOer4wV RDrqwrnTta4w9e6U10cBL4ToO2zRNztFBTk/0Gmj6xCoN/LEiijuJWqA0yy4WGb3+pD6 jfiFWNFB+/+C+6+2tTK+kVZRPb7mT4nbL9EEvvjn0xZ56lKy5hRDqVQMQ9Q7viF41rgv ODi5MZWdgSePZfI2zMSGd5pPVKEMFt691N51A1lEL+rWMywd4RjAp6EC2lR+7RC85jbg +gGdkM97I4LP2FdkOnKTfjnro1c7qQ9MJjXoA6M8sN4CTl96jdfHkolhrbXjM6z95VZT 24+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:mail-followup-to :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :mime-version:content-transfer-encoding; bh=5TbjcTfDulfzGQVskRx3dsTma6R52lsfAxcO8pvpjg4=; b=UzYtfyNICNUxLqgQNGcP+ZE81mGE8k4Ki6QXcGy3km4G8WBMOEv5bDYBvfrAuTP3P6 R4DAYItmav6qbZS0ESBd/4JN1h0r6NcfZXlrKus770jIlDJIpI8x2dv0n2IgYWliCJfi yPYAhpM3zMJ/zjXgT6m3V1AuG/DpMBOP3NwE3w/A2G8GpDcHyi12MKZgfjvLLV+Anpgg JbajklidSKJxlLlVda+8//RHQ2T+EBUx1NdvDhxDpMh/pe4ZQal+dqzaDfCgq0mZQe7z sPuX2P8XKkPC0OSvC95ArG4byP3emV+VvxQQ9PrF5tQ9+d5CB4fNRME5lRxNdLHpqpch MsAw== X-Gm-Message-State: AOUpUlGTgfLrd5Yx3zSWJkNMHRdJSfCmtih6MNtV6w8r7fhJ8+anJfzg ixnlNLnAZICihZgVwsFvGlLXOd44 X-Google-Smtp-Source: AAOMgpdywWSVPBUUqUBNOyA/5NpRBQ2unnEXX1KwTMsZUWf7OHInESRlAE9C/tVYkuBPoTTSxhw9tg== X-Received: by 2002:adf:ed8e:: with SMTP id c14-v6mr15038801wro.264.1532548743085; Wed, 25 Jul 2018 12:59:03 -0700 (PDT) Original-Received: from rpluim-ubuntu (vav06-1-78-207-202-134.fbx.proxad.net. [78.207.202.134]) by smtp.gmail.com with ESMTPSA id c5-v6sm14229968wrh.36.2018.07.25.12.59.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Jul 2018 12:59:02 -0700 (PDT) Mail-Followup-To: emacs-devel@gnu.org Mail-Copies-To: never Gmane-Reply-To-List: yes In-Reply-To: <834lgnv0p9.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 25 Jul 2018 17:39:46 +0300") X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::42a X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:227812 Archived-At: Eli Zaretskii writes: >> From: Robert Pluim >> Cc: emacs-devel@gnu.org >> Date: Wed, 25 Jul 2018 09:40:34 +0200 >>=20 >> I think I=CA=BCll start by putting pointers to auto-composition-mode in = the >> manual and lispref. > > Thanks in advance. Here=CA=BCs a first stab at it, intended for emacs-26. Probably the unicode characters below will not survive intact. diff --git i/doc/emacs/mule.texi w/doc/emacs/mule.texi index 401c83dd49..925a582a4c 100644 --- i/doc/emacs/mule.texi +++ w/doc/emacs/mule.texi @@ -225,6 +225,51 @@ International Chars decomposition: (101 770) ('e' '^') @end smallexample =20 +@cindex diacritic +@cindex composition + Sometimes Emacs will display a single character even when the buffer +contains multiple characters, through a process known as @dfn{composition}. +This is done via @code{auto-composition-mode}, which is enabled by default, +and can only be done if the characters to be composed all exist within +the same font. The exact rules for which characters to compose are +defined by the Unicode standard, but generally they concern +diacritical marks such as accents. + + For a successfully composed character, @kbd{C-u C-x =3D} displays +details about the base character and the following character(s) it is +composed with. For example for @samp{e} composed with @samp{COMBINING +CIRCUMFLEX ACCENT}, which visually would be very similar to the +previous example, the output would look like: + +@smallexample + position: 146 of 147 (99%), column: 0 + character: e (displayed as e) (codepoint 101, #o145, #x65) + preferred charset: ascii (ASCII (ISO646 IRV)) +code point in charset: 0x65 + script: latin + syntax: w which means: word + category: .:Base, L:Left-to-right (strong), a:ASCII, l:Latin, + r:Roman + to input: type "C-x 8 RET 65" or "C-x 8 RET LATIN SMALL LETTE= R E" + buffer code: #x65 + file code: #x65 (encoded by coding system utf-8-unix) + display: composed to form "e=C3=8C=C2=82" (see below) + +Composed with the following character(s) "=C3=8C=C2=82" using this font: + xft:-PfEd-DejaVu Sans Mono-normal-normal-normal-*-27-*-*-*-m-0-iso10646-1 +by these glyphs: + [0 1 101 72 16 2 15 21 0 nil] + [0 1 770 650 0 4 12 22 -17 [-16 0 0]] + +Character code properties: customize what to show + name: LATIN SMALL LETTER E + general-category: Ll (Letter, Lowercase) + decomposition: (101) ('e') + +There are text properties here: + fontified t +@end smallexample + @node Language Environments @section Language Environments @cindex language environments diff --git i/doc/lispref/nonascii.texi w/doc/lispref/nonascii.texi index 4d75d6a1f1..c1943dda1c 100644 --- i/doc/lispref/nonascii.texi +++ w/doc/lispref/nonascii.texi @@ -500,10 +500,14 @@ Character Properties character's classification. For unassigned codepoints, the value is @code{Cn}. =20 +@cindex diacritic +@cindex composition @item canonical-combining-class Corresponds to the @code{Canonical_Combining_Class} Unicode property. The value is an integer. For unassigned codepoints, the value -is zero. +is zero. Emacs can use this to visually compose multiple characters, +using @code{auto-composition-mode}, if all the characters concerned +exist in the same font. =20 @cindex bidirectional class of characters @item bidi-class