From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Robert Pluim Newsgroups: gmane.emacs.devel Subject: Re: emacs-28 b7d7c2d9e9: Add cross-reference to alternative syntaxes for Unicode Date: Tue, 18 Oct 2022 17:05:44 +0200 Message-ID: <87pmepuq7r.fsf@gmail.com> References: <831qra8klo.fsf@gnu.org> <87mt9yxtvi.fsf@gmail.com> <83zgdy714g.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30972"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Oct 18 17:39:44 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1okogx-0007rx-Vu for ged-emacs-devel@m.gmane-mx.org; Tue, 18 Oct 2022 17:39:44 +0200 Original-Received: from localhost ([::1]:38706 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1okogw-0004r3-Bh for ged-emacs-devel@m.gmane-mx.org; Tue, 18 Oct 2022 11:39:42 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37538) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1okoAO-0005Ju-SR for emacs-devel@gnu.org; Tue, 18 Oct 2022 11:06:06 -0400 Original-Received: from mail-wr1-x431.google.com ([2a00:1450:4864:20::431]:36603) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1okoAA-0004UN-1K; Tue, 18 Oct 2022 11:06:04 -0400 Original-Received: by mail-wr1-x431.google.com with SMTP id j7so24002629wrr.3; Tue, 18 Oct 2022 08:05:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=l5sQzukdUEAbUeGVa58HDQ8XM4jpp9Jqe2rleMw2GLU=; b=YUWzToI+/YpbWR5vOAILd0sNkxL34Nv8BYozjy7FQNJ5nfjXbCo29gb3l9qElxN5VN FYEFi57MXmgQ3LITv9fC8Yh6S0xB3qq6mDQTKCj2gDFGKxUQE4shsOHDUBbU/xWgZbXx PqbCIopS3bxHVnVuM9ZW2zAn99fDMxH5pLk9FzjVLzRaENWWGJzCawPeE2wGKE/DJx1+ 9iTjdkqr4NRkk/QDcUEHf8mIchmPV/o2MZa2Zg0lP+a+ajeYRub0AI8AXpnxPjw6YnX/ h6/w4mvNcdDQG9BwmifNnpIJ7C8O0fdzYBX9nOXD7LB+/xhBoQMpdZMeBEgUkTfDFAkU xaNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l5sQzukdUEAbUeGVa58HDQ8XM4jpp9Jqe2rleMw2GLU=; b=7Boq7SocZRK5goWaFso/8Y3SMxRebdEmDDHlo1Q1Q0epTzxBFArmOqmtM4geVjSk6m B4NBNti1j9JMUOfKeaGcVrAO/HfyWErsk24hHxew67/y9GBnxrUnGXH2Qr8HQ4ReUo5o tmYyZ+BoMtqerCXUEegBWiMY6A39QV/WxDaB/LZLwNkyk2NGYCRxnyaVfXl/j6aeK63y NUBg3vEuCu60zr2q3x4xjUPh7l5UEt5HQ7J5hmOdmu/70st06/WUnUy9sVqNVHUYciVV H8REPYUnmqUbZT1HMkjJUsiM54g5NYzBd8M7DtsrpEgEVXhyPidzKlGYSa5nD1lHHHJ/ rVGQ== X-Gm-Message-State: ACrzQf33olD0I8ZR6ErVuHVYCwgPQD+EeznGQEADougq67U1cFdxzCFn WMGZ6RwwAUV91uH+ef1nnCEE5OkeabY= X-Google-Smtp-Source: AMsMyM4eG+SaS3RSfhojt0NPpsqgxjCrl392dTni0EXcu4aRALauumd8FXD+YMgizik2pFjzv+obsQ== X-Received: by 2002:adf:e187:0:b0:22e:60d2:aaba with SMTP id az7-20020adfe187000000b0022e60d2aabamr2101828wrb.564.1666105546122; Tue, 18 Oct 2022 08:05:46 -0700 (PDT) Original-Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id l12-20020a05600c2ccc00b003c6f27d275dsm10288329wmc.33.2022.10.18.08.05.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 08:05:45 -0700 (PDT) In-Reply-To: <83zgdy714g.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 14 Oct 2022 20:42:39 +0300") Received-SPF: pass client-ip=2a00:1450:4864:20::431; envelope-from=rpluim@gmail.com; helo=mail-wr1-x431.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:298058 Archived-At: >>>>> On Fri, 14 Oct 2022 20:42:39 +0300, Eli Zaretskii said: >> Hmm, so move or copy "General Escape Syntax" under >> (emacs)International somewhere, and refer to it from the "You can >> insert non-ASCII characters or search for them" section of that node >> (since that=CA=BCs where we talk about C-x 8)? Eli> Yes, something like that. Here=CA=BCs a rough attempt: diff --git c/doc/emacs/custom.texi i/doc/emacs/custom.texi index 2bc1d3820d..817501b3f8 100644 --- c/doc/emacs/custom.texi +++ i/doc/emacs/custom.texi @@ -2794,9 +2794,8 @@ Init Non-ASCII =20 An alternative to using non-@acronym{ASCII} characters directly is to use one of the character escape syntaxes described in -@pxref{General Escape Syntax,,, elisp, The Emacs Lisp Reference -Manual}, as they allow all Unicode codepoints to be specified using -only @acronym{ASCII} characters. +@xref{Character Escape Syntax}, as they allow all Unicode codepoints +to be specified using only @acronym{ASCII} characters. =20 To bind non-@acronym{ASCII} keys, you must use a vector (@pxref{Init Rebinding}). The string syntax cannot be used, since the diff --git c/doc/emacs/mule.texi i/doc/emacs/mule.texi index f87c1252d3..c202c21aa4 100644 --- c/doc/emacs/mule.texi +++ i/doc/emacs/mule.texi @@ -56,7 +56,9 @@ International your keyboard can produce non-@acronym{ASCII} characters, you can select an appropriate keyboard coding system (@pxref{Terminal Coding}), and Emacs will accept those characters. Latin-1 characters can also be input by -using the @kbd{C-x 8} prefix, see @ref{Unibyte Mode}. +using the @kbd{C-x 8} prefix, see @ref{Unibyte Mode}. It is also +possible to write non-@acronym{ASCII} characters using various +pure-@acronym{ASCII} escape syntaxes, see @ref{Character Escape Syntax}. =20 With the X Window System, your locale should be set to an appropriate value to make sure Emacs interprets keyboard input correctly; see @@ -67,6 +69,7 @@ International =20 @menu * International Chars:: Basic concepts of multibyte characters. +* Character Escape Syntax:: Alternative ways to write characters * Language Environments:: Setting things up for the language you use. * Input Methods:: Entering text characters not on your keyboard. * Select Input Method:: Specifying your choice of input methods. @@ -240,6 +243,63 @@ International Chars decomposition: (101 770) ('e' '^') @end smallexample =20 +@c This is (almost) verbatim from "General Escape Syntax" in the Emacs +@c Lisp Reference Manual, please keep in sync. +@node Character Escape Syntax +@section Character Escape Syntax + + Input methods provide ways to enter non-@acronym{ASCII} characters, +but sometimes it is more convenient to use an @acronym{ASCII}-only +representation, e.g. when there are several similar characters that +are hard to visually distinguish. Emacs provides several types of +escape syntax that you can use to write such characters + +@enumerate +@item +@cindex @samp{\} in character constant +@cindex backslash in character constants +@cindex unicode character escape +You can specify characters by their Unicode names, if any. +@code{?\N@{@var{NAME}@}} represents the Unicode character named +@var{NAME}. Thus, @samp{?\N@{LATIN SMALL LETTER A WITH GRAVE@}} is +equivalent to @code{?=C3=A0} and denotes the Unicode character U+00E0. To +simplify entering multi-line strings, you can replace spaces in the +names by non-empty sequences of whitespace (e.g., newlines). + +@item +You can specify characters by their Unicode values. +@code{?\N@{U+@var{X}@}} represents a character with Unicode code point +@var{X}, where @var{X} is a hexadecimal number. Also, +@code{?\u@var{xxxx}} and @code{?\U@var{xxxxxxxx}} represent code +points @var{xxxx} and @var{xxxxxxxx}, respectively, where each @var{x} +is a single hexadecimal digit. For example, @code{?\N@{U+E0@}}, +@code{?\u00e0} and @code{?\U000000E0} are all equivalent to @code{?=C3=A0} +and to @samp{?\N@{LATIN SMALL LETTER A WITH GRAVE@}}. The Unicode +Standard defines code points only up to @samp{U+@var{10ffff}}, so if +you specify a code point higher than that, Emacs signals an error. + +@item +You can specify characters by their hexadecimal character +codes. A hexadecimal escape sequence consists of a backslash, +@samp{x}, and the hexadecimal character code. Thus, @samp{?\x41} is +the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and +@code{?\xe0} is the character @kbd{=C3=A0} (@kbd{a} with grave accent). +You can use any number of hex digits, so you can represent any +character code in this way. + +@item +@cindex octal character code +You can specify characters by their character code in +octal. An octal escape sequence consists of a backslash followed by +up to three octal digits; thus, @samp{?\101} for the character +@kbd{A}, @samp{?\001} for the character @kbd{C-a}, and @code{?\002} +for the character @kbd{C-b}. Only characters up to octal code 777 can +be specified this way. + +@end enumerate + + These escape sequences may also be used in strings. + @node Language Environments @section Language Environments @cindex language environments diff --git c/doc/lispref/objects.texi i/doc/lispref/objects.texi index a715b45a6c..35f413c5a5 100644 --- c/doc/lispref/objects.texi +++ i/doc/lispref/objects.texi @@ -440,6 +440,8 @@ Basic Char Syntax you should write an extra space after the character constant to separate it from the following text.) =20 +@c This is reproduced in "Character Escape Syntax" in the Emacs +@c manual, please keep in sync. @node General Escape Syntax @subsubsection General Escape Syntax =20