From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#37036: [PATCH] Inconsistent ASCII and Latin char categories Date: Thu, 15 Aug 2019 14:17:15 +0200 Message-ID: Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_BC12FC6B-E23E-4F58-8797-6412ED88D3F4" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="49534"; mail-complaints-to="usenet@blaine.gmane.org" To: 37036@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Aug 15 14:18:12 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1hyEhn-000Ck9-IP for geb-bug-gnu-emacs@m.gmane.org; Thu, 15 Aug 2019 14:18:11 +0200 Original-Received: from localhost ([::1]:41364 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hyEhm-0000RF-BL for geb-bug-gnu-emacs@m.gmane.org; Thu, 15 Aug 2019 08:18:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37739) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hyEhg-0000R2-0m for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:18:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hyEhe-0001QQ-Ti for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:18:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:41945) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hyEhe-0001QI-Qq for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:18:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hyEhe-0004Zl-JA for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:18:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 15 Aug 2019 12:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 37036 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.156587144617541 (code B ref -1); Thu, 15 Aug 2019 12:18:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 15 Aug 2019 12:17:26 +0000 Original-Received: from localhost ([127.0.0.1]:50766 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hyEh4-0004Yr-5M for submit@debbugs.gnu.org; Thu, 15 Aug 2019 08:17:26 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:40893) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hyEh1-0004Yh-N0 for submit@debbugs.gnu.org; Thu, 15 Aug 2019 08:17:24 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37674) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1hyEh0-0000IQ-HA for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:17:23 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hyEgz-0000xu-9r for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:17:22 -0400 Original-Received: from mail210c50.megamailservers.eu ([91.136.10.220]:37072 helo=mail194c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hyEgy-0000v8-LW for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2019 08:17:21 -0400 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1565871437; bh=hEOCdHw46AUZ2gOrAE0D8lRiHZDWsDs6mCQHAVdD1Vo=; h=From:Subject:Date:To:From; b=IBDixW7Hi13kS8JmgeV23VA9pUFfwXndxlISouVOahiHS5yQqporInC0aWXtxxAzU dDFKWBjHUQYHQw2CVBBKQXIpAWDhDx4r59SKR8W3R7SX1Q3tjPVddwqScn+pKb9Z+O r56ebaQU9vRhadAujfcpcuBXSW/S7gEGBBSZRZUw= Feedback-ID: mattiase@acm.or Original-Received: from [192.168.0.4] ([188.150.171.71]) (authenticated bits=0) by mail194c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id x7FCHFmI020660 for ; Thu, 15 Aug 2019 12:17:17 +0000 X-Mailer: Apple Mail (2.3445.104.11) X-CTCH-RefID: str=0001.0A0B020B.5D554D4D.005F, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=Df05VMlW c=1 sm=1 tr=0 a=SF+I6pRkHZhrawxbOkkvaA==:117 a=SF+I6pRkHZhrawxbOkkvaA==:17 a=M51BFTxLslgA:10 a=iaYm6Dxja3XvBU-JrQUA:9 a=CjuIK1q_8ugA:10 a=ww0mqciDqbblwU7X_Q4A:9 a=B2y7HmGcmWMA:10 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:165082 Archived-At: --Apple-Mail=_BC12FC6B-E23E-4F58-8797-6412ED88D3F4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii The ASCII (a) and Latin (l) character categories are inconsistent in = what characters they contain. It should be clear what the ASCII category means, but it omits 00-1f = (contrary to a comment in the code). The Latin category isn't exactly defined anywhere but should reasonably = comprise letters from Latin-based scripts. Currently, it also includes = many control characters and symbols from the ASCII and Latin-1 = Supplement blocks, which seems hard to justify. Other changes to Latin could be argued: what modifiers/combining chars = should be included? What about IPA and non-IPA phonetics? Ligatures? = What about Latin-derived forms such as circled letters? &c. The attached = patch does not go there but only fixes the glaring errors in the 00-ff = range. --Apple-Mail=_BC12FC6B-E23E-4F58-8797-6412ED88D3F4 Content-Disposition: attachment; filename=0001-Fix-ASCII-and-Latin-character-categories.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-Fix-ASCII-and-Latin-character-categories.patch" Content-Transfer-Encoding: quoted-printable =46rom=209dbb98c7d2f7856a16efcfacdfae7890db3c45fe=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Thu,=2015=20Aug=202019=2014:04:03=20+0200=0A= Subject:=20[PATCH]=20Fix=20ASCII=20and=20Latin=20character=20categories=0A= =0A*=20lisp/international/characters.el:=0AMake=20the=20ASCII=20(a)=20= category=20include=20all=20ASCII=20characters.=0AMake=20the=20Latin=20= (l)=20category=20include=20only=20letters=20from=20the=20range=2000-ff.=0A= ---=0A=20lisp/international/characters.el=20|=2015=20+++++++++------=0A=20= 1=20file=20changed,=209=20insertions(+),=206=20deletions(-)=0A=0Adiff=20= --git=20a/lisp/international/characters.el=20= b/lisp/international/characters.el=0Aindex=20012827ba1c..379a6a170b=20= 100644=0A---=20a/lisp/international/characters.el=0A+++=20= b/lisp/international/characters.el=0A@@=20-127,11=20+127,8=20@@=20?L=0A=20= =0C=0A=20;;;=20Setting=20syntax=20and=20category.=0A=20=0A-;;=20ASCII=0A= -=0A-;;=20All=20ASCII=20characters=20have=20the=20category=20`a'=20= (ASCII)=20and=20`l'=20(Latin).=0A-(modify-category-entry=20'(32=20.=20= 127)=20?a)=0A-(modify-category-entry=20'(32=20.=20127)=20?l)=0A+;;=20All=20= ASCII=20characters=20have=20the=20category=20`a'=20(ASCII).=0A= +(modify-category-entry=20'(0=20.=20127)=20?a)=0A=20=0A=20;;=20Deal=20= with=20the=20CJK=20charsets=20first.=20=20Since=20the=20syntax=20of=20= blocks=20is=0A=20;;=20defined=20per=20charset,=20and=20the=20charsets=20= may=20contain=20e.g.=20Latin=0A@@=20-510,7=20+507,13=20@@=20?L=0A=20=0A=20= ;;=20Latin=0A=20=0A-(modify-category-entry=20'(#x80=20.=20#x024F)=20?l)=0A= +;;=20ASCII=0A+(modify-category-entry=20'(?A=20.=20?Z)=20?l)=0A= +(modify-category-entry=20'(?a=20.=20?z)=20?l)=0A+;;=20Latin-1=20= Supplement=0A+(modify-category-entry=20'(#xc0=20.=20#xd6)=20?l)=0A= +(modify-category-entry=20'(#xd8=20.=20#xf6)=20?l)=0A= +(modify-category-entry=20'(#xf8=20.=20#xff)=20?l)=0A=20=0A=20(let=20= ((tbl=20(standard-case-table))=20c)=0A=20=0A--=20=0A2.20.1=20(Apple=20= Git-117)=0A=0A= --Apple-Mail=_BC12FC6B-E23E-4F58-8797-6412ED88D3F4--