unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#37036: [PATCH] Inconsistent ASCII and Latin char categories
@ 2019-08-15 12:17 Mattias Engdegård
  2019-08-15 15:27 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Mattias Engdegård @ 2019-08-15 12:17 UTC (permalink / raw)
  To: 37036

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

The ASCII (a) and Latin (l) character categories are inconsistent in what characters they contain.

It should be clear what the ASCII category means, but it omits 00-1f (contrary to a comment in the code).

The Latin category isn't exactly defined anywhere but should reasonably comprise letters from Latin-based scripts. Currently, it also includes many control characters and symbols from the ASCII and Latin-1 Supplement blocks, which seems hard to justify.

Other changes to Latin could be argued: what modifiers/combining chars should be included? What about IPA and non-IPA phonetics? Ligatures? What about Latin-derived forms such as circled letters? &c. The attached patch does not go there but only fixes the glaring errors in the 00-ff range.


[-- Attachment #2: 0001-Fix-ASCII-and-Latin-character-categories.patch --]
[-- Type: application/octet-stream, Size: 1598 bytes --]

From 9dbb98c7d2f7856a16efcfacdfae7890db3c45fe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 15 Aug 2019 14:04:03 +0200
Subject: [PATCH] Fix ASCII and Latin character categories

* lisp/international/characters.el:
Make the ASCII (a) category include all ASCII characters.
Make the Latin (l) category include only letters from the range 00-ff.
---
 lisp/international/characters.el | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lisp/international/characters.el b/lisp/international/characters.el
index 012827ba1c..379a6a170b 100644
--- a/lisp/international/characters.el
+++ b/lisp/international/characters.el
@@ -127,11 +127,8 @@ ?L
 \f
 ;;; Setting syntax and category.
 
-;; ASCII
-
-;; All ASCII characters have the category `a' (ASCII) and `l' (Latin).
-(modify-category-entry '(32 . 127) ?a)
-(modify-category-entry '(32 . 127) ?l)
+;; All ASCII characters have the category `a' (ASCII).
+(modify-category-entry '(0 . 127) ?a)
 
 ;; Deal with the CJK charsets first.  Since the syntax of blocks is
 ;; defined per charset, and the charsets may contain e.g. Latin
@@ -510,7 +507,13 @@ ?L
 
 ;; Latin
 
-(modify-category-entry '(#x80 . #x024F) ?l)
+;; ASCII
+(modify-category-entry '(?A . ?Z) ?l)
+(modify-category-entry '(?a . ?z) ?l)
+;; Latin-1 Supplement
+(modify-category-entry '(#xc0 . #xd6) ?l)
+(modify-category-entry '(#xd8 . #xf6) ?l)
+(modify-category-entry '(#xf8 . #xff) ?l)
 
 (let ((tbl (standard-case-table)) c)
 
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-08-16 10:48 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-15 12:17 bug#37036: [PATCH] Inconsistent ASCII and Latin char categories Mattias Engdegård
2019-08-15 15:27 ` Eli Zaretskii
2019-08-15 15:46   ` Mattias Engdegård
2019-08-15 16:23     ` Eli Zaretskii
2019-08-15 16:30       ` Mattias Engdegård
2019-08-15 16:59         ` Eli Zaretskii
2019-08-15 17:37           ` Mattias Engdegård
2019-08-15 19:23             ` Eli Zaretskii
2019-08-15 19:46               ` Eli Zaretskii
2019-08-15 22:19               ` Mattias Engdegård
2019-08-16  9:33                 ` Eli Zaretskii
2019-08-16 10:48                   ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).