unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: politza@hochschule-trier.de, mohammad.mahmoudi@gmail.com
Cc: 19878-done@debbugs.gnu.org
Subject: bug#19878: 24.4; Syntax class [:alpha:] wrongly matches the Indian	digits ۱۲۳۴۵۶۷۸۹۰ as letter
Date: Sat, 28 Feb 2015 14:29:52 +0200	[thread overview]
Message-ID: <83bnkete0v.fsf@gnu.org> (raw)
In-Reply-To: <838ufw7bzi.fsf@gnu.org>

> Date: Tue, 17 Feb 2015 18:13:05 +0200
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: mohammad.mahmoudi@gmail.com, 19878@debbugs.gnu.org
> 
> > From: Andreas Politz <politza@hochschule-trier.de>
> > Date: Sun, 15 Feb 2015 21:16:13 +0100
> > Cc: 19878@debbugs.gnu.org
> > 
> > 
> > I think this is supposed to be:
> > 
> > ,----[ (info "(elisp) Char Classes") ]
> > | `[:alpha:]'
> > |      This matches any letter.  (At present, for multibyte characters, it
> > |      matches anything that has word syntax.)
> > `----
> 
> Indeed, which doesn't sound very nice.
> 
> Does someone object to the changes below (to be installed on master)?
> They make [:alpha:] and [:alnum:] closer to the Unicode
> recommendations in UTS #18, although we are still very far from
> supporting even Level 1 of conformance.  But these two seem like
> low-hanging fruit to me.
> 
> The modified definitions of these two sets are not 100% compatible
> with the old ones for the multibyte characters.  However, if it turns
> out that some code used these to get word-constituent characters,
> those places should simply be changed to use \sw instead.

No further comments, so I pushed the changes as commit 1a50945 on the
master branch, and I'm marking this bug closed.

> Also, does someone see any potential problem to make [:digit:] be a
> superset of the current ASCII-only set, to match UTS #18 as well?  The
> comment in regex.c says it is "only used for single-byte characters",
> but it isn't clear to me whether this is a requirement, i.e. there's
> some code in Emacs that relies on that, or just a statement of facts.

I'd still like to hear an answer and/or opinions about this.  If I
hear no comments, I will look into making a similar change to
[:digit:] soon.





      parent reply	other threads:[~2015-02-28 12:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-15 15:44 bug#19878: 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter mohammad.mahmoudi
2015-02-15 20:16 ` Andreas Politz
2015-02-17 16:13   ` Eli Zaretskii
2015-02-17 18:15     ` Ivan Shmakov
2015-02-17 18:45       ` Eli Zaretskii
2015-02-28 12:29     ` Eli Zaretskii [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83bnkete0v.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=19878-done@debbugs.gnu.org \
    --cc=mohammad.mahmoudi@gmail.com \
    --cc=politza@hochschule-trier.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).