From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 16731@debbugs.gnu.org
Subject: bug#16731: 24.3.50; Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 18:33:05 +0200 [thread overview]
Message-ID: <83y51fq8fy.fsf@gnu.org> (raw)
In-Reply-To: <jwv4n4387b9.fsf-monnier+emacsbugs@gnu.org>
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Andreas Röhler <andreas.roehler@easy-emacs.de>,
> 16731@debbugs.gnu.org
> Date: Thu, 13 Feb 2014 08:37:45 -0500
>
> > How will we then be able to distinguish between lower-case characters
> > that have no upcase variant and characters that are not lower-case
> > characters at all?
>
> Right: to handle this, we need to distinguish characters that are
> lower-case without an uppercase variant from characters which are
> neither lowercase nor uppercase.
>
> We could do that by saying that the upcase table should return nil or -1
> for ß, to indicate that the upcase version is "missing". But such
> a change will probably require carefully revising "all" the code that
> uses those tables.
Right. I can instead suggest a much less intrusive change below. Its
only disadvantage is that if some user or Lisp program overrides the
standard case tables, and actually _wants_ some lower-case characters
behave as if they weren't, looking at the Unicode tables will undo
such customizations. If this is a concern, perhaps we could compare
the case table with the standard value, and only use the Unicode
attributes when they are equal?
If the approach below is accepted, a related question is how to treat
letters whose category is Lt, i.e. "titlecase" -- do we consider such
letters upper case or don't we?
--- src/buffer.h~0 2014-01-01 09:46:07.000000000 +0200
+++ src/buffer.h 2014-02-13 18:27:32.225839000 +0200
@@ -1349,7 +1349,19 @@ downcase (int c)
}
/* True if C is upper case. */
-INLINE bool uppercasep (int c) { return downcase (c) != c; }
+INLINE bool uppercasep (int c)
+{
+ Lisp_Object val;
+
+ if (downcase (c) != c)
+ return true;
+
+ if (NILP (Vunicode_category_table))
+ return false;
+
+ val = CHAR_TABLE_REF (Vunicode_category_table, c);
+ return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Lu;
+}
/* Upcase a character C known to be not upper case. */
INLINE int
@@ -1364,7 +1376,16 @@ upcase1 (int c)
INLINE bool
lowercasep (int c)
{
- return !uppercasep (c) && upcase1 (c) != c;
+ Lisp_Object val;
+
+ if (!uppercasep (c) && upcase1 (c) != c)
+ return true;
+
+ if (NILP (Vunicode_category_table))
+ return false;
+
+ val = CHAR_TABLE_REF (Vunicode_category_table, c);
+ return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Ll;
}
/* Upcase a character C, or make no change if that cannot be done. */
next prev parent reply other threads:[~2014-02-13 16:33 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-12 17:29 bug#16731: 24.3.50; Latin small letter sharp s is not considered lower-case Jorgen Schaefer
2014-02-12 17:55 ` Glenn Morris
2014-02-12 19:31 ` Andreas Röhler
2014-02-12 19:49 ` Eli Zaretskii
2014-02-12 20:10 ` Andreas Röhler
2014-02-12 20:16 ` Eli Zaretskii
2014-02-12 20:33 ` Andreas Röhler
2014-02-12 20:57 ` Juanma Barranquero
2014-02-13 3:46 ` Eli Zaretskii
2014-02-13 8:27 ` Andreas Röhler
2014-02-13 15:53 ` Eli Zaretskii
2014-02-13 13:37 ` Stefan Monnier
2014-02-13 16:33 ` Eli Zaretskii [this message]
2014-02-13 17:10 ` Stefan Monnier
2014-02-13 17:39 ` Eli Zaretskii
2014-02-13 18:02 ` Andreas Röhler
2014-02-13 18:17 ` Eli Zaretskii
2014-02-13 18:10 ` Stefan Monnier
2014-02-13 18:16 ` Eli Zaretskii
2014-02-13 19:15 ` Stefan Monnier
2014-02-13 20:24 ` Eli Zaretskii
2014-02-14 17:22 ` Stefan Monnier
2014-02-14 18:16 ` Eli Zaretskii
2014-02-14 20:59 ` Stefan Monnier
2014-02-15 7:12 ` Eli Zaretskii
2014-02-17 3:09 ` Stefan Monnier
2014-02-17 5:29 ` Eli Zaretskii
2014-02-13 17:58 ` Juanma Barranquero
2014-02-13 18:18 ` Eli Zaretskii
2014-02-13 18:22 ` Juanma Barranquero
2014-02-13 18:47 ` Glenn Morris
2014-02-13 20:16 ` Eli Zaretskii
2014-02-14 16:20 ` bug#16731: 24.3.50; , " Paul Eggert
2021-07-16 12:32 ` bug#10576: Subject: 23.4; char class [:lower:] misses latin small letter sharp s Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83y51fq8fy.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=16731@debbugs.gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).