unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 16731@debbugs.gnu.org
Subject: bug#16731: 24.3.50; Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 18:33:05 +0200	[thread overview]
Message-ID: <83y51fq8fy.fsf@gnu.org> (raw)
In-Reply-To: <jwv4n4387b9.fsf-monnier+emacsbugs@gnu.org>

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Andreas Röhler <andreas.roehler@easy-emacs.de>,
>   16731@debbugs.gnu.org
> Date: Thu, 13 Feb 2014 08:37:45 -0500
> 
> > How will we then be able to distinguish between lower-case characters
> > that have no upcase variant and characters that are not lower-case
> > characters at all?
> 
> Right: to handle this, we need to distinguish characters that are
> lower-case without an uppercase variant from characters which are
> neither lowercase nor uppercase.
> 
> We could do that by saying that the upcase table should return nil or -1
> for ß, to indicate that the upcase version is "missing".  But such
> a change will probably require carefully revising "all" the code that
> uses those tables.

Right.  I can instead suggest a much less intrusive change below.  Its
only disadvantage is that if some user or Lisp program overrides the
standard case tables, and actually _wants_ some lower-case characters
behave as if they weren't, looking at the Unicode tables will undo
such customizations.  If this is a concern, perhaps we could compare
the case table with the standard value, and only use the Unicode
attributes when they are equal?

If the approach below is accepted, a related question is how to treat
letters whose category is Lt, i.e. "titlecase" -- do we consider such
letters upper case or don't we?

--- src/buffer.h~0	2014-01-01 09:46:07.000000000 +0200
+++ src/buffer.h	2014-02-13 18:27:32.225839000 +0200
@@ -1349,7 +1349,19 @@ downcase (int c)
 }
 
 /* True if C is upper case.  */
-INLINE bool uppercasep (int c) { return downcase (c) != c; }
+INLINE bool uppercasep (int c)
+{
+  Lisp_Object val;
+
+  if (downcase (c) != c)
+    return true;
+
+  if (NILP (Vunicode_category_table))
+    return false;
+
+  val = CHAR_TABLE_REF (Vunicode_category_table, c);
+  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Lu;
+}
 
 /* Upcase a character C known to be not upper case.  */
 INLINE int
@@ -1364,7 +1376,16 @@ upcase1 (int c)
 INLINE bool
 lowercasep (int c)
 {
-  return !uppercasep (c) && upcase1 (c) != c;
+  Lisp_Object val;
+
+  if (!uppercasep (c) && upcase1 (c) != c)
+    return true;
+
+  if (NILP (Vunicode_category_table))
+    return false;
+
+  val = CHAR_TABLE_REF (Vunicode_category_table, c);
+  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Ll;
 }
 
 /* Upcase a character C, or make no change if that cannot be done.  */





  reply	other threads:[~2014-02-13 16:33 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-12 17:29 bug#16731: 24.3.50; Latin small letter sharp s is not considered lower-case Jorgen Schaefer
2014-02-12 17:55 ` Glenn Morris
2014-02-12 19:31   ` Andreas Röhler
2014-02-12 19:49     ` Eli Zaretskii
2014-02-12 20:10       ` Andreas Röhler
2014-02-12 20:16         ` Eli Zaretskii
2014-02-12 20:33           ` Andreas Röhler
2014-02-12 20:57             ` Juanma Barranquero
2014-02-13  3:46             ` Eli Zaretskii
2014-02-13  8:27               ` Andreas Röhler
2014-02-13 15:53                 ` Eli Zaretskii
2014-02-13 13:37               ` Stefan Monnier
2014-02-13 16:33                 ` Eli Zaretskii [this message]
2014-02-13 17:10                   ` Stefan Monnier
2014-02-13 17:39                     ` Eli Zaretskii
2014-02-13 18:02                       ` Andreas Röhler
2014-02-13 18:17                         ` Eli Zaretskii
2014-02-13 18:10                       ` Stefan Monnier
2014-02-13 18:16                         ` Eli Zaretskii
2014-02-13 19:15                           ` Stefan Monnier
2014-02-13 20:24                             ` Eli Zaretskii
2014-02-14 17:22                               ` Stefan Monnier
2014-02-14 18:16                                 ` Eli Zaretskii
2014-02-14 20:59                                   ` Stefan Monnier
2014-02-15  7:12                                     ` Eli Zaretskii
2014-02-17  3:09                                       ` Stefan Monnier
2014-02-17  5:29                                         ` Eli Zaretskii
2014-02-13 17:58                   ` Juanma Barranquero
2014-02-13 18:18                     ` Eli Zaretskii
2014-02-13 18:22                       ` Juanma Barranquero
2014-02-13 18:47                       ` Glenn Morris
2014-02-13 20:16                         ` Eli Zaretskii
2014-02-14 16:20 ` bug#16731: 24.3.50; , " Paul Eggert
2021-07-16 12:32 ` bug#10576: Subject: 23.4; char class [:lower:] misses latin small letter sharp s Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83y51fq8fy.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=16731@debbugs.gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).