From: Michal Nazarewicz <mina86@mina86.com>
To: 24603@debbugs.gnu.org, eliz@gnu.org
Subject: bug#24603: [PATCHv5 03/11] Add support for title-casing letters (bug#24603)
Date: Thu, 9 Mar 2017 22:51:42 +0100 [thread overview]
Message-ID: <20170309215150.9562-4-mina86@mina86.com> (raw)
In-Reply-To: <20170309215150.9562-1-mina86@mina86.com>
* src/casefiddle.c (struct casing_context, prepare_casing_context): Add
titlecase_char_table member. It’s set to the ‘titlecase’ Unicode
property table if capitalisation has been requested.
(case_character): Make use of the titlecase_char_table to title-case
initial characters when capitalising.
* test/src/casefiddle-tests.el (casefiddle-tests--characters,
casefiddle-tests-casing): Update test cases which are now passing.
---
etc/NEWS | 2 +-
src/casefiddle.c | 27 +++++++++++++++++++++------
test/src/casefiddle-tests.el | 39 ++++++++++++++++++++++++++-------------
3 files changed, 48 insertions(+), 20 deletions(-)
diff --git a/etc/NEWS b/etc/NEWS
index 32137a79da6..715764accf1 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -351,7 +351,7 @@ same as in modes where the character is not whitespace.
Instead of only checking the modification time, Emacs now also checks
the file's actual content before prompting the user.
-** Title case characters are properly converted to upper case.
+** Title case characters are properly cased (from and into).
'upcase', 'upcase-region' et al. convert title case characters (such
as the single character "Dz") into their upper case form (such as "DZ").
As a downside, 'capitalize' and 'upcase-initials' produce awkward
diff --git a/src/casefiddle.c b/src/casefiddle.c
index 8129d376a5a..01e35194e0e 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -33,6 +33,10 @@ enum case_action {CASE_UP, CASE_DOWN, CASE_CAPITALIZE, CASE_CAPITALIZE_UP};
/* State for casing individual characters. */
struct casing_context {
+ /* A char-table with title-case character mappings or nil. It being non-nil
+ implies flag being CASE_CAPITALIZE or CASE_CAPITALIZE_UP (but the reverse
+ is not true). */
+ Lisp_Object titlecase_char_table;
/* User-requested action. */
enum case_action flag;
/* If true, function operates on a buffer as opposed to a string or character.
@@ -54,6 +58,9 @@ prepare_casing_context (struct casing_context *ctx,
ctx->flag = flag;
ctx->inbuffer = inbuffer;
ctx->inword = flag == CASE_DOWN;
+ ctx->titlecase_char_table = (int)flag >= (int)CASE_CAPITALIZE
+ ? uniprop_table (intern_c_string ("titlecase"))
+ : Qnil;
/* If the case table is flagged as modified, rescan it. */
if (NILP (XCHAR_TABLE (BVAR (current_buffer, downcase_table))->extras[1]))
@@ -68,10 +75,16 @@ prepare_casing_context (struct casing_context *ctx,
static int
case_character (struct casing_context *ctx, int ch)
{
+ Lisp_Object prop;
+
if (ctx->inword)
ch = ctx->flag == CASE_CAPITALIZE_UP ? ch : downcase (ch);
+ else if (!NILP (ctx->titlecase_char_table) &&
+ CHARACTERP (prop = CHAR_TABLE_REF (ctx->titlecase_char_table, ch)))
+ ch = XFASTINT (prop);
else
ch = upcase(ch);
+
if ((int) ctx->flag >= (int) CASE_CAPITALIZE)
ctx->inword = SYNTAX (ch) == Sword &&
(!ctx->inbuffer || ctx->inword || !syntax_prefix_flag_p (ch));
@@ -199,8 +212,8 @@ The argument object is not altered--the value is a copy. */)
DEFUN ("capitalize", Fcapitalize, Scapitalize, 1, 1, 0,
doc: /* Convert argument to capitalized form and return that.
-This means that each word's first character is upper case
-and the rest is lower case.
+This means that each word's first character is upper case (more
+precisely, if available, title case) and the rest is lower case.
The argument may be a character or string. The result has the same type.
The argument object is not altered--the value is a copy. */)
(Lisp_Object obj)
@@ -212,7 +225,8 @@ The argument object is not altered--the value is a copy. */)
DEFUN ("upcase-initials", Fupcase_initials, Supcase_initials, 1, 1, 0,
doc: /* Convert the initial of each word in the argument to upper case.
-Do not change the other letters of each word.
+(More precisely, if available, initial of each word is converted to
+title-case). Do not change the other letters of each word.
The argument may be a character or string. The result has the same type.
The argument object is not altered--the value is a copy. */)
(Lisp_Object obj)
@@ -376,8 +390,8 @@ point and the mark is operated on. */)
DEFUN ("capitalize-region", Fcapitalize_region, Scapitalize_region, 2, 2, "r",
doc: /* Convert the region to capitalized form.
-Capitalized form means each word's first character is upper case
-and the rest of it is lower case.
+Capitalized form means each word's first character is upper case (more
+precisely, if available, title case) and the rest of it is lower case.
In programs, give two arguments, the starting and ending
character positions to operate on. */)
(Lisp_Object beg, Lisp_Object end)
@@ -391,7 +405,8 @@ character positions to operate on. */)
DEFUN ("upcase-initials-region", Fupcase_initials_region,
Supcase_initials_region, 2, 2, "r",
doc: /* Upcase the initial of each word in the region.
-Subsequent letters of each word are not changed.
+(More precisely, if available, initial of each word is converted to
+title-case). Subsequent letters of each word are not changed.
In programs, give two arguments, the starting and ending
character positions to operate on. */)
(Lisp_Object beg, Lisp_Object end)
diff --git a/test/src/casefiddle-tests.el b/test/src/casefiddle-tests.el
index 152d85de006..e83cb00059b 100644
--- a/test/src/casefiddle-tests.el
+++ b/test/src/casefiddle-tests.el
@@ -63,13 +63,9 @@ casefiddle-tests--characters
(?Ł ?Ł ?ł ?Ł)
(?ł ?Ł ?ł ?Ł)
- ;; FIXME(bug#24603): Commented ones are what we want.
- ;;(?DŽ ?DŽ ?dž ?Dž)
- (?DŽ ?DŽ ?dž ?DŽ)
- ;;(?Dž ?DŽ ?dž ?Dž)
- (?Dž ?DŽ ?dž ?DŽ)
- ;;(?dž ?DŽ ?dž ?Dž)
- (?dž ?DŽ ?dž ?DŽ)
+ (?DŽ ?DŽ ?dž ?Dž)
+ (?Dž ?DŽ ?dž ?Dž)
+ (?dž ?DŽ ?dž ?Dž)
(?Σ ?Σ ?σ ?Σ)
(?σ ?Σ ?σ ?Σ)
@@ -186,19 +182,19 @@ casefiddle-tests--test-casing
;; input upper lower capitalize up-initials
'(("Foo baR" "FOO BAR" "foo bar" "Foo Bar" "Foo BaR")
("Ⅷ ⅷ" "Ⅷ Ⅷ" "ⅷ ⅷ" "Ⅷ Ⅷ" "Ⅷ Ⅷ")
+ ;; "DžUNGLA" is an unfortunate result but it’s really best we can
+ ;; do while still being consistent. Hopefully, users only ever
+ ;; use upcase-initials on camelCase identifiers not real words.
+ ("DŽUNGLA" "DŽUNGLA" "džungla" "Džungla" "DžUNGLA")
+ ("Džungla" "DŽUNGLA" "džungla" "Džungla" "Džungla")
+ ("džungla" "DŽUNGLA" "džungla" "Džungla" "Džungla")
;; FIXME(bug#24603): Everything below is broken at the moment.
;; Here’s what should happen:
- ;;("DŽUNGLA" "DŽUNGLA" "džungla" "Džungla" "DžUNGLA")
- ;;("Džungla" "DŽUNGLA" "džungla" "Džungla" "Džungla")
- ;;("džungla" "DŽUNGLA" "džungla" "Džungla" "Džungla")
;;("define" "DEFINE" "define" "Define" "Define")
;;("fish" "FIsh" "fish" "Fish" "Fish")
;;("Straße" "STRASSE" "straße" "Straße" "Straße")
;;("ΌΣΟΣ" "ΌΣΟΣ" "όσος" "Όσος" "Όσος")
;; And here’s what is actually happening:
- ("DŽUNGLA" "DŽUNGLA" "džungla" "DŽungla" "DŽUNGLA")
- ("Džungla" "DŽUNGLA" "džungla" "DŽungla" "DŽungla")
- ("džungla" "DŽUNGLA" "džungla" "DŽungla" "DŽungla")
("define" "DEfiNE" "define" "Define" "Define")
("fish" "fiSH" "fish" "fish" "fish")
("Straße" "STRAßE" "straße" "Straße" "Straße")
@@ -243,4 +239,21 @@ casefiddle-tests--test-casing
"\xef\xff\xef Zażółć GĘŚlą \xcf\xcf")))))))
+(ert-deftest casefiddle-tests-char-casing ()
+ ;; input upcase downcase [titlecase]
+ (dolist (test '((?a ?A ?a) (?A ?A ?a)
+ (?ł ?Ł ?ł) (?Ł ?Ł ?ł)
+ (?ß ?ß ?ß) (?ẞ ?ẞ ?ß)
+ (?ⅷ ?Ⅷ ?ⅷ) (?Ⅷ ?Ⅷ ?ⅷ)
+ (?DŽ ?DŽ ?dž ?Dž) (?Dž ?DŽ ?dž ?Dž) (?dž ?DŽ ?dž ?Dž)))
+ (let ((ch (car test))
+ (up (nth 1 test))
+ (lo (nth 2 test))
+ (tc (or (nth 3 test) (nth 1 test))))
+ (should (eq up (upcase ch)))
+ (should (eq lo (downcase ch)))
+ (should (eq tc (capitalize ch)))
+ (should (eq tc (upcase-initials ch))))))
+
+
;;; casefiddle-tests.el ends here
--
2.12.0.246.ga2ecc84866-goog
next prev parent reply other threads:[~2017-03-09 21:51 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-04 1:05 bug#24603: [RFC 00/18] Improvement to casing Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 01/18] Add tests for casefiddle.c Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 02/18] Generate upcase and downcase tables from Unicode data Michal Nazarewicz
2016-10-04 7:27 ` Eli Zaretskii
2016-10-04 14:54 ` Michal Nazarewicz
2016-10-04 15:06 ` Eli Zaretskii
2016-10-04 16:57 ` Michal Nazarewicz
2016-10-04 17:27 ` Eli Zaretskii
2016-10-04 17:44 ` Eli Zaretskii
2016-10-06 20:29 ` Michal Nazarewicz
2016-10-07 6:52 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 03/18] Don’t assume character can be either upper- or lower-case when casing Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 04/18] Split casify_object into multiple functions Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 05/18] Introduce case_character function Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 06/18] Add support for title-casing letters Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 07/18] Split up casify_region function Michal Nazarewicz
2016-10-04 7:17 ` Eli Zaretskii
2016-10-18 2:27 ` Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 08/18] Support casing characters which map into multiple code points Michal Nazarewicz
2016-10-04 7:38 ` Eli Zaretskii
2016-10-06 21:40 ` Michal Nazarewicz
2016-10-07 7:46 ` Eli Zaretskii
2017-01-28 23:48 ` Michal Nazarewicz
2017-02-10 9:12 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 09/18] Implement special sigma casing rule Michal Nazarewicz
2016-10-04 7:22 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 10/18] Implement Turkic dotless and dotted i handling when casing strings Michal Nazarewicz
2016-10-04 7:12 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 11/18] Implement casing rules for Lithuanian Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 12/18] Implement rules for title-casing Dutch ij ‘letter’ Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 14/18] Factor out character category lookup to separate function Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties Michal Nazarewicz
2016-10-04 6:54 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case Michal Nazarewicz
2016-10-04 7:48 ` Eli Zaretskii
2016-10-17 13:22 ` Michal Nazarewicz
2016-11-06 19:26 ` Michal Nazarewicz
2016-11-06 19:44 ` Eli Zaretskii
2016-12-20 14:32 ` Michal Nazarewicz
2016-12-20 16:39 ` Eli Zaretskii
2016-12-22 14:02 ` Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 17/18] Optimise character class matching in regexes Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 18/18] Fix case-fold-search character class matching Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 0/3] Case table updates Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 1/3] Add tests for casefiddle.c Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 2/3] Generate upcase and downcase tables from Unicode data Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 3/3] Don’t generate ‘X maps to X’ entries in case tables Michal Nazarewicz
2016-10-18 6:36 ` bug#24603: [PATCH 0/3] Case table updates Eli Zaretskii
2016-10-24 15:11 ` Michal Nazarewicz
2016-10-24 15:33 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 00/11] Casing improvements Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 01/11] Split casify_object into multiple functions Michal Nazarewicz
2017-03-10 9:00 ` Andreas Schwab
2017-03-09 21:51 ` bug#24603: [PATCHv5 02/11] Introduce case_character function Michal Nazarewicz
2017-03-09 21:51 ` Michal Nazarewicz [this message]
2017-03-11 9:03 ` bug#24603: [PATCHv5 03/11] Add support for title-casing letters (bug#24603) Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 04/11] Split up casify_region function (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603) Michal Nazarewicz
2017-03-11 9:14 ` Eli Zaretskii
2017-03-21 2:09 ` Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 06/11] Implement special sigma casing rule (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 07/11] Introduce ‘buffer-language’ buffer-locar variable Michal Nazarewicz
2017-03-11 9:29 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij ‘letter’ (bug#24603) Michal Nazarewicz
2017-03-11 9:40 ` Eli Zaretskii
2017-03-16 21:30 ` Michal Nazarewicz
2017-03-17 13:43 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 09/11] Implement Turkic dotless and dotted i casing rules (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 10/11] Implement casing rules for Lithuanian (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603) Michal Nazarewicz
2017-03-11 9:44 ` Eli Zaretskii
2017-03-16 22:16 ` Michal Nazarewicz
2017-03-17 8:20 ` Eli Zaretskii
2017-03-11 10:00 ` bug#24603: [PATCHv5 00/11] Casing improvements Eli Zaretskii
2017-03-21 1:27 ` bug#24603: [PATCHv6 0/6] Casing improvements, language-independent part Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 1/6] Split casify_object into multiple functions Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 2/6] Introduce case_character function Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 3/6] Add support for title-casing letters (bug#24603) Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 4/6] Split up casify_region function (bug#24603) Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603) Michal Nazarewicz
2017-03-22 16:06 ` Eli Zaretskii
2017-04-03 9:01 ` Michal Nazarewicz
2017-04-03 14:52 ` Eli Zaretskii
2019-06-25 0:09 ` Lars Ingebrigtsen
2019-06-25 0:29 ` Michał Nazarewicz
2020-08-11 13:46 ` Lars Ingebrigtsen
2021-05-10 11:51 ` bug#24603: [RFC 00/18] Improvement to casing Lars Ingebrigtsen
2017-03-21 1:27 ` bug#24603: [PATCHv6 6/6] Implement special sigma casing rule (bug#24603) Michal Nazarewicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170309215150.9562-4-mina86@mina86.com \
--to=mina86@mina86.com \
--cc=24603@debbugs.gnu.org \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.