From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24603: [RFC 00/18] Improvement to casing Date: Tue, 4 Oct 2016 03:05:10 +0200 Message-ID: <1475543110-10019-1-git-send-email-mina86@mina86.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1475543183 16641 195.159.176.226 (4 Oct 2016 01:06:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 4 Oct 2016 01:06:23 +0000 (UTC) To: 24603@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Oct 04 03:06:19 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1brEBT-0003JA-Vr for geb-bug-gnu-emacs@m.gmane.org; Tue, 04 Oct 2016 03:06:16 +0200 Original-Received: from localhost ([::1]:39697 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brEBS-0004u1-Bs for geb-bug-gnu-emacs@m.gmane.org; Mon, 03 Oct 2016 21:06:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53470) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brEBL-0004tf-97 for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:06:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1brEBG-0005BT-6k for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:06:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:37303) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brEBG-0005BF-3L for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:06:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1brEBF-0006Rd-R5 for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:06:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 04 Oct 2016 01:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.147554314624737 (code B ref -1); Tue, 04 Oct 2016 01:06:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 4 Oct 2016 01:05:46 +0000 Original-Received: from localhost ([127.0.0.1]:43493 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1brEB0-0006Qu-04 for submit@debbugs.gnu.org; Mon, 03 Oct 2016 21:05:46 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:34094) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1brEAy-0006Qc-Bm for submit@debbugs.gnu.org; Mon, 03 Oct 2016 21:05:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1brEAs-0004pn-27 for submit@debbugs.gnu.org; Mon, 03 Oct 2016 21:05:39 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:40192) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brEAr-0004oW-La for submit@debbugs.gnu.org; Mon, 03 Oct 2016 21:05:37 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53253) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brEAm-0004su-Fa for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:05:36 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1brEAk-0004jG-7q for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:05:31 -0400 Original-Received: from mail-wm0-x22c.google.com ([2a00:1450:400c:c09::22c]:38558) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1brEAj-0004iS-Vq for bug-gnu-emacs@gnu.org; Mon, 03 Oct 2016 21:05:30 -0400 Original-Received: by mail-wm0-x22c.google.com with SMTP id p138so182202822wmb.1 for ; Mon, 03 Oct 2016 18:05:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=vDE6VzKLVM08Nfx8LlopcJaIidlkaD1aUmqyltC3yeY=; b=ARKgsX6wR2FDtEpOQd28o0QIB1CMeTzN5CzUa6gPt/fLoml4mVX78yWNKkZpf1PZjI Nd3lUP7ktDhx7pJIRHDrgOdBjHZixtHKD2+u0UHSbVIVoMJengMQKND64Pm3QYfOt31w oW1rPvsPkhiDG4o234JQElUnQ2jG0kElDBw+hzOWFYB9q62ZbU7KZ3zAs1IWX87CFpVl c7x0tppNlcYiY8KUTojoPrnsBqtuVjAAfRpGuVK0XbQFi8lfePt1D+6Ex9HB9SnhytZw ZC/SpBeKl8NCO2B+tlDB9OTda9YIIY91OBbweOWgFIepealtrAWVXTiUDmd2fEnieaZ3 nKxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:subject:date:message-id :mime-version:content-transfer-encoding; bh=vDE6VzKLVM08Nfx8LlopcJaIidlkaD1aUmqyltC3yeY=; b=TULXjig73ZdDDj7azImaZGgaHPKotkOZrR0sCFJlumQOg/bTiXGYyfxXg0jW2gQxQA 3WhhR9gdk5DejZWLn8zYKTHBtPj9NbRUMYrhHqa16BZ3EBHvQiA0ckfyPL7f5aSbov3P pxwhr4bCvrSSVC4cMINtOdavDt4t1ORqvjAVqtwZHGnururqvWt0XRudqs5Dj3FkxYy2 preOO9KyLsEZf8gvaqhoplPzbKc+PMAa5r9U74TsfrZNo5bV931RMqCZkBSisRBHGVCm igd2qa/IFxXClJwR8XYBJA61y3c3TzM5O50u6ZV0d6JTB+KgHVmyY+DmmexQuXSNUAD3 E3aw== X-Gm-Message-State: AA6/9RmC/+6xyrej0kMZHU0CXz6KFA5i3xULRqflGEg7PfBRoVIjHRAvPN6Dq7wT3iFOeewo X-Received: by 10.195.11.104 with SMTP id eh8mr572241wjd.128.1475543128590; Mon, 03 Oct 2016 18:05:28 -0700 (PDT) Original-Received: from mpn.zrh.corp.google.com ([172.16.113.135]) by smtp.gmail.com with ESMTPSA id t138sm21401012wmt.5.2016.10.03.18.05.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Oct 2016 18:05:26 -0700 (PDT) Original-Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 0AF111E0288; Tue, 4 Oct 2016 03:05:25 +0200 (CEST) X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:123988 Archived-At: This is work in progress with a known bug: if casing region changes length (e.g. fish becomes FISH) neither point moves correctly nor undo information is recorded correctly. There could also be some minor improvements to documentation here and there. Overall, this is mature enough (and probably too-big already) to send a request for comments. Michal Nazarewicz (18): Add tests for casefiddle.c Generate upcase and downcase tables from Unicode data Don’t assume character can be either upper- or lower-case when casing Split casify_object into multiple functions Introduce case_character function Add support for title-casing letters Split up casify_region function. Support casing characters which map into multiple code points Implement special sigma casing rule Implement Turkic dotless and dotted i handling when casing strings Implement casing rules for Lithuanian Implement rules for title-casing Dutch ij ‘letter’ Add some tricky Unicode characters to regex test Factor out character category lookup to separate function Base lower- and upper-case tests on Unicode properties Refactor character class checking; optimise ASCII case Optimise character class matching in regexes Fix case-fold-search character class matching .gitignore | 1 + admin/unidata/README | 4 + admin/unidata/SpecialCasing.txt | 281 +++++++++++++ etc/NEWS | 25 +- lisp/international/characters.el | 338 +++------------ src/Makefile.in | 3 + src/buffer.h | 17 +- src/casefiddle.c | 864 ++++++++++++++++++++++++++++++--------- src/character.c | 151 +++---- src/character.h | 76 +++- src/deps.mk | 2 +- src/keyboard.c | 25 +- src/make-special-casing.py | 189 +++++++++ src/regex.c | 119 +++--- test/lisp/char-fold-tests.el | 12 +- test/src/casefiddle-tests.el | 262 ++++++++++++ test/src/regex-tests.el | 62 ++- 17 files changed, 1786 insertions(+), 645 deletions(-) create mode 100644 admin/unidata/SpecialCasing.txt create mode 100644 src/make-special-casing.py create mode 100644 test/src/casefiddle-tests.el -- 2.8.0.rc3.226.g39d4020