From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24603: [PATCHv5 00/11] Casing improvements Date: Thu, 9 Mar 2017 22:51:39 +0100 Message-ID: <20170309215150.9562-1-mina86@mina86.com> References: <1475543110-10019-1-git-send-email-mina86@mina86.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1489096435 9152 195.159.176.226 (9 Mar 2017 21:53:55 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 9 Mar 2017 21:53:55 +0000 (UTC) To: 24603@debbugs.gnu.org, eliz@gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Mar 09 22:53:50 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm60H-0001fW-13 for geb-bug-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 22:53:45 +0100 Original-Received: from localhost ([::1]:36407 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm60N-0003E4-2N for geb-bug-gnu-emacs@m.gmane.org; Thu, 09 Mar 2017 16:53:51 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41174) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cm5zd-0002g8-To for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:53:09 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cm5za-0004Ik-R3 for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:53:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:49927) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cm5za-0004Ig-NP for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:53:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cm5za-0000Ox-8e for bug-gnu-emacs@gnu.org; Thu, 09 Mar 2017 16:53:02 -0500 X-Loop: help-debbugs@gnu.org In-Reply-To: <1475543110-10019-1-git-send-email-mina86@mina86.com> Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 09 Mar 2017 21:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.14890963281344 (code B ref 24603); Thu, 09 Mar 2017 21:53:02 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 9 Mar 2017 21:52:08 +0000 Original-Received: from localhost ([127.0.0.1]:48095 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm5yi-0000LZ-6m for submit@debbugs.gnu.org; Thu, 09 Mar 2017 16:52:08 -0500 Original-Received: from mail-wr0-f181.google.com ([209.85.128.181]:34768) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cm5yf-0000Kn-Um for 24603@debbugs.gnu.org; Thu, 09 Mar 2017 16:52:06 -0500 Original-Received: by mail-wr0-f181.google.com with SMTP id l37so53889295wrc.1 for <24603@debbugs.gnu.org>; Thu, 09 Mar 2017 13:52:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=kRuJrqmWHU+gq2yVVxPBw9vUJ5v86dUyd9AoJfkGkNs=; b=XkFmB+VSUipnvelM9Oo/YuDdyp4Q8K19hy8bqS1cD+OK0/W9tmqekAAduleCRHr6rs bjQhQDeuBuo6UWmrbVJoPYaBqfHhvOmsCmODip8hHvUrV4bgIUAIsX1yZJVE9/1a2VAc vHnTvoJS/kAOMNRw8OveJyjeRZcsPzbfZtsyAWutoU6ahgsC87Te7LgjD4mrCwvKm4eO 01AsaSruqOcgwir8PnRWf4Xl3xBlEa6nn7NpO3VOTB3T3ahEQPcSQnlzZmCCfqyfpkVN aV0tQL2uQT8hAPS7O1Y1v3I9DX6s4PmfgfXTyr5BlTIzlf2ydyTXsF6kbiF6tELfLXdd Alig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:subject:date:message-id :mime-version:content-transfer-encoding; bh=kRuJrqmWHU+gq2yVVxPBw9vUJ5v86dUyd9AoJfkGkNs=; b=XQP3sGFBhXE7fRbMoPC327YgtCUOx4BXovUHBOIsyaEr52+tMVofeyTLJVodZVSfWm 2X/+en6KzjEtuS400mSeV2X+NryN6hyr8CoBR0tdg0Ns/75uE0Rm8UqHVNOaznpoDZYf BtskGfsWiBzqq6/+LvfWy3qYSLTSciq+HpZPgFlX8O3pdjSHfV1RbDsd+Ia8vBVBTOn2 0redgeM9xkoQOyHG4Gejt8quRSCFEG0lWI0pvNjG/9VFN8+0/IU8qCzx4j3M+CZy8bNa wqQAFcH+ESWPv61KAlSDQZ1ZZ5NcUDUbvB0aRKcU5/QCblgzxiJiLx6TTw0tvAg+RyD6 I0Ww== X-Gm-Message-State: AMke39lW7uIsiOh2hIUOcgeRGBq/gC9j70n2JqE/tNK15l2O7Uu5PxXPuOlF7IGZ5A0lIRrJ X-Received: by 10.223.164.9 with SMTP id d9mr12834460wra.146.1489096319880; Thu, 09 Mar 2017 13:51:59 -0800 (PST) Original-Received: from mpn.zrh.corp.google.com ([2620:0:105f:303:6127:4508:f312:7f95]) by smtp.gmail.com with ESMTPSA id o15sm281261wmd.10.2017.03.09.13.51.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Mar 2017 13:51:58 -0800 (PST) Original-Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 15FCE1E0296; Thu, 9 Mar 2017 22:51:57 +0100 (CET) X-Mailer: git-send-email 2.12.0.246.ga2ecc84866-goog X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:130398 Archived-At: I figured that I should probably start versioning the patchsets; starting from v5 is as good as any. The first six patches (up to sigma casing rule) should be uncontroversial and unless there are objections I would like to get them submitted soon: Split casify_object into multiple functions Introduce case_character function Add support for title-casing letters (bug#24603) Split up casify_region function (bug#24603) Support casing characters which map into multiple code points Implement special sigma casing rule (bug#24603) The next patch adds ‘buffer-language’ buffer-local variable. This seems to me as a sensible way of dealing with language-dependent rules and in the future I imagine the variable might be used for more cases, e.g. spell checking should automatically choose a dictionary based on it. But perhaps there is another way which integrates with the rest of Emacs better: Introduce ‘buffer-language’ buffer-local variable The rest are just implementation of various language-specific rules. The implementation seems to be valid but it’s done purely in C which I guess still is a point of contention between me and Eli. Compared to previous versions of the patches, the new implementation is, I believe, a bit cleaner: Implement rules for title-casing Dutch ij ‘letter’ (bug#24603) Implement Turkic dotless and dotted i casing rules (bug#24603) Implement casing rules for Lithuanian (bug#24603) Implement Irish casing rules (bug#24603) The whole thing (plus regex changes not included in this patchset) are available at: git fetch git://github.com/mina86/emacs master admin/unidata/README | 4 + admin/unidata/SpecialCasing.txt | 281 +++++ admin/unidata/unidata-gen.el | 40 + doc/lispref/strings.texi | 23 + etc/NEWS | 22 +- lisp/international/mule-cmds.el | 8 +- src/buffer.c | 8 + src/buffer.h | 8 + src/casefiddle.c | 1269 +++++++++++++++++--- test/lisp/char-fold-tests.el | 12 +- .../casefiddle-resources/irish-lowercase-1-ref.txt | 211 ++++ .../src/casefiddle-resources/irish-lowercase-1.txt | 211 ++++ .../casefiddle-resources/irish-uppercase-1-ref.txt | 105 ++ .../src/casefiddle-resources/irish-uppercase-1.txt | 105 ++ test/src/casefiddle-tests.el | 193 ++- 15 files changed, 2260 insertions(+), 240 deletions(-) create mode 100644 admin/unidata/SpecialCasing.txt create mode 100644 test/src/casefiddle-resources/irish-lowercase-1-ref.txt create mode 100644 test/src/casefiddle-resources/irish-lowercase-1.txt create mode 100644 test/src/casefiddle-resources/irish-uppercase-1-ref.txt create mode 100644 test/src/casefiddle-resources/irish-uppercase-1.txt -- 2.12.0.246.ga2ecc84866-goog