From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#24603: [PATCHv5 00/11] Casing improvements Date: Sat, 11 Mar 2017 12:00:12 +0200 Message-ID: <837f3wdtg3.fsf@gnu.org> References: <20170309215150.9562-1-mina86@mina86.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1489226474 14724 195.159.176.226 (11 Mar 2017 10:01:14 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 11 Mar 2017 10:01:14 +0000 (UTC) Cc: 24603@debbugs.gnu.org To: Michal Nazarewicz Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Mar 11 11:01:11 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmdpj-00031G-7M for geb-bug-gnu-emacs@m.gmane.org; Sat, 11 Mar 2017 11:01:07 +0100 Original-Received: from localhost ([::1]:42642 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmdpp-0005sC-3M for geb-bug-gnu-emacs@m.gmane.org; Sat, 11 Mar 2017 05:01:13 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47209) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmdph-0005rd-Tc for bug-gnu-emacs@gnu.org; Sat, 11 Mar 2017 05:01:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmdpe-0006FY-KY for bug-gnu-emacs@gnu.org; Sat, 11 Mar 2017 05:01:05 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:52084) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cmdpe-0006FR-HA for bug-gnu-emacs@gnu.org; Sat, 11 Mar 2017 05:01:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cmdpe-0004jz-9y for bug-gnu-emacs@gnu.org; Sat, 11 Mar 2017 05:01:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 11 Mar 2017 10:01:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.148922644018184 (code B ref 24603); Sat, 11 Mar 2017 10:01:02 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 11 Mar 2017 10:00:40 +0000 Original-Received: from localhost ([127.0.0.1]:50283 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmdpH-0004jE-Nh for submit@debbugs.gnu.org; Sat, 11 Mar 2017 05:00:39 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:56101) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cmdpH-0004j3-0R for 24603@debbugs.gnu.org; Sat, 11 Mar 2017 05:00:39 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cmdp8-0005ll-R5 for 24603@debbugs.gnu.org; Sat, 11 Mar 2017 05:00:34 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41544) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cmdp8-0005lY-O7; Sat, 11 Mar 2017 05:00:30 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4808 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cmdp8-0005kM-2w; Sat, 11 Mar 2017 05:00:30 -0500 In-reply-to: <20170309215150.9562-1-mina86@mina86.com> (message from Michal Nazarewicz on Thu, 9 Mar 2017 22:51:39 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:130476 Archived-At: > From: Michal Nazarewicz > Date: Thu, 9 Mar 2017 22:51:39 +0100 > > The first six patches (up to sigma casing rule) should be > uncontroversial and unless there are objections I would like to get > them submitted soon: > > Split casify_object into multiple functions > Introduce case_character function > Add support for title-casing letters (bug#24603) > Split up casify_region function (bug#24603) > Support casing characters which map into multiple code points > Implement special sigma casing rule (bug#24603) Fine with me, modulo a few comments I posted to these 6 patches. > The next patch adds ‘buffer-language’ buffer-local variable. This > seems to me as a sensible way of dealing with language-dependent rules > and in the future I imagine the variable might be used for more > cases, e.g. spell checking should automatically choose a dictionary > based on it. But perhaps there is another way which integrates with > the rest of Emacs better: > > Introduce ‘buffer-language’ buffer-local variable I think we should rather introduce a _function_ named buffer-language, so that it's easier to extend this mechanism in the future to more sophisticated and more fine-grained methods of determining the language, such as text properties and/or overlays with special properties. The function could for now just return the value of a buffer-specific variable, but I wouldn't expose and advertise that variable so much as your code does. In addition, your implementation seems to assume that the language rules are independent of the country where that language is used, i.e. that nl_NL and nl_BE will necessarily use the same rules for case conversions. Is this a good assumption? Collation rules definitely do depend on the country as well, AFAIK. > The rest are just implementation of various language-specific rules. > The implementation seems to be valid but it’s done purely in C which > I guess still is a point of contention between me and Eli. Yes, I'd still prefer that as much of the rules as possible be specified in Lisp, thus avoiding the need to hard-code Unicode codepoints and the associated rules in C. I understand that the support for each kind of rule should be available in C before the rules can be used, but once such support is there, having the spec in Lisp will allow us easier maintenance in the future, easier expansion of this to cover additional languages that use the same types of rules, and, with time, perhaps also automatic derivation of the rules from the Unicode data files, thus providing for easier updates when a new version of Unicode is incorporated. So I'd still urge you to try to refactor the code so that as much as is feasible of the rules is implemented as a Lisp database. But I won't reject these patches if you don't want to do such refactoring Thanks.