From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij =?UTF-8?Q?=E2=80=98letter=E2=80=99?= (bug#24603) Date: Thu, 16 Mar 2017 22:30:52 +0100 Organization: http://mina86.com/ Message-ID: References: <20170309215150.9562-1-mina86@mina86.com> <20170309215150.9562-9-mina86@mina86.com> <83a88sdudb.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1489699877 23991 195.159.176.226 (16 Mar 2017 21:31:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 16 Mar 2017 21:31:17 +0000 (UTC) Cc: 24603@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Thu Mar 16 22:31:12 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1coczE-0005D2-6w for geb-bug-gnu-emacs@m.gmane.org; Thu, 16 Mar 2017 22:31:08 +0100 Original-Received: from localhost ([::1]:45876 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1coczK-0001MW-7S for geb-bug-gnu-emacs@m.gmane.org; Thu, 16 Mar 2017 17:31:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43553) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1coczC-0001ME-Rj for bug-gnu-emacs@gnu.org; Thu, 16 Mar 2017 17:31:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cocz8-0006Gg-Ty for bug-gnu-emacs@gnu.org; Thu, 16 Mar 2017 17:31:06 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:32925) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cocz8-0006GW-IO for bug-gnu-emacs@gnu.org; Thu, 16 Mar 2017 17:31:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1cocz8-0004Y0-6C for bug-gnu-emacs@gnu.org; Thu, 16 Mar 2017 17:31:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 16 Mar 2017 21:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.148969986117473 (code B ref 24603); Thu, 16 Mar 2017 21:31:02 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 16 Mar 2017 21:31:01 +0000 Original-Received: from localhost ([127.0.0.1]:59357 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cocz7-0004Xl-36 for submit@debbugs.gnu.org; Thu, 16 Mar 2017 17:31:01 -0400 Original-Received: from mail-wm0-f52.google.com ([74.125.82.52]:35760) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cocz5-0004XT-F2 for 24603@debbugs.gnu.org; Thu, 16 Mar 2017 17:30:59 -0400 Original-Received: by mail-wm0-f52.google.com with SMTP id u132so2217877wmg.0 for <24603@debbugs.gnu.org>; Thu, 16 Mar 2017 14:30:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:from:to:cc:subject:in-reply-to:organization:references:face :openpgp:date:message-id:mime-version:content-transfer-encoding; bh=LFOmQLrLShb+6g66VG88PM+iSJZa9Sx9ioQTuj5Swlw=; b=vRSkNBDS7Qch/lB0s98YQ3FNVvCoGarRT8cEn5/eUroE1s1QMMDHFWZZ/h7FMlazsB o3kWF8fjP/bGSz3JhvVKcD9SlmhBNNbRtE5Zr4jV8LE6UEvu/fPISPJCoDr8O7+5wQpn MOZtDWcC8/WYeAGxDhYzKQm6iIjyP7kpiHCLTTO6ad6Q/kx/bqcBxGwrnOCy8RfK8TWn GRFMhUkOh4XSpb6NaX1T2RMBB3bTI3LzOpvbvdG2GVFyzogM6ZHqA5p0FSO/i6eCbYWy /Q+h3qmTgnaL+qB70/QD4vMKf1VCR3U1Jn4XVPJw5s0+ATFJOkHm1gPIyimpEBsMI3uj r11g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:face:openpgp:date:message-id:mime-version :content-transfer-encoding; bh=LFOmQLrLShb+6g66VG88PM+iSJZa9Sx9ioQTuj5Swlw=; b=DZmvRBS88z0Ifl+Qa0JqTY1/LiLdoUa+swpyhk5jwa/M07A0D8/Ghu5QNuBbEtMg/r zDzffUo6q5eIS+P8QKtCMB7W2NR6PjZ/DTUsU5+p/xTm3oAI7ff0u6rNtZIp6esUTBCi Alo8sMuBzLuOA+a2/faaU0cGaaMpYCrhEV/M2wPUAOZ7/PfEBNx6ZrOZFgBEaVDPooCB TfMrAaAKiCPbODciGyMsjiYj06TZ6Kh6z8D8lB8bDAOgb9nJWSi1h/B8ILzK3rYBhTTd pm1TDVg+DYjzC05YShOgxjpn1h1R/xCm1g261kzcwMciLdFg35P4WFxY1QSDlaEHxPpS ta2w== X-Gm-Message-State: AFeK/H1bXHYBFPVcsmUx768dYn8N+HLNpOyxs0Y4MnZYLvSZSo+NXKWZBMVBNtYDPKMxEupU X-Received: by 10.28.168.150 with SMTP id r144mr11356923wme.43.1489699853524; Thu, 16 Mar 2017 14:30:53 -0700 (PDT) Original-Received: from mpn-glaptop ([2620:0:105f:fd00:ad19:4371:939e:5974]) by smtp.gmail.com with ESMTPSA id p185sm290246wme.20.2017.03.16.14.30.52 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Thu, 16 Mar 2017 14:30:52 -0700 (PDT) In-Reply-To: <83a88sdudb.fsf@gnu.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4 fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& OpenPGP: id=AC1F5F5CD41888F8CC8458582060401250751FF4; url=http://mina86.com/mina86.pub X-Hashcash: 1:20:170316:24603@debbugs.gnu.org::OhF/Uu4yGZuJwfxh:00000000000000000000000000000000000000000BX5 X-Hashcash: 1:20:170316:eliz@gnu.org::dhAjBdqlxqeIH/uf:000001p/r X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:130654 Archived-At: On Sat, Mar 11 2017, Eli Zaretskii wrote: >> From: Michal Nazarewicz >> Date: Thu, 9 Mar 2017 22:51:47 +0100 >>=20 >> + /* In Dutch, =E2=80=98ij=E2=80=99 is a digraph and when capitalised= the whole thing is upper >> + cased. Unicode has =E2=80=98=C4=B3=E2=80=99 and =E2=80=98=C4=B2= =E2=80=99 (with proper casing mappings) but they >> + aren=E2=80=99t always used so we cannot/should not rely on them. >> + >> + Note that rule for capitalising =E2=80=98ij=E2=80=99 as a single= letter is not present in >> + Unicode 9.0=E2=80=99s SpecialCasing.txt. On the flip side, Fire= fox implements >> + this as well so we=E2=80=99re not completely alone. > > If this is not mandated by Unicode 9.0 (and not by the latest draft of > 10.0, AFAICS), shouldn't we have a user option for this, by default > off? I don=E2=80=99t really see why. If the goal is to implement Unicode then =E2=80=98ij=E2=80=99 handling shou= ld not be implemented at all and Unicode-mandated behaviour should not be configurable, but implementing Unicode is a mean, not a goal in itself. Rather, the goal is to properly case strings and while Unicode is helpful in that it=E2=80=99s not the whole story. And if user are allowed to disable =E2=80=98ij=E2=80=99 handling, they shou= ld also be allowed to disable Turkish =E2=80=98i=E2=80=99 handling. >> + There are words where =E2=80=98ij=E2=80=99 are two separate lett= ers (such as bijectie or >> + bijoux) in which case the capitalisation rules do not apply. I = (mina86) >> + have googled this a little and couldn=E2=80=99t find a Dutch wor= d which beings >> + with =E2=80=98ij=E2=80=99 that is not a digraph so we should be = in the clear since we >> + only care about the initial. */ > > I'm not sure I get this right: does this mean that writing in English > (or any other non-Dutch language) in a Dutch locale will automatically > capitalize "ij" to "IJ", just because the default value of > buffer-language is "nl_NL" or somesuch, and no specific language was > set for the buffer? Wouldn't that surprise users? Yes it does. And yes it would. This is currently the biggest blocker/concern for all the patches past 07/11 and I=E2=80=99m still wondering what would be the best solution. I thought about having a =E2=80=98language=E2=80=99 string property so that= programming major modes would mark everything outside of comments as a =E2=80=98nil=E2= =80=99 language. This would require support from multiple major modes and likely complicate them.=C2=B9 Or perhaps have off-by-default =E2=80=98special-casing-mode=E2=80=99 which = enables language-dependent casing rules. Similar effect could be accomplished by replacing the =E2=80=98buffer-language=E2=80=99 with nil-by-default =E2= =80=98casing-locale=E2=80=99 variable applicable only to casing, but I would miss =E2=80=98buffer-langua= ge=E2=80=99 since I believe it might get used for other things. =C2=B9 Having string property could still be an option in the future of course and it might allow fancy things like:

Iceland=E2=80= =99s name in Dutch is Ijsland

. --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB