From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Drew Adams" Newsgroups: gmane.emacs.bugs Subject: bug#13041: 24.2; diacritic-fold-search Date: Wed, 5 Dec 2012 07:38:10 -0800 Message-ID: <611DD154E83240D183A7B5B88691DC37@us.oracle.com> References: <20121130182205.C722F14B8D@panix1.panix.com><87hao69b5r.fsf@mail.jurta.org><20665.8224.844876.619203@panix5.panix.com><87hao6zko4.fsf@mail.jurta.org> <83fw3qtboc.fsf@gnu.org><87hao5jqu3.fsf@mail.jurta.org> <50BB93C2.1050007@gmx.at><83y5hgs564.fsf@gnu.org> <50BC7BF5.2020400@gmx.at><83hao3rskd.fsf@gnu.org> <50BCE49D.6010001@gmx.at><837gozrp8f.fsf@gnu.org> <50BE38F3.3030907@gmx.at> <3E2D742BA0FC44B7A61665D85AAC3712@us.oracle.com> <50BF1702.4020100@gmx.at> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1354721947 24161 80.91.229.3 (5 Dec 2012 15:39:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 5 Dec 2012 15:39:07 +0000 (UTC) Cc: perin@panix.com, perin@acm.org, 13041@debbugs.gnu.org To: "'martin rudalics'" Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Dec 05 16:39:20 2012 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TgH4B-0006sH-HX for geb-bug-gnu-emacs@m.gmane.org; Wed, 05 Dec 2012 16:39:19 +0100 Original-Received: from localhost ([::1]:52398 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TgH3z-0008H3-CW for geb-bug-gnu-emacs@m.gmane.org; Wed, 05 Dec 2012 10:39:07 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:38242) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TgH3t-0008GV-MX for bug-gnu-emacs@gnu.org; Wed, 05 Dec 2012 10:39:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TgH3n-0004TT-HW for bug-gnu-emacs@gnu.org; Wed, 05 Dec 2012 10:39:01 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:44886) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TgH3n-0004TO-EY for bug-gnu-emacs@gnu.org; Wed, 05 Dec 2012 10:38:55 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TgH3u-0001hT-S3 for bug-gnu-emacs@gnu.org; Wed, 05 Dec 2012 10:39:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: "Drew Adams" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 05 Dec 2012 15:39:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13041 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13041-submit@debbugs.gnu.org id=B13041.13547219156488 (code B ref 13041); Wed, 05 Dec 2012 15:39:02 +0000 Original-Received: (at 13041) by debbugs.gnu.org; 5 Dec 2012 15:38:35 +0000 Original-Received: from localhost ([127.0.0.1]:55134 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TgH3T-0001gZ-6t for submit@debbugs.gnu.org; Wed, 05 Dec 2012 10:38:35 -0500 Original-Received: from userp1040.oracle.com ([156.151.31.81]:21660) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TgH3O-0001gP-9s for 13041@debbugs.gnu.org; Wed, 05 Dec 2012 10:38:31 -0500 Original-Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94]) by userp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qB5FcJiX029884 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 5 Dec 2012 15:38:19 GMT Original-Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qB5FcIk5019189 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 5 Dec 2012 15:38:18 GMT Original-Received: from abhmt107.oracle.com (abhmt107.oracle.com [141.146.116.59]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qB5FcHPW022064; Wed, 5 Dec 2012 09:38:17 -0600 Original-Received: from dradamslap1 (/10.159.232.122) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 05 Dec 2012 07:38:17 -0800 X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <50BF1702.4020100@gmx.at> Thread-Index: Ac3SzNkIeOhfShQLQAypKA2c11bTAgAKwFpw X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:67968 Archived-At: > `ignore-diacritics' is misleading. The variable would have > to be called `observe-decompositions' or something the like. 1. "Observe decompositions" doesn't mean anything to me. The verb should probably be more active - what does it mean to observe the char decompositions here? BTW, if we use "decomposition" in the name and description then we should probably also use "char" - this is not about decomposing strings in some way (whatever that might mean); it involves decomposing Unicode characters. 2. But my confusion over the name/description is in fact wrt function `decomposed-string-lessp': I guess it's not 100% clear to me what it does. Your doc string said "STRING1 is decomposition-less than STRING2", which confuses me. And it is a bit ambiguous wrt "-less": a. decomposition-less as in comparing the strings only after removing (some parts of) their decompositions (i.e., "-less" as in "sans")? or b. -lessp as in `string<': a comparison ordering relation? In the version of `decomposed-string-lessp' that I sent, I changed the doc string to this: "decomposed STRING1 is less than decomposed STRING2". But that is no doubt incorrect (less correct than yours, if perhaps clearer). In particular, it says nothing about how we compare the two decompositions. In practical (use) terms, this is typically about ignoring diacritics, keeping only the "base" characters. Something about that should at least be mentioned in the doc, so that users know they can use this for that. But IIUC this is not just about diacritics; it sometimes might not be about diacritics at all; and diacritics present are sometimes not ignored. E.g., the ligature ffi gets treated the same as the 3 chars f f i. There are no diacritics present in that case. IIUC, we convert the two strings to their Unicode decompositions and then use the Unicode char compatibility specs to compare the decompositions. IOW, we treat equivalent chars, as defined by Unicode, as the same. Perhaps the name/description should speak in terms of Unicode char compatibility or equivalence. Perhaps a name like `string-less-compat-p'? Or `Unicode-equivalent-p'? Or `string-equivalent-p'? How would you characterize what the function does? No doubt Eli can help here. It is important to try to get the function name and description right from the outset, if we can. If the Unicode standard has some terminology that applies here then perhaps we can/should leverage that. Beyond the name and an accurate description, the doc should, as I say, at least mention that you can use this to ignore diacritics (such as accents), as that will be a common use case.