From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Word boundary (was: find-composition still depends on the composition property) Date: Fri, 24 Oct 2008 02:48:44 +0300 Organization: JURTA Message-ID: <87mygusydi.fsf_-_@jurta.org> References: <87tzbh7kd9.fsf@jurta.org> <87tzb5ikrw.fsf@jurta.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1224806255 11559 80.91.229.12 (23 Oct 2008 23:57:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 23 Oct 2008 23:57:35 +0000 (UTC) Cc: Eli Zaretskii , emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 24 01:58:35 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KtA4Y-0000lG-KS for ged-emacs-devel@m.gmane.org; Fri, 24 Oct 2008 01:58:34 +0200 Original-Received: from localhost ([127.0.0.1]:44423 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KtA3T-0002lS-6p for ged-emacs-devel@m.gmane.org; Thu, 23 Oct 2008 19:57:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KtA3J-0002iU-TQ for emacs-devel@gnu.org; Thu, 23 Oct 2008 19:57:18 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KtA3J-0002i8-9S for emacs-devel@gnu.org; Thu, 23 Oct 2008 19:57:17 -0400 Original-Received: from [199.232.76.173] (port=59298 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KtA3I-0002hv-VK for emacs-devel@gnu.org; Thu, 23 Oct 2008 19:57:17 -0400 Original-Received: from relay01.kiev.sovam.com ([62.64.120.200]:4422) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KtA3H-0005wG-2o; Thu, 23 Oct 2008 19:57:15 -0400 Original-Received: from [83.170.232.243] (helo=smtp.svitonline.com) by relay01.kiev.sovam.com with esmtp (Exim 4.67) (envelope-from ) id 1KtA3A-0005vp-VK; Fri, 24 Oct 2008 02:57:09 +0300 In-Reply-To: (Kenichi Handa's message of "Thu, 23 Oct 2008 10:18:22 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (x86_64-pc-linux-gnu) X-Scanner-Signature: b22dd54e5e410b0526d530b08df0ebb5 X-DrWeb-checked: yes X-SpamTest-Envelope-From: juri@jurta.org X-SpamTest-Group-ID: 00000000 X-SpamTest-Header: Trusted X-SpamTest-Info: Profiles 5467 [Oct 22 2008] X-SpamTest-Info: {received from trusted relay: common white list} X-SpamTest-Info: {HEADERS: header Content-Type found without required header Content-Transfer-Encoding} X-SpamTest-Method: white ip list X-SpamTest-Rate: 10 X-SpamTest-Status: Trusted X-SpamTest-Status-Extended: trusted X-SpamTest-Version: SMTP-Filter Version 3.0.0 [0278], KAS30/Release X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.8-5.1 (or MacOS X 10.2-10.3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:104918 Archived-At: > Then what to do is: > > (1-1) assign the category "6" (digit) to "0123456789". > (1-2) define a category, say "D", and assign it to all > characters that have no word-boundary between digits. > (1-3) add (?D . ?6) and (?6 . ?D) to word-combining-categories. > > Another way is: > > (2-1) modify word_boundary_p to handle negative category mnemonic in > word-*-categories to catch a character that doesn't have the > specified category. > (2-2) assign the category "6" (digit) to "0123456789". > (2-3) define a category, say "X", and assign it to all > characters that have word-boundary between digits. > (2-4) add ((- ?X) . ?6) and (?6 . (- ?X)) to > word-combining-categories. > > Or, > > (3-1) Make `common' script and classify digits, etc to it. > (3-2) modify word_boundary_p not to distinguish `common' from > any other script. > (3-3) define a category, say "X", and assign it to all > characters that have word-boundary between digits. > (3-4) add (?X . ?6) and (?6 . ?X) to > word-separating-categories. Do you know how many scripts require word boundaries between letters and digits? Does the Unicode standard specify this? If the majority of scripts does not require word boundaries, then we could define a new category only for few exceptions, and vice versa. -- Juri Linkov http://www.jurta.org/emacs/