From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: William Xu Newsgroups: gmane.emacs.devel Subject: Re: chinese word mode Date: Wed, 06 Nov 2013 23:37:10 +0800 Organization: the Church of Emacs Message-ID: References: <87mwljciwc.fsf@ericabrahamsen.net> <87txfqggn3.fsf@ericabrahamsen.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1383752305 671 80.91.229.3 (6 Nov 2013 15:38:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 6 Nov 2013 15:38:25 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Nov 06 16:38:29 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ve5Bc-0002Iz-5I for ged-emacs-devel@m.gmane.org; Wed, 06 Nov 2013 16:38:28 +0100 Original-Received: from localhost ([::1]:34540 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ve5Bb-000224-N3 for ged-emacs-devel@m.gmane.org; Wed, 06 Nov 2013 10:38:27 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:59175) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ve5BT-00021x-Ql for emacs-devel@gnu.org; Wed, 06 Nov 2013 10:38:25 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ve5BO-0002Cb-5A for emacs-devel@gnu.org; Wed, 06 Nov 2013 10:38:19 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:57849) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ve5BN-0002CP-V3 for emacs-devel@gnu.org; Wed, 06 Nov 2013 10:38:14 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Ve5BM-0001h0-Rz for emacs-devel@gnu.org; Wed, 06 Nov 2013 16:38:12 +0100 Original-Received: from 182.48.101.22 ([182.48.101.22]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 06 Nov 2013 16:38:12 +0100 Original-Received: from william.xwl by 182.48.101.22 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 06 Nov 2013 16:38:12 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 56 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 182.48.101.22 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (darwin) Cancel-Lock: sha1:3gKHIC8e+BfNl8+JcjKLfGA+1uU= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:165005 Archived-At: Eric Abrahamsen writes: > Eric Abrahamsen writes: > > [...] > >> (define-minor-mode thai-word-mode >> :global t :group 'mule >> (cond (thai-word-mode >> ;; This enables linebreak between Thai characters. >> (modify-category-entry (make-char 'thai-tis620) ?|) >> ;; This enables linebreak at a Thai word boundary. >> (put-charset-property 'thai-tis620 'fill-find-break-point-function >> 'thai-fill-find-break-point)) >> (t >> (modify-category-entry (make-char 'thai-tis620) ?| nil t) >> (put-charset-property 'thai-tis620 'fill-find-break-point-function >> nil)))) >> > > [...] > >> My buffers are utf-8 encoded, and describe-char on a Chinese character >> shows "preferred charset: unicode-bmp". So what do I put for the charset >> in order to make these functions target the right characters? Chinese >> characters all seem to have the "|" line-breakable category by default, >> but (I think) I can only add the custom fill break point function one >> charset at a time. > > I've tried slapping the 'fill-find-break-point-function onto the > 'unicode charset for now, and it works fine because the function only > does anything if point is in the midst of Chinese. It presumably gets > applied to all characters, though, and that can't be a real solution. modify-category-entry also accepts a range cons, where you can select Chinese characters by range. For example, (#x3400 . #x4DBF) ; CJK Unified Ideographs Extension A (#x4E00 . #x9FFF) ; CJK Unified Ideographs (#xF900 . #xFAFF) ; CJK Compatibility Ideographs put-charset-property seems only accepts a charset.. > I'm guessing I'll need to separate simplified and traditional word sets > and make two versions of the mode. Both modes will loop through their > applicable charsets and apply/remove the custom break point function. > > Assuming I fix this problem and other inevitable bugs, would this > library be of general interest to Emacs? It can make those word movement functions useful. :) -- William http://xwl.appspot.com