From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.help Subject: Re: word boundaries in Asian languages Date: Tue, 20 Aug 2013 09:11:36 +0800 Message-ID: <87ob8t88tz.fsf@ericabrahamsen.net> References: <87vc329dtf.fsf@ericabrahamsen.net> <83bo4tlkev.fsf@gnu.org> <877gfh7g00.fsf@zigzag.favinet> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1376961212 1702 80.91.229.3 (20 Aug 2013 01:13:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 20 Aug 2013 01:13:32 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Aug 20 03:13:32 2013 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VBaVo-0007Lz-9M for geh-help-gnu-emacs@m.gmane.org; Tue, 20 Aug 2013 03:13:32 +0200 Original-Received: from localhost ([::1]:45487 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBaVn-0005aa-P2 for geh-help-gnu-emacs@m.gmane.org; Mon, 19 Aug 2013 21:13:31 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60250) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBaTh-00022w-30 for help-gnu-emacs@gnu.org; Mon, 19 Aug 2013 21:11:27 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VBaTX-0000WF-3W for help-gnu-emacs@gnu.org; Mon, 19 Aug 2013 21:11:21 -0400 Original-Received: from plane.gmane.org ([80.91.229.3]:33727) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBaTW-0000Vv-Pp for help-gnu-emacs@gnu.org; Mon, 19 Aug 2013 21:11:11 -0400 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1VBaTU-0006HW-QQ for help-gnu-emacs@gnu.org; Tue, 20 Aug 2013 03:11:08 +0200 Original-Received: from 114.252.246.79 ([114.252.246.79]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 20 Aug 2013 03:11:08 +0200 Original-Received: from eric by 114.252.246.79 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 20 Aug 2013 03:11:08 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 26 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 114.252.246.79 User-Agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.3 (gnu/linux) Cancel-Lock: sha1:G4OZ3z+FtnGr+kFJAy8nzyTMWWg= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:93005 Archived-At: Thien-Thi Nguyen writes: > () Eli Zaretskii > () Mon, 19 Aug 2013 19:23:04 +0300 > > The right place to discuss this is emacs-devel, not here. > > Emacs provides var ‘find-word-boundary-function-table’, used in > Capitalized Words Mode (see lisp/progmodes/cap-words.el). There are > several "subword" modes, Thai Word Mode (lisp/language/thai-util.el), > etc. > > Perhaps some ideas and techniques there can be used in this context by > normal users (with adventurous spirit :-D) here, as well. Whoa, no kidding. The language support for Thai does exactly what I was thinking of (though in elisp, not C), and it does it by loading the entire Thai language into memory! I've already got all of Chinese loaded into memory for the wubi input method, so theoretically something could be done there... If I manage to do anything with this, I'll post to emacs-devel. Thanks! Eric