From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.devel Subject: Re: chinese word mode Date: Sat, 09 Nov 2013 10:51:39 +0800 Message-ID: <87vc022sp0.fsf@ericabrahamsen.net> References: <87zjpg2yt0.fsf@gnu.org> <87li0z5zur.fsf@ericabrahamsen.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1383965439 4106 80.91.229.3 (9 Nov 2013 02:50:39 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 9 Nov 2013 02:50:39 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 09 03:50:43 2013 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VeydG-0007rA-03 for ged-emacs-devel@m.gmane.org; Sat, 09 Nov 2013 03:50:42 +0100 Original-Received: from localhost ([::1]:54170 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeydF-0002ig-G7 for ged-emacs-devel@m.gmane.org; Fri, 08 Nov 2013 21:50:41 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51727) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Veyd6-0002hc-CB for emacs-devel@gnu.org; Fri, 08 Nov 2013 21:50:39 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Veycy-0003yo-Tk for emacs-devel@gnu.org; Fri, 08 Nov 2013 21:50:32 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:60473) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Veycy-0003yi-AA for emacs-devel@gnu.org; Fri, 08 Nov 2013 21:50:24 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Veycv-0007Rk-3T for emacs-devel@gnu.org; Sat, 09 Nov 2013 03:50:21 +0100 Original-Received: from 123.122.38.242 ([123.122.38.242]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 09 Nov 2013 03:50:21 +0100 Original-Received: from eric by 123.122.38.242 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 09 Nov 2013 03:50:21 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 28 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 123.122.38.242 User-Agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.3 (gnu/linux) Cancel-Lock: sha1:X9ccdK48WQX6bFASoQzJlsZrfEs= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:165086 Archived-At: Xue Fuqiao writes: > Looks interesting, although sometimes it can be ambiguous. > > For example: > > * `化妆和服装' can split into either `化妆 和 服装' or `化妆 和服 装'; > * In `这个门把手坏了', `把手' is a word, but in `请把手拿开', `把手' is not a word; > * In `将军任命了一名中将', `中将' is a word, but in `产量三年中将增长两倍', `中将' isn't a > word any more. > > How do you solve this problem? Short answer: you don't! When I first started looking at this issue, I was considering all kinds of complicated solutions involving external libraries by people smarter than me, who presumably had a system for syntactical analysis. Then someone pointed me at thai-word.el, which takes the "dumb" approach -- scanning forward for the longest string in a word list -- and I realized if I was going to get anything actually completed, this would have to do. Mainly the point is making navigation through Chinese prose a little less annoying, not producing a "correct" solution. As you note, there will always be ambiguities that can't be resolved. Eric