From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Word syntax question Date: Wed, 22 Oct 2008 12:11:20 +0900 Message-ID: <87bpxdtjc7.fsf@xemacs.org> References: <87mygy0ybq.fsf@catnip.gol.com> <87bpxd29ft.fsf@catnip.gol.com> <8763nl1m41.fsf@catnip.gol.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1224644790 25333 80.91.229.12 (22 Oct 2008 03:06:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 22 Oct 2008 03:06:30 +0000 (UTC) Cc: schwab@suse.de, Eli Zaretskii , emacs-devel@gnu.org To: Miles Bader Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Oct 22 05:07:30 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KsU4F-0005yk-U4 for ged-emacs-devel@m.gmane.org; Wed, 22 Oct 2008 05:07:28 +0200 Original-Received: from localhost ([127.0.0.1]:57879 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KsU3A-0002sS-IY for ged-emacs-devel@m.gmane.org; Tue, 21 Oct 2008 23:06:20 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KsU34-0002re-Pt for emacs-devel@gnu.org; Tue, 21 Oct 2008 23:06:14 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KsU33-0002qm-0a for emacs-devel@gnu.org; Tue, 21 Oct 2008 23:06:14 -0400 Original-Received: from [199.232.76.173] (port=47515 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KsU32-0002qg-NW for emacs-devel@gnu.org; Tue, 21 Oct 2008 23:06:12 -0400 Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:41792) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KsU2x-0007ye-Ko; Tue, 21 Oct 2008 23:06:07 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 0DB4E8002; Wed, 22 Oct 2008 12:06:04 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 4F0D31A26AE; Wed, 22 Oct 2008 12:11:20 +0900 (JST) In-Reply-To: <8763nl1m41.fsf@catnip.gol.com> X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta28) "fuki" 83e35df20028+ XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:104812 Archived-At: Miles Bader writes: > Eli Zaretskii writes: > >> > See char-script-table, forward-word also stops at a script boundary. > >> > >> That seems kind of broken in this case -- it's quite common for > >> "phonetic" characters to be intermixed in a word with latin characters, > >> and certainly nobody thinks of those boundaries as being word > >> boundaries. > > > > I agree. I think we should introduce a user option to control whether > > it stops on script boundaries or not, because sometimes it makes > > sense, sometimes it doesn't. > > But a global setting seems far too course, and in general, whether it's > "right" or not seems like it depends more on the precise mixture of > scripts rather than a user's personal preferences. AFAIK Unicode has solved this problem, but I forget where I saw it. If my memory is correct, that supports Miles's opinion. In general, I think that if the scripts are for different human languages, it's almost always the case that a script boundary is a word boundary. (But I'm biased, because I deal with that daily in ordinary Japanese text, where that is the case.) If one script is not language-specific (IPA is really the only one I can think of), it's not. Note that for something like Japanese which has three separate scripts (hiragana, katakana, and kanji) which are separately standardized (JIS X 0201 for katakana, and JIS X 0213 for the others) this care for different scripts, same language already needs to be made. So it seems to me that an exceptional case for IPA (make it a member of all language groups, or perhaps of those that use the Latin alphabet?) should be sufficient.