From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Word syntax question
Date: Wed, 22 Oct 2008 12:11:20 +0900
Message-ID: <87bpxdtjc7.fsf@xemacs.org>
References: <87mygy0ybq.fsf@catnip.gol.com> <jek5c2gd2q.fsf@sykes.suse.de>
	<87bpxd29ft.fsf@catnip.gol.com> <ur669g8yu.fsf@gnu.org>
	<8763nl1m41.fsf@catnip.gol.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1224644790 25333 80.91.229.12 (22 Oct 2008 03:06:30 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Wed, 22 Oct 2008 03:06:30 +0000 (UTC)
Cc: schwab@suse.de, Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
To: Miles Bader <miles@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Oct 22 05:07:30 2008
connect(): Connection refused
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KsU4F-0005yk-U4
	for ged-emacs-devel@m.gmane.org; Wed, 22 Oct 2008 05:07:28 +0200
Original-Received: from localhost ([127.0.0.1]:57879 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KsU3A-0002sS-IY
	for ged-emacs-devel@m.gmane.org; Tue, 21 Oct 2008 23:06:20 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KsU34-0002re-Pt
	for emacs-devel@gnu.org; Tue, 21 Oct 2008 23:06:14 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KsU33-0002qm-0a
	for emacs-devel@gnu.org; Tue, 21 Oct 2008 23:06:14 -0400
Original-Received: from [199.232.76.173] (port=47515 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KsU32-0002qg-NW
	for emacs-devel@gnu.org; Tue, 21 Oct 2008 23:06:12 -0400
Original-Received: from mtps02.sk.tsukuba.ac.jp ([130.158.97.224]:41792)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <stephen@xemacs.org>)
	id 1KsU2x-0007ye-Ko; Tue, 21 Oct 2008 23:06:07 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mtps02.sk.tsukuba.ac.jp (Postfix) with ESMTP id 0DB4E8002;
	Wed, 22 Oct 2008 12:06:04 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 4F0D31A26AE; Wed, 22 Oct 2008 12:11:20 +0900 (JST)
In-Reply-To: <8763nl1m41.fsf@catnip.gol.com>
X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta28) "fuki" 83e35df20028+ XEmacs
	Lucid (x86_64-unknown-linux)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:104812
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/104812>

Miles Bader writes:
 > Eli Zaretskii <eliz@gnu.org> writes:
 > >> > See char-script-table, forward-word also stops at a script boundary.
 > >> 
 > >> That seems kind of broken in this case -- it's quite common for
 > >> "phonetic" characters to be intermixed in a word with latin characters,
 > >> and certainly nobody thinks of those boundaries as being word
 > >> boundaries.
 > >
 > > I agree.  I think we should introduce a user option to control whether
 > > it stops on script boundaries or not, because sometimes it makes
 > > sense, sometimes it doesn't.
 > 
 > But a global setting seems far too course, and in general, whether it's
 > "right" or not seems like it depends more on the precise mixture of
 > scripts rather than a user's personal preferences.

AFAIK Unicode has solved this problem, but I forget where I saw it.
If my memory is correct, that supports Miles's opinion.

In general, I think that if the scripts are for different human
languages, it's almost always the case that a script boundary is a
word boundary.  (But I'm biased, because I deal with that daily in
ordinary Japanese text, where that is the case.)  If one script is not
language-specific (IPA is really the only one I can think of), it's
not.  Note that for something like Japanese which has three separate
scripts (hiragana, katakana, and kanji) which are separately
standardized (JIS X 0201 for katakana, and JIS X 0213 for the others)
this care for different scripts, same language already needs to be made.

So it seems to me that an exceptional case for IPA (make it a member
of all language groups, or perhaps of those that use the Latin
alphabet?) should be sufficient.