From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Oleksandr Gavenko Newsgroups: gmane.emacs.bugs Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Date: Sat, 10 Sep 2016 20:12:57 +0300 Organization: Oleksandr Gavenko , http://defun.work/ Message-ID: <87inu3vfty.fsf@gavenkoa.example.com> References: <87mvjgupau.fsf@gavenkoa.example.com> <83lgz083ze.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1473527665 15455 195.159.176.226 (10 Sep 2016 17:14:25 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 10 Sep 2016 17:14:25 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) Cc: 24405@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Sep 10 19:14:21 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bilr3-0002UZ-L9 for geb-bug-gnu-emacs@m.gmane.org; Sat, 10 Sep 2016 19:14:13 +0200 Original-Received: from localhost ([::1]:35153 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bilr1-0008Um-HZ for geb-bug-gnu-emacs@m.gmane.org; Sat, 10 Sep 2016 13:14:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50121) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bilqw-0008UV-6o for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 13:14:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bilqr-0005kf-Ua for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 13:14:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:58031) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bilqr-0005ka-Qk for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 13:14:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bilqr-0004zC-NI for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 13:14:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Oleksandr Gavenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 10 Sep 2016 17:14:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: notabug Original-Received: via spool by 24405-submit@debbugs.gnu.org id=B24405.147352759219101 (code B ref 24405); Sat, 10 Sep 2016 17:14:01 +0000 Original-Received: (at 24405) by debbugs.gnu.org; 10 Sep 2016 17:13:12 +0000 Original-Received: from localhost ([127.0.0.1]:55743 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bilq3-0004y1-Pm for submit@debbugs.gnu.org; Sat, 10 Sep 2016 13:13:11 -0400 Original-Received: from mail-lf0-f46.google.com ([209.85.215.46]:35047) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bilq1-0004xl-No for 24405@debbugs.gnu.org; Sat, 10 Sep 2016 13:13:10 -0400 Original-Received: by mail-lf0-f46.google.com with SMTP id l131so63190675lfl.2 for <24405@debbugs.gnu.org>; Sat, 10 Sep 2016 10:13:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:organization:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=GjP4HoXEmakNU2uzAHIi7AnKJeSD+5CjzzcLYoBZfFo=; b=Tauh5xwryjDq1UwHVPMQWuvoiIuFhKcpfIpXHpWqK+piefS1uNyuNV8MvhKojt/QjN 4Ep8ZvTVK4/n+xh2AEV3/c6BO/g2ogfhiOAjLb9VQTkN9pTRBz/S0Z5jlJGU6nR+eM5k FisBe4/8vJ15uwSP0/5xRmVqNKdtfiJH/n5/T8ZmMGZXMdo8yZdPncAFWMh2yqSP8RUN mg/SaFybjrd0DtirgjSfLKb4ZmVqSe57fYMmudAuAIEJyoSHoo1JrAtJ2z0wTamCkxds U8vr9cvU7qOm5VjT+7EYODch49h2xAiLKx/j4ChYQVIfEuBUup1j3NPjktvaaTHQhkTQ rF2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:organization:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=GjP4HoXEmakNU2uzAHIi7AnKJeSD+5CjzzcLYoBZfFo=; b=d8btGCHp9YytyUiqoHIbtYMnIuuegkolIDTmoAJ3xNLi6whYPaaNeSGjjWqw5JFOst mhXRY8pdH264i3tGTmkU4Hy6iWSkgaSWkEOjDeaVFn412pDCniZAiNrFHgs+aMzkDhtm FkWvAc221xRnTXTvOvSvZIOlqpUTwRAp2wKyeunrKLKpSIeNZaICnGQSkQXw36zez4T6 AvydwWozbWtOnsvFK5SXL07S/x1J7cLHqTOVBU/Jdr1NfALfWilOhLEM91Iu6Yu9Nc6y MpaWmnsa6xYhaLUNA19sxQMm9tlNK7JQfEzHUJUpU5P/w/h5zD58pylY13iKLO1sK1HB xdvQ== X-Gm-Message-State: AE9vXwMS7qzK4GmCZ+2DwkF6AQbZcSrJZeHMArofdN7J+XfCRCbadV39H9I+OR+ebXA1Xg== X-Received: by 10.46.1.170 with SMTP id f42mr2944394lji.50.1473527583287; Sat, 10 Sep 2016 10:13:03 -0700 (PDT) Original-Received: from desktop ([46.185.21.165]) by smtp.gmail.com with ESMTPSA id g201sm1675903lfg.8.2016.09.10.10.13.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 10 Sep 2016 10:13:02 -0700 (PDT) In-Reply-To: <83lgz083ze.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 10 Sep 2016 13:05:09 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:123144 Archived-At: On 2016-09-10, Eli Zaretskii wrote: > This is the intended behavior, yes. The word-combining-categories > feature is designed to support specific rare situations with mixing > the Far Eastern scripts (e.g., use of Kanji characters in Japanese > text), not for arbitrary games with Latin and European scripts. > > May I ask why do you need to consider the above a single word? In > what situation(s) does that make sense? I work on dictionary. Dictionary article and supplemented texts uses IPA symbols for word pronunciation. I like with single move to select pronunciation in text like: leap [li=CB=90p] lip [l=C9=AAp] wheel [wi=CB=90l] will [w=C9=AAl] seek [si=CB=90k] sick [s=C9=AAk] It's annoying to move across long mixed words with C-Left, C-Right or C-S-Left, C-S-Right, you may try to move across: international [=CB=8C=C9=AAnt=C9=99r=CB=88n=C3=A6=CA=83=C9=99n=C9=99l] Also I found that some IPA characters marked as latin script: (aref char-script-table ?=C3=A6) latin But it may be discussing because it is usual letter for some languages. As a workaround should I modify char-script-table? Like: (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entr= y ch "w")) '(?=CA=8C ?=C9=99 ?=C9=9C ?=C9=92 ?=C9=9B ?=CE=B8 ?=CA=8A ?=C9=AA ?= =C9=94 ?=C9=91 ?=CA=83 ?=CA=A7 ?=CB=90 ?=CB=88 ?=CB=8C ?=CA=92 ?=C5=8B)) This brings desired behavior but it is unclear if this is fine. Another solution is to invent own: (define-category ?p "Phonetic") and to add it to IPA characters: (mapc (lambda (ch) (modify-category-entry ch "p")) '(?=CA=8C ?=C9=99 ?=C9=9C ?=C9=92 ?=C9=9B ?=CE=B8 ?=CA=8A ?=C9=AA ?= =C9=94 ?=C9=91 ?=CA=83 ?=CA=A7 ?=CB=90 ?=CB=88 ?=CB=8C ?=CA=92 ?=C5=8B)) so it becomes possible to use: (add-to-list 'word-combining-categories '(?p . ?l)) (add-to-list 'word-combining-categories '(?l . ?p)) --=20 http://defun.work/