From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Oleksandr Gavenko Newsgroups: gmane.emacs.bugs Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Date: Sun, 11 Sep 2016 14:57:33 +0300 Organization: Oleksandr Gavenko , http://defun.work/ Message-ID: <87r38qtzrm.fsf@gavenkoa.example.com> References: <87mvjgupau.fsf@gavenkoa.example.com> <83lgz083ze.fsf@gnu.org> <87inu3vfty.fsf@gavenkoa.example.com> <83h99n8y9e.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1473595103 23633 195.159.176.226 (11 Sep 2016 11:58:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 11 Sep 2016 11:58:23 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) Cc: 24405@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Sep 11 13:58:18 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bj3Oq-0005Kn-Ay for geb-bug-gnu-emacs@m.gmane.org; Sun, 11 Sep 2016 13:58:16 +0200 Original-Received: from localhost ([::1]:37297 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bj3Oo-0006ya-Dk for geb-bug-gnu-emacs@m.gmane.org; Sun, 11 Sep 2016 07:58:14 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40690) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bj3Og-0006yI-J2 for bug-gnu-emacs@gnu.org; Sun, 11 Sep 2016 07:58:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bj3Oc-0008GZ-E7 for bug-gnu-emacs@gnu.org; Sun, 11 Sep 2016 07:58:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:58245) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bj3Oc-0008GU-Ai for bug-gnu-emacs@gnu.org; Sun, 11 Sep 2016 07:58:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bj3Oc-0004ZI-6C for bug-gnu-emacs@gnu.org; Sun, 11 Sep 2016 07:58:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Oleksandr Gavenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 11 Sep 2016 11:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: notabug Original-Received: via spool by 24405-submit@debbugs.gnu.org id=B24405.147359506817539 (code B ref 24405); Sun, 11 Sep 2016 11:58:02 +0000 Original-Received: (at 24405) by debbugs.gnu.org; 11 Sep 2016 11:57:48 +0000 Original-Received: from localhost ([127.0.0.1]:55957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bj3ON-0004Yo-VH for submit@debbugs.gnu.org; Sun, 11 Sep 2016 07:57:48 -0400 Original-Received: from mail-lf0-f52.google.com ([209.85.215.52]:34170) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bj3OL-0004Yb-Rh for 24405@debbugs.gnu.org; Sun, 11 Sep 2016 07:57:46 -0400 Original-Received: by mail-lf0-f52.google.com with SMTP id u14so70994261lfd.1 for <24405@debbugs.gnu.org>; Sun, 11 Sep 2016 04:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:organization:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=zUjr6HAL8TiZXsFS91d85arRUg36gFxqDZwYDC53ADw=; b=yC6MDcPQvJWgt/JEuIjqPQ66BamfcDkOx2bSNOYFG2hV3d3ZjFUfGxKyZnDEEdeSnu H7e+guQfXRGl7fxcfaPnwdUUpxV7kKVGmRRQ50gKNPHrp24+h/Ad4v0aUYZ6P6YZd008 LqbW+rFKJbcjXg4fHsbtqbh4O5MHSO55GeS6JqGR5r2IsNgJ2/OXdvAoMgyFdbZwBmzS ZF1jwZ/3fQ1rk7569KoMtXDHFAAeg0pVB6P2ib9fNUk1KtUcSMj3CPRQExqp5tONKoVk BVQLZjHsacCmvQJyBfG0kbv6cn5ISJq46VaN8Yx6w/mWJ/m5CwMqxbrRCfFkjrZ/miQe fOMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:organization:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=zUjr6HAL8TiZXsFS91d85arRUg36gFxqDZwYDC53ADw=; b=X7tZIaJsU27tbUerQCcx2AwQ8bsdY+Ajmz8OCqkaAQUUZwHjlBygbe7LrE/xKA7V0E CM0QxNo7o0G5YnjrEIEZvZpfRR3RqJyYz4uqwTRRuwgx0oOE4U1G0TiMEfbVM2GohqQw 3vi0qiuoy/w8OFQv3dL+993U1jydzz0JoIisrOSlOsfiGCzaM1ZG2pxy8yQ6IVZusn9P Vi39zs4sLxQRIpOKgi1LORiy5SuYHAmiK7kTUM8D+aZJ5AaYn04uE5pFpOW3XGKMHRh9 Fh5Q9FlRp0b0Mkjc7pGFYl0mjbOMXQfpO4DrQcWJwN8Qu+O0sa6E1B333NjiFN7J40G7 xaNQ== X-Gm-Message-State: AE9vXwOIHq2DA2kR4OfGHYBodMvwfzFLjma3/2t7G5cE2qdA6luod4vr1Cxe1v7WoGCrrA== X-Received: by 10.25.155.18 with SMTP id d18mr3636123lfe.120.1473595059381; Sun, 11 Sep 2016 04:57:39 -0700 (PDT) Original-Received: from desktop ([46.185.21.165]) by smtp.gmail.com with ESMTPSA id p21sm2292403lfi.4.2016.09.11.04.57.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 11 Sep 2016 04:57:38 -0700 (PDT) In-Reply-To: <83h99n8y9e.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 10 Sep 2016 20:23:25 +0300") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:123166 Archived-At: On 2016-09-10, Eli Zaretskii wrote: >> Another solution is to invent own: >>=20 >> (define-category ?p "Phonetic") >>=20 >> and to add it to IPA characters: >>=20 >> (mapc (lambda (ch) (modify-category-entry ch "p")) >> '(?=CA=8C ?=C9=99 ?=C9=9C ?=C9=92 ?=C9=9B ?=CE=B8 ?=CA=8A ?=C9= =AA ?=C9=94 ?=C9=91 ?=CA=83 ?=CA=A7 ?=CB=90 ?=CB=88 ?=CB=8C ?=CA=92 ?=C5=8B= )) >>=20 >> so it becomes possible to use: >>=20 >> (add-to-list 'word-combining-categories '(?p . ?l)) >> (add-to-list 'word-combining-categories '(?l . ?p)) > > That'd be my second best advice. But I think regular expressions > should provide a better and easier solution. This works for me: (defconst my/ipa-chars (list ?=CB=88 ?=CB=8C ?=CB=90 ?=C7=81 ?=CA=B2 ?=CE= =B8 ?=C3=B0 ?=C5=8B ?=C9=A1 ?=CA=92 ?=CA=83 ?=CA=A7 ?=C9=99 ?=C9=9C ?=C9=9B= ?=CA=8C ?=C9=92 ?=C9=94 ?=C9=91 ?=C3=A6 ?=CA=8A ?=C9=AA)) (define-category ?p "Phonetic") (mapc (lambda (ch) (cond ((eq (aref char-script-table ch) 'phonetic) (modify-category-entry ch ?p) (modify-category-entry ch ?l nil t)) ((eq (aref char-script-table ch) 'latin) ; (aref char-script-table= ?=CB=8C) is 'latin but (char-category-set ?=CB=8C) is ".j" (modify-category-entry ch ?l)))) my/ipa-chars) (add-to-list 'word-combining-categories '(?p . ?l)) (add-to-list 'word-combining-categories '(?l . ?p)) But adding and removing categories looks too low level. It is necessary to = use some (define-category ?p "Phonetic") that is not defined in Emacs itself. This looks easier to me: (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entry ch "w")) my/ipa-chars) But ``char-script-table`` derived from Unicode and some code my depends on this database... --=20 http://defun.work/