unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
@ 2016-09-10  8:33 Oleksandr Gavenko
  2016-09-10 10:05 ` Eli Zaretskii
  2019-09-29  4:33 ` Stefan Kangas
  0 siblings, 2 replies; 6+ messages in thread
From: Oleksandr Gavenko @ 2016-09-10  8:33 UTC (permalink / raw)
  To: 24405

Evaluate following form by C-x C-e:

  (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
        (word-separating-categories nil))
    (forward-word))

  HelloПривLLжɪəʊheləʊaiɪa

My pointer stopped between ʊh.

I have:

  (aref char-script-table ?ʊ) phonetic
  (aref char-script-table ?h) latin
  (aref char-script-table ?ж) cyrillic

  (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl"
  (category-set-mnemonics (char-category-set ?h)) ".Lalr"

  (category-docstring ?y) "Cyrillic"
  (category-docstring ?l) "Latin"

I expect that point moved to last character before new line.

Seems that:

  (?l . ?y) (?y . ?l)

has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
scripts but refused to move through Latin/Phonetic scripts.

If it is intended behavior how will I make Emacs to move across Latin/Phonetic
scripts?

See also:

  http://emacs.stackexchange.com/questions/21131/does-word-syntax-take-script-into-account

In GNU Emacs 24.5.1 (x86_64-pc-linux-gnu, GTK+ Version 3.18.6)
 of 2016-01-22 on binet, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11803000
System Description:	Debian GNU/Linux testing (stretch)

-- 
http://defun.work/





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
  2016-09-10  8:33 bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts Oleksandr Gavenko
@ 2016-09-10 10:05 ` Eli Zaretskii
  2016-09-10 17:12   ` Oleksandr Gavenko
  2019-09-29  4:33 ` Stefan Kangas
  1 sibling, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2016-09-10 10:05 UTC (permalink / raw)
  To: Oleksandr Gavenko; +Cc: 24405

tags 24405 + notabug
thanks

> From: Oleksandr Gavenko <gavenkoa@gmail.com>
> Date: Sat, 10 Sep 2016 11:33:45 +0300
> 
> Evaluate following form by C-x C-e:
> 
>   (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
>         (word-separating-categories nil))
>     (forward-word))
> 
>   HelloПривLLжɪəʊheləʊaiɪa
> 
> My pointer stopped between ʊh.
> 
> I have:
> 
>   (aref char-script-table ?ʊ) phonetic
>   (aref char-script-table ?h) latin
>   (aref char-script-table ?ж) cyrillic
> 
>   (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl"
>   (category-set-mnemonics (char-category-set ?h)) ".Lalr"
> 
>   (category-docstring ?y) "Cyrillic"
>   (category-docstring ?l) "Latin"
> 
> I expect that point moved to last character before new line.
> 
> Seems that:
> 
>   (?l . ?y) (?y . ?l)
> 
> has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
> scripts but refused to move through Latin/Phonetic scripts.
> 
> If it is intended behavior how will I make Emacs to move across Latin/Phonetic
> scripts?

You can't do this for 2 characters that belong to different scripts,
but have the same categories in their category sets.  Those two
characters both have the 'l' (Latin) category in their sets, so you
cannot force Emacs to consider them not as word boundary.

For the same reason, including a cons cell whose members are
identical, such as (?l . ?l), has no effect.

This is the intended behavior, yes.  The word-combining-categories
feature is designed to support specific rare situations with mixing
the Far Eastern scripts (e.g., use of Kanji characters in Japanese
text), not for arbitrary games with Latin and European scripts.

May I ask why do you need to consider the above a single word?  In
what situation(s) does that make sense?

Thanks.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
  2016-09-10 10:05 ` Eli Zaretskii
@ 2016-09-10 17:12   ` Oleksandr Gavenko
  2016-09-10 17:23     ` Eli Zaretskii
  0 siblings, 1 reply; 6+ messages in thread
From: Oleksandr Gavenko @ 2016-09-10 17:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24405

On 2016-09-10, Eli Zaretskii wrote:

> This is the intended behavior, yes.  The word-combining-categories
> feature is designed to support specific rare situations with mixing
> the Far Eastern scripts (e.g., use of Kanji characters in Japanese
> text), not for arbitrary games with Latin and European scripts.
>
> May I ask why do you need to consider the above a single word?  In
> what situation(s) does that make sense?

I work on dictionary. Dictionary article and supplemented texts uses IPA
symbols for word pronunciation.

I like with single move to select pronunciation in text like:

  leap [liːp]        lip [lɪp]
  wheel [wiːl]       will [wɪl]
  seek [siːk]        sick [sɪk]

It's annoying to move across long mixed words with C-Left, C-Right or
C-S-Left, C-S-Right, you may try to move across:

  international [ˌɪntərˈnæʃənəl]

Also I found that some IPA characters marked as latin script:

  (aref char-script-table ?æ)  latin

But it may be discussing because it is usual letter for some languages.

As a workaround should I modify char-script-table?

Like:

  (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entry ch "w"))
        '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))

This brings desired behavior but it is unclear if this is fine.

Another solution is to invent own:

  (define-category ?p "Phonetic")

and to add it to IPA characters:

  (mapc (lambda (ch) (modify-category-entry ch "p"))
        '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))

so it becomes possible to use:

  (add-to-list 'word-combining-categories '(?p . ?l))
  (add-to-list 'word-combining-categories '(?l . ?p))

-- 
http://defun.work/





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
  2016-09-10 17:12   ` Oleksandr Gavenko
@ 2016-09-10 17:23     ` Eli Zaretskii
  2016-09-11 11:57       ` Oleksandr Gavenko
  0 siblings, 1 reply; 6+ messages in thread
From: Eli Zaretskii @ 2016-09-10 17:23 UTC (permalink / raw)
  To: Oleksandr Gavenko; +Cc: 24405

> From: Oleksandr Gavenko <gavenkoa@gmail.com>
> Cc: 24405@debbugs.gnu.org
> Date: Sat, 10 Sep 2016 20:12:57 +0300
> 
> As a workaround should I modify char-script-table?

I'd suggest to write your own word-motion commands.  It's not
complicated, you can use regular expressions (which understand
categories, if you need that).

> Another solution is to invent own:
> 
>   (define-category ?p "Phonetic")
> 
> and to add it to IPA characters:
> 
>   (mapc (lambda (ch) (modify-category-entry ch "p"))
>         '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))
> 
> so it becomes possible to use:
> 
>   (add-to-list 'word-combining-categories '(?p . ?l))
>   (add-to-list 'word-combining-categories '(?l . ?p))

That'd be my second best advice.  But I think regular expressions
should provide a better and easier solution.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
  2016-09-10 17:23     ` Eli Zaretskii
@ 2016-09-11 11:57       ` Oleksandr Gavenko
  0 siblings, 0 replies; 6+ messages in thread
From: Oleksandr Gavenko @ 2016-09-11 11:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24405

On 2016-09-10, Eli Zaretskii wrote:

>> Another solution is to invent own:
>> 
>>   (define-category ?p "Phonetic")
>> 
>> and to add it to IPA characters:
>> 
>>   (mapc (lambda (ch) (modify-category-entry ch "p"))
>>         '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))
>> 
>> so it becomes possible to use:
>> 
>>   (add-to-list 'word-combining-categories '(?p . ?l))
>>   (add-to-list 'word-combining-categories '(?l . ?p))
>
> That'd be my second best advice.  But I think regular expressions
> should provide a better and easier solution.

This works for me:

  (defconst my/ipa-chars (list ?ˈ ?ˌ ?ː ?ǁ ?ʲ ?θ ?ð ?ŋ ?ɡ ?ʒ ?ʃ ?ʧ ?ə ?ɜ ?ɛ ?ʌ ?ɒ ?ɔ ?ɑ ?æ ?ʊ ?ɪ))
  (define-category ?p "Phonetic")
  (mapc (lambda (ch)
       (cond
        ((eq (aref char-script-table ch) 'phonetic)
         (modify-category-entry ch ?p)
         (modify-category-entry ch ?l nil t))
        ((eq (aref char-script-table ch) 'latin)  ; (aref char-script-table ?ˌ) is 'latin but (char-category-set ?ˌ) is ".j"
         (modify-category-entry ch ?l))))
        my/ipa-chars)
  (add-to-list 'word-combining-categories '(?p . ?l))
  (add-to-list 'word-combining-categories '(?l . ?p))

But adding and removing categories looks too low level. It is necessary to use
some (define-category ?p "Phonetic") that is not defined in Emacs itself.

This looks easier to me:

  (mapc (lambda (ch)
          (aset char-script-table ch 'latin)
          (modify-syntax-entry ch "w"))
        my/ipa-chars)

But ``char-script-table`` derived from Unicode and some code my depends on
this database...

-- 
http://defun.work/





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
  2016-09-10  8:33 bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts Oleksandr Gavenko
  2016-09-10 10:05 ` Eli Zaretskii
@ 2019-09-29  4:33 ` Stefan Kangas
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Kangas @ 2019-09-29  4:33 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 24405-done, Oleksandr Gavenko

Eli Zaretskii <eliz@gnu.org> writes:

> tags 24405 + notabug
> thanks
[...]
> This is the intended behavior, yes.  The word-combining-categories
> feature is designed to support specific rare situations with mixing
> the Far Eastern scripts (e.g., use of Kanji characters in Japanese
> text), not for arbitrary games with Latin and European scripts.

This was already tagged notabug, and I can see nothing more to do here.
I'm therefore closing this now.

Best regards,
Stefan Kangas





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-29  4:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-10  8:33 bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts Oleksandr Gavenko
2016-09-10 10:05 ` Eli Zaretskii
2016-09-10 17:12   ` Oleksandr Gavenko
2016-09-10 17:23     ` Eli Zaretskii
2016-09-11 11:57       ` Oleksandr Gavenko
2019-09-29  4:33 ` Stefan Kangas

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).