Behavior of input method -- crdt.el

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Behavior of input method -- crdt.el
@ 2020-10-18  3:47 Qiantan Hong
  2020-10-18  4:46 ` Eli Zaretskii
  2020-10-18 13:26 ` Stefan Monnier
  0 siblings, 2 replies; 11+ messages in thread
From: Qiantan Hong @ 2020-10-18  3:47 UTC (permalink / raw)
  To: EMACS development team

[-- Attachment #1.1: Type: text/plain, Size: 1446 bytes --]

Hi,

I’m now working on the compatibility between 
https://code.librehq.com/qhong/crdt.el <https://code.librehq.com/qhong/crdt.el> and Emacs input methods.
There might be the case that one peer is in the halfway of input
some characters using input method (it seems that at this state,
there are some temporary text inserted in the buffer without calling
*-change-functions), and some changes from other peer arrives.

After resolving the position of the changes, crdt.el move point
to the resolved position and use INSERT to insert the characters
from remote peer.
In the case that these other changes happen to at exactly the same
position that current user is inserting using input method, seems that
the input method get confused and consider those inserted character
as part of its halfway input. It doesn’t affect character selection, but
when user finally select a character, it erase both the halfway input
and the inserted remote characters. Now the peers are inconsistent.

Anyone have any idea on how to workaround this?

To be more clear, a concrete example:

User1 is typing using chinese-py.
User1’s buffer: ce
User2’s buffer: 

User2 type a “t” at the begining
User1’s buffer: tce
User2’s buffer: t

User1 finish selection
User1’s buffer: 测
User2’s buffer: t测
Notice that the input method also erase t.

Now both user’s buffers are inconsistent.

Best,
Qiantan

[-- Attachment #1.2: Type: text/html, Size: 3317 bytes --]

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 1858 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-18  3:47 Behavior of input method -- crdt.el Qiantan Hong
@ 2020-10-18  4:46 ` Eli Zaretskii
  2020-10-18 13:26 ` Stefan Monnier
  1 sibling, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2020-10-18  4:46 UTC (permalink / raw)
  To: emacs-devel, Qiantan Hong, EMACS development team

On October 18, 2020 6:47:44 AM GMT+03:00, Qiantan Hong <qhong@mit.edu> wrote:
> Hi,
> 
> I’m now working on the compatibility between 
> https://code.librehq.com/qhong/crdt.el
> <https://code.librehq.com/qhong/crdt.el> and Emacs input methods.
> There might be the case that one peer is in the halfway of input
> some characters using input method (it seems that at this state,
> there are some temporary text inserted in the buffer without calling
> *-change-functions), and some changes from other peer arrives.
> 
> After resolving the position of the changes, crdt.el move point
> to the resolved position and use INSERT to insert the characters
> from remote peer.
> In the case that these other changes happen to at exactly the same
> position that current user is inserting using input method, seems that
> the input method get confused and consider those inserted character
> as part of its halfway input. It doesn’t affect character selection,
> but
> when user finally select a character, it erase both the halfway input
> and the inserted remote characters. Now the peers are inconsistent.
> 
> Anyone have any idea on how to workaround this?

One simple solution is to avoid executing remote changes as long as quail-translating is non-nil, thus treating the entire input method insertion sequence as a single atomic transaction.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-18  3:47 Behavior of input method -- crdt.el Qiantan Hong
  2020-10-18  4:46 ` Eli Zaretskii
@ 2020-10-18 13:26 ` Stefan Monnier
  2020-10-18 20:34   ` Qiantan Hong
  1 sibling, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2020-10-18 13:26 UTC (permalink / raw)
  To: Qiantan Hong; +Cc: EMACS development team

> After resolving the position of the changes, crdt.el move point
> to the resolved position and use INSERT to insert the characters
> from remote peer.
> In the case that these other changes happen to at exactly the same
> position that current user is inserting using input method, seems that
> the input method get confused and consider those inserted character
> as part of its halfway input. It doesn’t affect character selection, but
> when user finally select a character, it erase both the halfway input
> and the inserted remote characters.

So far so good: the behavior is not what the user intended, but it's
still just a sequence of insertions and deletions, with a conflict due
to simultaneous edits.  So the user can/should undo and then retype
the character.

> Now the peers are inconsistent.

Why?  Why isn't the exact same erroneous erase+insert propagated to the
other peers?


        Stefan





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-18 13:26 ` Stefan Monnier
@ 2020-10-18 20:34   ` Qiantan Hong
  2020-10-18 20:52     ` Stefan Monnier
  0 siblings, 1 reply; 11+ messages in thread
From: Qiantan Hong @ 2020-10-18 20:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: EMACS development team

[-- Attachment #1: Type: text/plain, Size: 916 bytes --]

> 
>> Now the peers are inconsistent.
> 
> Why?  Why isn't the exact same erroneous erase+insert propagated to the
> other peers?
No, it seems that the input method secretly change buffer text
without calling *-change-functions.

I’ve figured a hack already, that still allows real time remote changes
to apply.

The input method is using an overlay
to mark “characters pending translation” and I just need to push 
the overlay forward. Surprisingly, "secretly change buffer text
without calling *-change-functions” itself is not a problem because
I store CRDT-ID in buffer itself, so I just need to skip them during
ID search. It *will* be a problem, I imagine, if CRDT is implemented
as a separate library because it will shift all the buffer position after
that! In that case the only solution will be to block remote edit during
using input method — which make it not very real time.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 1858 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-18 20:34   ` Qiantan Hong
@ 2020-10-18 20:52     ` Stefan Monnier
  2020-10-19  2:28       ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Monnier @ 2020-10-18 20:52 UTC (permalink / raw)
  To: Qiantan Hong; +Cc: EMACS development team

>> Why?  Why isn't the exact same erroneous erase+insert propagated to the
>> other peers?
> No, it seems that the input method secretly change buffer text
> without calling *-change-functions.

Hmm... That's clearly a problem.  I can imagine some reasons why we do
that, but I'm not sure they're good enough.  This probably deserves
a `M-x report-emacs-bug`.

> I’ve figured a hack already, that still allows real time remote changes
> to apply.

Another approach is indeed to consider the input-method insertion as an
"atomic transaction" and hence postpone remote updates until that
transaction is completed (or at least to do that if the remote updates
touch the affected text area).  But indeed, that can be problematic (with
some input methods, a single char can terminate the previous entry and
start the next at the same time, so it's possible to be "in the middle
of an input-method insertion" for quite a while).

        Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-18 20:52     ` Stefan Monnier
@ 2020-10-19  2:28       ` Eli Zaretskii
  2020-10-19  2:48         ` Qiantan Hong
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2020-10-19  2:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: qhong, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 18 Oct 2020 16:52:36 -0400
> Cc: EMACS development team <emacs-devel@gnu.org>
> 
> But indeed, that can be problematic (with some input methods, a
> single char can terminate the previous entry and start the next at
> the same time, so it's possible to be "in the middle of an
> input-method insertion" for quite a while).

FWIW, I don't see any problem with that, because users won't expect
that to happen anyway.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-19  2:28       ` Eli Zaretskii
@ 2020-10-19  2:48         ` Qiantan Hong
  2020-10-19  3:07           ` Stefan Monnier
  2020-10-19 14:29           ` Eli Zaretskii
  0 siblings, 2 replies; 11+ messages in thread
From: Qiantan Hong @ 2020-10-19  2:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, EMACS development team

[-- Attachment #1: Type: text/plain, Size: 1246 bytes --]

> 
> FWIW, I don't see any problem with that, because users won't expect
> that to happen anyway.
Can you clarity “what to happen”?

I think it’s quite common to have a very long single input sequence,
especially when I’m using some input method package that
does word hint based on context. I basically enter a whole phrase
or even sentence at once. 

I’ve done it with my hack (push forward the overlay input method
uses) and it doesn’t block remote changes. It works with the
builtin Chinese input method, and also a few external package I’ve
tested. I don’t know if there's any problem with this approach.

> Hmm... That's clearly a problem.  I can imagine some reasons why we do
> that, but I'm not sure they're good enough.  This probably deserves
> a `M-x report-emacs-bug`.
I don’t know, will it cause more problem if the input method
actually trigger *-change-functions during halfway input?
At least for my use case, that means the halfway input sequence
is also synchronized to other peers and that doesn’t make much
sense — so I’ll have to filter it anyway.

I think maybe the “correct” way is to display input sequence not
as text in the buffer (conceptually they aren’t yet).

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 1858 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-19  2:48         ` Qiantan Hong
@ 2020-10-19  3:07           ` Stefan Monnier
  2020-10-19 14:29           ` Eli Zaretskii
  1 sibling, 0 replies; 11+ messages in thread
From: Stefan Monnier @ 2020-10-19  3:07 UTC (permalink / raw)
  To: Qiantan Hong; +Cc: Eli Zaretskii, EMACS development team

> I’ve done it with my hack (push forward the overlay input method
> uses) and it doesn’t block remote changes. It works with the
> builtin Chinese input method, and also a few external package I’ve
> tested. I don’t know if there's any problem with this approach.

I don't see anything wrong with the approach you're following.
It is disappointing that you'd need a special case for input methods.
I consider that as a "wart" which we'd like to be able to fix.
Maybe a good fix is to arrange for some kind of "protocol" that the
input method could use but that doesn't hard-code a dependency on the
input-method (i.e. a protocol that could be used by other packages
which may also want to modify the buffer temporarily while postponing
running the *-change-functions).

>> Hmm... That's clearly a problem.  I can imagine some reasons why we do
>> that, but I'm not sure they're good enough.  This probably deserves
>> a `M-x report-emacs-bug`.
> I don’t know, will it cause more problem if the input method
> actually trigger *-change-functions during halfway input?
> At least for my use case, that means the halfway input sequence
> is also synchronized to other peers and that doesn’t make much
> sense

I can't see any reason why it "doesn't make much sense".
I think it makes as much sense as any other "intermediate state" you
might go through while modifying some text.

> I think maybe the “correct” way is to display input sequence not
> as text in the buffer (conceptually they aren’t yet).

That's also a valid way to look at it, yes.

        Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-19  2:48         ` Qiantan Hong
  2020-10-19  3:07           ` Stefan Monnier
@ 2020-10-19 14:29           ` Eli Zaretskii
  2020-10-19 14:55             ` Qiantan Hong
  1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2020-10-19 14:29 UTC (permalink / raw)
  To: Qiantan Hong; +Cc: monnier, emacs-devel

> From: Qiantan Hong <qhong@mit.edu>
> CC: Stefan Monnier <monnier@iro.umontreal.ca>,
>         EMACS development team
> 	<emacs-devel@gnu.org>
> Date: Mon, 19 Oct 2020 02:48:45 +0000
> 
> > FWIW, I don't see any problem with that, because users won't expect
> > that to happen anyway.
> Can you clarity “what to happen”?

Sorry for being unclear: I meant that users won't expect remote
commands to be executed in the middle of inputing a character.

This isn't limited to input methods, btw: did you try typing several
characters that are composed together on display into a single
grapheme cluster (under auto-composition-mode)? what happens if remote
command arrives in the middle of this sequence and moves point?

In general, I don't think we must take the "real-time" nature of this
too literally.  Nothing bad will happen if the remote commands are
executed only when it's "safe".

> I think it’s quite common to have a very long single input sequence,
> especially when I’m using some input method package that
> does word hint based on context. I basically enter a whole phrase
> or even sentence at once. 

We are talking about Leim input methods, not about the input methods
your OS supports.  Do such long sequences that produce entire phrases
happen in our input methods?  If so, can you show an example?

> I’ve done it with my hack (push forward the overlay input method
> uses) and it doesn’t block remote changes. It works with the
> builtin Chinese input method, and also a few external package I’ve
> tested. I don’t know if there's any problem with this approach.

Do we have to employ hacks?  Quail input methods tell you when they
don't expect to be interrupted, by setting quail-translating to a
non-nil value; why not use that indication to handle this issue in a
non-hackish way?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-19 14:29           ` Eli Zaretskii
@ 2020-10-19 14:55             ` Qiantan Hong
  2020-10-19 15:06               ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: Qiantan Hong @ 2020-10-19 14:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1339 bytes --]

> 
> This isn't limited to input methods, btw: did you try typing several
> characters that are composed together on display into a single
> grapheme cluster (under auto-composition-mode)? what happens if remote
> command arrives in the middle of this sequence and moves point?
Is there variable to detect such usage like “quail-translating”?

> We are talking about Leim input methods, not about the input methods
> your OS supports.  Do such long sequences that produce entire phrases
> happen in our input methods?  If so, can you show an example?
There are input methods written by users using Quail framework.
E.g. https://melpa.org/#/pyim


> Do we have to employ hacks?  Quail input methods tell you when they
> don't expect to be interrupted, by setting quail-translating to a
> non-nil value; why not use that indication to handle this issue in a
> non-hackish way?
For quail input method, remote command will never interrupt the
sequence themselves. Because crdt.el doesn’t assign CRDT IDs
to those pending text, remote change will never resolve in the middle
of such sequence. They are not synchronized to other peers either.
It follows exact the conceptual model “text pending translation
are not in the buffer yet, they’re just displayed there”. They don’t
need to lock the buffer either.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 1858 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Behavior of input method -- crdt.el
  2020-10-19 14:55             ` Qiantan Hong
@ 2020-10-19 15:06               ` Eli Zaretskii
  0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2020-10-19 15:06 UTC (permalink / raw)
  To: Qiantan Hong; +Cc: monnier, emacs-devel

> From: Qiantan Hong <qhong@mit.edu>
> CC: Stefan Monnier <monnier@iro.umontreal.ca>,
>         "emacs-devel@gnu.org"
> 	<emacs-devel@gnu.org>
> Date: Mon, 19 Oct 2020 14:55:31 +0000
> 
> > This isn't limited to input methods, btw: did you try typing several
> > characters that are composed together on display into a single
> > grapheme cluster (under auto-composition-mode)? what happens if remote
> > command arrives in the middle of this sequence and moves point?
> Is there variable to detect such usage like “quail-translating”?

No.  But is there a problem?

> > We are talking about Leim input methods, not about the input methods
> > your OS supports.  Do such long sequences that produce entire phrases
> > happen in our input methods?  If so, can you show an example?
> There are input methods written by users using Quail framework.
> E.g. https://melpa.org/#/pyim

Can you help me with an example of a long sequence that produces an
entire phrase with that input method?

> > Do we have to employ hacks?  Quail input methods tell you when they
> > don't expect to be interrupted, by setting quail-translating to a
> > non-nil value; why not use that indication to handle this issue in a
> > non-hackish way?
> For quail input method, remote command will never interrupt the
> sequence themselves. Because crdt.el doesn’t assign CRDT IDs
> to those pending text, remote change will never resolve in the middle
> of such sequence. They are not synchronized to other peers either.
> It follows exact the conceptual model “text pending translation
> are not in the buffer yet, they’re just displayed there”. They don’t
> need to lock the buffer either.

I'm not sure I understand: are you saying that the problem with
receiving remote commands in the middle of a Leim input sequence
doesn't exist?  If so, what was the situation about which you asked
the question that started this thread?



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-19 15:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-18  3:47 Behavior of input method -- crdt.el Qiantan Hong
2020-10-18  4:46 ` Eli Zaretskii
2020-10-18 13:26 ` Stefan Monnier
2020-10-18 20:34   ` Qiantan Hong
2020-10-18 20:52     ` Stefan Monnier
2020-10-19  2:28       ` Eli Zaretskii
2020-10-19  2:48         ` Qiantan Hong
2020-10-19  3:07           ` Stefan Monnier
2020-10-19 14:29           ` Eli Zaretskii
2020-10-19 14:55             ` Qiantan Hong
2020-10-19 15:06               ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).