From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Po Lu Newsgroups: gmane.emacs.devel Subject: Android input methods (was: Re: textconv.c) Date: Sun, 12 Feb 2023 23:27:00 +0800 Message-ID: <87bklyzyyj.fsf_-_@yahoo.com> References: <83r0uvghw7.fsf@gnu.org> <87k00nyo60.fsf@yahoo.com> <83ilg7gdjj.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28129"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Feb 12 16:30:17 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pREIy-0007Ad-5e for ged-emacs-devel@m.gmane-mx.org; Sun, 12 Feb 2023 16:30:16 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pREID-0007JP-9h; Sun, 12 Feb 2023 10:29:29 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pREIA-0007Im-Rg for emacs-devel@gnu.org; Sun, 12 Feb 2023 10:29:27 -0500 Original-Received: from sonic302-48.consmr.mail.ne1.yahoo.com ([66.163.186.174]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pREI8-00014t-AG for emacs-devel@gnu.org; Sun, 12 Feb 2023 10:29:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1676215762; bh=KXLusyM1UDZUYXnJXoQIsvgscH8lw4qJBXsV/hHBXZU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From:Subject:Reply-To; b=VtnGEeZ6YHvlidHKRqJIkssZJBsdo8q1Du52hZIetkvyxKDmFLmU+pXJXfDUT7hB2IC3uxgnTMtbTPQaCDNIN909ZDBVBJEHC1T6yu0OyZ4XfJgbZMWFW46qz/Y8ocbpI7ks+d044gGgmJThyXT/xJoh17BQEZ+Yf8Wl3zjq4NUyp1BG9u2TvoMrGv2bg3F4SbYcNq+mJPudOMGrX4q6iqIs0RKJ6vOrJsYxjIf5r11eAO+Q9LgKt1VeBWDGbuwQC9md4p9VD0RfEHNytoEVGKxqUuKGTCOLmLauLD4tR7/E4sWY1yhDNrMps7JLrPMBAPj+Xr6EsMd1AEJtT88Biw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1676215762; bh=IG/1DE1a1Et5TW1sgwuapDrmvf/d2f/K+2Vv5OfGYRu=; h=X-Sonic-MF:From:To:Subject:Date:From:Subject; b=H/3jszkJvnXkqRtxH5rQ9EZ4BYUuZ/QsAQlnWs7fl8z0MIDKQ91pnESM2imKzjpjsZ2lgk0Z0wTldvMOpt9rkif+hBhu7kYclYSQHyDjOVpcsZODovLc5kN60FCpmm4Xzbm1L7dyB5sdsVFBF+3tLcL3AjULmWk02BuiJShKupg+Er52YG1/VIik5NdqUEKcKX9Z3OhM41M3bvYvkrkMPeUniEbmFc0FMxD/qnDeJeGTUeTy4TLy8grQgGeW1rYV/vCPUNfY7SSxCc+xKibmhpt7mNyyL4UnK+afV/PBWD+fEhQFAJgyKvPWtXNe3GCBTuUduvPBVf/ZAidlDEFYvg== X-YMail-OSG: 9u2y0oAVM1lFPnN8I8I5VCCCD0HROXvPP7a1MF4xsHzVSieF_6U5qb8zZYPalns E5yS7ak3bJEH2UsBEMzGcKQPyRiBq2TvRWFC23T9JBer5qll9WYCKNDruoJxf8nIjxyb4vzQxXkK BplP46JGid8gNqEWZWRsRB6QRLwZNB9Pt77d4LS6dtaiwzpZBSO67bhxYmifxMF3.PWn4z9nq.g6 I94Vr06fQ02sc5bLu1BnEUpRV2.L1fTKr.KAu4_WjRiWbLJJHjLMlvUrZh2..Dpn45DcT1o9tDzV NG6o.VPz0gSDbrWTNVD.eizkIzBsBcbPkNTrby5NnWXV.9avhp0aFnS3uVHr25H08mYSqsroJdvt 5B8vhOVIfWxi0l5BLRMFB6nYzEP.WUn6QxgXF4Bm1Rfv.BOX4kFXlXf63Fjz3JE_ZQW0MSiZAZKf OKrteWOVQ2l729lnh6inghUpfDv6hA5BrK_gP.b_yzy2q8l5A0nL81c_p8WnrbPIEmmkCSAQzWee slfG2yBOpo36EdKcUekWhrClhB0zbPFFO0FzbnLLG7Xorjd8bnvrlaHtkNAk27tu2jjmUC0zMzbd ePsNnmnZ66iBxB1wOBQENsOUXcbeGcwftZAeq5NK5YLWvWA.qxddFg_.C4W1zfYpw.GdY66tXHvc qM9SFXznVHV8sTG45XXH5232HN33z0M__5hovykxhhlQkj3R2IDhAa8jXq2vUd4MuIKSsbgyV53. QedxyrXsNBpu2piqLyh2IkdT41iP1kIy3qxt18wuGbnwdOFBHHlpov8SbBwuWRWykuRH_nCv1BY2 GNfgrqZQyshpa6rhocf3CJo4wTgNOn4N2pDmf4.MBC X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic302.consmr.mail.ne1.yahoo.com with HTTP; Sun, 12 Feb 2023 15:29:22 +0000 Original-Received: by hermes--production-sg3-9fc5746c8-nc5k6 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID e5e2e41991479c40ad4b0ac16f3c3a30; Sun, 12 Feb 2023 15:27:19 +0000 (UTC) In-Reply-To: <83ilg7gdjj.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 12 Feb 2023 16:32:16 +0200") X-Mailer: WebService/1.1.21183 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.yahoo Received-SPF: pass client-ip=66.163.186.174; envelope-from=luangruo@yahoo.com; helo=sonic302-48.consmr.mail.ne1.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:303180 Archived-At: Eli Zaretskii writes: > I don't see why this is important, since you can switch the buffer > temporarily. We do this all over the place, since insdel.c always > works on the current buffer. The text is copied into a C ``char *'', not into another buffer. > [lots of details omitted] > > And you intended to produce code which supports this without any > discussions of the architecture and design? I'm surprised, to say the > least. This has to be discussed, with the participation of everyone > on board who knows about the Emacs internals related to these issues. > We have here a significant amount of knowledge, expertise, and past > experience with similar issues, and disregarding that and trying to > solve this by your lone self is at the very least unwise. [...] > My suggestion is that you describe the problem(s), i.e. what these > input methods expect from the client application, in enough detail > that will allow people here think about it and suggest solutions. > Please don't write even a single line of code before such a > description is posted and people have enough time to respond with > suggestions, ideas, and questions. (I have already a couple of ideas, > but will withhold them until I'm convinced that I understand the > problems to be solved.) > > P.S. And please start a thread with a new, more meaningful name when > you post those details. Ok, now done. Basically, today, on Android (but also on other platforms), input methods desire fine grained control over buffer contents, as they start to provide more and more features aside from simply composing text. This is mainly seen on Android, but they have appeared in other systems as well, most notably GNOME on handheld devices. What is being said below applies to those input methods as well. In the past, input methods more or less worked like this: when a key press arrives, the input method receives it first, performs transformations, and either returns it to the application or inserts it into a composition buffer. Once the composition completes (through the user pressing enter, or some other similar key), the text is sent to Emacs, which converts each character inside into a key press, to be inserted by self-insert-command. On Android, input methods work the other way around. They do the text insertion and deletion themselves, all whilst querying the text editor about the position of the caret (point) and the text around it for reference. Emacs is only told to insert or delete text at a specific position in its buffer, and is obligated to inform the input method about changes around the caret. If Emacs makes a change to the buffer outside the area in which the input method expresses interest, then it is obligated to ``restart'' the input method. This takes a significant amount of time to complete. Sometimes, the input method will also tell Emacs to mark a portion of the buffer as ``preconversion text'' (or a ``composing span''), which is an ephemeral region which may be replaced by the input method by some other text, or deleted altogether. The intention is that the input method will display temporary edits to the buffer used to display the contents of any on-going composition to the user within that ephemeral region. Input methods on Android make extensive use of this functionality, even for input in languages that utilize Latin or Cyrillic script. Consider a user who wants to delete the words ``tomorrow afternoon'' and replace them with ``next Thursday'' in the following buffer: Why don't we both look through all the television channels tomorrow afternoon for offensive content we can complain about? on a desktop, this would be simple; assume that point is already after the word ``tomorrow afternoon''. The user will press the delete key enough times to delete ``tomorrow afternoon'', and then type in ``next Thursday''. On Android, this is completely different. Once the input method (and on screen keyboard) is displayed, it looks at the text surrounding the point. It sees the word: ``afternoon'' immediately before point, and the text: `` for'' immediately after. Since the caret (point) is closer to the word ``afternoon'' than it is to the word ``for'', it now considers itself to be editing the word ``for''. The input method then tells Emacs that ``afternoon'' is now the ephemeral region, by issuing a request along the lines of: ``set the preconversion region to 69-78'' Emacs is now expected to indicate, by displaying an underline, that the IME is now editing the word ``afternoon''. As the user starts to press delete in the input method, the input method starts to issue requests to replace the contents of the preconversion region with something else: ``replace the preconversion region contents with afternoo'' ``replace the preconversion region contents with afterno'' ``replace the preconversion region contents with aftern'' ``replace the preconversion region contents with after'' ``replace the preconversion region contents with afte'' ``replace the preconversion region contents with aft'' ``replace the preconversion region contents with af'' ``replace the preconversion region contents with a'' ``remove the preconversion region entirely'' at that point, the input method asks for the contents of the buffer before point again, and repeats the whole process. Point is now 69, immediately after a newline character, which cannot be meaningfully composed. Input methods have been observed to do one of two things: either the input method will issue a request: ``delete one character before 69'' or it will say: ``set the preconversion region to 68-69'' ``remove the preconversion region'' sometimes, the input method will start to delete entire words at a time. When that happens, the input method will look backwards and ask for the text: ``tomorrow\n'' and simply ask Emacs: ``delete 9 characters after the position 60'' or perhaps ``set the preconversion region to 60-69'' ``remove the preconversion region'' or perhaps some other combination that I have yet to see in practice. Now assume that the user changes his mind in the middle of the operation, say immediately after ``afternoon'' has become ``aftern''. The input method may display the text ``afternoon'' in a button, to allow him to undo the change immediately. If that is pressed, Emacs might receive: ``replace the preconversion region contents with afternoon'' ``stop preconverting text'' or alternatively: ``stop preconverting text'' ``insert the text oon after 75'' or some other request. All of this is behavior I have observed CJK and English input methods perform. An input method is not obligated to behave in any way like what I have described above, as long as it constrains its edits to some reasonable position (600 characters) around the caret; if it makes edits any further away from the caret than that, the behavior of the application is undefined. i.e. it might also be valid for the input method to say: ``replace 0-123 with '' ``replace 0-123 with '' ``replace 0-123 with '' ``replace 0-123 with '' ``replace 0-123 with '' ``replace 0-123 with '' ``replace 0-123 with '' over and over again, though I don't see the utility in that. But the input method will stop working properly until the next time it is reset if it doesn't see the replacement reflected in Emacs's own buffer contents. Sometimes, an input method will also monitor changes to the caret position. At this point, Emacs is obligated to report any changes to the on screen caret to the input method, so it knows where it should begin to make edits from. An input method might also ask for a region of text to be ``extracted'', which means Emacs must report each change to the buffer that modifies said region to the input method, but is relieved of the obligation to reset the input method as long as a ``major change'' (whatever that means) has not happened to the buffer contents, or outside the extracted text. What I have observed is that the region of extracted text is wide enough to perform actions such as refilling a paragraph or indenting a line without resetting the input method, but not much more than that. In any case, the conclusion is that Emacs must present a completely correct view of the buffer contents of the selected window and the location of its point to the input method, correctly report edits made by the input method to the buffer contents and any edits made by Emacs after that, and dilligently report changes to extracted text and/or reset the input method on ``major changes'' such as the selected buffer or window changing, or edits happening outside extracted text. Otherwise, the behavior of the input method becomes undefined (and nasty.) Now, it is sometimes possible to disable the input method and to simply work with an on screen keyboard (which is what the Android port currently does), but that precludes entering any non-ASCII text, and is a luxury which is only affored by several input methods. Also, it wouldn't be out of character for GNOME to demand applications implement input method support their ``right way'' either, at some point in the future, so we will have to implement this properly, if not now, then at some point in the future. Thanks.