From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Date: Sat, 27 Mar 2021 10:54:28 +0300 Message-ID: <8335whowuj.fsf@gnu.org> References: <874kgxtatr.fsf@gnu.org> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5691"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gregory@heytings.org, 46933@debbugs.gnu.org To: handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Mar 27 08:55:27 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lQ3n5-0001NT-0h for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 27 Mar 2021 08:55:27 +0100 Original-Received: from localhost ([::1]:55380 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lQ3n3-0006ns-O7 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 27 Mar 2021 03:55:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lQ3mg-0006n5-Dv for bug-gnu-emacs@gnu.org; Sat, 27 Mar 2021 03:55:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:60033) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lQ3mg-0003Dd-7L for bug-gnu-emacs@gnu.org; Sat, 27 Mar 2021 03:55:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lQ3mg-0004Qv-6K for bug-gnu-emacs@gnu.org; Sat, 27 Mar 2021 03:55:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 27 Mar 2021 07:55:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 46933 X-GNU-PR-Package: emacs Original-Received: via spool by 46933-submit@debbugs.gnu.org id=B46933.161683167517006 (code B ref 46933); Sat, 27 Mar 2021 07:55:02 +0000 Original-Received: (at 46933) by debbugs.gnu.org; 27 Mar 2021 07:54:35 +0000 Original-Received: from localhost ([127.0.0.1]:43346 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lQ3mF-0004QD-6n for submit@debbugs.gnu.org; Sat, 27 Mar 2021 03:54:35 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:35062) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lQ3mD-0004Q1-CY for 46933@debbugs.gnu.org; Sat, 27 Mar 2021 03:54:33 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:32978) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lQ3m5-0002yA-4P; Sat, 27 Mar 2021 03:54:28 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2007 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1lQ3m4-0001Rs-95; Sat, 27 Mar 2021 03:54:24 -0400 In-Reply-To: <874kgxtatr.fsf@gnu.org> (message from handa on Sat, 27 Mar 2021 14:38:56 +0900) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:203106 Archived-At: > From: handa > Cc: gregory@heytings.org, 46933@debbugs.gnu.org > Date: Sat, 27 Mar 2021 14:38:56 +0900 > > In article <83ft0obk7i.fsf@gnu.org>, Eli Zaretskii writes: > > > Kenichi, why are these 6 bytes inserted by encode-coding-region, but > > not when we encode the same text as part of saving the buffer to its > > file? And why does it happen near the end of the text, between those > > 2 particular letters? > > There surely exists a bug. Could you please try the attached patch? > > The reason why that bug did not happen on file writing is that the code > in write_region calls encoding routine repeatedly without > CODING_MODE_LAST_BLOCK flag, and only in the case that flushing is > required (e.g. the case of iso-2022-jp), just for flushing, it calls > enoding routine again with CODING_MODE_LAST_BLOCK flag. In that case, > carryover does not happen in encode_coding (). Thanks. The patch fixes the problem with the extra 6 bytes, so I installed it. The results of filepos-to-bufferpos with the file attached by Gregory are better now, but there are still problems for some values of BYTE argument. The problem is that ISO-2022 encoding (and others like it) include shift-in and shift-out sequences, used to switch between character sets. As a trivial example, each CR+LF sequence has the "ESC ( B" sequence before it and "ESC $ B" sequence after it, to switch to ASCII before the newline, then switch to Japanese after it. And likewise whenever there's Latin text within Japanese (there are quite a lot of them in this particular file). These shift-in and shift-out sequences consume bytes, but don't produce any characters. So if the BYTE argument of filepos-to-bufferpos specifies a byte in the middle of one of these shift sequences, the result will be incorrect, because decoding a partial sequence produces the bytes of that sequence verbatim, and the logic in filepos-to-bufferpos of using the length of the decoded text breaks. We need special handling of this and other similar coding-systems to fix these corner use cases, similarly to what we do in filepos-to-bufferpos--dos. Patches welcome. I'm leaving this bug open because not all of the problem was fixed.