From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Date: Sun, 21 Mar 2021 17:27:45 +0200 Message-ID: <83ft0obk7i.fsf@gnu.org> References: <9cff0f8894f167925251@heytings.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3221"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 46933@debbugs.gnu.org To: Gregory Heytings , Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Mar 21 16:28:33 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lO00G-0000hl-Dw for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 21 Mar 2021 16:28:32 +0100 Original-Received: from localhost ([::1]:37928 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lO00F-0002QY-FT for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 21 Mar 2021 11:28:31 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:39976) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lNzzm-0002QB-SF for bug-gnu-emacs@gnu.org; Sun, 21 Mar 2021 11:28:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:43726) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lNzzl-0006FB-PB for bug-gnu-emacs@gnu.org; Sun, 21 Mar 2021 11:28:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lNzzl-00046L-Kw for bug-gnu-emacs@gnu.org; Sun, 21 Mar 2021 11:28:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 21 Mar 2021 15:28:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 46933 X-GNU-PR-Package: emacs Original-Received: via spool by 46933-submit@debbugs.gnu.org id=B46933.161634047515752 (code B ref 46933); Sun, 21 Mar 2021 15:28:01 +0000 Original-Received: (at 46933) by debbugs.gnu.org; 21 Mar 2021 15:27:55 +0000 Original-Received: from localhost ([127.0.0.1]:55272 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lNzzf-000460-8j for submit@debbugs.gnu.org; Sun, 21 Mar 2021 11:27:55 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:46966) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lNzzd-00045o-AV for 46933@debbugs.gnu.org; Sun, 21 Mar 2021 11:27:54 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:41184) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lNzzW-00067d-Aa; Sun, 21 Mar 2021 11:27:47 -0400 Original-Received: from 84.94.185.95.cable.012.net.il ([84.94.185.95]:2671 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1lNzzU-00048P-TG; Sun, 21 Mar 2021 11:27:45 -0400 In-Reply-To: <9cff0f8894f167925251@heytings.org> (message from Gregory Heytings on Thu, 04 Mar 2021 21:21:24 +0000) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:202780 Archived-At: > Date: Thu, 04 Mar 2021 21:21:24 +0000 > From: Gregory Heytings > > (Disclaimer: I have no knowledge whatsoever about the ISO-2022-JP > encoding, and although this looks like a bug, I'm not sure this is > actually a bug; I report this at the suggesion of Eli in bug#46859.) > > I downloaded the file [1], and converted it to the ISO-2022-JP encoding > with iconv -t iso-2022-jp one.txt > iso-2022-jp.txt. The resulting file > is attached to this bug report. It ends with two CRLFs, at byte offsets > 2993 and 2995. However, after emacs -Q iso-2022-jp.txt, with M-: > (goto-char (filepos-to-bufferpos POS 'exact)) we get: > > POS = 2991, 2992: last but one visible character (HIRAGANA LETTER RU) > POS = 2993, 2994: last visible character (IDEOGRAPHIC FULL STOP) > POS = 2995, 2996: first CRLF > POS = 2997: second CRLF > POS = 2998: point-max > POS = 2999: first CRLF > POS = 3000, 3001: second CRLF > POS >= 3002: point-max > > I would have expected: > > POS = 2989, 2990: last but one visible character (HIRAGANA LETTER RU) > POS = 2991, 2992: last visible character (IDEOGRAPHIC FULL STOP) > POS = 2993, 2994: first CRLF > POS = 2995, 2996: second CRLF > POS >= 2997: point-max > > The opposite operation M-: (bufferpos-to-filepos (- (point) POS) 'exact) > apparently also has bugs; its return values are not coherent with the > above ones: > > POS = 0: 3003 > POS = 1: 3001 > POS = 2: 2999 > POS = 3 (IDEOGRAPHIC FULL STOP): 2997 > POS = 4 (HIRAGANA LETTER RU): 2995 > > I would have expected: > > POS = 0: 2997 > POS = 1: 2995 > POS = 2: 2993 > POS = 3 (IDEOGRAPHIC FULL STOP): 2991 > POS = 4 (HIRAGANA LETTER RU): 2989 > > [1] https://darza.com/ecbackend/vendor/symfony/mime/Tests/Fixtures/samples/charsets/iso-2022-jp/one.txt There's something strange going on here with encoding of the buffer using iso-2022-jp-dos: near the end of the encoded bytestream, between the encoded HIRAGANA LETTER KO (こ) and HIRAGANA LETTER TO (と), we get 6 extra bytes: "ESC ( B ESC $ B". AFAIU, this sequence mean switch to ASCII and then switch back to Japanese. So together these 6 bytes are a no-op as regards to their effect on the text, but they disrupt the logic of filepos-to-bufferpos because they introduce extra bytes that aren't there in the original file. Kenichi, why are these 6 bytes inserted by encode-coding-region, but not when we encode the same text as part of saving the buffer to its file? And why does it happen near the end of the text, between those 2 particular letters?