From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: handa Newsgroups: gmane.emacs.bugs Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Date: Sun, 28 Mar 2021 23:29:41 +0900 Message-ID: <87y2e7s65m.fsf@gnu.org> References: <9cff0f8894f167925251@heytings.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36509"; mail-complaints-to="usenet@ciao.gmane.io" Cc: gregory@heytings.org, 46933@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Mar 28 16:30:14 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lQWQg-0009Q8-7P for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 28 Mar 2021 16:30:14 +0200 Original-Received: from localhost ([::1]:59052 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lQWQf-0007Na-19 for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 28 Mar 2021 10:30:13 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:32890) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lQWQV-0007NQ-T6 for bug-gnu-emacs@gnu.org; Sun, 28 Mar 2021 10:30:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:35161) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lQWQV-00080t-DY for bug-gnu-emacs@gnu.org; Sun, 28 Mar 2021 10:30:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lQWQV-0004GF-70 for bug-gnu-emacs@gnu.org; Sun, 28 Mar 2021 10:30:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: handa Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 28 Mar 2021 14:30:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 46933 X-GNU-PR-Package: emacs Original-Received: via spool by 46933-submit@debbugs.gnu.org id=B46933.161694179516345 (code B ref 46933); Sun, 28 Mar 2021 14:30:03 +0000 Original-Received: (at 46933) by debbugs.gnu.org; 28 Mar 2021 14:29:55 +0000 Original-Received: from localhost ([127.0.0.1]:46707 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lQWQN-0004FZ-2J for submit@debbugs.gnu.org; Sun, 28 Mar 2021 10:29:55 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:39890) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lQWQL-0004FI-AZ for 46933@debbugs.gnu.org; Sun, 28 Mar 2021 10:29:53 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:52940) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lQWQF-0007oL-UA; Sun, 28 Mar 2021 10:29:47 -0400 Original-Received: from fl1-60-236-248-230.iba.mesh.ad.jp ([60.236.248.230]:65176 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1lQWQE-0003pl-6A; Sun, 28 Mar 2021 10:29:46 -0400 Original-Received: from handa by shatin with local (Exim 4.93) (envelope-from ) id 1lQWQ9-0006ED-FK; Sun, 28 Mar 2021 23:29:41 +0900 In-Reply-To: <83pmzkog6x.fsf@gnu.org> (message from Eli Zaretskii on Sat, 27 Mar 2021 16:54:14 +0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:203170 Archived-At: In article <83pmzkog6x.fsf@gnu.org>, Eli Zaretskii writes: > > How about something like this method: > > 1. Encode the buffer text one line by one until we get a longer byte > > sequence than BYTE. > > 2. Delete the result of enoding the last line above. > > 3. Provided that the above last line has chars C1 C2 ... Cn, > > encode characters C1...Cn, C1...Cn-1, C1...Cn-2 until we get a shorter > > byte sequence than BYTE. > > > > The first step may be optimized by encode multiple lines instead of > > single line. > Even if we do optimize, this would be very slow, I think. Whether it is too slow or not depends on what filepos-to-bufferpos is used for. Do you know why filepos-to-bufferpos (and bufferpos-to-filepos) is introduced? > And what if the buffer has no newlines? In that case, just do the step 2. Or, we can use the bi-sectioning technique. > In any case, the problem is not with encoding, the problem is with > decoding. Encoding doesn't have this problem because we always encode > more than enough (we use the value of BYTE as the count of > _characters_ to encode, so for ISO-2022 encoding it is usually much > more than needed). By contrast, when decoding, we decode exactly > BYTE+1 bytes, which then hits the problem if that offset is inside a > shift sequence. Then, that implementation should be changed. Any coding system can have :post-read-conversion and :pre-write-conversion functions, it is not guaranteed that encoded byte length is greater than the number of characters. --- K. Handa handa@gnu.org