unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Gregory Heytings <gregory@heytings.org>, Kenichi Handa <handa@gnu.org>
Cc: 46933@debbugs.gnu.org
Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
Date: Sun, 21 Mar 2021 17:27:45 +0200	[thread overview]
Message-ID: <83ft0obk7i.fsf@gnu.org> (raw)
In-Reply-To: <9cff0f8894f167925251@heytings.org> (message from Gregory Heytings on Thu, 04 Mar 2021 21:21:24 +0000)

> Date: Thu, 04 Mar 2021 21:21:24 +0000
> From: Gregory Heytings <gregory@heytings.org>
> 
> (Disclaimer: I have no knowledge whatsoever about the ISO-2022-JP 
> encoding, and although this looks like a bug, I'm not sure this is 
> actually a bug; I report this at the suggesion of Eli in bug#46859.)
> 
> I downloaded the file [1], and converted it to the ISO-2022-JP encoding 
> with iconv -t iso-2022-jp one.txt > iso-2022-jp.txt.  The resulting file 
> is attached to this bug report.  It ends with two CRLFs, at byte offsets 
> 2993 and 2995.  However, after emacs -Q iso-2022-jp.txt, with M-: 
> (goto-char (filepos-to-bufferpos POS 'exact)) we get:
> 
> POS = 2991, 2992: last but one visible character (HIRAGANA LETTER RU)
> POS = 2993, 2994: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2995, 2996: first CRLF
> POS = 2997: second CRLF
> POS = 2998: point-max
> POS = 2999: first CRLF
> POS = 3000, 3001: second CRLF
> POS >= 3002: point-max
> 
> I would have expected:
> 
> POS = 2989, 2990: last but one visible character (HIRAGANA LETTER RU)
> POS = 2991, 2992: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2993, 2994: first CRLF
> POS = 2995, 2996: second CRLF
> POS >= 2997: point-max
> 
> The opposite operation M-: (bufferpos-to-filepos (- (point) POS) 'exact) 
> apparently also has bugs; its return values are not coherent with the 
> above ones:
> 
> POS = 0: 3003
> POS = 1: 3001
> POS = 2: 2999
> POS = 3 (IDEOGRAPHIC FULL STOP): 2997
> POS = 4 (HIRAGANA LETTER RU): 2995
> 
> I would have expected:
> 
> POS = 0: 2997
> POS = 1: 2995
> POS = 2: 2993
> POS = 3 (IDEOGRAPHIC FULL STOP): 2991
> POS = 4 (HIRAGANA LETTER RU): 2989
> 
> [1] https://darza.com/ecbackend/vendor/symfony/mime/Tests/Fixtures/samples/charsets/iso-2022-jp/one.txt

There's something strange going on here with encoding of the buffer
using iso-2022-jp-dos: near the end of the encoded bytestream, between
the encoded HIRAGANA LETTER KO (こ) and HIRAGANA LETTER TO (と), we
get 6 extra bytes: "ESC ( B ESC $ B".  AFAIU, this sequence mean
switch to ASCII and then switch back to Japanese.  So together these 6
bytes are a no-op as regards to their effect on the text, but they
disrupt the logic of filepos-to-bufferpos because they introduce extra
bytes that aren't there in the original file.

Kenichi, why are these 6 bytes inserted by encode-coding-region, but
not when we encode the same text as part of saving the buffer to its
file?  And why does it happen near the end of the text, between those
2 particular letters?





  reply	other threads:[~2021-03-21 15:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 21:21 bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Gregory Heytings
2021-03-21 15:27 ` Eli Zaretskii [this message]
2021-03-27  5:38   ` handa
2021-03-27  7:54     ` Eli Zaretskii
2021-03-27 13:23       ` handa
2021-03-27 13:54         ` Eli Zaretskii
2021-03-28 14:29           ` handa
2021-03-28 14:51             ` Eli Zaretskii
2021-04-01 15:14               ` handa
2021-04-01 15:25                 ` Eli Zaretskii
2021-04-01 15:32                 ` Eli Zaretskii
2021-04-03 16:12                   ` handa
2022-06-20  0:59                     ` Lars Ingebrigtsen
2022-06-20 11:52                       ` Eli Zaretskii
2022-06-21 10:40                         ` Lars Ingebrigtsen
2022-06-21 12:14                           ` Eli Zaretskii
2022-06-22  4:17                             ` Lars Ingebrigtsen
2022-06-22 13:11                               ` Eli Zaretskii
2021-03-27 14:24     ` Gregory Heytings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83ft0obk7i.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=46933@debbugs.gnu.org \
    --cc=gregory@heytings.org \
    --cc=handa@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).