From: Eli Zaretskii <eliz@gnu.org>
To: Gregory Heytings <gregory@heytings.org>, Kenichi Handa <handa@gnu.org>
Cc: 46933@debbugs.gnu.org
Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
Date: Sun, 21 Mar 2021 17:27:45 +0200 [thread overview]
Message-ID: <83ft0obk7i.fsf@gnu.org> (raw)
In-Reply-To: <9cff0f8894f167925251@heytings.org> (message from Gregory Heytings on Thu, 04 Mar 2021 21:21:24 +0000)
> Date: Thu, 04 Mar 2021 21:21:24 +0000
> From: Gregory Heytings <gregory@heytings.org>
>
> (Disclaimer: I have no knowledge whatsoever about the ISO-2022-JP
> encoding, and although this looks like a bug, I'm not sure this is
> actually a bug; I report this at the suggesion of Eli in bug#46859.)
>
> I downloaded the file [1], and converted it to the ISO-2022-JP encoding
> with iconv -t iso-2022-jp one.txt > iso-2022-jp.txt. The resulting file
> is attached to this bug report. It ends with two CRLFs, at byte offsets
> 2993 and 2995. However, after emacs -Q iso-2022-jp.txt, with M-:
> (goto-char (filepos-to-bufferpos POS 'exact)) we get:
>
> POS = 2991, 2992: last but one visible character (HIRAGANA LETTER RU)
> POS = 2993, 2994: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2995, 2996: first CRLF
> POS = 2997: second CRLF
> POS = 2998: point-max
> POS = 2999: first CRLF
> POS = 3000, 3001: second CRLF
> POS >= 3002: point-max
>
> I would have expected:
>
> POS = 2989, 2990: last but one visible character (HIRAGANA LETTER RU)
> POS = 2991, 2992: last visible character (IDEOGRAPHIC FULL STOP)
> POS = 2993, 2994: first CRLF
> POS = 2995, 2996: second CRLF
> POS >= 2997: point-max
>
> The opposite operation M-: (bufferpos-to-filepos (- (point) POS) 'exact)
> apparently also has bugs; its return values are not coherent with the
> above ones:
>
> POS = 0: 3003
> POS = 1: 3001
> POS = 2: 2999
> POS = 3 (IDEOGRAPHIC FULL STOP): 2997
> POS = 4 (HIRAGANA LETTER RU): 2995
>
> I would have expected:
>
> POS = 0: 2997
> POS = 1: 2995
> POS = 2: 2993
> POS = 3 (IDEOGRAPHIC FULL STOP): 2991
> POS = 4 (HIRAGANA LETTER RU): 2989
>
> [1] https://darza.com/ecbackend/vendor/symfony/mime/Tests/Fixtures/samples/charsets/iso-2022-jp/one.txt
There's something strange going on here with encoding of the buffer
using iso-2022-jp-dos: near the end of the encoded bytestream, between
the encoded HIRAGANA LETTER KO (こ) and HIRAGANA LETTER TO (と), we
get 6 extra bytes: "ESC ( B ESC $ B". AFAIU, this sequence mean
switch to ASCII and then switch back to Japanese. So together these 6
bytes are a no-op as regards to their effect on the text, but they
disrupt the logic of filepos-to-bufferpos because they introduce extra
bytes that aren't there in the original file.
Kenichi, why are these 6 bytes inserted by encode-coding-region, but
not when we encode the same text as part of saving the buffer to its
file? And why does it happen near the end of the text, between those
2 particular letters?
next prev parent reply other threads:[~2021-03-21 15:27 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-04 21:21 bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Gregory Heytings
2021-03-21 15:27 ` Eli Zaretskii [this message]
2021-03-27 5:38 ` handa
2021-03-27 7:54 ` Eli Zaretskii
2021-03-27 13:23 ` handa
2021-03-27 13:54 ` Eli Zaretskii
2021-03-28 14:29 ` handa
2021-03-28 14:51 ` Eli Zaretskii
2021-04-01 15:14 ` handa
2021-04-01 15:25 ` Eli Zaretskii
2021-04-01 15:32 ` Eli Zaretskii
2021-04-03 16:12 ` handa
2022-06-20 0:59 ` Lars Ingebrigtsen
2022-06-20 11:52 ` Eli Zaretskii
2022-06-21 10:40 ` Lars Ingebrigtsen
2022-06-21 12:14 ` Eli Zaretskii
2022-06-22 4:17 ` Lars Ingebrigtsen
2022-06-22 13:11 ` Eli Zaretskii
2021-03-27 14:24 ` Gregory Heytings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83ft0obk7i.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=46933@debbugs.gnu.org \
--cc=gregory@heytings.org \
--cc=handa@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.