unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: handa <handa@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: handa@gnu.org, gregory@heytings.org, 46933@debbugs.gnu.org
Subject: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos
Date: Sun, 04 Apr 2021 01:12:06 +0900	[thread overview]
Message-ID: <87im53ny95.fsf@gnu.org> (raw)
In-Reply-To: <83zgyif2aq.fsf@gnu.org> (message from Eli Zaretskii on Thu, 01 Apr 2021 18:32:45 +0300)

In article <83zgyif2aq.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> Leaving the :pre-write/:post-read-conversion use case aside, do we
> have some means of find where ISO-2022 shift-in/out sequence begins
> and ends, so that we never try to decode a partial sequence (and
> produce "characters" that are not really in the original buffer)?
> If not, where can I find the description of every kind of such
> sequences, i.e. sequences that modify the decoder state without
> producing any characters?

The official definition is in the standard ISO/IEC 2022, but
it seems that this wiki page:
  https://en.wikipedia.org/wiki/ISO/IEC_2022
is more concise.  Emacs implements all control sequences shown in the
sections: "Shift functions", "Character set designations", and
"Interaction with other coding systems".

> > By the way, what is the intention of filepos-to-bufferpos?  Why that
> > function was introduce?

> The original (and so far the only) use case was an Info manual
> separated into several files, where the tag table at the end of the
> main file specifies offsets in bytes.  See the function
> Info-find-node-2 in info.el.

As filepos-to-bufferpos accepts the optional arg CODING-SYSTEM,
I've thought BYTE arg is:
  a byte position in a file that will be created by encoding the current
  buffer by CODING-SYSTEM

But it seems that the usage in Info-find-node-2 is:
  a byte position in an existing file that may not be created by Emacs

There's a case that they are different.  The method I wrote in the
previous mail works only in the former case.   And it seems that the
current implementation of filepos-to-bufferpos is the same because it
tries to get byte sequence by encode-coding-region.

For the latter case, perhaps something like the following code works.

;; Return the buffer position correspoinding to the byte position
;; FILEPOS in FILE provided that FILE is decoded by CODING-SYSTEM.
(defun temp (file filepos coding-system)
  (with-temp-buffer
    (set-buffer-multibyte nil)
    (insert-file-contents-literally file)
    (let ((full (decode-coding-region 1 (point-max) coding-system t))
	  partial)
      (while (and (setq partial (decode-coding-region 1 (1+ filepos)
						      coding-system t))
		  (not (eq (compare-strings full 0 (length partial)
					    partial 0 (length partial))
			   t)))
	  (setq filepos (1+ filepos)))
      (1+ (length partial)))))

If it is too slow, there are a few ways to make it faster.

---
K. Handa
handa@gnu.org





  reply	other threads:[~2021-04-03 16:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 21:21 bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Gregory Heytings
2021-03-21 15:27 ` Eli Zaretskii
2021-03-27  5:38   ` handa
2021-03-27  7:54     ` Eli Zaretskii
2021-03-27 13:23       ` handa
2021-03-27 13:54         ` Eli Zaretskii
2021-03-28 14:29           ` handa
2021-03-28 14:51             ` Eli Zaretskii
2021-04-01 15:14               ` handa
2021-04-01 15:25                 ` Eli Zaretskii
2021-04-01 15:32                 ` Eli Zaretskii
2021-04-03 16:12                   ` handa [this message]
2022-06-20  0:59                     ` Lars Ingebrigtsen
2022-06-20 11:52                       ` Eli Zaretskii
2022-06-21 10:40                         ` Lars Ingebrigtsen
2022-06-21 12:14                           ` Eli Zaretskii
2022-06-22  4:17                             ` Lars Ingebrigtsen
2022-06-22 13:11                               ` Eli Zaretskii
2021-03-27 14:24     ` Gregory Heytings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87im53ny95.fsf@gnu.org \
    --to=handa@gnu.org \
    --cc=46933@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=gregory@heytings.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).