unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Wolfgang Jenkner <wjenkner@inode.at>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 20783@debbugs.gnu.org
Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug
Date: Tue, 16 Jun 2015 17:40:38 +0200	[thread overview]
Message-ID: <85k2v3veg7.fsf@iznogoud.viz> (raw)
In-Reply-To: <85fv5za8vv.fsf@iznogoud.viz>

[-- Attachment #1: Type: text/plain, Size: 1135 bytes --]

On Thu, Jun 11 2015, Wolfgang Jenkner wrote:

> The loop could be improved a bit by doing pointer arithmetic like in
> DEC_POS

I wondered whether such a change (to avoid unnecessary buffer gap
considerations while in the middle of a multibyte character) would
actually make a measurable difference, so, silly as that may be, I wrote
a simple benchmark for byte-to-position, using the tutorials as data
samples, and passed the results to ministat(1)[*], please see the
attached btp-ministat.el and ministat.sh for details.

[*] https://www.freebsd.org/cgi/man.cgi?query=ministat&sektion=1&manpath=FreeBSD+10.1-RELEASE

The result is that ministat reports statistical differences for several
of the tutorials (but not generally for the same languages at each run,
system load apparently generating too much statistical noise) and I find
that the version with the DEC_POS like loop is _always_ faster in those
cases (judging from the average values or just by taking a quick look at
the histograms).

So, while this is not really very important, it seems that it would be
safe to use the following patch with the improved loop instead:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: improve loop --]
[-- Type: text/x-diff, Size: 1533 bytes --]

From be2adf5b7b427ee5d84c9ae011d8d11d452c2f4e Mon Sep 17 00:00:00 2001
From: Wolfgang Jenkner <wjenkner@inode.at>
Date: Thu, 11 Jun 2015 16:21:21 +0200
Subject: [PATCH] * src/editfns.c (Fbyte_to_position): Fix bytepos not at char
 boundary.

The behavior now matches the description in the manual.  (Bug#20783)
---
 src/editfns.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/editfns.c b/src/editfns.c
index cddb0d4..ff54e73 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -1025,10 +1025,28 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0,
 If BYTEPOS is out of range, the value is nil.  */)
   (Lisp_Object bytepos)
 {
+  ptrdiff_t pos_byte;
+
   CHECK_NUMBER (bytepos);
-  if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE)
+  pos_byte = XINT (bytepos);
+  if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE)
     return Qnil;
-  return make_number (BYTE_TO_CHAR (XINT (bytepos)));
+  if (Z != Z_BYTE)
+    /* There are multibyte characters in the buffer.
+       The argument of BYTE_TO_CHAR must be a byte position at
+       a character boundary, so search for the start of the current
+       character.  */
+    {
+      unsigned char *chp = BYTE_POS_ADDR (pos_byte);
+
+      while (!CHAR_HEAD_P (*chp))
+	{
+	  pos_byte--;
+	  /* There's no buffer gap in the middle of a character.  */
+	  chp--;
+	}
+    }
+  return make_number (BYTE_TO_CHAR (pos_byte));
 }
 \f
 DEFUN ("following-char", Ffollowing_char, Sfollowing_char, 0, 0, 0,
-- 
2.4.2


[-- Attachment #3: byte-to-position benchmark --]
[-- Type: application/emacs-lisp, Size: 1569 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: ministat wrapper --]
[-- Type: text/x-sh, Size: 490 bytes --]

#! /bin/sh

old="$(mktemp -t old)"
new="$(mktemp -t new)"
emacs_version=25.0.50
old_emacs=./src/emacs-${emacs_version}.1
new_emacs=./src/emacs-${emacs_version}.2
emacs_flags="--batch -Q -L . --load btp-ministat.el --eval '(benchmark--byte-to-position-results 10)'"

eval $old_emacs $emacs_flags >"$old"
eval $new_emacs $emacs_flags >"$new"

locales="$(head -1 "$old" | sed s'/^#//')"

i=1
for l in $locales; do
	echo
	echo "--- $l ---"
	ministat -s -C$i "$old" "$new"
	i=$(($i + 1))
done
	

  parent reply	other threads:[~2015-06-16 15:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-10 15:13 bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Wolfgang Jenkner
2015-06-10 17:17 ` Eli Zaretskii
2015-06-11 15:24   ` Wolfgang Jenkner
2015-06-11 16:04     ` Eli Zaretskii
2015-06-11 16:41       ` Wolfgang Jenkner
2015-06-11 19:08         ` Eli Zaretskii
2015-06-17 12:19       ` Wolfgang Jenkner
2015-06-17 16:57         ` Eli Zaretskii
2015-06-16 15:40     ` Wolfgang Jenkner [this message]
2015-06-16 16:08       ` Eli Zaretskii
2015-06-16 16:31         ` Wolfgang Jenkner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85k2v3veg7.fsf@iznogoud.viz \
    --to=wjenkner@inode.at \
    --cc=20783@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).