From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Wolfgang Jenkner Newsgroups: gmane.emacs.bugs Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Date: Tue, 16 Jun 2015 17:40:38 +0200 Message-ID: <85k2v3veg7.fsf@iznogoud.viz> References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1434470054 17184 80.91.229.3 (16 Jun 2015 15:54:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 16 Jun 2015 15:54:14 +0000 (UTC) Cc: 20783@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jun 16 17:54:02 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z4tBX-0000Wo-OY for geb-bug-gnu-emacs@m.gmane.org; Tue, 16 Jun 2015 17:54:00 +0200 Original-Received: from localhost ([::1]:41087 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4tBW-0002WE-K4 for geb-bug-gnu-emacs@m.gmane.org; Tue, 16 Jun 2015 11:53:58 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41592) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4tAi-0001fc-Pa for bug-gnu-emacs@gnu.org; Tue, 16 Jun 2015 11:53:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z4tAd-0001Er-4i for bug-gnu-emacs@gnu.org; Tue, 16 Jun 2015 11:53:08 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:41553) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z4tAd-0001EZ-0W for bug-gnu-emacs@gnu.org; Tue, 16 Jun 2015 11:53:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Z4tAc-0004PZ-DR for bug-gnu-emacs@gnu.org; Tue, 16 Jun 2015 11:53:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <85fv5za8vv.fsf@iznogoud.viz> Resent-From: Wolfgang Jenkner Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 16 Jun 2015 15:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143446992316889 (code B ref 20783); Tue, 16 Jun 2015 15:53:02 +0000 Original-Received: (at 20783) by debbugs.gnu.org; 16 Jun 2015 15:52:03 +0000 Original-Received: from localhost ([127.0.0.1]:56013 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4t9e-0004OL-MM for submit@debbugs.gnu.org; Tue, 16 Jun 2015 11:52:03 -0400 Original-Received: from b2bfep14.mx.upcmail.net ([62.179.121.59]:58943) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4t9b-0004Np-Q3 for 20783@debbugs.gnu.org; Tue, 16 Jun 2015 11:52:00 -0400 Original-Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep14.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150616155152.VUOS1806.b2bfep14-int.chello.at@edge12.upcmail.net> for <20783@debbugs.gnu.org>; Tue, 16 Jun 2015 17:51:52 +0200 Original-Received: from iznogoud.viz ([91.119.133.139]) by edge12.upcmail.net with edge id h3rs1q00Y30cwdn0C3rsEo; Tue, 16 Jun 2015 17:51:52 +0200 X-SourceIP: 91.119.133.139 Original-Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z4t9U-0001JV-4b; Tue, 16 Jun 2015 17:51:52 +0200 User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:103993 Archived-At: --=-=-= Content-Type: text/plain On Thu, Jun 11 2015, Wolfgang Jenkner wrote: > The loop could be improved a bit by doing pointer arithmetic like in > DEC_POS I wondered whether such a change (to avoid unnecessary buffer gap considerations while in the middle of a multibyte character) would actually make a measurable difference, so, silly as that may be, I wrote a simple benchmark for byte-to-position, using the tutorials as data samples, and passed the results to ministat(1)[*], please see the attached btp-ministat.el and ministat.sh for details. [*] https://www.freebsd.org/cgi/man.cgi?query=ministat&sektion=1&manpath=FreeBSD+10.1-RELEASE The result is that ministat reports statistical differences for several of the tutorials (but not generally for the same languages at each run, system load apparently generating too much statistical noise) and I find that the version with the DEC_POS like loop is _always_ faster in those cases (judging from the average values or just by taking a quick look at the histograms). So, while this is not really very important, it seems that it would be safe to use the following patch with the improved loop instead: --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-src-editfns.c-Fbyte_to_position-Fix-bytepos-not-at-c.patch Content-Description: improve loop >From be2adf5b7b427ee5d84c9ae011d8d11d452c2f4e Mon Sep 17 00:00:00 2001 From: Wolfgang Jenkner Date: Thu, 11 Jun 2015 16:21:21 +0200 Subject: [PATCH] * src/editfns.c (Fbyte_to_position): Fix bytepos not at char boundary. The behavior now matches the description in the manual. (Bug#20783) --- src/editfns.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/src/editfns.c b/src/editfns.c index cddb0d4..ff54e73 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -1025,10 +1025,28 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0, If BYTEPOS is out of range, the value is nil. */) (Lisp_Object bytepos) { + ptrdiff_t pos_byte; + CHECK_NUMBER (bytepos); - if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE) + pos_byte = XINT (bytepos); + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) return Qnil; - return make_number (BYTE_TO_CHAR (XINT (bytepos))); + if (Z != Z_BYTE) + /* There are multibyte characters in the buffer. + The argument of BYTE_TO_CHAR must be a byte position at + a character boundary, so search for the start of the current + character. */ + { + unsigned char *chp = BYTE_POS_ADDR (pos_byte); + + while (!CHAR_HEAD_P (*chp)) + { + pos_byte--; + /* There's no buffer gap in the middle of a character. */ + chp--; + } + } + return make_number (BYTE_TO_CHAR (pos_byte)); } DEFUN ("following-char", Ffollowing_char, Sfollowing_char, 0, 0, 0, -- 2.4.2 --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=btp-ministat.el Content-Transfer-Encoding: quoted-printable Content-Description: byte-to-position benchmark (defun benchmark--byte-to-position (file n &optional move-gap) "Loop N times through all bytes in FILE and compute their position." (let ((form '(let ((i 1)) (while (byte-to-position i) (setq i (1+ i)))))) (with-temp-buffer (insert-file-contents file) (when move-gap ;; Try to minimize buffer gap calculations by moving it to eob. (goto-char (point-max)) (insert " ")) (eval `(benchmark-run-compiled ,n ,form))))) (defun benchmark--byte-to-position-files () "Return a list of the absolute file names of the emacs tutorials." (directory-files (expand-file-name "tutorials" data-directory) t "TUTORIAL\\(:?\\.[a-z][a-z]\\(:?_[A-Z][A-Z]\\)?\\)?\\'")) (defun benchmark--byte-to-position-results (n &optional move-gap) "Generate an input file suitable for ministat(1). Each column corresponds to one of the tutorials and holds the results of 10 * N times running N loops of the benchmark above for it." (let ((files (benchmark--byte-to-position-files))) ;; Print a header line. (princ "#") (dolist (file files) (let ((locale (or (file-name-extension file) "en")) ;; Compute the average byte/char count. (avg (with-temp-buffer (insert-file-contents file) (/ (position-bytes (point-max)) (point-max) 1.0)))) (princ (format "%s:%f\t" locale avg)))) (dotimes (_ (* 10 n)) (terpri) (dolist (file files) (let* ((result (benchmark--byte-to-position file n move-gap))) (princ (format "%f\t" (car result)) t)))) (terpri))) --=-=-= Content-Type: text/x-sh Content-Disposition: attachment; filename=ministat.sh Content-Description: ministat wrapper #! /bin/sh old="$(mktemp -t old)" new="$(mktemp -t new)" emacs_version=25.0.50 old_emacs=./src/emacs-${emacs_version}.1 new_emacs=./src/emacs-${emacs_version}.2 emacs_flags="--batch -Q -L . --load btp-ministat.el --eval '(benchmark--byte-to-position-results 10)'" eval $old_emacs $emacs_flags >"$old" eval $new_emacs $emacs_flags >"$new" locales="$(head -1 "$old" | sed s'/^#//')" i=1 for l in $locales; do echo echo "--- $l ---" ministat -s -C$i "$old" "$new" i=$(($i + 1)) done --=-=-=--