From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#21055: Info reader fails to follow xrefs to anchors Date: Wed, 15 Jul 2015 18:09:56 +0300 Message-ID: <83k2u178zf.fsf@gnu.org> References: <87615o2l0e.fsf@gnu.org> <83h9p884w2.fsf@gnu.org> <87twt7x18d.fsf@gnu.org> <83y4ii7pmz.fsf@gnu.org> <874ml6s52n.fsf@mail.linkov.net> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1436973151 19615 80.91.229.3 (15 Jul 2015 15:12:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 15 Jul 2015 15:12:31 +0000 (UTC) Cc: ludo@gnu.org, 21055@debbugs.gnu.org To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Jul 15 17:12:17 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZFOM1-0001I6-TB for geb-bug-gnu-emacs@m.gmane.org; Wed, 15 Jul 2015 17:12:14 +0200 Original-Received: from localhost ([::1]:36348 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFOM1-0000D4-46 for geb-bug-gnu-emacs@m.gmane.org; Wed, 15 Jul 2015 11:12:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47642) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFOLv-0000AY-O9 for bug-gnu-emacs@gnu.org; Wed, 15 Jul 2015 11:12:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZFOLr-0002pD-Ku for bug-gnu-emacs@gnu.org; Wed, 15 Jul 2015 11:12:07 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:48727) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFOLr-0002p9-H8 for bug-gnu-emacs@gnu.org; Wed, 15 Jul 2015 11:12:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1ZFOLr-0005N9-9L for bug-gnu-emacs@gnu.org; Wed, 15 Jul 2015 11:12:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 15 Jul 2015 15:12:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 21055 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-Cc: ludo@gnu.org, bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.143697307220565 (code B ref -1); Wed, 15 Jul 2015 15:12:03 +0000 Original-Received: (at submit) by debbugs.gnu.org; 15 Jul 2015 15:11:12 +0000 Original-Received: from localhost ([127.0.0.1]:50172 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZFOL0-0005Lc-Lu for submit@debbugs.gnu.org; Wed, 15 Jul 2015 11:11:11 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:33863) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZFOKx-0005LE-Hl for submit@debbugs.gnu.org; Wed, 15 Jul 2015 11:11:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZFOKl-0002aT-PE for submit@debbugs.gnu.org; Wed, 15 Jul 2015 11:11:02 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:60889) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFOKl-0002aL-Lu for submit@debbugs.gnu.org; Wed, 15 Jul 2015 11:10:55 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47324) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFOKj-000071-W0 for bug-gnu-emacs@gnu.org; Wed, 15 Jul 2015 11:10:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZFOKc-0002Wr-L8 for bug-gnu-emacs@gnu.org; Wed, 15 Jul 2015 11:10:53 -0400 Original-Received: from mtaout28.012.net.il ([80.179.55.184]:59575) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFOKb-0002WK-OP; Wed, 15 Jul 2015 11:10:46 -0400 Original-Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NRJ00200AZS0B00@mtaout28.012.net.il>; Wed, 15 Jul 2015 18:09:51 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NRJ001KMBGFU520@mtaout28.012.net.il>; Wed, 15 Jul 2015 18:09:51 +0300 (IDT) In-reply-to: <874ml6s52n.fsf@mail.linkov.net> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:104977 Archived-At: > From: Juri Linkov > Cc: ludo@gnu.org (Ludovic Court=C3=A8s), > bug-gnu-emacs@gnu.org > Date: Wed, 15 Jul 2015 02:16:32 +0300 >=20 > I'm attaching here all the files that I used to fix bug#14125, > so you could compare the output of different makeinfo versions > and see the problem. The command line used to translate > Texinfo files was: makeinfo --split-size=3D2000 test.texi Thanks. I see the problem now. It only happened in makeinfo 5.0 and 5.1, and is fixed since 5.2. Furthermore, it only rears its ugly head if the Texinfo source has an @ifnottex block before the Top node; any other blurbs usually put there, like @copying, @direntry, etc. -- don't trigger the problem even in those 2 versions of makeinfo. Moreover, when this problem happens, it only affects the 1st subfile; the rest have their offsets set correctly. So it's a pretty rare combination of conditions. Therefore, I think we should fix the anchor use case by making the value returned from Info-read-subfile as accurate as possible, and then cater to the problematic output of makeinfo 5.0 and 5.1 by attempting another search for a node with a larger slop value. So any objections to the patch below? It introduces a new infrastructure, and then uses it to get the file byte offset corresponding to the first node on a subfile. --- lisp/international/mule-util.el~0=092015-06-21 06:45:33.000000000= +0300 +++ lisp/international/mule-util.el=092015-07-15 18:00:57.053036400 += 0300 @@ -412,6 +412,79 @@ (decode-coding-region (point-min) (min (point-max) (+ pm = byte)) coding-system t))))))))= )))) +;;;###autoload +(defun bufferpos-to-filepos (position &optional quality coding-syste= m) + "Try to return the file byte corresponding to a particular buffer = POSITION. +Value is the file position given as a (0-based) byte count. +The function presumes the file is encoded with CODING-SYSTEM, which = defaults +to `buffer-file-coding-system'. +QUALITY can be: + `approximate', in which case we may cut some corners to avoid + excessive work. + `exact', in which case we may end up re-(en/de)coding a large + part of the file/buffer. + nil, in which case we may return nil rather than an approximation.= " + (unless coding-system (setq coding-system buffer-file-coding-syste= m)) + (let* ((eol (coding-system-eol-type coding-system)) + (lineno (if (=3D eol 1) (1- (line-number-at-pos position)) = 0)) + (type (coding-system-type coding-system)) + (base (coding-system-base coding-system)) + byte) + (and (eq type 'utf-8) + ;; Any post-read/pre-write conversions mean it's not really= UTF-8. + (not (null (coding-system-get coding-system :post-read-conv= ersion))) + (setq type 'not-utf-8)) + (and (memq type '(charset raw-text undecided)) + ;; The following are all of type 'charset', but they are + ;; actually variable-width encodings. + (not (memq base '(chinese-gbk chinese-gb18030 euc-tw euc-ji= s-2004 + korean-iso-8bit chinese-iso-8= bit + japanese-iso-8bit chinese-big= 5-hkscs + japanese-cp932 korean-cp949))= ) + (setq type 'single-byte)) + (pcase type + (`utf-8 + (setq byte (position-bytes position)) + (when (null byte) + (if (<=3D position 0) + (setq byte 1) + (setq byte (position-bytes (point-max))))) + (setq byte (1- byte)) + (+ byte + ;; Account for BOM, if any. + (if (coding-system-get coding-system :bom) 3 0) + ;; Account for CR in CRLF pairs. + lineno)) + (`single-byte + (+ position -1 lineno)) + ((and `utf-16 + ;; FIXME: For utf-16, we could use the same approach as = used for + ;; dos EOLs (counting the number of non-BMP chars instea= d of the + ;; number of lines). + (guard (not (eq quality 'exact)))) + ;; In approximate mode, assume all characters are within the + ;; BMP, i.e. each one takes up 2 bytes. + (+ (* (1- position) 2) + ;; Account for BOM, if any. + (if (coding-system-get coding-system :bom) 2 0) + ;; Account for CR in CRLF pairs. + lineno)) + (_ + (pcase quality + (`approximate (+ (position-bytes position) -1 lineno)) + (`exact + ;; Rather than assume that the file exists and still holds= the right + ;; data, we reconstruct its relevant portion. + (let ((buf (current-buffer))) + (with-temp-buffer + (set-buffer-multibyte nil) + (let ((tmp-buf (current-buffer))) + (with-current-buffer buf + (save-restriction + (widen) + (encode-coding-region (point-min) (min (point-ma= x) position) + coding-system tmp-buf))) + (1- (point-max))))))))))) =0C (provide 'mule-util) =20 --- lisp/info.el~0=092015-06-16 10:34:22.000000000 +0300 +++ lisp/info.el=092015-07-15 18:08:58.585385400 +0300 @@ -1217,6 +1217,18 @@ =09=09 (goto-char pos) =09=09 (throw 'foo t))) =20 + ;; If the Texinfo source had an @ifnottex block of tex= t + ;; before the Top node, makeinfo 5.0 and 5.1 mistakenl= y + ;; omitted that block's size from the starting positio= n + ;; of the 1st subfile, which makes GUESSPOS overshoot + ;; the correct position by the length of that text. S= o + ;; we try again with a larger slop. + (goto-char (max (point-min) (- guesspos 10000))) +=09 (let ((pos (Info-find-node-in-buffer regexp strict-case))) +=09=09(when pos +=09=09 (goto-char pos) +=09=09 (throw 'foo t))) + (when (string-match "\\([^.]+\\)\\." nodename) (let (Info-point-loc) (Info-find-node-2 @@ -1553,10 +1565,13 @@ (if (looking-at "\^_") =09(forward-char 1) (search-forward "\n\^_")) - ;; Don't add the length of the skipped summary segment to - ;; the value returned to `Info-find-node-2'. (Bug#14125) (if (numberp nodepos) -=09(- nodepos lastfilepos)))) + ;; Our caller ('Info-find-node-2') wants the (zero-based) by= te + ;; offset corresponding to NODEPOS, from the beginning of th= e + ;; subfile. This is especially important if NODEPOS is for = an + ;; anchor reference, because for those the position is all w= e + ;; have. +=09(+ (- nodepos lastfilepos) (bufferpos-to-filepos (point) 'exact))= ))) =20 (defun Info-unescape-quotes (value) "Unescape double quotes and backslashes in VALUE."