From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#21055: Info reader fails to follow xrefs to anchors Date: Tue, 14 Jul 2015 17:57:56 +0300 Message-ID: <83y4ii7pmz.fsf@gnu.org> References: <87615o2l0e.fsf@gnu.org> <83h9p884w2.fsf@gnu.org> <87twt7x18d.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: ger.gmane.org 1436885966 18460 80.91.229.3 (14 Jul 2015 14:59:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 14 Jul 2015 14:59:26 +0000 (UTC) Cc: 21055@debbugs.gnu.org To: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=), Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Jul 14 16:59:14 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1ZF1fs-0006LI-Eh for geb-bug-gnu-emacs@m.gmane.org; Tue, 14 Jul 2015 16:59:12 +0200 Original-Received: from localhost ([::1]:59929 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF1fr-0007kY-Oy for geb-bug-gnu-emacs@m.gmane.org; Tue, 14 Jul 2015 10:59:11 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57708) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF1fm-0007kI-M8 for bug-gnu-emacs@gnu.org; Tue, 14 Jul 2015 10:59:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZF1fj-0003aO-FQ for bug-gnu-emacs@gnu.org; Tue, 14 Jul 2015 10:59:06 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:47695) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF1fj-0003aA-BL for bug-gnu-emacs@gnu.org; Tue, 14 Jul 2015 10:59:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1ZF1fi-0001cN-TT for bug-gnu-emacs@gnu.org; Tue, 14 Jul 2015 10:59:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 14 Jul 2015 14:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 21055 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-Cc: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.14368858926141 (code B ref -1); Tue, 14 Jul 2015 14:59:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 14 Jul 2015 14:58:12 +0000 Original-Received: from localhost ([127.0.0.1]:49141 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZF1et-0001aw-PI for submit@debbugs.gnu.org; Tue, 14 Jul 2015 10:58:12 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:43681) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZF1er-0001aa-5c for submit@debbugs.gnu.org; Tue, 14 Jul 2015 10:58:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZF1ek-0003D9-6K for submit@debbugs.gnu.org; Tue, 14 Jul 2015 10:58:03 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:59002) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF1ek-0003D5-4J for submit@debbugs.gnu.org; Tue, 14 Jul 2015 10:58:02 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57154) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF1ei-0007hS-Pa for bug-gnu-emacs@gnu.org; Tue, 14 Jul 2015 10:58:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZF1ec-00032h-6s for bug-gnu-emacs@gnu.org; Tue, 14 Jul 2015 10:58:00 -0400 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:48840) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF1eb-00031E-Q7; Tue, 14 Jul 2015 10:57:54 -0400 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0NRH00K00G7IS700@a-mtaout21.012.net.il>; Tue, 14 Jul 2015 17:57:52 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NRH00K6CG8FMJ80@a-mtaout21.012.net.il>; Tue, 14 Jul 2015 17:57:52 +0300 (IDT) In-reply-to: <87twt7x18d.fsf@gnu.org> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:104968 Archived-At: Redirecting the Emacs part to bug-gnu-emacs; see http://lists.gnu.org/archive/html/bug-texinfo/2015-07/msg00051.html for the related Texinfo discussion. CC to Juri, who made the offending change. > From: ludo@gnu.org (Ludovic Court=C3=A8s) > Cc: bug-texinfo@gnu.org > Date: Mon, 13 Jul 2015 22:16:02 +0200 >=20 > The standalone Info reader in Texinfo 6.0 fails to follow > cross-references to anchors: Following such a link leads to an unre= lated > place in the document. This is a regression compared to Texinfo 5.= 2 > (guix.texi is one example that illustrates the bug.) >=20 > Unfortunately the Emacs Info reader has had the same problem for a = long > time, but I suppose this one should go to bug-emacs? >=20 > That=E2=80=99s with 24.5.1, and I remember experience that with ear= lier > versions too. There are two issues here. One is that Emacs 24.4 introduced a change, as part of fixing bug #14125 (see http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D14125), which caused t= he Emacs Info reader to go to the wrong place when it follows cross-references to anchors (as opposed to references to nodes). The other problem is the generous use of UTF-8 encoded characters in guix.info, including in the preamble, which makes Emacs's job even harder, because references in Info files are given in bytes, not characters. The second problem needs an infrastructure, part of which was introduced only recently: how to convert a file byte offset to an Emacs buffer position (which counts characters), accounting correctly for the file's encoding and EOL format. It sounds like we would need the reverse conversion for fixing this present problem, see below. As for the first part: I've read the discussions in bug #14125, and tried playing with the test file provided there, and I must say that = I understand neither the problem nor its solution. The analysis of the problem (see http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D14125#11) was this: > Makeinfo 4.13 produced the character positions of indirect subfiles > relative to the beginning of the first node, but Makeinfo 5.0 produ= ces the > positions relative to the beginning of the subfile. The Emacs Info= reader > fails when the distance between the beginning of the subfile and > the beginning of its first node is longer than a thousand character= s. > [...] > The expression (+ (- nodepos lastfilepos) (point)) in `Info-read-su= bfile' > assumes that `lastfilepos' in `Info-read-subfile' is the beginning = of the > first node, so for Info files produced by Makeinfo 4.13 it returns = the > length of the summary segment, but for Makeinfo 5.0 it returns > two lengths of the summary segment. Perhaps I don't understand what this says, but the conclusion sounds incorrect to me. The actual difference between makeinfo 4.13 and makeinfo 5.0 and late= r is that with makeinfo 5 the starting position of the 2nd, 3rd, etc. subfile includes the length of the preamble text that precedes the first node in the subfile. In makeinfo 4, only the beginning of the first subfile included the preamble, and all the rest excluded it= . But that doesn't matter, IMO, because with both versions of makeinfo, if a subfile's beginning is recorded in the tag table as byte positio= n N, the first node in that subfile is also recorded to start at byte position N. Therefore, to find the byte offset of a node/anchor from the beginning of a subfile, one needs to do this: (+ (- nodepos lastfilepos) preamble-length) in both the old and the new versions. To find the length of the preamble, one needs to search from the beginning of the subfile for the start of the first node, and then compute the file's byte number of that position. Therefore, the original code in Info-read-subfile, viz.: (+ (- nodepos lastfilepos) (point)) was an approximation that did TRT for ASCII Info files. It is easy t= o extend this to UTF-8 encoded files: (+ (- nodepos lastfilepos) (position-bytes (point))) Other encodings, as well as DOS end-of-line format, will need a dedicated function similar to filepos-to-bufferpos, but in the reverse direction. (We also need to subtract 1 from the above expression, since we need a zero-based offset.) Juri, do you see any flaws in the above description? I couldn't reproduce the problem reported in bug #14125, so I'm not sure why the fix you installed was even needed, or where my reasoning is wrong. I tried both Emacs 24.3 (for which the bug was filed) and later versions, and they all work correctly with the Info file produced fro= m the Texinfo source attached to that bug report, no matter if I produc= e the Info file with makeinfo 4.13 or makeinfo 5.1 or 6.0. So I'm unsure what problems you saw with the original code in Info-read-subfile. Could you describe those problems in more detail than you did in the bug discussions? Why are these problems invisible when following references to nodes, you ask? Because in that case we search for the node's header line after going to the recorded position. So going to a position that undershoots (which is what that change caused) doesn't do any visible harm. But for references to anchors, we don't have any text to search, so the position where we place the reader should be reasonabl= y exact.