* Support double colons in Info index entries @ 2019-01-09 21:14 Gavin Smith 2019-01-11 0:04 ` bug#34023: " Juri Linkov 2019-01-11 0:53 ` Glenn Morris 0 siblings, 2 replies; 11+ messages in thread From: Gavin Smith @ 2019-01-09 21:14 UTC (permalink / raw) To: bug-gnu-emacs; +Cc: bug-texinfo Emacs version checked: 26.1. In the Info format colons are special, and for this reason, there is limited support for colons in index entries. The Emacs Info mode supports single colons in index entries as long as they are not followed by a space. There is this comment at the start of info.el: ;; Note that nowadays we expect Info files to be made using makeinfo. ;; In particular we make these assumptions: ;; - a menu item MAY contain colons but not colon-space ": " ;; - a menu item ending with ": " (but not ":: ") is an index entry ;; - a node name MAY NOT contain a colon ;; This distinction is to support indexing of computer programming ;; language terms that may contain ":" but not ": ". It doesn't state it, but when I tested it double colons don't work even if they are not followed by a space. There is a fairly simple solution to this problem that I haven't seen suggested in all the messages posted on this topic in the mailing list archives. In index nodes only (which have a special marker included, ^@^H[index^@^H]), use a colon to terminate the text of the index entry, but instead of looking for the first colon in the line, look for the last. So this entry: * a::b: a colon b. (line 129) would refer to line 129 of the node "a colon b". This is possible because node names cannot contain colons. This restriction is not too important, whereas the inability to index items containing colons is quite important. This is what is implemented in the standalone info browser (since change on 2017-04-08). This change shouldn't be made for all nodes, because the comment after the closing '.' could contain a colon: * label: node. comment: with a colon. This shouldn't be interpreted as refering to a node "with a colon". However, the "(line ...)" comment can't contain a colon. I'm not familiar with Emacs Lisp enough to propose a patch to implement this change myself. The standalone info program also implemented a quoting mechanism (surrounding the text with a pair of 0x7F bytes) to allow nearly all characters to be included in node names and index entries. This has never been implemented in Emacs Info and has never been used by default in texi2any's output. I think my suggestion above would be sufficient and would work with existing Info files and versions of texi2any/makeinfo without anything breaking. The quoting mechanism could potentially be removed from texi2any and info as nobody has ever used it and it makes things more complicated for no reason. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-09 21:14 Support double colons in Info index entries Gavin Smith @ 2019-01-11 0:04 ` Juri Linkov 2019-01-11 0:28 ` Drew Adams ` (2 more replies) 2019-01-11 0:53 ` Glenn Morris 1 sibling, 3 replies; 11+ messages in thread From: Juri Linkov @ 2019-01-11 0:04 UTC (permalink / raw) To: Gavin Smith; +Cc: 34023, bug-texinfo [-- Attachment #1: Type: text/plain, Size: 2186 bytes --] Hi Gavin, > In the Info format colons are special, and for this reason, there is > limited support for colons in index entries. The Emacs Info mode > supports single colons in index entries as long as they are not followed > by a space. Thanks for the detailed description. > It doesn't state it, but when I tested it double colons don't work even > if they are not followed by a space. > > There is a fairly simple solution to this problem that I haven't seen > suggested in all the messages posted on this topic in the mailing list > archives. In index nodes only (which have a special marker included, > ^@^H[index^@^H]), use a colon to terminate the text of the index entry, > but instead of looking for the first colon in the line, look for the > last. So this entry: > > * a::b: a colon b. (line 129) > > would refer to line 129 of the node "a colon b". This is possible > because node names cannot contain colons. This restriction is not too > important, whereas the inability to index items containing colons is > quite important. This is what is implemented in the standalone info > browser (since change on 2017-04-08). The following patch handles the cases that you presented, but it's hard to predict what other cases it might break. Do you have a sample test file that covers different cases? We could add such file to Emacs regression tests. > This change shouldn't be made for all nodes, because the comment after > the closing '.' could contain a colon: > > * label: node. comment: with a colon. > > This shouldn't be interpreted as refering to a node "with a colon". > > However, the "(line ...)" comment can't contain a colon. The following change is made only for index nodes. I have to say that the current regexp-based parsing is an inherently fragile approach. Do you think it would be possible to add more markup to Info files instead of relying on regexps? Like index nodes having a special marker ^@^H[index^@^H] maybe adding some markers to identify index entries, node references, line numbers? Better yet would be to read Info manual in HTML format in Info reader. That would allow extracting all information unambiguously. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: info.el.support-double-colons-in-Info-index-entries.patch --] [-- Type: text/x-diff, Size: 2523 bytes --] diff --git a/lisp/info.el b/lisp/info.el index 6038273c37..2f7e293297 100644 --- a/lisp/info.el +++ b/lisp/info.el @@ -2664,9 +2664,15 @@ Info-menu-entry-name-re Because of ambiguities, this should be concatenated with something like `:' and `Info-following-node-name-re'.") +(defconst Info-index-entry-name-re "\\(?:[^:]\\|:[^,.;() \t\n]\\)*" + "Regexp that matches an index entry name possibly including a colon.") + (defun Info-extract-menu-node-name (&optional multi-line index-node) (skip-chars-forward " \t\n") - (when (looking-at (concat Info-menu-entry-name-re ":\\(:\\|" + (when (looking-at (concat (if index-node + Info-index-entry-name-re + Info-menu-entry-name-re + ) ":\\(:\\|" (Info-following-node-name-re (cond (index-node "^,\t\n") @@ -2741,7 +2747,9 @@ Info-complete-menu-item (t (let ((pattern (concat "\n\\* +\\(" (regexp-quote string) - Info-menu-entry-name-re "\\):" + (if (Info-index-node) + Info-index-entry-name-re + Info-menu-entry-name-re) "\\):" Info-node-spec-re)) completions (complete-nodes Info-complete-nodes)) @@ -3966,7 +3974,8 @@ Info-try-follow-nearest-node (setq node t)) (setq node nil)))) ;; menu item: node name - ((setq node (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::")) + ((setq node (unless (Info-index-node) + (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::"))) (Info-goto-node node fork)) ;; menu item: node name or index entry ((Info-get-token (point) "\\* +" "\\* +\\(.*\\): ") @@ -4929,7 +4938,9 @@ Info-fontify-node (let ((n 0) cont) (while (re-search-forward - (concat "^\\* Menu:\\|\\(?:^\\* +\\(" Info-menu-entry-name-re "\\)\\(:" + (concat "^\\* Menu:\\|\\(?:^\\* +\\(" (if (Info-index-node) + Info-index-entry-name-re + Info-menu-entry-name-re) "\\)\\(:" Info-node-spec-re "\\([ \t]*\\)\\)\\)") nil t) (when (match-beginning 1) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-11 0:04 ` bug#34023: " Juri Linkov @ 2019-01-11 0:28 ` Drew Adams 2019-01-11 19:46 ` Gavin Smith [not found] ` <20190111194631.GA14925@darkstar> 2 siblings, 0 replies; 11+ messages in thread From: Drew Adams @ 2019-01-11 0:28 UTC (permalink / raw) To: Juri Linkov, Gavin Smith; +Cc: 34023, bug-texinfo > The Emacs Info mode supports single colons in index > entries as long as they are not followed by a space. I thought they were verboten altogether. Does this mean that we can finally have index entries such as `:type'? That would be good. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-11 0:04 ` bug#34023: " Juri Linkov 2019-01-11 0:28 ` Drew Adams @ 2019-01-11 19:46 ` Gavin Smith [not found] ` <20190111194631.GA14925@darkstar> 2 siblings, 0 replies; 11+ messages in thread From: Gavin Smith @ 2019-01-11 19:46 UTC (permalink / raw) To: Juri Linkov; +Cc: 34023, bug-texinfo On Fri, Jan 11, 2019 at 02:04:32AM +0200, Juri Linkov wrote: > The following patch handles the cases that you presented, > but it's hard to predict what other cases it might break. > > Do you have a sample test file that covers different cases? > We could add such file to Emacs regression tests. I've attached a file that includes different possibilities. > I have to say that the current regexp-based parsing is > an inherently fragile approach. Do you think it would be possible > to add more markup to Info files instead of relying on regexps? I don't understand. Whatever markup is added has to be read somehow, with regexp or other. > Better yet would be to read Info manual in HTML format in Info reader. > That would allow extracting all information unambiguously. That would be a different project with several unresolved questions; this could be the way forward in the long term. I would be opposed to making the standalone info program read HTML as this would be a complete rewrite of the program and there are probably better ways of dealing with it. > diff --git a/lisp/info.el b/lisp/info.el > index 6038273c37..2f7e293297 100644 > --- a/lisp/info.el > +++ b/lisp/info.el > @@ -2664,9 +2664,15 @@ Info-menu-entry-name-re > Because of ambiguities, this should be concatenated with something like > `:' and `Info-following-node-name-re'.") > > +(defconst Info-index-entry-name-re "\\(?:[^:]\\|:[^,.;() \t\n]\\)*" > + "Regexp that matches an index entry name possibly including a colon.") > + > (defun Info-extract-menu-node-name (&optional multi-line index-node) > (skip-chars-forward " \t\n") > - (when (looking-at (concat Info-menu-entry-name-re ":\\(:\\|" > + (when (looking-at (concat (if index-node > + Info-index-entry-name-re > + Info-menu-entry-name-re > + ) ":\\(:\\|" > (Info-following-node-name-re > (cond > (index-node "^,\t\n") > @@ -2741,7 +2747,9 @@ Info-complete-menu-item > (t > (let ((pattern (concat "\n\\* +\\(" > (regexp-quote string) > - Info-menu-entry-name-re "\\):" > + (if (Info-index-node) > + Info-index-entry-name-re > + Info-menu-entry-name-re) "\\):" > Info-node-spec-re)) > completions > (complete-nodes Info-complete-nodes)) > @@ -3966,7 +3974,8 @@ Info-try-follow-nearest-node > (setq node t)) > (setq node nil)))) > ;; menu item: node name > - ((setq node (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::")) > + ((setq node (unless (Info-index-node) > + (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::"))) > (Info-goto-node node fork)) > ;; menu item: node name or index entry > ((Info-get-token (point) "\\* +" "\\* +\\(.*\\): ") > @@ -4929,7 +4938,9 @@ Info-fontify-node > (let ((n 0) > cont) > (while (re-search-forward > - (concat "^\\* Menu:\\|\\(?:^\\* +\\(" Info-menu-entry-name-re "\\)\\(:" > + (concat "^\\* Menu:\\|\\(?:^\\* +\\(" (if (Info-index-node) > + Info-index-entry-name-re > + Info-menu-entry-name-re) "\\)\\(:" > Info-node-spec-re "\\([ \t]*\\)\\)\\)") > nil t) > (when (match-beginning 1) ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20190111194631.GA14925@darkstar>]
* bug#34023: Support double colons in Info index entries [not found] ` <20190111194631.GA14925@darkstar> @ 2019-01-11 19:49 ` Gavin Smith 2019-01-13 0:55 ` Juri Linkov 1 sibling, 0 replies; 11+ messages in thread From: Gavin Smith @ 2019-01-11 19:49 UTC (permalink / raw) To: Juri Linkov, 34023, bug-texinfo [-- Attachment #1: Type: text/plain, Size: 140 bytes --] On Fri, Jan 11, 2019 at 07:46:31PM +0000, Gavin Smith wrote: > I've attached a file that includes different possibilities. Attaching file. [-- Attachment #2: index-test-cases.info --] [-- Type: text/plain, Size: 1066 bytes --] \x1f Node: Top top node * Menu: * Node 1:: * Regular node:: * Index without tag:: * Index with tag:: \x1f Node: Node 1, Up: Top, Next: Regular node node 1 \x1f Node: Regular node, Next: Index without tag, Up: Top This node is not an index. * Menu: * a2:Node 1. * a1:Node 1. :comment * a1:Node 1. comment * aaa::bbb:Node 1. (line 2) * :aaa::bbb:Node 1. (line 2) * ::Node 1. (line 2) * a: b:Node 1. (line 2) \x1f Node: Index without tag, Next: Index with tag, Prev: Regular node, Up: Top "Index" in the node name but no tag. * Menu: * a2:Node 1. * a1:Node 1. :comment * a1:Node 1. comment * aaa::bbb:Node 1. (line 2) * :aaa::bbb:Node 1. (line 2) * ::Node 1. (line 2) * a: b:Node 1. (line 2) \x1f Node: Index with tag, Prev: Index without tag, Up: Top \0\b[index\0\b] Note this index tag is needed for the index entry to be properly parsed. * Menu: * a2:Node 1. * a1:Node 1. :comment * a1:Node 1. comment * aaa::bbb:Node 1. (line 2) * :aaa::bbb:Node 1. (line 2) * ::Node 1. (line 2) * a: b:Node 1. (line 2) ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries [not found] ` <20190111194631.GA14925@darkstar> 2019-01-11 19:49 ` Gavin Smith @ 2019-01-13 0:55 ` Juri Linkov 1 sibling, 0 replies; 11+ messages in thread From: Juri Linkov @ 2019-01-13 0:55 UTC (permalink / raw) To: Gavin Smith; +Cc: 34023, bug-texinfo >> The following patch handles the cases that you presented, >> but it's hard to predict what other cases it might break. >> >> Do you have a sample test file that covers different cases? >> We could add such file to Emacs regression tests. > > I've attached a file that includes different possibilities. Thanks. >> I have to say that the current regexp-based parsing is >> an inherently fragile approach. Do you think it would be possible >> to add more markup to Info files instead of relying on regexps? > > I don't understand. Whatever markup is added has to be read somehow, > with regexp or other. This is a hint for using more XML-like markup languages with more reliable parsing. >> Better yet would be to read Info manual in HTML format in Info reader. >> That would allow extracting all information unambiguously. > > That would be a different project with several unresolved questions; this > could be the way forward in the long term. I would be opposed to making > the standalone info program read HTML as this would be a complete > rewrite of the program and there are probably better ways of dealing > with it. Maybe not rewrite, but just adding a HTML "add-on" to the info program. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-09 21:14 Support double colons in Info index entries Gavin Smith 2019-01-11 0:04 ` bug#34023: " Juri Linkov @ 2019-01-11 0:53 ` Glenn Morris 2019-01-11 20:13 ` Gavin Smith 1 sibling, 1 reply; 11+ messages in thread From: Glenn Morris @ 2019-01-11 0:53 UTC (permalink / raw) To: Gavin Smith; +Cc: 34023, bug-texinfo Gavin Smith wrote: > This is what is implemented in the standalone info browser (since > change on 2017-04-08). "Defining the Entries of an Index" in the Texinfo manual continues to say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry". ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-11 0:53 ` Glenn Morris @ 2019-01-11 20:13 ` Gavin Smith 2019-01-11 20:14 ` Gavin Smith ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Gavin Smith @ 2019-01-11 20:13 UTC (permalink / raw) To: Glenn Morris; +Cc: 34023, bug-texinfo [-- Attachment #1: Type: text/plain, Size: 728 bytes --] On Thu, Jan 10, 2019 at 07:53:52PM -0500, Glenn Morris wrote: > Gavin Smith wrote: > > > This is what is implemented in the standalone info browser (since > > change on 2017-04-08). > > "Defining the Entries of an Index" in the Texinfo manual continues to > say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry". Even if Info mode and the standalone Info browser are changed to support colons in index entries, people running older versions of these won't be able to read them. However, texi2any does output the colon in the index entry without complaint. See attached Texinfo input and Info output. Newer versions of 'info' can deal with the colons in the index entries that are output here. [-- Attachment #2: colon-index.info --] [-- Type: text/plain, Size: 924 bytes --] This is colon-index.info, produced by texi2any version 6.5.90 from colon-index.texi. \x1f File: colon-index.info, Node: Top, Next: One, Up: (dir) * Menu: * One:: * Concept Index:: \x1f File: colon-index.info, Node: One, Next: Concept Index, Prev: Top, Up: Top node one \x1f File: colon-index.info, Node: Concept Index, Prev: One, Up: Top \0\b[index\0\b] * Menu: * :: One. (line 3) * :a: One. (line 3) * b:c: One. (line 3) * d::e: One. (line 3) * f :d: One. (line 3) * g: h: One. (line 3) \x1f Tag Table: Node: Top\x7f86 Node: One\x7f184 Node: Concept Index\x7f276 \x1f End Tag Table [-- Attachment #3: colon-index.texi --] [-- Type: application/x-texinfo, Size: 203 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-11 20:13 ` Gavin Smith @ 2019-01-11 20:14 ` Gavin Smith 2019-01-11 20:32 ` Glenn Morris [not found] ` <hek1jbkk9o.fsf@fencepost.gnu.org> 2 siblings, 0 replies; 11+ messages in thread From: Gavin Smith @ 2019-01-11 20:14 UTC (permalink / raw) To: Glenn Morris, bug-texinfo, 34023 On Fri, Jan 11, 2019 at 08:13:23PM +0000, Gavin Smith wrote: > On Thu, Jan 10, 2019 at 07:53:52PM -0500, Glenn Morris wrote: > > Gavin Smith wrote: > > > > > This is what is implemented in the standalone info browser (since > > > change on 2017-04-08). > > > > "Defining the Entries of an Index" in the Texinfo manual continues to > > say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry". > > Even if Info mode and the standalone Info browser are changed to > support colons in index entries, people running older versions of these > won't be able to read them. However, texi2any does output the colon in > the index entry without complaint. See attached Texinfo input and Info > output. Newer versions of 'info' can deal with the colons in the index > entries that are output here. > There should still be a warning about this in the Texinfo manual, but it could be toned down. ^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#34023: Support double colons in Info index entries 2019-01-11 20:13 ` Gavin Smith 2019-01-11 20:14 ` Gavin Smith @ 2019-01-11 20:32 ` Glenn Morris [not found] ` <hek1jbkk9o.fsf@fencepost.gnu.org> 2 siblings, 0 replies; 11+ messages in thread From: Glenn Morris @ 2019-01-11 20:32 UTC (permalink / raw) To: Gavin Smith; +Cc: 34023, bug-texinfo Gavin Smith wrote: > Even if Info mode and the standalone Info browser are changed to > support colons in index entries, people running older versions of these > won't be able to read them. Sure. However, if Texinfo is intending to support them from version X, IMO it should document that. > However, texi2any does output the colon in the index entry without > complaint. Personally I think this is a bug, but Texinfo's previous maintainer disagreed about what warnings were appropriate. http://lists.gnu.org/r/bug-texinfo/2014-02/msg00029.html ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <hek1jbkk9o.fsf@fencepost.gnu.org>]
* bug#34023: Support double colons in Info index entries [not found] ` <hek1jbkk9o.fsf@fencepost.gnu.org> @ 2019-01-16 19:17 ` Gavin Smith 0 siblings, 0 replies; 11+ messages in thread From: Gavin Smith @ 2019-01-16 19:17 UTC (permalink / raw) To: Glenn Morris; +Cc: 34023, bug-texinfo On Fri, Jan 11, 2019 at 03:32:35PM -0500, Glenn Morris wrote: > Gavin Smith wrote: > > > Even if Info mode and the standalone Info browser are changed to > > support colons in index entries, people running older versions of these > > won't be able to read them. > > Sure. However, if Texinfo is intending to support them from version X, > IMO it should document that. I changed the wording a bit in git revision 3381bcb. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-01-16 19:17 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-09 21:14 Support double colons in Info index entries Gavin Smith 2019-01-11 0:04 ` bug#34023: " Juri Linkov 2019-01-11 0:28 ` Drew Adams 2019-01-11 19:46 ` Gavin Smith [not found] ` <20190111194631.GA14925@darkstar> 2019-01-11 19:49 ` Gavin Smith 2019-01-13 0:55 ` Juri Linkov 2019-01-11 0:53 ` Glenn Morris 2019-01-11 20:13 ` Gavin Smith 2019-01-11 20:14 ` Gavin Smith 2019-01-11 20:32 ` Glenn Morris [not found] ` <hek1jbkk9o.fsf@fencepost.gnu.org> 2019-01-16 19:17 ` Gavin Smith
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).