unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* Support double colons in Info index entries
@ 2019-01-09 21:14 Gavin Smith
  2019-01-11  0:04 ` bug#34023: " Juri Linkov
  2019-01-11  0:53 ` Glenn Morris
  0 siblings, 2 replies; 11+ messages in thread
From: Gavin Smith @ 2019-01-09 21:14 UTC (permalink / raw)
  To: bug-gnu-emacs; +Cc: bug-texinfo

Emacs version checked: 26.1.

In the Info format colons are special, and for this reason, there is 
limited support for colons in index entries.  The Emacs Info mode 
supports single colons in index entries as long as they are not followed 
by a space.

There is this comment at the start of info.el:

;; Note that nowadays we expect Info files to be made using makeinfo.
;; In particular we make these assumptions:
;;  - a menu item MAY contain colons but not colon-space ": "
;;  - a menu item ending with ": " (but not ":: ") is an index entry
;;  - a node name MAY NOT contain a colon
;; This distinction is to support indexing of computer programming
;; language terms that may contain ":" but not ": ".

It doesn't state it, but when I tested it double colons don't work even 
if they are not followed by a space.

There is a fairly simple solution to this problem that I haven't seen 
suggested in all the messages posted on this topic in the mailing list 
archives. In index nodes only (which have a special marker included, 
^@^H[index^@^H]), use a colon to terminate the text of the index entry, 
but instead of looking for the first colon in the line, look for the 
last.  So this entry:

* a::b:  a colon b.  (line 129)

would refer to line 129 of the node "a colon b".  This is possible 
because node names cannot contain colons.  This restriction is not too 
important, whereas the inability to index items containing colons is 
quite important.  This is what is implemented in the standalone info 
browser (since change on 2017-04-08).

This change shouldn't be made for all nodes, because the comment after 
the closing '.' could contain a colon:

* label: node.  comment: with a colon.

This shouldn't be interpreted as refering to a node "with a colon".

However, the "(line ...)" comment can't contain a colon.

I'm not familiar with Emacs Lisp enough to propose a patch to implement 
this change myself.

The standalone info program also implemented a quoting mechanism 
(surrounding the text with a pair of 0x7F bytes) to allow nearly all 
characters to be included in node names and index entries.  This has 
never been implemented in Emacs Info and has never been used by default 
in texi2any's output.  I think my suggestion above would be sufficient 
and would work with existing Info files and versions of 
texi2any/makeinfo without anything breaking.  The quoting mechanism could 
potentially be removed from texi2any and info as nobody has ever used it 
and it makes things more complicated for no reason.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-09 21:14 Support double colons in Info index entries Gavin Smith
@ 2019-01-11  0:04 ` Juri Linkov
  2019-01-11  0:28   ` Drew Adams
                     ` (2 more replies)
  2019-01-11  0:53 ` Glenn Morris
  1 sibling, 3 replies; 11+ messages in thread
From: Juri Linkov @ 2019-01-11  0:04 UTC (permalink / raw)
  To: Gavin Smith; +Cc: 34023, bug-texinfo

[-- Attachment #1: Type: text/plain, Size: 2186 bytes --]

Hi Gavin,

> In the Info format colons are special, and for this reason, there is 
> limited support for colons in index entries.  The Emacs Info mode 
> supports single colons in index entries as long as they are not followed 
> by a space.

Thanks for the detailed description.

> It doesn't state it, but when I tested it double colons don't work even 
> if they are not followed by a space.
>
> There is a fairly simple solution to this problem that I haven't seen 
> suggested in all the messages posted on this topic in the mailing list 
> archives. In index nodes only (which have a special marker included, 
> ^@^H[index^@^H]), use a colon to terminate the text of the index entry, 
> but instead of looking for the first colon in the line, look for the 
> last.  So this entry:
>
> * a::b:  a colon b.  (line 129)
>
> would refer to line 129 of the node "a colon b".  This is possible 
> because node names cannot contain colons.  This restriction is not too 
> important, whereas the inability to index items containing colons is 
> quite important.  This is what is implemented in the standalone info 
> browser (since change on 2017-04-08).

The following patch handles the cases that you presented,
but it's hard to predict what other cases it might break.

Do you have a sample test file that covers different cases?
We could add such file to Emacs regression tests.

> This change shouldn't be made for all nodes, because the comment after 
> the closing '.' could contain a colon:
>
> * label: node.  comment: with a colon.
>
> This shouldn't be interpreted as refering to a node "with a colon".
>
> However, the "(line ...)" comment can't contain a colon.

The following change is made only for index nodes.

I have to say that the current regexp-based parsing is
an inherently fragile approach.  Do you think it would be possible
to add more markup to Info files instead of relying on regexps?

Like index nodes having a special marker ^@^H[index^@^H]
maybe adding some markers to identify index entries,
node references, line numbers?

Better yet would be to read Info manual in HTML format in Info reader.
That would allow extracting all information unambiguously.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: info.el.support-double-colons-in-Info-index-entries.patch --]
[-- Type: text/x-diff, Size: 2523 bytes --]

diff --git a/lisp/info.el b/lisp/info.el
index 6038273c37..2f7e293297 100644
--- a/lisp/info.el
+++ b/lisp/info.el
@@ -2664,9 +2664,15 @@ Info-menu-entry-name-re
 Because of ambiguities, this should be concatenated with something like
 `:' and `Info-following-node-name-re'.")
 
+(defconst Info-index-entry-name-re "\\(?:[^:]\\|:[^,.;() \t\n]\\)*"
+  "Regexp that matches an index entry name possibly including a colon.")
+
 (defun Info-extract-menu-node-name (&optional multi-line index-node)
   (skip-chars-forward " \t\n")
-  (when (looking-at (concat Info-menu-entry-name-re ":\\(:\\|"
+  (when (looking-at (concat (if index-node
+                                Info-index-entry-name-re
+                                Info-menu-entry-name-re
+                              ) ":\\(:\\|"
 			    (Info-following-node-name-re
                              (cond
                               (index-node "^,\t\n")
@@ -2741,7 +2747,9 @@ Info-complete-menu-item
          (t
           (let ((pattern (concat "\n\\* +\\("
                                  (regexp-quote string)
-                                 Info-menu-entry-name-re "\\):"
+                                 (if (Info-index-node)
+                                     Info-index-entry-name-re
+                                   Info-menu-entry-name-re) "\\):"
                                  Info-node-spec-re))
                 completions
                 (complete-nodes Info-complete-nodes))
@@ -3966,7 +3974,8 @@ Info-try-follow-nearest-node
 	      (setq node t))
 	  (setq node nil))))
      ;; menu item: node name
-     ((setq node (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::"))
+     ((setq node (unless (Info-index-node)
+                   (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::")))
       (Info-goto-node node fork))
      ;; menu item: node name or index entry
      ((Info-get-token (point) "\\* +" "\\* +\\(.*\\): ")
@@ -4929,7 +4938,9 @@ Info-fontify-node
         (let ((n 0)
               cont)
           (while (re-search-forward
-                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" Info-menu-entry-name-re "\\)\\(:"
+                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" (if (Info-index-node)
+                                                            Info-index-entry-name-re
+                                                          Info-menu-entry-name-re) "\\)\\(:"
                           Info-node-spec-re "\\([ \t]*\\)\\)\\)")
                   nil t)
 	    (when (match-beginning 1)

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-11  0:04 ` bug#34023: " Juri Linkov
@ 2019-01-11  0:28   ` Drew Adams
  2019-01-11 19:46   ` Gavin Smith
       [not found]   ` <20190111194631.GA14925@darkstar>
  2 siblings, 0 replies; 11+ messages in thread
From: Drew Adams @ 2019-01-11  0:28 UTC (permalink / raw)
  To: Juri Linkov, Gavin Smith; +Cc: 34023, bug-texinfo

> The Emacs Info mode supports single colons in index
> entries as long as they are not followed by a space.

I thought they were verboten altogether.  Does this
mean that we can finally have index entries such as
`:type'?  That would be good.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-09 21:14 Support double colons in Info index entries Gavin Smith
  2019-01-11  0:04 ` bug#34023: " Juri Linkov
@ 2019-01-11  0:53 ` Glenn Morris
  2019-01-11 20:13   ` Gavin Smith
  1 sibling, 1 reply; 11+ messages in thread
From: Glenn Morris @ 2019-01-11  0:53 UTC (permalink / raw)
  To: Gavin Smith; +Cc: 34023, bug-texinfo

Gavin Smith wrote:

> This is what is implemented in the standalone info browser (since
> change on 2017-04-08).

"Defining the Entries of an Index" in the Texinfo manual continues to
say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry".





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-11  0:04 ` bug#34023: " Juri Linkov
  2019-01-11  0:28   ` Drew Adams
@ 2019-01-11 19:46   ` Gavin Smith
       [not found]   ` <20190111194631.GA14925@darkstar>
  2 siblings, 0 replies; 11+ messages in thread
From: Gavin Smith @ 2019-01-11 19:46 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 34023, bug-texinfo

On Fri, Jan 11, 2019 at 02:04:32AM +0200, Juri Linkov wrote:
> The following patch handles the cases that you presented,
> but it's hard to predict what other cases it might break.
> 
> Do you have a sample test file that covers different cases?
> We could add such file to Emacs regression tests.

I've attached a file that includes different possibilities.

> I have to say that the current regexp-based parsing is
> an inherently fragile approach.  Do you think it would be possible
> to add more markup to Info files instead of relying on regexps?

I don't understand.  Whatever markup is added has to be read somehow, 
with regexp or other.

> Better yet would be to read Info manual in HTML format in Info reader.
> That would allow extracting all information unambiguously.

That would be a different project with several unresolved questions; this 
could be the way forward in the long term.  I would be opposed to making 
the standalone info program read HTML as this would be a complete 
rewrite of the program and there are probably better ways of dealing 
with it.


> diff --git a/lisp/info.el b/lisp/info.el
> index 6038273c37..2f7e293297 100644
> --- a/lisp/info.el
> +++ b/lisp/info.el
> @@ -2664,9 +2664,15 @@ Info-menu-entry-name-re
>  Because of ambiguities, this should be concatenated with something like
>  `:' and `Info-following-node-name-re'.")
>  
> +(defconst Info-index-entry-name-re "\\(?:[^:]\\|:[^,.;() \t\n]\\)*"
> +  "Regexp that matches an index entry name possibly including a colon.")
> +
>  (defun Info-extract-menu-node-name (&optional multi-line index-node)
>    (skip-chars-forward " \t\n")
> -  (when (looking-at (concat Info-menu-entry-name-re ":\\(:\\|"
> +  (when (looking-at (concat (if index-node
> +                                Info-index-entry-name-re
> +                                Info-menu-entry-name-re
> +                              ) ":\\(:\\|"
>  			    (Info-following-node-name-re
>                               (cond
>                                (index-node "^,\t\n")
> @@ -2741,7 +2747,9 @@ Info-complete-menu-item
>           (t
>            (let ((pattern (concat "\n\\* +\\("
>                                   (regexp-quote string)
> -                                 Info-menu-entry-name-re "\\):"
> +                                 (if (Info-index-node)
> +                                     Info-index-entry-name-re
> +                                   Info-menu-entry-name-re) "\\):"
>                                   Info-node-spec-re))
>                  completions
>                  (complete-nodes Info-complete-nodes))
> @@ -3966,7 +3974,8 @@ Info-try-follow-nearest-node
>  	      (setq node t))
>  	  (setq node nil))))
>       ;; menu item: node name
> -     ((setq node (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::"))
> +     ((setq node (unless (Info-index-node)
> +                   (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::")))
>        (Info-goto-node node fork))
>       ;; menu item: node name or index entry
>       ((Info-get-token (point) "\\* +" "\\* +\\(.*\\): ")
> @@ -4929,7 +4938,9 @@ Info-fontify-node
>          (let ((n 0)
>                cont)
>            (while (re-search-forward
> -                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" Info-menu-entry-name-re "\\)\\(:"
> +                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" (if (Info-index-node)
> +                                                            Info-index-entry-name-re
> +                                                          Info-menu-entry-name-re) "\\)\\(:"
>                            Info-node-spec-re "\\([ \t]*\\)\\)\\)")
>                    nil t)
>  	    (when (match-beginning 1)






^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
       [not found]   ` <20190111194631.GA14925@darkstar>
@ 2019-01-11 19:49     ` Gavin Smith
  2019-01-13  0:55     ` Juri Linkov
  1 sibling, 0 replies; 11+ messages in thread
From: Gavin Smith @ 2019-01-11 19:49 UTC (permalink / raw)
  To: Juri Linkov, 34023, bug-texinfo

[-- Attachment #1: Type: text/plain, Size: 140 bytes --]

On Fri, Jan 11, 2019 at 07:46:31PM +0000, Gavin Smith wrote:
> I've attached a file that includes different possibilities.

Attaching file.

[-- Attachment #2: index-test-cases.info --]
[-- Type: text/plain, Size: 1066 bytes --]

\x1f
Node: Top

top node

* Menu:

* Node 1::
* Regular node::
* Index without tag::
* Index with tag::

\x1f
Node: Node 1, Up: Top, Next: Regular node

node 1
\x1f
Node: Regular node, Next: Index without tag, Up: Top

This node is not an index.

* Menu:

* a2:Node 1.
* a1:Node 1. :comment
* a1:Node 1. comment
* aaa::bbb:Node 1. (line 2)
* :aaa::bbb:Node 1. (line 2)
* ::Node 1. (line 2)
* a: b:Node 1. (line 2)

\x1f
Node: Index without tag, Next: Index with tag, Prev: Regular node, Up: Top

"Index" in the node name but no tag.

* Menu:

* a2:Node 1.
* a1:Node 1. :comment
* a1:Node 1. comment
* aaa::bbb:Node 1. (line 2)
* :aaa::bbb:Node 1. (line 2)
* ::Node 1. (line 2)
* a: b:Node 1. (line 2)


\x1f
Node: Index with tag, Prev: Index without tag, Up: Top

\0\b[index\0\b]
Note this index tag is needed for the index entry to be properly parsed.

* Menu:

* a2:Node 1.
* a1:Node 1. :comment
* a1:Node 1. comment
* aaa::bbb:Node 1. (line 2)
* :aaa::bbb:Node 1. (line 2)
* ::Node 1. (line 2)
* a: b:Node 1. (line 2)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-11  0:53 ` Glenn Morris
@ 2019-01-11 20:13   ` Gavin Smith
  2019-01-11 20:14     ` Gavin Smith
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Gavin Smith @ 2019-01-11 20:13 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 34023, bug-texinfo

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

On Thu, Jan 10, 2019 at 07:53:52PM -0500, Glenn Morris wrote:
> Gavin Smith wrote:
> 
> > This is what is implemented in the standalone info browser (since
> > change on 2017-04-08).
> 
> "Defining the Entries of an Index" in the Texinfo manual continues to
> say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry".

Even if Info mode and the standalone Info browser are changed to 
support colons in index entries, people running older versions of these 
won't be able to read them.  However, texi2any does output the colon in 
the index entry without complaint.  See attached Texinfo input and Info 
output.  Newer versions of 'info' can deal with the colons in the index 
entries that are output here.


[-- Attachment #2: colon-index.info --]
[-- Type: text/plain, Size: 924 bytes --]

This is colon-index.info, produced by texi2any version 6.5.90 from
colon-index.texi.

\x1f
File: colon-index.info,  Node: Top,  Next: One,  Up: (dir)

* Menu:

* One::
* Concept Index::

\x1f
File: colon-index.info,  Node: One,  Next: Concept Index,  Prev: Top,  Up: Top

node one

\x1f
File: colon-index.info,  Node: Concept Index,  Prev: One,  Up: Top

\0\b[index\0\b]
* Menu:

* ::                                     One.                   (line 3)
* :a:                                    One.                   (line 3)
* b:c:                                   One.                   (line 3)
* d::e:                                  One.                   (line 3)
* f :d:                                  One.                   (line 3)
* g: h:                                  One.                   (line 3)


\x1f
Tag Table:
Node: Top\x7f86
Node: One\x7f184
Node: Concept Index\x7f276
\x1f
End Tag Table

[-- Attachment #3: colon-index.texi --]
[-- Type: application/x-texinfo, Size: 203 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-11 20:13   ` Gavin Smith
@ 2019-01-11 20:14     ` Gavin Smith
  2019-01-11 20:32     ` Glenn Morris
       [not found]     ` <hek1jbkk9o.fsf@fencepost.gnu.org>
  2 siblings, 0 replies; 11+ messages in thread
From: Gavin Smith @ 2019-01-11 20:14 UTC (permalink / raw)
  To: Glenn Morris, bug-texinfo, 34023

On Fri, Jan 11, 2019 at 08:13:23PM +0000, Gavin Smith wrote:
> On Thu, Jan 10, 2019 at 07:53:52PM -0500, Glenn Morris wrote:
> > Gavin Smith wrote:
> > 
> > > This is what is implemented in the standalone info browser (since
> > > change on 2017-04-08).
> > 
> > "Defining the Entries of an Index" in the Texinfo manual continues to
> > say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry".
> 
> Even if Info mode and the standalone Info browser are changed to 
> support colons in index entries, people running older versions of these 
> won't be able to read them.  However, texi2any does output the colon in 
> the index entry without complaint.  See attached Texinfo input and Info 
> output.  Newer versions of 'info' can deal with the colons in the index 
> entries that are output here.
> 

There should still be a warning about this in the Texinfo manual, but it 
could be toned down.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
  2019-01-11 20:13   ` Gavin Smith
  2019-01-11 20:14     ` Gavin Smith
@ 2019-01-11 20:32     ` Glenn Morris
       [not found]     ` <hek1jbkk9o.fsf@fencepost.gnu.org>
  2 siblings, 0 replies; 11+ messages in thread
From: Glenn Morris @ 2019-01-11 20:32 UTC (permalink / raw)
  To: Gavin Smith; +Cc: 34023, bug-texinfo

Gavin Smith wrote:

> Even if Info mode and the standalone Info browser are changed to 
> support colons in index entries, people running older versions of these 
> won't be able to read them.

Sure. However, if Texinfo is intending to support them from version X,
IMO it should document that.

> However, texi2any does output the colon in the index entry without
> complaint.

Personally I think this is a bug, but Texinfo's previous maintainer
disagreed about what warnings were appropriate.

http://lists.gnu.org/r/bug-texinfo/2014-02/msg00029.html





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
       [not found]   ` <20190111194631.GA14925@darkstar>
  2019-01-11 19:49     ` Gavin Smith
@ 2019-01-13  0:55     ` Juri Linkov
  1 sibling, 0 replies; 11+ messages in thread
From: Juri Linkov @ 2019-01-13  0:55 UTC (permalink / raw)
  To: Gavin Smith; +Cc: 34023, bug-texinfo

>> The following patch handles the cases that you presented,
>> but it's hard to predict what other cases it might break.
>>
>> Do you have a sample test file that covers different cases?
>> We could add such file to Emacs regression tests.
>
> I've attached a file that includes different possibilities.

Thanks.

>> I have to say that the current regexp-based parsing is
>> an inherently fragile approach.  Do you think it would be possible
>> to add more markup to Info files instead of relying on regexps?
>
> I don't understand.  Whatever markup is added has to be read somehow,
> with regexp or other.

This is a hint for using more XML-like markup languages with more
reliable parsing.

>> Better yet would be to read Info manual in HTML format in Info reader.
>> That would allow extracting all information unambiguously.
>
> That would be a different project with several unresolved questions; this
> could be the way forward in the long term.  I would be opposed to making
> the standalone info program read HTML as this would be a complete
> rewrite of the program and there are probably better ways of dealing
> with it.

Maybe not rewrite, but just adding a HTML "add-on" to the info program.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#34023: Support double colons in Info index entries
       [not found]     ` <hek1jbkk9o.fsf@fencepost.gnu.org>
@ 2019-01-16 19:17       ` Gavin Smith
  0 siblings, 0 replies; 11+ messages in thread
From: Gavin Smith @ 2019-01-16 19:17 UTC (permalink / raw)
  To: Glenn Morris; +Cc: 34023, bug-texinfo

On Fri, Jan 11, 2019 at 03:32:35PM -0500, Glenn Morris wrote:
> Gavin Smith wrote:
> 
> > Even if Info mode and the standalone Info browser are changed to 
> > support colons in index entries, people running older versions of these 
> > won't be able to read them.
> 
> Sure. However, if Texinfo is intending to support them from version X,
> IMO it should document that.

I changed the wording a bit in git revision 3381bcb.





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-16 19:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-09 21:14 Support double colons in Info index entries Gavin Smith
2019-01-11  0:04 ` bug#34023: " Juri Linkov
2019-01-11  0:28   ` Drew Adams
2019-01-11 19:46   ` Gavin Smith
     [not found]   ` <20190111194631.GA14925@darkstar>
2019-01-11 19:49     ` Gavin Smith
2019-01-13  0:55     ` Juri Linkov
2019-01-11  0:53 ` Glenn Morris
2019-01-11 20:13   ` Gavin Smith
2019-01-11 20:14     ` Gavin Smith
2019-01-11 20:32     ` Glenn Morris
     [not found]     ` <hek1jbkk9o.fsf@fencepost.gnu.org>
2019-01-16 19:17       ` Gavin Smith

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).