* [PATCH 0/2] Add 'snarfi' guild script to convert SRFI HTML specifications to Texinfo
@ 2023-12-03 16:37 Maxim Cournoyer
2023-12-03 16:37 ` [PATCH 1/2] .dir-locals: Add indentation rule for sxml-match syntax Maxim Cournoyer
2023-12-03 16:37 ` [PATCH 2/2] scripts: Add SRFI documentation HTML -> Texinfo snarfer Maxim Cournoyer
0 siblings, 2 replies; 3+ messages in thread
From: Maxim Cournoyer @ 2023-12-03 16:37 UTC (permalink / raw)
To: guile-devel; +Cc: Maxim Cournoyer
Due to challenges in the unspecified HTML format used by the various
SRFI authors, the result is not perfect, but it's still a great help
when adapting the specification HTML into Texinfo. It requires
guile-lib for its htmlprag HTML parser, which is autoloaded to avoid
introducing a hard dependency on it.
Maxim Cournoyer (2):
.dir-locals: Add indentation rule for sxml-match syntax.
scripts: Add SRFI documentation HTML -> Texinfo snarfer.
.dir-locals.el | 3 +-
NEWS | 11 +
am/bootstrap.am | 1 +
module/scripts/snarfi.scm | 637 ++++++++++++++++++++++++++++++++++++++
4 files changed, 651 insertions(+), 1 deletion(-)
create mode 100644 module/scripts/snarfi.scm
base-commit: d8df317bafcdd9fcfebb636433c4871f2fab28b2
--
2.41.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] .dir-locals: Add indentation rule for sxml-match syntax.
2023-12-03 16:37 [PATCH 0/2] Add 'snarfi' guild script to convert SRFI HTML specifications to Texinfo Maxim Cournoyer
@ 2023-12-03 16:37 ` Maxim Cournoyer
2023-12-03 16:37 ` [PATCH 2/2] scripts: Add SRFI documentation HTML -> Texinfo snarfer Maxim Cournoyer
1 sibling, 0 replies; 3+ messages in thread
From: Maxim Cournoyer @ 2023-12-03 16:37 UTC (permalink / raw)
To: guile-devel; +Cc: Maxim Cournoyer
* .dir-locals.el (scheme-mode) <sxml-match>: Add indentation rule.
---
.dir-locals.el | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/.dir-locals.el b/.dir-locals.el
index 908670479..a96408dc1 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -48,7 +48,8 @@
(eval . (put '$kclause 'scheme-indent-function 1))
(eval . (put '$fun 'scheme-indent-function 1))
(eval . (put 'record-case 'scheme-indent-function 1))
- (eval . (put 'syntax-parameterize 'scheme-indent-function 1))))
+ (eval . (put 'syntax-parameterize 'scheme-indent-function 1))
+ (eval . (put 'sxml-match 'scheme-indent-function 1))))
(emacs-lisp-mode . ((indent-tabs-mode . nil)))
(texinfo-mode . ((indent-tabs-mode . nil)
(fill-column . 72))))
--
2.41.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH 2/2] scripts: Add SRFI documentation HTML -> Texinfo snarfer.
2023-12-03 16:37 [PATCH 0/2] Add 'snarfi' guild script to convert SRFI HTML specifications to Texinfo Maxim Cournoyer
2023-12-03 16:37 ` [PATCH 1/2] .dir-locals: Add indentation rule for sxml-match syntax Maxim Cournoyer
@ 2023-12-03 16:37 ` Maxim Cournoyer
1 sibling, 0 replies; 3+ messages in thread
From: Maxim Cournoyer @ 2023-12-03 16:37 UTC (permalink / raw)
To: guile-devel; +Cc: Maxim Cournoyer
* module/scripts/snarfi.scm: New file.
* am/bootstrap.am (SOURCES): Register it.
* NEWS: Add news entry.
---
NEWS | 11 +
am/bootstrap.am | 1 +
module/scripts/snarfi.scm | 637 ++++++++++++++++++++++++++++++++++++++
3 files changed, 649 insertions(+)
create mode 100644 module/scripts/snarfi.scm
diff --git a/NEWS b/NEWS
index b319404d7..db8930ff9 100644
--- a/NEWS
+++ b/NEWS
@@ -21,6 +21,17 @@ definitely unused---this is notably the case for modules that are only
used at macro-expansion time, such as (srfi srfi-26). In those cases,
the compiler reports it as "possibly unused".
+
+** New guild command: snarfi
+
+The new ~snarfi~ guild command aims to make importing a SRFI
+specification documentation easier, by snarfing relevant documentation
+into Texinfo. It can be invoked, for example, like:
+
+ $ guild snarfi srfi-151.html
+
+Where the last argument is the SRFI specification source HTML file.
+
* Bug fixes
** (ice-9 suspendable-ports) incorrect UTF-8 decoding
diff --git a/am/bootstrap.am b/am/bootstrap.am
index a71946958..8782a7a82 100644
--- a/am/bootstrap.am
+++ b/am/bootstrap.am
@@ -307,6 +307,7 @@ SOURCES = \
scripts/api-diff.scm \
scripts/read-rfc822.scm \
scripts/snarf-guile-m4-docs.scm \
+ scripts/snarfi.scm \
scripts/autofrisk.scm \
scripts/scan-api.scm \
\
diff --git a/module/scripts/snarfi.scm b/module/scripts/snarfi.scm
new file mode 100644
index 000000000..10420d4f0
--- /dev/null
+++ b/module/scripts/snarfi.scm
@@ -0,0 +1,637 @@
+;;; snarfi --- Snarf SRFI HTML specifications into Texinfo doc
+
+;; Copyright 2023 Free Software Foundation, Inc.
+;;
+;; This program is free software; you can redistribute it and/or
+;; modify it under the terms of the GNU Lesser General Public License
+;; as published by the Free Software Foundation; either version 3, or
+;; (at your option) any later version.
+;;
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; Lesser General Public License for more details.
+;;
+;; You should have received a copy of the GNU Lesser General Public
+;; License along with this software; see the file COPYING.LESSER. If
+;; not, write to the Free Software Foundation, Inc., 51 Franklin
+;; Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+;;; Author: Maxim Cournoyer <maxim.cournoyer@gmail.com>
+
+;;; Commentary:
+
+;;; Usage: guild snarfi srfi-spec.html
+
+;;; This script takes the a SRFI HTML source file as input, parses it
+;;; and processes it to output a Texinfo that can be used as a starting
+;;; point to properly document a SRFI into Guile.
+;;;
+;;; Requirements: guile-lib (for htmlprag)
+
+;;; Tested with:
+;;; - srfi-64.html
+;;; - srfi-151.html
+;;; - srfi-160.html
+;;; - srfi-178.html
+;;; - srfi-209.html
+
+;;; Code:
+
+(define-module (scripts snarfi)
+ #:use-module (ice-9 format)
+ #:use-module (ice-9 match)
+ #:use-module (ice-9 pretty-print)
+ #:use-module (ice-9 regex)
+ #:use-module (ice-9 string-fun)
+ #:use-module (srfi srfi-1)
+ #:use-module (srfi srfi-2)
+ #:use-module (srfi srfi-26)
+ #:use-module (sxml apply-templates)
+ #:use-module (sxml fold)
+ #:use-module ((sxml xpath) #:renamer (lambda (s)
+ (if (eq? 'filter s)
+ 'xfilter
+ s)))
+ #:use-module (sxml match)
+ #:autoload (htmlprag) (html->shtml)
+ #:export (snarfi))
+
+(define %summary "Snarf SRFI HTML specifications into Texinfo doc")
+
+(define (usage)
+ "Display usage text."
+ (format #t "Usage: snarfi SRFI-SPEC.HTML
+Snarf Texinfo documentation from SRFI-SPEC.HTML, the HTML source file
+of a SRFI specification.
+
+-h, --help print this help message
+"))
+
+(define (section-of-interest? node entered?)
+ "XPath predicate to check if a h1 header is of interest.
+ENTERED? tracks whether we've already entered a section of interest."
+ (sxml-match node
+ ((h1 ,title)
+ (member title '("Abstract" "Rationale" "Specification")))
+ (,otherwise
+ entered?)))
+
+(define (find-heading-by-text text tree)
+ "Return the heading node by its TEXT."
+ (find (lambda (node)
+ (and (pair? node)
+ (member (car node) '(h1 h2 h3 h4 h5 h6))
+ (string=? text (last node))))
+ tree))
+
+(define (node->level node)
+ "Return the heading level of NODE, else #f."
+ (and-let* ((h (cond ((pair? node) (car node))
+ ((symbol? node) node)
+ (else #f)))
+ (h* (symbol->string h))
+ (m (string-match "^h([0-6])$" h*)))
+ (string->number (match:substring m 1))))
+
+(define (decrement-heading node)
+ "Decrement heading level of NODE, if a heading, else leave it untouched."
+ (sxml-match node
+ ((h2 ,value)
+ `(h1 ,value))
+ ((h3 ,value)
+ `(h2 ,value))
+ ((h4 ,value)
+ `(h3 ,value))
+ ((h5 ,value)
+ `(h4 ,value))
+ ((h6 ,value)
+ `(h5 ,value))
+ (,other other)))
+
+(define (splice-children heading tree)
+ "Discard any HEADING section and its orphaned text (between it and the
+next heading), a symbol like e.g. 'h1' or '(h2 \"Some Section\"),
+promoting its children sections in its place."
+ (unless (or (pair? heading)
+ (symbol? heading))
+ (error "heading must be a symbol or a pair"))
+
+ (define heading-level (node->level heading))
+
+ (let loop ((result '())
+ (rest tree)
+ (discard? #f)
+ (entered? #f))
+ (match rest
+ (() (reverse result))
+ ((node . tail)
+ (let* ((level (node->level node))
+ (at-section? (if (pair? heading)
+ (equal? heading node)
+ ;; heading is a symbol
+ (and (pair? node)
+ (eq? heading (car node)))))
+ (discard? (if discard?
+ (not level) ;stop discarding?
+ at-section?))
+ (leaving-section? (and entered?
+ (and level (<= level heading-level)))))
+ (loop (cond ((or at-section? ;skip section
+ discard?) ;discard orphaned node
+ result)
+ (entered? ;decrement child headings
+ (cons (decrement-heading node) result))
+ (else ;leave untouched
+ (cons node result)))
+ tail
+ discard?
+ (if entered?
+ (not leaving-section?) ;leave section?
+ at-section?))))))) ;enter section
+
+\f
+;;;
+;;; HTML conversion related.
+;;;
+
+(define (dl->@table node)
+ "Transform a <dl> SHTML NODE into a Texinfo @table."
+ (define (dl-node->alist node)
+ (let loop ((result '())
+ (current-terms '())
+ (current-descriptions '())
+ (rest (sxml-match node
+ ((dl . ,rest)
+ rest)
+ (,other (error "expected dl node, got" node)))))
+ (match rest
+ (()
+ ;; Produce last entry.
+ (reverse (cons (cons (reverse current-terms)
+ (reverse current-descriptions))
+ result)))
+ (((? string?) . rest)
+ ;; Disregard any interspersed strings, which are typically used
+ ;; for spacing purposes.
+ (loop result current-terms current-descriptions rest))
+ ((('dt term ...) . rest)
+ (let ((texi-term (string-join (map html->texinfo term) "")))
+ (if (null? current-descriptions)
+ (loop result
+ (cons texi-term current-terms)
+ current-descriptions rest)
+ ;; Produce last table item/description pair, if description was
+ ;; set.
+ (loop (cons (cons (reverse current-terms)
+ (reverse current-descriptions)) result)
+ (list texi-term) '() rest))))
+ ((('dd description ...) . rest)
+ (when (null? current-terms)
+ (error "malformed dl HTML"))
+ (loop result current-terms
+ (cons (string-join (map html->texinfo description) "")
+ current-descriptions)
+ rest)))))
+
+ (format #f "\
+@table @asis
+~{~a~}\
+@end table~%" (map (match-lambda
+ ;; Each dl "row" can have multiple terms and descriptions.
+ ((terms . descriptions)
+ (match terms
+ ((term term* ..1)
+ (format #f "\
+@item ~a
+~{@itemx ~a~^~%~}
+~{~a~}~%~%" term term* descriptions))
+ ((term)
+ (format #f "\
+@item ~a
+~{~a~}~%~%" term descriptions)))))
+ (dl-node->alist node))))
+
+(define* (html->texinfo node #:key srfi deffn?)
+ "A HTML node to Texinfo converter, applied recursively to NODE, with the
+SRFI string prefix, e.g. \"SRFI 151\", used to produce unique @node for
+sections. DEFFN? indicates whether conversion rules for use with
+'@deffn' nodes should be used."
+ (sxml-match node
+ ((h1 ,value)
+ (format #f "
+@node ~a ~a
+@subsubsection ~a ~a~%~%" srfi (string-replace-substring value "," "")
+srfi value))
+ ((h2 ,value)
+ (format #f "~%@subsubheading ~a~%~%"
+ value))
+ ;; Procedure definitions in the style of SRFI 151.
+ ((p (tt ,proc) . ,rest) (guard (and (string-prefix? "(" proc)
+ (member '(tt ")") rest)))
+ (node->definitions node #:style-hint 'srfi-151))
+ ;; Procedure definitions in the '<pre>' style of SRFI 64.
+ ((pre "(" (b ,proc) . ,rest) (guard (or (member ")" rest)
+ (member ")\n" rest)
+ (member "])\n" rest)
+ (member " ...)\n" rest)))
+ (node->definitions node #:style-hint 'srfi-64-pre))
+ ;; Procedure definitions in the '<code>' style of SRFI 64. The
+ ;; signature usually looks like (p (code "(" (b "proc") ...)), but
+ ;; since there can be trailing text describing the code element, we
+ ;; use the following catchall pattern with a guard:
+ ((p . ,rest) (guard (find (match-lambda
+ (('code "(" ('b proc) . rest)
+ #t)
+ (('code "(" ('var proc) . rest)
+ #t)
+ (_ #f))
+ rest))
+ (node->definitions node #:style-hint 'srfi-64-code))
+ ((p . ,rest) (guard (find (match-lambda
+ (('code (and (? string?)
+ (? (cut string-prefix? "(" <>)))
+ . rest)
+ (any (lambda (x)
+ (and (string? x)
+ (string-contains x ") ->")))
+ rest))
+ (_ #f))
+ rest))
+ (node->definitions node #:style-hint 'srfi-160-code))
+ ((p (code ,proc) . ,rest) (guard (and (string-prefix? "(" proc)
+ (member '(code ")") rest)))
+ (node->definitions node #:style-hint 'srfi-178-code))
+ ((p . ,rest) (guard (find (match-lambda
+ (('code proc . rest)
+ (and (string-prefix? "(" proc)
+ (any (cut string-suffix? ") -" <>)
+ (filter string? rest))))
+ (_ #f))
+ rest))
+ (node->definitions node #:style-hint 'srfi-178-code-with-return-type))
+ ((p ,(node) ...)
+ (string-join `("\n" ,@node "\n") ""))
+ ((small ,(node) ...)
+ (format #f "~{~a~}" node))
+ ((pre ,(node) ...)
+ (format #f "~%@lisp
+~{~a~}@end lisp~%" node))
+ ((a (@ (href ,href)) ,title ...)
+ (format #f "@url{~a, ~{~a~^ ~}}" href (map string-trim-both title)))
+ ((b ,value)
+ (format #f "@b{~a}" value))
+ ((br) "\n")
+ ((code (var ,value))
+ (format #f "@var{~a}" value))
+ ((code ,(node) ...)
+ (format #f "@code{~{~a~}}" node))
+ ((*COMMENT* ,value)
+ (string-append "@c " (string-replace-substring
+ (string-trim-both value) "\n" "\n@c ")))
+ ((dfn ,value)
+ (format #f "@dfn{~a}" value))
+ ((*ENTITY* "additional" "copy")
+ "@copyright{}")
+ ((*ENTITY* "additional" "mdash")
+ "---") ;em dash
+ ((*ENTITY* "additional" "nbsp")
+ "@tie{}") ;non-breakable space
+ ((*ENTITY* "additional" "rArr") ;rightwards double arrow
+ "@U{21D2}")
+ ((*ENTITY* "additional-char" ,value)
+ (format #f "@U{~x}" (string->number value)))
+ ((i ,(node) ...)
+ (format #f "@i{~{~a~}}" node))
+ ((q ,value)
+ (format #f "``~a''" value))
+ ((em ,(node) ...)
+ (guard deffn?)
+ (format #f "~{~a~^ ~}" node))
+ ((em ,(node) ...)
+ (guard (not deffn?))
+ (format #f "@emph{~{~a~}}" node))
+ ((dl . ,rest)
+ (dl->@table node))
+ ((ol (li ,(node) ...) ...)
+ (format #f "~%@enumerate
+~{@item~%~{~a~}~^~%~%~}
+@end enumerate~%" node))
+ ((ul (li ,(node) ...) ...)
+ (format #f "~%@itemize
+~{@item~%~{~a~}~^~%~%~}
+@end itemize~%" node))
+ ((sup ,value)
+ (format #f "@sup{~a}" value))
+ ((sub ,value)
+ (format #f "@sub{~a}" value))
+ ((tt ,value)
+ (format #f "@code{~a}" value))
+ ((var ,value)
+ (format #f "@var{~a}" value))
+ (,other other)))
+
+(define (process-args args)
+ "Process the arguments to have them in the format expected by Texinfo's
+@deffn, stripping extraneous metadata. SRFI is a string such as \"SRFI
+64\", used to prefix Texinfo nodes."
+ (reverse
+ ;; Convert SHTML to Texinfo, and coalesce brackets into arguments.
+ (match (fold
+ (lambda (node tree) ;fhere
+ (let ((new (string-replace-substring
+ (html->texinfo node #:deffn? #t)
+ ;; Strip closing quote (SRFI 160).
+ ") -> " "-> ")))
+ (match tree
+ (("[" . rest)
+ (cons (string-append "[" new) rest))
+ (("[[" . rest)
+ (cons (string-append "[[" new) rest))
+ (("]" . (s . tail))
+ (cons* new (string-append s "]") tail))
+ ;; SRFI 160.
+ (((and (? string? s1) (? (cut string-prefix? "]]" <>)))
+ . (s2 . tail))
+ (cons* new (string-trim-both (string-drop s1 2))
+ (string-append s2 "]]") tail))
+ (((and (? string? s1) (? (cut string-prefix? "]" <>)))
+ . (s2 . tail))
+ (cons* new (string-trim-both (string-drop s1 1))
+ (string-append s2 "]") tail))
+ ;; SRFI 178.
+ (((and (? string? s1) (? (cut string-suffix? "[" <>)))
+ . tail)
+ (cons* (string-append "[" new)
+ (string-append (string-trim-both
+ (string-drop-right s1 1)))
+ tail))
+ (() (cons new tree))
+ (other (cons new other)))))
+ '()
+ args)
+ ;; Post-process for a potential trailing ']'.
+ (("]" . (s . tail))
+ (cons (string-append s "]") tail))
+ (((and (? string? s1) (? (cut string-prefix? "]]" <>))) . (s2 . tail))
+ (cons* (string-trim-both (string-drop s1 2))
+ (string-append s2 "]]") tail))
+ (((and (? string? s1) (? (cut string-prefix? "]" <>))) . (s2 . tail))
+ (cons* (string-trim-both (string-drop s1 1))
+ (string-append s2 "]") tail))
+ (other other))))
+
+(define (sanitize-definitions-node node)
+ "Recursively sanitize arguments, removing extraneous white space."
+ (first (foldts
+ (lambda (seed node) ;fdown
+ '())
+ (lambda (seed kid-seed node) ;fup
+ (sxml-match node
+ ((*ENTITY* "additional" "nbsp")
+ seed) ;discard
+ ((i ,value)
+ (cons value seed))
+ ((var ,value)
+ (cons value seed))
+ (,other
+ (cons (reverse kid-seed) seed))))
+ (lambda (seed node) ;fhere
+ (let ((node (if (string? node)
+ (string-trim-both node)
+ node)))
+ (match node
+ ((or "..." "…")
+ (cons "@dots{}" seed))
+ ((and (? string?) (? string-null?))
+ seed) ;discard node
+ (other
+ (cons node seed)))))
+ '()
+ node)))
+
+(define* (node->definitions node #:key style-hint)
+ "Convert NODE, an SHTML expression assumed to contain
+procedure definitions, to Texinfo. STYLE-HINT may be used to provide a
+hint about which HTML scheme is being used to format the definitions."
+ ;; This is messy because SRFIs appear to all use their own special
+ ;; variant to format procedure/syntax definitions.
+ (let ((node (sanitize-definitions-node node)))
+
+ (define (deffn proc args continued?)
+ (unless proc
+ (error "expected a string for 'proc', got" proc))
+ (format #f "~%@deffn~:[~;x~] {Scheme Procedure} ~{~a ~^~}~%"
+ continued? (process-args (cons proc args))))
+
+ (if (eq? 'srfi-160-code style-hint)
+ ;; Definitions in SRFI 160 are special in that no grouping
+ ;; information can be inferred, and a return value and SRFI
+ ;; annotation are added at the end.
+ (match node
+ ((p ('code (and (? string? proc) (? (cut string-prefix? "(" <>)))
+ . rest) srfi ...)
+ (string-append (deffn (string-trim-both (string-drop proc 1))
+ (append rest srfi) #f)
+ "\
+@c FIXME: Check deffn category and adjust '@end deffn' location
+@end deffn\n\n")))
+
+ (let loop ((result '())
+ (doc '())
+ (rest (cdr node))
+ (continued? #f)
+ (proc #f)
+ (args '()))
+
+ (pk 'called-loop-with-proc proc)
+ (match rest
+ (()
+ (pk 'closing-deffn-on-empty-list)
+ (pk 'doc doc)
+ ;; The (p (code (b procedure) args ... doc ...)) style used by
+ ;; SRFI-64 allows to correctly close the definition.
+ (let ((result (if (null? doc) ;srfi-64 <code> trailing doc
+ (cons "\
+@c FIXME: Check deffn category and adjust '@end deffn' location
+@end deffn\n\n" result)
+ (cons* "@end deffn\n"
+ (html->texinfo `(p ,@(reverse doc)))
+ result))))
+ (cons "\n" (reverse result))))
+ ((head . tail)
+ (pk 'GOT-HEAD head 'REST tail)
+ (match head
+ ("(" ;skip opening parenthesis
+ (loop result doc tail continued? proc args)) ;no-op
+ ;; SRFI 64 <code> procedure definition node.
+ (('code "(" ((or 'b 'var) p) . rest)
+ (pk 'setting-proc p)
+ ;; Dump accumulated leading documentation, if any.
+ (let ((result (if (null? doc)
+ result
+ (cons (html->texinfo `(p ,@(reverse doc)))
+ result))))
+ (if proc
+ ;; New continued procedure.
+ (loop result '() (append rest tail) #t p '())
+ (loop result '() (append rest tail) continued? p '()))))
+ (('code (and (? string? p)
+ (? (cut string-prefix? "(" <>))) . rest)
+ ;; SRFI 178 style.
+ (pk 'setting-proc p)
+ ;; Dump accumulated leading documentation, if any.
+ (let ((result (if (null? doc)
+ result
+ (cons (html->texinfo `(p ,@(reverse doc)))
+ result)))
+ (p (string-trim-right (string-drop p 1))))
+ (if proc
+ ;; New continued procedure.
+ (loop result '() (append rest tail) #t p '())
+ (loop result '() (append rest tail) continued? p '()))))
+ ((or ('tt ")") ")" ")\n" "])" "])\n" " ...)" " ...)\n"
+ ('code ")")) ;end
+ (let* ((args (if (and (string? head) (string-index head #\]))
+ (cons "]" args)
+ args))
+ (args* (if (and (string? head)
+ (string-prefix? " ..." head))
+ (cons " ..." args)
+ args)))
+ (pk 'calling-deffn-in-end)
+ (loop (cons (deffn proc (reverse args*) continued?) result)
+ doc
+ tail
+ #t ;mark next definitions as continued
+ #f '())))
+ ((and (? string?) (? (cut string-suffix? ") -" <>))) ;end
+ ;; Definition ended but there is a trailing returned type
+ ;; annotation (e.g. "-> bitvector") to add as an argument.
+ (let* ((tail-length (length tail))
+ (args `(,@(if (>= tail-length 2)
+ (list (second tail))
+ '())
+ "->"
+ ,(string-drop-right head 3)
+ ,@args)))
+ (pk 'calling-deffn-in-end)
+ (loop (cons (deffn proc (reverse args) continued?) result)
+ doc
+ (if (>= tail-length 2)
+ (drop tail 2)
+ tail)
+ #t ;mark next definitions as continued
+ #f '())))
+ (('tt (and (? string?) (? (cut string-prefix? "(" <>)) p))
+ (when proc
+ (error "unexpected proc encountered while already defined" p))
+ (let ((p (string-trim-right (string-drop p 1))))
+ (pk 'setting-proc p)
+ (loop result doc tail continued? p '())))
+ (('b p) ;for SRFI 64 <pre>
+ (pk 'setting-proc p)
+ ;; Dump accumulated leading documentation, if any.
+ (let ((result (if (null? doc)
+ result
+ (cons (html->texinfo `(p ,@doc)) result))))
+ (if proc
+ (loop result '() tail #t p '()) ;new continued procedure
+ (loop result '() tail continued? p '()))))
+ (('br) ;continued?
+ (pk 'got-br)
+ (if (eq? 'srfi-64-code style-hint)
+ ;; For SRFI 64 <code> style definitions, this should be a
+ ;; no-op.
+ (loop result doc tail continued? proc args)
+ (loop result doc tail #t #f '())))
+ (other
+ (pk 'got-arg other)
+ (if proc
+ (loop result doc tail continued? proc (cons other args))
+ ;; The argument was seen after proc definition was
+ ;; closed; preserve it as documentation.
+ (begin (pk 'keeping-as-doc other)
+ (loop result (cons other doc)
+ tail continued? proc args)))))))))))
+
+(define (snarfi . args)
+ "snarf documentation from the first argument, a srfi html spec source
+file assumed to follow the basic structure of the SRFI template (see:
+https://srfi.schemers.org/srfi-template.html)."
+ (match args
+ (((or "-h" "--help"))
+ (usage))
+ ((file)
+ (let* ((shtml (call-with-input-file file
+ (cut html->shtml <> #:strict? #t)))
+ (title (first ((sxpath '(// html head title *text*)) shtml)))
+ (srfi (and=> (string-match "SRFI [[:digit:]]+" title)
+ match:substring))
+ (body (first ((sxpath '(// html body)) shtml)))
+ (content-level (or (node->level
+ (find-heading-by-text "Abstract" body))
+ (error "could not locate 'Abstract' section")))
+ (body* (if (> content-level 1)
+ (splice-children (pk 'splicing-with (string->symbol
+ (format #f "h~a" (1- content-level))))
+ body)
+ body))
+ (content (let loop ((result '())
+ (rest body*)
+ (entered? #f))
+ (match rest
+ (() (reverse result))
+ ((node . tail)
+ (if (section-of-interest? node entered?)
+ (loop (cons node result) tail #t)
+ (loop result tail #f))))))
+ (content* (splice-children '(h1 "Specification") content))
+ ;; Sanitize the SHTML to remove spurious "\r\n" or "\n"
+ ;; strings, which would otherwise prevent some of the
+ ;; template patterns to match. Also make some safe
+ ;; substitutions.
+ (scontent (foldts
+ (lambda (seed node) ; fdown
+ '())
+ (lambda (seed kid-seed node) ; fup
+ (match node
+ (('@ ('id id))
+ ;; IDs may appear anywhere and break pattern
+ ;; matching; skip them.
+ seed)
+ (other (cons (reverse kid-seed) seed))))
+ (lambda (seed tree) ; fhere
+ (let ((tree (if (string? tree)
+ ;; Use @dots{} for ellipses.
+ (string-replace-substring
+ ;; Double all '@'s, for Texinfo.
+ (string-replace-substring
+ tree "@" "@@")
+ "..." "@dots{}")
+ tree)))
+ (match tree
+ ((and (? string?)
+ (? (cut string-match
+ "^(\\.?\r?\n|[[:space:]]+)$" <>)))
+ seed) ;skip white space atoms
+ (other (cons tree seed)))))
+ '()
+ content*))
+ (_ (begin (pk 'scontent) (pretty-print scontent)))
+ (templates `((*any* . ,(cut html->texinfo <> #:srfi srfi))))
+ (texi (apply-templates scontent templates)))
+ (pretty-print (string-replace-substring (string-append "\
+@node " srfi "
+@subsection " title "
+@cindex " srfi "
+
+" (string-join texi "")
+"
+") "\r\n" "\n")
+ #:display? #t)))
+ (_ (usage))))
+
+(define main snarfi)
+
+;;; snarfi ends here
--
2.41.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-12-03 16:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-03 16:37 [PATCH 0/2] Add 'snarfi' guild script to convert SRFI HTML specifications to Texinfo Maxim Cournoyer
2023-12-03 16:37 ` [PATCH 1/2] .dir-locals: Add indentation rule for sxml-match syntax Maxim Cournoyer
2023-12-03 16:37 ` [PATCH 2/2] scripts: Add SRFI documentation HTML -> Texinfo snarfer Maxim Cournoyer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).