From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Newsgroups: gmane.lisp.guile.devel
Subject: [PATCH 2/2] scripts: Add SRFI documentation HTML -> Texinfo snarfer.
Date: Sun,  3 Dec 2023 11:37:51 -0500
Message-ID: <20231203170504.7818-3-maxim.cournoyer@gmail.com>
References: <20231203170504.7818-1-maxim.cournoyer@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="29044"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: Maxim Cournoyer <maxim.cournoyer@gmail.com>
To: guile-devel@gnu.org
Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Dec 03 18:05:52 2023
Return-path: <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>
Envelope-to: guile-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>)
	id 1r9puh-0007Ef-CO
	for guile-devel@m.gmane-mx.org; Sun, 03 Dec 2023 18:05:51 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <guile-devel-bounces@gnu.org>)
	id 1r9puQ-00040y-PI; Sun, 03 Dec 2023 12:05:34 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <maxim.cournoyer@gmail.com>)
 id 1r9puE-0003zz-Av
 for guile-devel@gnu.org; Sun, 03 Dec 2023 12:05:24 -0500
Original-Received: from mail-qk1-x72e.google.com ([2607:f8b0:4864:20::72e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <maxim.cournoyer@gmail.com>)
 id 1r9puA-0008ST-Je
 for guile-devel@gnu.org; Sun, 03 Dec 2023 12:05:22 -0500
Original-Received: by mail-qk1-x72e.google.com with SMTP id
 af79cd13be357-77f0933adb2so42102185a.1
 for <guile-devel@gnu.org>; Sun, 03 Dec 2023 09:05:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1701623117; x=1702227917; darn=gnu.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=G0ydXRYn+vrHMQO7hiLI2LhFLn2gX1jIX9rUOQiI0Kw=;
 b=FHEu7/ze97HkCv4hEWk0ouIVbb0OKvjnzlawtRk3Z/eKHqJtGYH9Jd0Aumduy370Vg
 huwWTGDiaQAzxLsd9ciL1GvRQpdnHBojOJxUvb0UUN4JixuZec0qfyGB8ksVXoayRNtv
 sf7AnkWJCDv1BwCzZ6u4AfnUKZ8E4/ZqbEFOGrbc27q/GLNUQtA4rSS+EQird4jyvSoy
 lJkilD1AOcqQzJr93eeIoB0nHS5CfalZG3k7cj7E+iywM6N+nb2kiTptVWWBgZ0GqnEd
 hBnvCCneWuYWXMocwEw4BTvNs81Gopxy4if8GRr7ZvOqUgMGSjC0sXGnTsnuUuL9meWz
 MUCw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1701623117; x=1702227917;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=G0ydXRYn+vrHMQO7hiLI2LhFLn2gX1jIX9rUOQiI0Kw=;
 b=Z5Mgv4AasF8grMOPjalE/Ew66npTRaQvqy/sSaLh/gfxPPJvhVTqw0uTEiFQH99lxT
 A/qVl5dJuZVv/y+PCiLKum257JCXKKjZvBiew5SeXCgiHQT+/bPbDCwsB5mapLngeo9v
 hH+PN+a3Z/yRpsn9fxsG+lLFfqS0FBncdnUYOml7b3/dvH4pqtbZyL/Uz+3HcfYEkZE/
 rdX6gKJ72bOyB1/TvfEPZoDrJobwH4+FcZaufPR/HvDz7LK2goWTJXvsW3LHyP+S5NpW
 VhF/mV+WHmjagbA+Li/5cIg9XSa8Lbu193LMj0xfP1xx0nI7sUrDn0QMLvXsFm7gH4WZ
 3kPg==
X-Gm-Message-State: AOJu0YxKBA9jy1dhNubGJ7kxukZ9ByJE+slt0DD+CT1Wp6mE61AYaX4y
 HrZwCK1B6m+YUUQc6+cTQm86ai4++Nw=
X-Google-Smtp-Source: AGHT+IHx3ll6VdxxGIPdYU5LejFtL8ZgD1C7hYuM8HvE8GE3FmNslFgI7VT2DcbonYdnaJ2OifXQ1w==
X-Received: by 2002:a05:620a:880f:b0:77b:d28d:9324 with SMTP id
 qj15-20020a05620a880f00b0077bd28d9324mr3839385qkn.76.1701623116549; 
 Sun, 03 Dec 2023 09:05:16 -0800 (PST)
Original-Received: from localhost.localdomain (dsl-141-198.b2b2c.ca. [66.158.141.198])
 by smtp.gmail.com with ESMTPSA id
 az12-20020a05620a170c00b0076ce061f44dsm3503513qkb.25.2023.12.03.09.05.15
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Sun, 03 Dec 2023 09:05:16 -0800 (PST)
X-Mailer: git-send-email 2.41.0
In-Reply-To: <20231203170504.7818-1-maxim.cournoyer@gmail.com>
Received-SPF: pass client-ip=2607:f8b0:4864:20::72e;
 envelope-from=maxim.cournoyer@gmail.com; helo=mail-qk1-x72e.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Developers list for Guile,
 the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
 <mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
 <mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.lisp.guile.devel:22179
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/22179>

* module/scripts/snarfi.scm: New file.
* am/bootstrap.am (SOURCES): Register it.
* NEWS: Add news entry.

---

 NEWS                      |  11 +
 am/bootstrap.am           |   1 +
 module/scripts/snarfi.scm | 637 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 649 insertions(+)
 create mode 100644 module/scripts/snarfi.scm

diff --git a/NEWS b/NEWS
index b319404d7..db8930ff9 100644
--- a/NEWS
+++ b/NEWS
@@ -21,6 +21,17 @@ definitely unused---this is notably the case for modules that are only
 used at macro-expansion time, such as (srfi srfi-26).  In those cases,
 the compiler reports it as "possibly unused".
 
+
+** New guild command: snarfi
+
+The new ~snarfi~ guild command aims to make importing a SRFI
+specification documentation easier, by snarfing relevant documentation
+into Texinfo.  It can be invoked, for example, like:
+
+  $ guild snarfi srfi-151.html
+
+Where the last argument is the SRFI specification source HTML file.
+
 * Bug fixes
 
 ** (ice-9 suspendable-ports) incorrect UTF-8 decoding
diff --git a/am/bootstrap.am b/am/bootstrap.am
index a71946958..8782a7a82 100644
--- a/am/bootstrap.am
+++ b/am/bootstrap.am
@@ -307,6 +307,7 @@ SOURCES =					\
   scripts/api-diff.scm				\
   scripts/read-rfc822.scm			\
   scripts/snarf-guile-m4-docs.scm		\
+  scripts/snarfi.scm				\
   scripts/autofrisk.scm				\
   scripts/scan-api.scm				\
 						\
diff --git a/module/scripts/snarfi.scm b/module/scripts/snarfi.scm
new file mode 100644
index 000000000..10420d4f0
--- /dev/null
+++ b/module/scripts/snarfi.scm
@@ -0,0 +1,637 @@
+;;; snarfi --- Snarf SRFI HTML specifications into Texinfo doc
+
+;; Copyright 2023 Free Software Foundation, Inc.
+;;
+;; This program is free software; you can redistribute it and/or
+;; modify it under the terms of the GNU Lesser General Public License
+;; as published by the Free Software Foundation; either version 3, or
+;; (at your option) any later version.
+;;
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; Lesser General Public License for more details.
+;;
+;; You should have received a copy of the GNU Lesser General Public
+;; License along with this software; see the file COPYING.LESSER.  If
+;; not, write to the Free Software Foundation, Inc., 51 Franklin
+;; Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+;;; Author: Maxim Cournoyer <maxim.cournoyer@gmail.com>
+
+;;; Commentary:
+
+;;; Usage: guild snarfi srfi-spec.html
+
+;;; This script takes the a SRFI HTML source file as input, parses it
+;;; and processes it to output a Texinfo that can be used as a starting
+;;; point to properly document a SRFI into Guile.
+;;;
+;;; Requirements: guile-lib (for htmlprag)
+
+;;; Tested with:
+;;; - srfi-64.html
+;;; - srfi-151.html
+;;; - srfi-160.html
+;;; - srfi-178.html
+;;; - srfi-209.html
+
+;;; Code:
+
+(define-module (scripts snarfi)
+  #:use-module (ice-9 format)
+  #:use-module (ice-9 match)
+  #:use-module (ice-9 pretty-print)
+  #:use-module (ice-9 regex)
+  #:use-module (ice-9 string-fun)
+  #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-2)
+  #:use-module (srfi srfi-26)
+  #:use-module (sxml apply-templates)
+  #:use-module (sxml fold)
+  #:use-module ((sxml xpath) #:renamer (lambda (s)
+                                         (if (eq? 'filter s)
+                                             'xfilter
+                                             s)))
+  #:use-module (sxml match)
+  #:autoload (htmlprag) (html->shtml)
+  #:export (snarfi))
+
+(define %summary "Snarf SRFI HTML specifications into Texinfo doc")
+
+(define (usage)
+  "Display usage text."
+  (format #t "Usage: snarfi SRFI-SPEC.HTML
+Snarf Texinfo documentation from SRFI-SPEC.HTML, the HTML source file
+of a SRFI specification.
+
+-h, --help             print this help message
+"))
+
+(define (section-of-interest? node entered?)
+  "XPath predicate to check if a h1 header is of interest.
+ENTERED? tracks whether we've already entered a section of interest."
+  (sxml-match node
+    ((h1 ,title)
+     (member title '("Abstract" "Rationale" "Specification")))
+    (,otherwise
+     entered?)))
+
+(define (find-heading-by-text text tree)
+  "Return the heading node by its TEXT."
+  (find (lambda (node)
+          (and (pair? node)
+               (member (car node) '(h1 h2 h3 h4 h5 h6))
+               (string=? text (last node))))
+        tree))
+
+(define (node->level node)
+  "Return the heading level of NODE, else #f."
+  (and-let* ((h (cond ((pair? node) (car node))
+                      ((symbol? node) node)
+                      (else #f)))
+             (h* (symbol->string h))
+             (m (string-match "^h([0-6])$" h*)))
+    (string->number (match:substring m 1))))
+
+(define (decrement-heading node)
+  "Decrement heading level of NODE, if a heading, else leave it untouched."
+  (sxml-match node
+    ((h2 ,value)
+     `(h1 ,value))
+    ((h3 ,value)
+     `(h2 ,value))
+    ((h4 ,value)
+     `(h3 ,value))
+    ((h5 ,value)
+     `(h4 ,value))
+    ((h6 ,value)
+     `(h5 ,value))
+    (,other other)))
+
+(define (splice-children heading tree)
+  "Discard any HEADING section and its orphaned text (between it and the
+next heading), a symbol like e.g. 'h1' or '(h2 \"Some Section\"),
+promoting its children sections in its place."
+  (unless (or (pair? heading)
+              (symbol? heading))
+    (error "heading must be a symbol or a pair"))
+
+  (define heading-level (node->level heading))
+
+  (let loop ((result '())
+             (rest tree)
+             (discard? #f)
+             (entered? #f))
+    (match rest
+      (() (reverse result))
+      ((node . tail)
+       (let* ((level (node->level node))
+              (at-section? (if (pair? heading)
+                               (equal? heading node)
+                               ;; heading is a symbol
+                               (and (pair? node)
+                                    (eq? heading (car node)))))
+              (discard? (if discard?
+                            (not level) ;stop discarding?
+                            at-section?))
+              (leaving-section? (and entered?
+                                     (and level (<= level heading-level)))))
+         (loop (cond ((or at-section?   ;skip section
+                          discard?)     ;discard orphaned node
+                      result)
+                     (entered?          ;decrement child headings
+                      (cons (decrement-heading node) result))
+                     (else              ;leave untouched
+                      (cons node result)))
+               tail
+               discard?
+               (if entered?
+                   (not leaving-section?) ;leave section?
+                   at-section?)))))))     ;enter section
+
+
+;;;
+;;; HTML conversion related.
+;;;
+
+(define (dl->@table node)
+  "Transform a <dl> SHTML NODE into a Texinfo @table."
+  (define (dl-node->alist node)
+    (let loop ((result '())
+               (current-terms '())
+               (current-descriptions '())
+               (rest (sxml-match node
+                       ((dl . ,rest)
+                        rest)
+                       (,other (error "expected dl node, got" node)))))
+      (match rest
+        (()
+         ;; Produce last entry.
+         (reverse (cons (cons (reverse current-terms)
+                              (reverse current-descriptions))
+                        result)))
+        (((? string?) . rest)
+        ;; Disregard any interspersed strings, which are typically used
+        ;; for spacing purposes.
+         (loop result current-terms current-descriptions rest))
+        ((('dt term ...) . rest)
+         (let ((texi-term (string-join (map html->texinfo term) "")))
+           (if (null? current-descriptions)
+               (loop result
+                     (cons texi-term current-terms)
+                     current-descriptions rest)
+               ;; Produce last table item/description pair, if description was
+               ;; set.
+               (loop (cons (cons (reverse current-terms)
+                                 (reverse current-descriptions)) result)
+                     (list texi-term) '() rest))))
+        ((('dd description ...) . rest)
+         (when (null? current-terms)
+           (error "malformed dl HTML"))
+         (loop result current-terms
+               (cons (string-join (map html->texinfo description) "")
+                     current-descriptions)
+               rest)))))
+
+  (format #f "\
+@table @asis
+~{~a~}\
+@end table~%" (map (match-lambda
+                     ;; Each dl "row" can have multiple terms and descriptions.
+                     ((terms . descriptions)
+                      (match terms
+                        ((term term* ..1)
+                         (format #f "\
+@item ~a
+~{@itemx ~a~^~%~}
+~{~a~}~%~%" term term* descriptions))
+                        ((term)
+                         (format #f "\
+@item ~a
+~{~a~}~%~%" term descriptions)))))
+                   (dl-node->alist node))))
+
+(define* (html->texinfo node #:key srfi deffn?)
+  "A HTML node to Texinfo converter, applied recursively to NODE, with the
+SRFI string prefix, e.g. \"SRFI 151\", used to produce unique @node for
+sections.  DEFFN? indicates whether conversion rules for use with
+'@deffn' nodes should be used."
+  (sxml-match node
+    ((h1 ,value)
+     (format #f "
+@node ~a ~a
+@subsubsection ~a ~a~%~%" srfi (string-replace-substring value "," "")
+srfi value))
+    ((h2 ,value)
+     (format #f "~%@subsubheading ~a~%~%"
+             value))
+    ;; Procedure definitions in the style of SRFI 151.
+    ((p (tt ,proc) . ,rest) (guard (and (string-prefix? "(" proc)
+                                        (member '(tt ")") rest)))
+     (node->definitions node #:style-hint 'srfi-151))
+    ;; Procedure definitions in the '<pre>' style of SRFI 64.
+    ((pre "(" (b ,proc) . ,rest) (guard (or (member ")" rest)
+                                            (member ")\n" rest)
+                                            (member "])\n" rest)
+                                            (member " ...)\n" rest)))
+     (node->definitions node #:style-hint 'srfi-64-pre))
+    ;; Procedure definitions in the '<code>' style of SRFI 64.  The
+    ;; signature usually looks like (p (code "(" (b "proc") ...)), but
+    ;; since there can be trailing text describing the code element, we
+    ;; use the following catchall pattern with a guard:
+    ((p . ,rest) (guard (find (match-lambda
+                                (('code "(" ('b proc) . rest)
+                                 #t)
+                                (('code "(" ('var proc) . rest)
+                                 #t)
+                                (_ #f))
+                              rest))
+     (node->definitions node #:style-hint 'srfi-64-code))
+    ((p . ,rest) (guard (find (match-lambda
+                                (('code (and (? string?)
+                                             (? (cut string-prefix? "(" <>)))
+                                        . rest)
+                                 (any (lambda (x)
+                                        (and (string? x)
+                                             (string-contains x ") ->")))
+                                      rest))
+                                (_ #f))
+                              rest))
+     (node->definitions node #:style-hint 'srfi-160-code))
+    ((p (code ,proc) . ,rest) (guard (and (string-prefix? "(" proc)
+                                          (member '(code ")") rest)))
+     (node->definitions node #:style-hint 'srfi-178-code))
+    ((p . ,rest) (guard (find (match-lambda
+                                (('code proc . rest)
+                                 (and (string-prefix? "(" proc)
+                                      (any (cut string-suffix? ") -" <>)
+                                           (filter string? rest))))
+                                (_ #f))
+                              rest))
+     (node->definitions node #:style-hint 'srfi-178-code-with-return-type))
+    ((p ,(node) ...)
+     (string-join `("\n" ,@node "\n") ""))
+    ((small ,(node) ...)
+     (format #f "~{~a~}" node))
+    ((pre ,(node) ...)
+     (format #f "~%@lisp
+~{~a~}@end lisp~%" node))
+    ((a (@ (href ,href)) ,title ...)
+     (format #f "@url{~a, ~{~a~^ ~}}" href (map string-trim-both title)))
+    ((b ,value)
+     (format #f "@b{~a}" value))
+    ((br) "\n")
+    ((code (var ,value))
+     (format #f "@var{~a}" value))
+    ((code ,(node) ...)
+     (format #f "@code{~{~a~}}" node))
+    ((*COMMENT* ,value)
+     (string-append "@c " (string-replace-substring
+                           (string-trim-both value) "\n" "\n@c ")))
+    ((dfn ,value)
+     (format #f "@dfn{~a}" value))
+    ((*ENTITY* "additional" "copy")
+     "@copyright{}")
+    ((*ENTITY* "additional" "mdash")
+     "---")                             ;em dash
+    ((*ENTITY* "additional" "nbsp")
+     "@tie{}")                          ;non-breakable space
+    ((*ENTITY* "additional" "rArr")     ;rightwards double arrow
+     "@U{21D2}")
+    ((*ENTITY* "additional-char" ,value)
+     (format #f "@U{~x}" (string->number value)))
+    ((i ,(node) ...)
+     (format #f "@i{~{~a~}}" node))
+    ((q ,value)
+     (format #f "``~a''" value))
+    ((em ,(node) ...)
+     (guard deffn?)
+     (format #f "~{~a~^ ~}" node))
+    ((em ,(node) ...)
+     (guard (not deffn?))
+     (format #f "@emph{~{~a~}}" node))
+    ((dl . ,rest)
+     (dl->@table node))
+    ((ol (li ,(node) ...) ...)
+     (format #f "~%@enumerate
+~{@item~%~{~a~}~^~%~%~}
+@end enumerate~%" node))
+    ((ul (li ,(node) ...) ...)
+     (format #f "~%@itemize
+~{@item~%~{~a~}~^~%~%~}
+@end itemize~%" node))
+    ((sup ,value)
+     (format #f "@sup{~a}" value))
+    ((sub ,value)
+     (format #f "@sub{~a}" value))
+    ((tt ,value)
+     (format #f "@code{~a}" value))
+    ((var ,value)
+     (format #f "@var{~a}" value))
+    (,other other)))
+
+(define (process-args args)
+  "Process the arguments to have them in the format expected by Texinfo's
+@deffn, stripping extraneous metadata.  SRFI is a string such as \"SRFI
+64\", used to prefix Texinfo nodes."
+  (reverse
+   ;; Convert SHTML to Texinfo, and coalesce brackets into arguments.
+   (match (fold
+           (lambda (node tree)        ;fhere
+             (let ((new (string-replace-substring
+                         (html->texinfo node #:deffn? #t)
+                         ;; Strip closing quote (SRFI 160).
+                         ") -> " "-> ")))
+               (match tree
+                 (("[" . rest)
+                  (cons (string-append "[" new) rest))
+                 (("[[" . rest)
+                  (cons (string-append "[[" new) rest))
+                 (("]" . (s . tail))
+                  (cons* new (string-append s "]") tail))
+                 ;; SRFI 160.
+                 (((and (? string? s1) (? (cut string-prefix? "]]" <>)))
+                   . (s2 . tail))
+                  (cons* new (string-trim-both (string-drop s1 2))
+                         (string-append s2 "]]") tail))
+                 (((and (? string? s1) (? (cut string-prefix? "]" <>)))
+                   . (s2 . tail))
+                  (cons* new (string-trim-both (string-drop s1 1))
+                         (string-append s2 "]") tail))
+                 ;; SRFI 178.
+                 (((and (? string? s1) (? (cut string-suffix? "[" <>)))
+                   . tail)
+                  (cons* (string-append "[" new)
+                         (string-append (string-trim-both
+                                         (string-drop-right s1 1)))
+                         tail))
+                 (() (cons new tree))
+                 (other (cons new other)))))
+           '()
+           args)
+     ;; Post-process for a potential trailing ']'.
+     (("]" . (s . tail))
+      (cons (string-append s "]") tail))
+     (((and (? string? s1) (? (cut string-prefix? "]]" <>))) . (s2 . tail))
+      (cons* (string-trim-both (string-drop s1 2))
+             (string-append s2 "]]") tail))
+     (((and (? string? s1) (? (cut string-prefix? "]" <>))) . (s2 . tail))
+      (cons* (string-trim-both (string-drop s1 1))
+             (string-append s2 "]") tail))
+     (other other))))
+
+(define (sanitize-definitions-node node)
+  "Recursively sanitize arguments, removing extraneous white space."
+  (first (foldts
+          (lambda (seed node)               ;fdown
+            '())
+          (lambda (seed kid-seed node)      ;fup
+            (sxml-match node
+              ((*ENTITY* "additional" "nbsp")
+               seed)                        ;discard
+              ((i ,value)
+               (cons value seed))
+              ((var ,value)
+               (cons value seed))
+              (,other
+               (cons (reverse kid-seed) seed))))
+          (lambda (seed node)               ;fhere
+            (let ((node (if (string? node)
+                            (string-trim-both node)
+                            node)))
+              (match node
+                ((or "..." "…")
+                 (cons "@dots{}" seed))
+                ((and (? string?) (? string-null?))
+                 seed)                      ;discard node
+                (other
+                 (cons node seed)))))
+          '()
+          node)))
+
+(define* (node->definitions node #:key style-hint)
+  "Convert NODE, an SHTML expression assumed to contain
+procedure definitions, to Texinfo.  STYLE-HINT may be used to provide a
+hint about which HTML scheme is being used to format the definitions."
+  ;; This is messy because SRFIs appear to all use their own special
+  ;; variant to format procedure/syntax definitions.
+  (let ((node (sanitize-definitions-node node)))
+
+    (define (deffn proc args continued?)
+      (unless proc
+        (error "expected a string for 'proc', got" proc))
+      (format #f "~%@deffn~:[~;x~] {Scheme Procedure} ~{~a ~^~}~%"
+              continued? (process-args (cons proc args))))
+
+    (if (eq? 'srfi-160-code style-hint)
+        ;; Definitions in SRFI 160 are special in that no grouping
+        ;; information can be inferred, and a return value and SRFI
+        ;; annotation are added at the end.
+        (match node
+          ((p ('code (and (? string? proc) (? (cut string-prefix? "(" <>)))
+                     . rest) srfi ...)
+           (string-append (deffn (string-trim-both (string-drop proc 1))
+                            (append rest srfi) #f)
+                          "\
+@c FIXME: Check deffn category and adjust '@end deffn' location
+@end deffn\n\n")))
+
+        (let loop ((result '())
+                   (doc '())
+                   (rest (cdr node))
+                   (continued? #f)
+                   (proc #f)
+                   (args '()))
+
+          (pk 'called-loop-with-proc proc)
+          (match rest
+            (()
+             (pk 'closing-deffn-on-empty-list)
+             (pk 'doc doc)
+             ;; The (p (code (b procedure) args ... doc ...)) style used by
+             ;; SRFI-64 allows to correctly close the definition.
+             (let ((result (if (null? doc) ;srfi-64 <code> trailing doc
+                               (cons "\
+@c FIXME: Check deffn category and adjust '@end deffn' location
+@end deffn\n\n" result)
+                               (cons* "@end deffn\n"
+                                      (html->texinfo `(p ,@(reverse doc)))
+                                      result))))
+               (cons "\n" (reverse result))))
+            ((head . tail)
+             (pk 'GOT-HEAD head 'REST tail)
+             (match head
+               ("("                       ;skip opening parenthesis
+                (loop result doc tail continued? proc args)) ;no-op
+               ;; SRFI 64 <code> procedure definition node.
+               (('code "(" ((or 'b 'var) p) . rest)
+                (pk 'setting-proc p)
+                ;; Dump accumulated leading documentation, if any.
+                (let ((result (if (null? doc)
+                                  result
+                                  (cons (html->texinfo `(p ,@(reverse doc)))
+                                        result))))
+                  (if proc
+                      ;; New continued procedure.
+                      (loop result '() (append rest tail) #t p '())
+                      (loop result '() (append rest tail) continued? p '()))))
+               (('code (and (? string? p)
+                            (? (cut string-prefix? "(" <>))) . rest)
+                ;; SRFI 178 style.
+                (pk 'setting-proc p)
+                ;; Dump accumulated leading documentation, if any.
+                (let ((result (if (null? doc)
+                                  result
+                                  (cons (html->texinfo `(p ,@(reverse doc)))
+                                        result)))
+                      (p (string-trim-right (string-drop p 1))))
+                  (if proc
+                      ;; New continued procedure.
+                      (loop result '() (append rest tail) #t p '())
+                      (loop result '() (append rest tail) continued? p '()))))
+               ((or ('tt ")") ")" ")\n" "])" "])\n" " ...)" " ...)\n"
+                    ('code ")"))          ;end
+                (let* ((args (if (and (string? head) (string-index head #\]))
+                                 (cons "]" args)
+                                 args))
+                       (args* (if (and (string? head)
+                                       (string-prefix? " ..." head))
+                                  (cons " ..." args)
+                                  args)))
+                  (pk 'calling-deffn-in-end)
+                  (loop (cons (deffn proc (reverse args*) continued?) result)
+                        doc
+                        tail
+                        #t             ;mark next definitions as continued
+                        #f '())))
+               ((and (? string?) (? (cut string-suffix? ") -" <>))) ;end
+                ;; Definition ended but there is a trailing returned type
+                ;; annotation (e.g. "-> bitvector") to add as an argument.
+                (let* ((tail-length (length tail))
+                       (args `(,@(if (>= tail-length 2)
+                                     (list (second tail))
+                                     '())
+                               "->"
+                               ,(string-drop-right head 3)
+                               ,@args)))
+                  (pk 'calling-deffn-in-end)
+                  (loop (cons (deffn proc (reverse args) continued?) result)
+                        doc
+                        (if (>= tail-length 2)
+                            (drop tail 2)
+                            tail)
+                        #t             ;mark next definitions as continued
+                        #f '())))
+               (('tt (and (? string?) (? (cut string-prefix? "(" <>)) p))
+                (when proc
+                  (error "unexpected proc encountered while already defined" p))
+                (let ((p (string-trim-right (string-drop p 1))))
+                  (pk 'setting-proc p)
+                  (loop result doc tail continued? p '())))
+               (('b p)                    ;for SRFI 64 <pre>
+                (pk 'setting-proc p)
+                ;; Dump accumulated leading documentation, if any.
+                (let ((result (if (null? doc)
+                                  result
+                                  (cons (html->texinfo `(p ,@doc)) result))))
+                  (if proc
+                      (loop result '() tail #t p '()) ;new continued procedure
+                      (loop result '() tail continued? p '()))))
+               (('br)                     ;continued?
+                (pk 'got-br)
+                (if (eq? 'srfi-64-code style-hint)
+                    ;; For SRFI 64 <code> style definitions, this should be a
+                    ;; no-op.
+                    (loop result doc tail continued? proc args)
+                    (loop result doc tail #t #f '())))
+               (other
+                (pk 'got-arg other)
+                (if proc
+                    (loop result doc tail continued? proc (cons other args))
+                    ;; The argument was seen after proc definition was
+                    ;; closed; preserve it as documentation.
+                    (begin (pk 'keeping-as-doc other)
+                           (loop result (cons other doc)
+                                 tail continued? proc args)))))))))))
+
+(define (snarfi . args)
+  "snarf documentation from the first argument, a srfi html spec source
+file assumed to follow the basic structure of the SRFI template (see:
+https://srfi.schemers.org/srfi-template.html)."
+  (match args
+    (((or "-h" "--help"))
+     (usage))
+    ((file)
+     (let* ((shtml (call-with-input-file file
+                     (cut html->shtml <> #:strict? #t)))
+            (title (first ((sxpath '(// html head title *text*)) shtml)))
+            (srfi (and=> (string-match "SRFI [[:digit:]]+" title)
+                         match:substring))
+            (body (first ((sxpath '(// html body)) shtml)))
+            (content-level (or (node->level
+                                (find-heading-by-text "Abstract" body))
+                               (error "could not locate 'Abstract' section")))
+            (body* (if (> content-level 1)
+                       (splice-children (pk 'splicing-with (string->symbol
+                                                            (format #f "h~a" (1- content-level))))
+                                        body)
+                       body))
+            (content (let loop ((result '())
+                                (rest body*)
+                                (entered? #f))
+                       (match rest
+                         (() (reverse result))
+                         ((node . tail)
+                          (if (section-of-interest? node entered?)
+                              (loop (cons node result) tail #t)
+                              (loop result tail #f))))))
+            (content* (splice-children '(h1 "Specification") content))
+            ;; Sanitize the SHTML to remove spurious "\r\n" or "\n"
+            ;; strings, which would otherwise prevent some of the
+            ;; template patterns to match.  Also make some safe
+            ;; substitutions.
+            (scontent (foldts
+                       (lambda (seed node) ; fdown
+                         '())
+                       (lambda (seed kid-seed node) ; fup
+                         (match node
+                           (('@ ('id id))
+                            ;; IDs may appear anywhere and break pattern
+                            ;; matching; skip them.
+                            seed)
+                           (other (cons (reverse kid-seed) seed))))
+                       (lambda (seed tree) ; fhere
+                         (let ((tree (if (string? tree)
+                                         ;; Use @dots{} for ellipses.
+                                         (string-replace-substring
+                                          ;; Double all '@'s, for Texinfo.
+                                          (string-replace-substring
+                                           tree "@" "@@")
+                                          "..." "@dots{}")
+                                         tree)))
+                           (match tree
+                             ((and (? string?)
+                                   (? (cut string-match
+                                           "^(\\.?\r?\n|[[:space:]]+)$" <>)))
+                              seed)     ;skip white space atoms
+                             (other (cons tree seed)))))
+                       '()
+                       content*))
+            (_ (begin (pk 'scontent) (pretty-print scontent)))
+            (templates `((*any* . ,(cut html->texinfo <> #:srfi srfi))))
+            (texi (apply-templates scontent templates)))
+       (pretty-print (string-replace-substring (string-append "\
+@node " srfi "
+@subsection " title "
+@cindex " srfi "
+
+" (string-join texi "")
+"
+") "\r\n" "\n")
+                     #:display? #t)))
+    (_ (usage))))
+
+(define main snarfi)
+
+;;; snarfi ends here
-- 
2.41.0