From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxim Cournoyer <maxim.cournoyer@gmail.com> Newsgroups: gmane.lisp.guile.devel Subject: [PATCH 2/2] scripts: Add SRFI documentation HTML -> Texinfo snarfer. Date: Sun, 3 Dec 2023 11:37:51 -0500 Message-ID: <20231203170504.7818-3-maxim.cournoyer@gmail.com> References: <20231203170504.7818-1-maxim.cournoyer@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29044"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Maxim Cournoyer <maxim.cournoyer@gmail.com> To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Dec 03 18:05:52 2023 Return-path: <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org> Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>) id 1r9puh-0007Ef-CO for guile-devel@m.gmane-mx.org; Sun, 03 Dec 2023 18:05:51 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <guile-devel-bounces@gnu.org>) id 1r9puQ-00040y-PI; Sun, 03 Dec 2023 12:05:34 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <maxim.cournoyer@gmail.com>) id 1r9puE-0003zz-Av for guile-devel@gnu.org; Sun, 03 Dec 2023 12:05:24 -0500 Original-Received: from mail-qk1-x72e.google.com ([2607:f8b0:4864:20::72e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <maxim.cournoyer@gmail.com>) id 1r9puA-0008ST-Je for guile-devel@gnu.org; Sun, 03 Dec 2023 12:05:22 -0500 Original-Received: by mail-qk1-x72e.google.com with SMTP id af79cd13be357-77f0933adb2so42102185a.1 for <guile-devel@gnu.org>; Sun, 03 Dec 2023 09:05:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701623117; x=1702227917; darn=gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=G0ydXRYn+vrHMQO7hiLI2LhFLn2gX1jIX9rUOQiI0Kw=; b=FHEu7/ze97HkCv4hEWk0ouIVbb0OKvjnzlawtRk3Z/eKHqJtGYH9Jd0Aumduy370Vg huwWTGDiaQAzxLsd9ciL1GvRQpdnHBojOJxUvb0UUN4JixuZec0qfyGB8ksVXoayRNtv sf7AnkWJCDv1BwCzZ6u4AfnUKZ8E4/ZqbEFOGrbc27q/GLNUQtA4rSS+EQird4jyvSoy lJkilD1AOcqQzJr93eeIoB0nHS5CfalZG3k7cj7E+iywM6N+nb2kiTptVWWBgZ0GqnEd hBnvCCneWuYWXMocwEw4BTvNs81Gopxy4if8GRr7ZvOqUgMGSjC0sXGnTsnuUuL9meWz MUCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701623117; x=1702227917; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G0ydXRYn+vrHMQO7hiLI2LhFLn2gX1jIX9rUOQiI0Kw=; b=Z5Mgv4AasF8grMOPjalE/Ew66npTRaQvqy/sSaLh/gfxPPJvhVTqw0uTEiFQH99lxT A/qVl5dJuZVv/y+PCiLKum257JCXKKjZvBiew5SeXCgiHQT+/bPbDCwsB5mapLngeo9v hH+PN+a3Z/yRpsn9fxsG+lLFfqS0FBncdnUYOml7b3/dvH4pqtbZyL/Uz+3HcfYEkZE/ rdX6gKJ72bOyB1/TvfEPZoDrJobwH4+FcZaufPR/HvDz7LK2goWTJXvsW3LHyP+S5NpW VhF/mV+WHmjagbA+Li/5cIg9XSa8Lbu193LMj0xfP1xx0nI7sUrDn0QMLvXsFm7gH4WZ 3kPg== X-Gm-Message-State: AOJu0YxKBA9jy1dhNubGJ7kxukZ9ByJE+slt0DD+CT1Wp6mE61AYaX4y HrZwCK1B6m+YUUQc6+cTQm86ai4++Nw= X-Google-Smtp-Source: AGHT+IHx3ll6VdxxGIPdYU5LejFtL8ZgD1C7hYuM8HvE8GE3FmNslFgI7VT2DcbonYdnaJ2OifXQ1w== X-Received: by 2002:a05:620a:880f:b0:77b:d28d:9324 with SMTP id qj15-20020a05620a880f00b0077bd28d9324mr3839385qkn.76.1701623116549; Sun, 03 Dec 2023 09:05:16 -0800 (PST) Original-Received: from localhost.localdomain (dsl-141-198.b2b2c.ca. [66.158.141.198]) by smtp.gmail.com with ESMTPSA id az12-20020a05620a170c00b0076ce061f44dsm3503513qkb.25.2023.12.03.09.05.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Dec 2023 09:05:16 -0800 (PST) X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231203170504.7818-1-maxim.cournoyer@gmail.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::72e; envelope-from=maxim.cournoyer@gmail.com; helo=mail-qk1-x72e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" <guile-devel.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>, <mailto:guile-devel-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/guile-devel> List-Post: <mailto:guile-devel@gnu.org> List-Help: <mailto:guile-devel-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>, <mailto:guile-devel-request@gnu.org?subject=subscribe> Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22179 Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/22179> * module/scripts/snarfi.scm: New file. * am/bootstrap.am (SOURCES): Register it. * NEWS: Add news entry. --- NEWS | 11 + am/bootstrap.am | 1 + module/scripts/snarfi.scm | 637 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 649 insertions(+) create mode 100644 module/scripts/snarfi.scm diff --git a/NEWS b/NEWS index b319404d7..db8930ff9 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,17 @@ definitely unused---this is notably the case for modules that are only used at macro-expansion time, such as (srfi srfi-26). In those cases, the compiler reports it as "possibly unused". + +** New guild command: snarfi + +The new ~snarfi~ guild command aims to make importing a SRFI +specification documentation easier, by snarfing relevant documentation +into Texinfo. It can be invoked, for example, like: + + $ guild snarfi srfi-151.html + +Where the last argument is the SRFI specification source HTML file. + * Bug fixes ** (ice-9 suspendable-ports) incorrect UTF-8 decoding diff --git a/am/bootstrap.am b/am/bootstrap.am index a71946958..8782a7a82 100644 --- a/am/bootstrap.am +++ b/am/bootstrap.am @@ -307,6 +307,7 @@ SOURCES = \ scripts/api-diff.scm \ scripts/read-rfc822.scm \ scripts/snarf-guile-m4-docs.scm \ + scripts/snarfi.scm \ scripts/autofrisk.scm \ scripts/scan-api.scm \ \ diff --git a/module/scripts/snarfi.scm b/module/scripts/snarfi.scm new file mode 100644 index 000000000..10420d4f0 --- /dev/null +++ b/module/scripts/snarfi.scm @@ -0,0 +1,637 @@ +;;; snarfi --- Snarf SRFI HTML specifications into Texinfo doc + +;; Copyright 2023 Free Software Foundation, Inc. +;; +;; This program is free software; you can redistribute it and/or +;; modify it under the terms of the GNU Lesser General Public License +;; as published by the Free Software Foundation; either version 3, or +;; (at your option) any later version. +;; +;; This program is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;; Lesser General Public License for more details. +;; +;; You should have received a copy of the GNU Lesser General Public +;; License along with this software; see the file COPYING.LESSER. If +;; not, write to the Free Software Foundation, Inc., 51 Franklin +;; Street, Fifth Floor, Boston, MA 02110-1301 USA + +;;; Author: Maxim Cournoyer <maxim.cournoyer@gmail.com> + +;;; Commentary: + +;;; Usage: guild snarfi srfi-spec.html + +;;; This script takes the a SRFI HTML source file as input, parses it +;;; and processes it to output a Texinfo that can be used as a starting +;;; point to properly document a SRFI into Guile. +;;; +;;; Requirements: guile-lib (for htmlprag) + +;;; Tested with: +;;; - srfi-64.html +;;; - srfi-151.html +;;; - srfi-160.html +;;; - srfi-178.html +;;; - srfi-209.html + +;;; Code: + +(define-module (scripts snarfi) + #:use-module (ice-9 format) + #:use-module (ice-9 match) + #:use-module (ice-9 pretty-print) + #:use-module (ice-9 regex) + #:use-module (ice-9 string-fun) + #:use-module (srfi srfi-1) + #:use-module (srfi srfi-2) + #:use-module (srfi srfi-26) + #:use-module (sxml apply-templates) + #:use-module (sxml fold) + #:use-module ((sxml xpath) #:renamer (lambda (s) + (if (eq? 'filter s) + 'xfilter + s))) + #:use-module (sxml match) + #:autoload (htmlprag) (html->shtml) + #:export (snarfi)) + +(define %summary "Snarf SRFI HTML specifications into Texinfo doc") + +(define (usage) + "Display usage text." + (format #t "Usage: snarfi SRFI-SPEC.HTML +Snarf Texinfo documentation from SRFI-SPEC.HTML, the HTML source file +of a SRFI specification. + +-h, --help print this help message +")) + +(define (section-of-interest? node entered?) + "XPath predicate to check if a h1 header is of interest. +ENTERED? tracks whether we've already entered a section of interest." + (sxml-match node + ((h1 ,title) + (member title '("Abstract" "Rationale" "Specification"))) + (,otherwise + entered?))) + +(define (find-heading-by-text text tree) + "Return the heading node by its TEXT." + (find (lambda (node) + (and (pair? node) + (member (car node) '(h1 h2 h3 h4 h5 h6)) + (string=? text (last node)))) + tree)) + +(define (node->level node) + "Return the heading level of NODE, else #f." + (and-let* ((h (cond ((pair? node) (car node)) + ((symbol? node) node) + (else #f))) + (h* (symbol->string h)) + (m (string-match "^h([0-6])$" h*))) + (string->number (match:substring m 1)))) + +(define (decrement-heading node) + "Decrement heading level of NODE, if a heading, else leave it untouched." + (sxml-match node + ((h2 ,value) + `(h1 ,value)) + ((h3 ,value) + `(h2 ,value)) + ((h4 ,value) + `(h3 ,value)) + ((h5 ,value) + `(h4 ,value)) + ((h6 ,value) + `(h5 ,value)) + (,other other))) + +(define (splice-children heading tree) + "Discard any HEADING section and its orphaned text (between it and the +next heading), a symbol like e.g. 'h1' or '(h2 \"Some Section\"), +promoting its children sections in its place." + (unless (or (pair? heading) + (symbol? heading)) + (error "heading must be a symbol or a pair")) + + (define heading-level (node->level heading)) + + (let loop ((result '()) + (rest tree) + (discard? #f) + (entered? #f)) + (match rest + (() (reverse result)) + ((node . tail) + (let* ((level (node->level node)) + (at-section? (if (pair? heading) + (equal? heading node) + ;; heading is a symbol + (and (pair? node) + (eq? heading (car node))))) + (discard? (if discard? + (not level) ;stop discarding? + at-section?)) + (leaving-section? (and entered? + (and level (<= level heading-level))))) + (loop (cond ((or at-section? ;skip section + discard?) ;discard orphaned node + result) + (entered? ;decrement child headings + (cons (decrement-heading node) result)) + (else ;leave untouched + (cons node result))) + tail + discard? + (if entered? + (not leaving-section?) ;leave section? + at-section?))))))) ;enter section + + +;;; +;;; HTML conversion related. +;;; + +(define (dl->@table node) + "Transform a <dl> SHTML NODE into a Texinfo @table." + (define (dl-node->alist node) + (let loop ((result '()) + (current-terms '()) + (current-descriptions '()) + (rest (sxml-match node + ((dl . ,rest) + rest) + (,other (error "expected dl node, got" node))))) + (match rest + (() + ;; Produce last entry. + (reverse (cons (cons (reverse current-terms) + (reverse current-descriptions)) + result))) + (((? string?) . rest) + ;; Disregard any interspersed strings, which are typically used + ;; for spacing purposes. + (loop result current-terms current-descriptions rest)) + ((('dt term ...) . rest) + (let ((texi-term (string-join (map html->texinfo term) ""))) + (if (null? current-descriptions) + (loop result + (cons texi-term current-terms) + current-descriptions rest) + ;; Produce last table item/description pair, if description was + ;; set. + (loop (cons (cons (reverse current-terms) + (reverse current-descriptions)) result) + (list texi-term) '() rest)))) + ((('dd description ...) . rest) + (when (null? current-terms) + (error "malformed dl HTML")) + (loop result current-terms + (cons (string-join (map html->texinfo description) "") + current-descriptions) + rest))))) + + (format #f "\ +@table @asis +~{~a~}\ +@end table~%" (map (match-lambda + ;; Each dl "row" can have multiple terms and descriptions. + ((terms . descriptions) + (match terms + ((term term* ..1) + (format #f "\ +@item ~a +~{@itemx ~a~^~%~} +~{~a~}~%~%" term term* descriptions)) + ((term) + (format #f "\ +@item ~a +~{~a~}~%~%" term descriptions))))) + (dl-node->alist node)))) + +(define* (html->texinfo node #:key srfi deffn?) + "A HTML node to Texinfo converter, applied recursively to NODE, with the +SRFI string prefix, e.g. \"SRFI 151\", used to produce unique @node for +sections. DEFFN? indicates whether conversion rules for use with +'@deffn' nodes should be used." + (sxml-match node + ((h1 ,value) + (format #f " +@node ~a ~a +@subsubsection ~a ~a~%~%" srfi (string-replace-substring value "," "") +srfi value)) + ((h2 ,value) + (format #f "~%@subsubheading ~a~%~%" + value)) + ;; Procedure definitions in the style of SRFI 151. + ((p (tt ,proc) . ,rest) (guard (and (string-prefix? "(" proc) + (member '(tt ")") rest))) + (node->definitions node #:style-hint 'srfi-151)) + ;; Procedure definitions in the '<pre>' style of SRFI 64. + ((pre "(" (b ,proc) . ,rest) (guard (or (member ")" rest) + (member ")\n" rest) + (member "])\n" rest) + (member " ...)\n" rest))) + (node->definitions node #:style-hint 'srfi-64-pre)) + ;; Procedure definitions in the '<code>' style of SRFI 64. The + ;; signature usually looks like (p (code "(" (b "proc") ...)), but + ;; since there can be trailing text describing the code element, we + ;; use the following catchall pattern with a guard: + ((p . ,rest) (guard (find (match-lambda + (('code "(" ('b proc) . rest) + #t) + (('code "(" ('var proc) . rest) + #t) + (_ #f)) + rest)) + (node->definitions node #:style-hint 'srfi-64-code)) + ((p . ,rest) (guard (find (match-lambda + (('code (and (? string?) + (? (cut string-prefix? "(" <>))) + . rest) + (any (lambda (x) + (and (string? x) + (string-contains x ") ->"))) + rest)) + (_ #f)) + rest)) + (node->definitions node #:style-hint 'srfi-160-code)) + ((p (code ,proc) . ,rest) (guard (and (string-prefix? "(" proc) + (member '(code ")") rest))) + (node->definitions node #:style-hint 'srfi-178-code)) + ((p . ,rest) (guard (find (match-lambda + (('code proc . rest) + (and (string-prefix? "(" proc) + (any (cut string-suffix? ") -" <>) + (filter string? rest)))) + (_ #f)) + rest)) + (node->definitions node #:style-hint 'srfi-178-code-with-return-type)) + ((p ,(node) ...) + (string-join `("\n" ,@node "\n") "")) + ((small ,(node) ...) + (format #f "~{~a~}" node)) + ((pre ,(node) ...) + (format #f "~%@lisp +~{~a~}@end lisp~%" node)) + ((a (@ (href ,href)) ,title ...) + (format #f "@url{~a, ~{~a~^ ~}}" href (map string-trim-both title))) + ((b ,value) + (format #f "@b{~a}" value)) + ((br) "\n") + ((code (var ,value)) + (format #f "@var{~a}" value)) + ((code ,(node) ...) + (format #f "@code{~{~a~}}" node)) + ((*COMMENT* ,value) + (string-append "@c " (string-replace-substring + (string-trim-both value) "\n" "\n@c "))) + ((dfn ,value) + (format #f "@dfn{~a}" value)) + ((*ENTITY* "additional" "copy") + "@copyright{}") + ((*ENTITY* "additional" "mdash") + "---") ;em dash + ((*ENTITY* "additional" "nbsp") + "@tie{}") ;non-breakable space + ((*ENTITY* "additional" "rArr") ;rightwards double arrow + "@U{21D2}") + ((*ENTITY* "additional-char" ,value) + (format #f "@U{~x}" (string->number value))) + ((i ,(node) ...) + (format #f "@i{~{~a~}}" node)) + ((q ,value) + (format #f "``~a''" value)) + ((em ,(node) ...) + (guard deffn?) + (format #f "~{~a~^ ~}" node)) + ((em ,(node) ...) + (guard (not deffn?)) + (format #f "@emph{~{~a~}}" node)) + ((dl . ,rest) + (dl->@table node)) + ((ol (li ,(node) ...) ...) + (format #f "~%@enumerate +~{@item~%~{~a~}~^~%~%~} +@end enumerate~%" node)) + ((ul (li ,(node) ...) ...) + (format #f "~%@itemize +~{@item~%~{~a~}~^~%~%~} +@end itemize~%" node)) + ((sup ,value) + (format #f "@sup{~a}" value)) + ((sub ,value) + (format #f "@sub{~a}" value)) + ((tt ,value) + (format #f "@code{~a}" value)) + ((var ,value) + (format #f "@var{~a}" value)) + (,other other))) + +(define (process-args args) + "Process the arguments to have them in the format expected by Texinfo's +@deffn, stripping extraneous metadata. SRFI is a string such as \"SRFI +64\", used to prefix Texinfo nodes." + (reverse + ;; Convert SHTML to Texinfo, and coalesce brackets into arguments. + (match (fold + (lambda (node tree) ;fhere + (let ((new (string-replace-substring + (html->texinfo node #:deffn? #t) + ;; Strip closing quote (SRFI 160). + ") -> " "-> "))) + (match tree + (("[" . rest) + (cons (string-append "[" new) rest)) + (("[[" . rest) + (cons (string-append "[[" new) rest)) + (("]" . (s . tail)) + (cons* new (string-append s "]") tail)) + ;; SRFI 160. + (((and (? string? s1) (? (cut string-prefix? "]]" <>))) + . (s2 . tail)) + (cons* new (string-trim-both (string-drop s1 2)) + (string-append s2 "]]") tail)) + (((and (? string? s1) (? (cut string-prefix? "]" <>))) + . (s2 . tail)) + (cons* new (string-trim-both (string-drop s1 1)) + (string-append s2 "]") tail)) + ;; SRFI 178. + (((and (? string? s1) (? (cut string-suffix? "[" <>))) + . tail) + (cons* (string-append "[" new) + (string-append (string-trim-both + (string-drop-right s1 1))) + tail)) + (() (cons new tree)) + (other (cons new other))))) + '() + args) + ;; Post-process for a potential trailing ']'. + (("]" . (s . tail)) + (cons (string-append s "]") tail)) + (((and (? string? s1) (? (cut string-prefix? "]]" <>))) . (s2 . tail)) + (cons* (string-trim-both (string-drop s1 2)) + (string-append s2 "]]") tail)) + (((and (? string? s1) (? (cut string-prefix? "]" <>))) . (s2 . tail)) + (cons* (string-trim-both (string-drop s1 1)) + (string-append s2 "]") tail)) + (other other)))) + +(define (sanitize-definitions-node node) + "Recursively sanitize arguments, removing extraneous white space." + (first (foldts + (lambda (seed node) ;fdown + '()) + (lambda (seed kid-seed node) ;fup + (sxml-match node + ((*ENTITY* "additional" "nbsp") + seed) ;discard + ((i ,value) + (cons value seed)) + ((var ,value) + (cons value seed)) + (,other + (cons (reverse kid-seed) seed)))) + (lambda (seed node) ;fhere + (let ((node (if (string? node) + (string-trim-both node) + node))) + (match node + ((or "..." "…") + (cons "@dots{}" seed)) + ((and (? string?) (? string-null?)) + seed) ;discard node + (other + (cons node seed))))) + '() + node))) + +(define* (node->definitions node #:key style-hint) + "Convert NODE, an SHTML expression assumed to contain +procedure definitions, to Texinfo. STYLE-HINT may be used to provide a +hint about which HTML scheme is being used to format the definitions." + ;; This is messy because SRFIs appear to all use their own special + ;; variant to format procedure/syntax definitions. + (let ((node (sanitize-definitions-node node))) + + (define (deffn proc args continued?) + (unless proc + (error "expected a string for 'proc', got" proc)) + (format #f "~%@deffn~:[~;x~] {Scheme Procedure} ~{~a ~^~}~%" + continued? (process-args (cons proc args)))) + + (if (eq? 'srfi-160-code style-hint) + ;; Definitions in SRFI 160 are special in that no grouping + ;; information can be inferred, and a return value and SRFI + ;; annotation are added at the end. + (match node + ((p ('code (and (? string? proc) (? (cut string-prefix? "(" <>))) + . rest) srfi ...) + (string-append (deffn (string-trim-both (string-drop proc 1)) + (append rest srfi) #f) + "\ +@c FIXME: Check deffn category and adjust '@end deffn' location +@end deffn\n\n"))) + + (let loop ((result '()) + (doc '()) + (rest (cdr node)) + (continued? #f) + (proc #f) + (args '())) + + (pk 'called-loop-with-proc proc) + (match rest + (() + (pk 'closing-deffn-on-empty-list) + (pk 'doc doc) + ;; The (p (code (b procedure) args ... doc ...)) style used by + ;; SRFI-64 allows to correctly close the definition. + (let ((result (if (null? doc) ;srfi-64 <code> trailing doc + (cons "\ +@c FIXME: Check deffn category and adjust '@end deffn' location +@end deffn\n\n" result) + (cons* "@end deffn\n" + (html->texinfo `(p ,@(reverse doc))) + result)))) + (cons "\n" (reverse result)))) + ((head . tail) + (pk 'GOT-HEAD head 'REST tail) + (match head + ("(" ;skip opening parenthesis + (loop result doc tail continued? proc args)) ;no-op + ;; SRFI 64 <code> procedure definition node. + (('code "(" ((or 'b 'var) p) . rest) + (pk 'setting-proc p) + ;; Dump accumulated leading documentation, if any. + (let ((result (if (null? doc) + result + (cons (html->texinfo `(p ,@(reverse doc))) + result)))) + (if proc + ;; New continued procedure. + (loop result '() (append rest tail) #t p '()) + (loop result '() (append rest tail) continued? p '())))) + (('code (and (? string? p) + (? (cut string-prefix? "(" <>))) . rest) + ;; SRFI 178 style. + (pk 'setting-proc p) + ;; Dump accumulated leading documentation, if any. + (let ((result (if (null? doc) + result + (cons (html->texinfo `(p ,@(reverse doc))) + result))) + (p (string-trim-right (string-drop p 1)))) + (if proc + ;; New continued procedure. + (loop result '() (append rest tail) #t p '()) + (loop result '() (append rest tail) continued? p '())))) + ((or ('tt ")") ")" ")\n" "])" "])\n" " ...)" " ...)\n" + ('code ")")) ;end + (let* ((args (if (and (string? head) (string-index head #\])) + (cons "]" args) + args)) + (args* (if (and (string? head) + (string-prefix? " ..." head)) + (cons " ..." args) + args))) + (pk 'calling-deffn-in-end) + (loop (cons (deffn proc (reverse args*) continued?) result) + doc + tail + #t ;mark next definitions as continued + #f '()))) + ((and (? string?) (? (cut string-suffix? ") -" <>))) ;end + ;; Definition ended but there is a trailing returned type + ;; annotation (e.g. "-> bitvector") to add as an argument. + (let* ((tail-length (length tail)) + (args `(,@(if (>= tail-length 2) + (list (second tail)) + '()) + "->" + ,(string-drop-right head 3) + ,@args))) + (pk 'calling-deffn-in-end) + (loop (cons (deffn proc (reverse args) continued?) result) + doc + (if (>= tail-length 2) + (drop tail 2) + tail) + #t ;mark next definitions as continued + #f '()))) + (('tt (and (? string?) (? (cut string-prefix? "(" <>)) p)) + (when proc + (error "unexpected proc encountered while already defined" p)) + (let ((p (string-trim-right (string-drop p 1)))) + (pk 'setting-proc p) + (loop result doc tail continued? p '()))) + (('b p) ;for SRFI 64 <pre> + (pk 'setting-proc p) + ;; Dump accumulated leading documentation, if any. + (let ((result (if (null? doc) + result + (cons (html->texinfo `(p ,@doc)) result)))) + (if proc + (loop result '() tail #t p '()) ;new continued procedure + (loop result '() tail continued? p '())))) + (('br) ;continued? + (pk 'got-br) + (if (eq? 'srfi-64-code style-hint) + ;; For SRFI 64 <code> style definitions, this should be a + ;; no-op. + (loop result doc tail continued? proc args) + (loop result doc tail #t #f '()))) + (other + (pk 'got-arg other) + (if proc + (loop result doc tail continued? proc (cons other args)) + ;; The argument was seen after proc definition was + ;; closed; preserve it as documentation. + (begin (pk 'keeping-as-doc other) + (loop result (cons other doc) + tail continued? proc args))))))))))) + +(define (snarfi . args) + "snarf documentation from the first argument, a srfi html spec source +file assumed to follow the basic structure of the SRFI template (see: +https://srfi.schemers.org/srfi-template.html)." + (match args + (((or "-h" "--help")) + (usage)) + ((file) + (let* ((shtml (call-with-input-file file + (cut html->shtml <> #:strict? #t))) + (title (first ((sxpath '(// html head title *text*)) shtml))) + (srfi (and=> (string-match "SRFI [[:digit:]]+" title) + match:substring)) + (body (first ((sxpath '(// html body)) shtml))) + (content-level (or (node->level + (find-heading-by-text "Abstract" body)) + (error "could not locate 'Abstract' section"))) + (body* (if (> content-level 1) + (splice-children (pk 'splicing-with (string->symbol + (format #f "h~a" (1- content-level)))) + body) + body)) + (content (let loop ((result '()) + (rest body*) + (entered? #f)) + (match rest + (() (reverse result)) + ((node . tail) + (if (section-of-interest? node entered?) + (loop (cons node result) tail #t) + (loop result tail #f)))))) + (content* (splice-children '(h1 "Specification") content)) + ;; Sanitize the SHTML to remove spurious "\r\n" or "\n" + ;; strings, which would otherwise prevent some of the + ;; template patterns to match. Also make some safe + ;; substitutions. + (scontent (foldts + (lambda (seed node) ; fdown + '()) + (lambda (seed kid-seed node) ; fup + (match node + (('@ ('id id)) + ;; IDs may appear anywhere and break pattern + ;; matching; skip them. + seed) + (other (cons (reverse kid-seed) seed)))) + (lambda (seed tree) ; fhere + (let ((tree (if (string? tree) + ;; Use @dots{} for ellipses. + (string-replace-substring + ;; Double all '@'s, for Texinfo. + (string-replace-substring + tree "@" "@@") + "..." "@dots{}") + tree))) + (match tree + ((and (? string?) + (? (cut string-match + "^(\\.?\r?\n|[[:space:]]+)$" <>))) + seed) ;skip white space atoms + (other (cons tree seed))))) + '() + content*)) + (_ (begin (pk 'scontent) (pretty-print scontent))) + (templates `((*any* . ,(cut html->texinfo <> #:srfi srfi)))) + (texi (apply-templates scontent templates))) + (pretty-print (string-replace-substring (string-append "\ +@node " srfi " +@subsection " title " +@cindex " srfi " + +" (string-join texi "") +" +") "\r\n" "\n") + #:display? #t))) + (_ (usage)))) + +(define main snarfi) + +;;; snarfi ends here -- 2.41.0