Re: [PATCH v3] docs/match: pattern matcher example makeover

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Maxime Devos <maximedevos@telenet.be>
To: Blake Shaw <blake@reproduciblemedia.com>, guile-devel@gnu.org
Subject: Re: [PATCH v3] docs/match: pattern matcher example makeover
Date: Wed, 1 Feb 2023 17:40:23 +0100	[thread overview]
Message-ID: <9725d8d9-faf2-3306-94ae-209086aa5a82@telenet.be> (raw)
In-Reply-To: <20230201130930.6624-1-blake@reproduciblemedia.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 22572 bytes --]



On 01-02-2023 14:09, Blake Shaw wrote:
>  [...]
> -
> style: clean-up newlines
> --
> It appears that while the PDF needs additional newlines
> to be presentable, these appear to have a negative effect
> on the presentation of the texinfo doc.
> 
> I don't know how to fix this, but from looking at the PDF,
> it appears that the strategy until now has been to privilege
> texinfo at the expense of PDF readability (the PDF is more
> or less "squished together")
> 
> So in that regard, these edits make my past edits more in sync
> with past Guile docs.

IIRC, Texinfo has a @iftex @endif construct or such. You could use this 
to define a @pdf-newline macro, to only insert newlines in the PDF (TeX 
is used for the PDF).
> -
> examples: replace with didactic ex. that can copied & pasted
> --
> The existing example can't be copied and pasted.
> 
> This example both fixes the past one and improves on its relation
> to the text.
> 
> -
> style: switch to "Indiana style", bracketing lets and clauses
> --
> After spending much time looking at the examples in black & white
> to edit the texinfo document, it occurred to me just how much the
> brackets improve legibility. Therefore, I have decided to adopt
> the "Indiana" style of using brackets, used by Kent Dybvig, Dan
> Friedman and Will Byrd at Indiana University.
> 
> Currently the docs use this style in some places but not in others.
> 
> Considering some are color blind, and that few will have rigged
> their texinfo configuration to use rainbow-delimiters in the while
> reading documentation, I think this should be considered a general
> accessibility improvement.

IME, (( )) is quite readable (and I don't use rainbow delimiters).
That might largely be 'due to experience', though.  While I would
expect ([ ] [ ]) to be unconventional for many Guilers, it should be 
readable too though, so I suppose it could be good to just change the 
convention, then.

You are currently making the manual more inconsistent by using this (for 
Guile) mostly non-standard notation though; IIRC the manual mostly does 
(( )) and not ([ ]).  Yet, in the review of the v1, you mentioned

> No, I'm not, I'm being totally boring and normal in this regard because collectively authored documentation is something you should never adopt non-standard writing notation in the course of authoring, just to one up someone on a mailing list
> 
> To be honest, it's this kind of attitude that has resulted in the current docs that so many people find utterly incomprehensible. The core point of my talk that what makes Info Guile so hard to read is the lack of stylistic consistency. Editors and editing exist for a very good reason.

, which is very much against non-standard notation and for consistency. 
As such, I propose:

   a) Before (or after) this patch, change everything in the manual to
      "Indiana style", for consistency.  If you go for 'after this
      patch', I mean immediately afterwards, because Guile contributors
      tend to come and go, and delaying things tends to become never
      doing things.

   b) or: do it in non-Indiana style (likely not the option you will
      take, but it would be more stylistically consistent than the
      current version of the patch ...)

   c) or: don't adjust everything in the manual to Indiana style yet,
      but also make it a rule that the manual (and Guile code in Guile
      proper, I guess) does Indiana style, and that all current
      deviations from Indiana style are old style to be updated in the
      future.

      If this were Guix, you could make this a rule by adding it
      to the "Contributing" section.  Guile does not have appear to have
      such a section, but "1.8 Typograhical Conventions" might be a good
      place.

Additionally, changing the parenthesis convention in Guile is not just a 
change to the 'match' documentation, but the subject line only mentions 
'match'.  While Indiana styles seems a good thing to me at first sight 
now you mention the benefits, it needs a separate e-mail thread such 
that people interested in ()/[] stuff but not in 'match' stuff will have 
an opportunity to respond.

> indentation: make consistent according to rule defined below
> 
> If a new paragraph opens onto a new topic, it should naturally
> indent (i.e, no indentation markup is required)
> 
> If a new paragraph is a continuation of the current subject,
> the markup @noident should be applied
> 
> markup: replace @var with @code unless @var is a @defn argument
> 
> The way that it renders in texinfo means that it renders @vars
> in uppercase, the way that is conventionally done for definition
> arguments.

I'm not too familiar with Texinfo PDF output but I'll take your word for 
it.  However, this is not the case at least for HTML output, as you can 
see at 
<https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040var.html>, 
for HTML documentation it remains lowercase.

> Therefore I've changed all @vars to @code unless @var is a @defn
> argument

I'm missing what you mean with the 'Therefore'.  How does this relate to 
your previous paragraph (I don't get what your point is about 
'definition arguments')?  Do you mean that uppercase @var bad and that 
it should be lowercase instead?  If so, it would be better to modify 
Texinfo itself to let @var not change the case, then every manual in 
Texinfo would benefit instead of only the Guile manual.

Also, you could ask the Texinfo people if there is a reason for 
uppercase @var; maybe they determined that it is more readable to more 
people (I'm just speculating, I don't know the reason)? -- Presumably 
there's some good reason (or maybe not, I don't know, but you could ask 
them first).

Otherwise, if you make this Guile-specific change, you would create 
stylistical inconsistencies between projects using Texinfo.  More 
specifically, you are creating stylistical inconsisencies between GNU 
projects.

Additionally, you are not merely removing the uppercasing thing, you are 
also removing the 'slanted' thing -- the result of @var is slanted 
typewriter, the result of @code is merely typewriter, which makes it 
slightly harder to distinguish metavariables from other code.

You are also only making this stylistical change in the documentation of 
'match'; the remainder of the manual still has the old @var.  If you 
change tings, it would be better to change things for the whole manual. 
I think you can do this by redefining the @var macro to whatever you 
want in the prelude (at least that can be done in TeX).

> -
> remove: paragraph that referred to a since removed example
> --
> 
> fix: uncomment @xref{sxml-match}
> ---
>   doc/ref/match.texi | 252 ++++++++++++++++++++++++++++++---------------
>   1 file changed, 167 insertions(+), 85 deletions(-)
> 
> diff --git a/doc/ref/match.texi b/doc/ref/match.texi
> index f5ea43118..4e657b976 100644
> --- a/doc/ref/match.texi
> +++ b/doc/ref/match.texi
> @@ -23,71 +23,142 @@ The @code{(ice-9 match)} module provides a @dfn{pattern matcher},
>   written by Alex Shinn, and compatible with Andrew K. Wright's pattern
>   matcher found in many Scheme implementations.
>   
> -@cindex pattern variable
> -A pattern matcher can match an object against several patterns and
> -extract the elements that make it up.  Patterns can represent any Scheme
> -object: lists, strings, symbols, records, etc.  They can optionally contain
> -@dfn{pattern variables}.  When a matching pattern is found, an
> -expression associated with the pattern is evaluated, optionally with all
> -pattern variables bound to the corresponding elements of the object:
> +@noindent A pattern matcher does precisely what the name implies: it
> +matches some arbitrary pattern, and returns some result accordingly.

Again, as I mentioned previously, in the general case it matches 
arbitrary patterns (plural) and returns results (plural) -- the 'match' 
construct is not as limited as you are implying it to be here.

>   
>   @example
> -(let ((l '(hello (world))))
> -  (match l           ;; <- the input object
> -    (('hello (who))  ;; <- the pattern
> -     who)))          ;; <- the expression evaluated upon matching
> -@result{} world
> +(define (english-base-ten->number name)
> +  (match name
> +    ('zero   0)
> +    ('one    1)
> +    ('two    2)
> +    ('three  3)
> +    ('four   4)
> +    ('five   5)
> +    ('six    6)
> +    ('seven  7)
> +    ('eight  8)
> +    ('nine   9)))
> +
> +(english-base-ten->number 'six)
> +@result{} 6

My previous comment still applies:

> This is a suboptimal example; this would be better done with 'case'.
> I propose replacing it with another example, or adding a note that one would normally use 'case' for this. 

still applies.  What is the reason for not doing something akin to that?

> +
> +(apply + (map english-base-ten->number '(one two three four)))
> +@result{} 10
>   @end example
>   
> -In this example, list @var{l} matches the pattern @code{('hello (who))},
> -because it is a two-element list whose first element is the symbol
> -@code{hello} and whose second element is a one-element list.  Here
> -@var{who} is a pattern variable.  @code{match}, the pattern matcher,
> -locally binds @var{who} to the value contained in this one-element
> -list---i.e., the symbol @code{world}.  An error would be raised if
> -@var{l} did not match the pattern.
> +@page
> +@cindex pattern variable
> +@noindent Pattern matchers may contain @dfn{pattern variables},
> +local bindings to all elements that match a pattern.

'Pattern matchers' -> 'pattern' would be more precise here, as it more 
precisely states _where_ the pattern variable is.  E.g. if you say 
'pattern', it's certainly not the 'ns' in (match ns ...).  If you say 
'pattern matcher' (*), then 'pattern matcher' might mean 'match' itself, 
or (match ns ...); the former does not contain a pattern variable, the 
latter likely does but less is stated about _where_ the pattern variable 
is, purely going by your sentence it moght be the 'match' which is 
incorrect.

(*) While the original text defined 'pattern matcher=match', that part 
doesn't contain any pattern variables, and in your new text the notion 
is of 'pattern matcher' is not exactly defined but rather described, and 
not as some kind of precise characterisation.

>   
> -The same object can be matched against a simpler pattern:
> +@example
> +(let re ([ns '(one two three four 9)] [total 0])

The Scheme convention would to be to write 'loop' instead of 're' when 
using named-let, and something like 'rest' instead of 'ns'.  The exact 
word for the loop argument varies a lot, but two letters that don't 
appear to mean anything are to be avoided.

> +  (match ns
> +    [(e) (+ total (english-base-ten->number e))]
> +    [(e . es)
> +     (re es (+ total (english-base-ten->number e)))]))

I tried running your example, and it doesn't work:

(define (english-base-ten->number name)
   (match name
     ('zero   0)
     ('one    1)
     ('two    2)
     ('three  3)
     ('four   4)
     ('five   5)
     ('six    6)
     ('seven  7)
     ('eight  8)
     ('nine   9)))
(let re ([ns '(one two three four 9)] [total 0])
   (match ns
     [(e) (+ total (english-base-ten->number e))]
     [(e . es)
      (re es (+ total (english-base-ten->number e)))]))
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" 9)'.

Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.

I think you need to replace (one two three four 9) by (one two three 
four nine).  As you mentioned yourself (in other words), examples in the 
manual should actually work as-is.

> -@example
> -(let ((l '(hello (world))))
> -  (match l
> -    ((x y)
> -     (values x y))))
> -@result{} hello
> -@result{} (world)
> +@result{} 19
>   @end example
>   
> -Here pattern @code{(x y)} matches any two-element list, regardless of
> -the types of these elements.  Pattern variables @var{x} and @var{y} are
> -bound to, respectively, the first and second element of @var{l}.
> -
> -Patterns can be composed, and nested.  For instance, @code{...}
> +@noindent In this example, the list @code{ns} matches the pattern
> +@code{(e . es)}, where the pattern variable @code{e} corresponds
> +to the metaphoical "car" of @code{ns} and the pattern variable @code{es}
> +corresponds to the "cdr" of @code{ns}.

Typo: metaphoical -> metaphorical.

Also: metaphorical -> literal. -- e is literally the car of ns (or 
‘corresponds to the car of ns in a literal way’ if you go for a 
variable/value distinction); there is nothing figurative here.  I would 
just drop the metaphorical/literal word.  Also, "car" -> `car' and "cdr" 
-> `cdr' -- the manual currently consistently uses the quotation style 
‘car’ / ‘pair?’, ‘SCM’, ..., not "car". For example, in 5.4.1 Dynamic 
Types, there is the paragraph:

> In order to implement standard Scheme functions like ‘pair?’ and
> ‘string?’ and provide garbage collection, the representation of every
> value must contain enough information to accurately determine its type
> at run time.

'Function' -> 'Procedure'. You are introducing a stylistical 
inconsistency here. In Guile, the C things are called 'Functions', and 
the Scheme things are called 'Procedures'.  To some degree, this ‘in 
Scheme it's called a procedure’ also holds for other Schemes IIUC.

Actually, while some GC do require runtime type information (RTI), RTI 
is not needed for garbage collection.  Guix uses Boehm-GC for garbage 
collection.  Being a conservative garbage collector, it doesn't need any 
type information.  It works a little better if you do give it some type 
information, and Guile does give it some information in some cases, but 
it's not required.

This information is therefore incorrect and needs to be removed, but the 
bits about predicates seems fine to me.

> Often, Scheme systems also use this information to
> determine whether a program has attempted to apply an operation to an
> inappropriately typed value (such as taking the ‘car’ of a string).

IIUC, in Texinfo, we write `stuff' instead of ‘stuff’, and it will get 
turned in ‘stuff’.  I dunno why this is still done in the Guile manual 
as UTF-8 is an established thing, but I have used ‘’ in Guix stuff in 
the past and people changed into `'.

Additionally, doing "git grep -F "car" doc/ref/*.texi", it appears that 
the manual doesn't actually quote car and cdr -- instead it writes car 
and cdr unquoted, or writes @code{car} / @code{cdr} which happens to be 
turned into a quoted ‘car’ / ‘cdr’ in the .info documentation by Texinfo.

I think you can guess what I would be saying about stylistic consistency 
here.

> +
> +@noindent A tail call @code{re} is then initiated


‘A tail call @code{re} is then initiated’ -> ‘A tail call to @code{re} 
is the initiated’ -- @code{re} is a variable reference, not a tail call. 
The tail call is @code{(re es (+ to total ...))}.

More simply, you could write ‘The procedure @var{re} is then tail-called’.

> +and we "cdr" down the
> +list by recurring on the tail @code{es}, applying our matcher
> +@code{english-base-ten->number} to each element of @code{ns} until
> +only a single element @code{(e)} remains, causing the @code{total}
> +to be computed.  In modern Scheme programming it is common to use
> +@code{match} in place of the more verbose but familiar combination
> +of @code{cond}, @code{car} and @code{cdr}, so it's important to
> +understand how these idioms translate.
> +
> +Patterns can be composed and nested.  For instance, @code{...}
>   (ellipsis) means that the previous pattern may be matched zero or more
>   times in a list:
>   @example
> -(match lst
> -  (((heads tails ...) ...)
> -   heads))
> +(match '((a.0 b.0 c.0 ((1.0 2.0 3.0) x.0 y.0 z.0))
> +         (a.1 b.1 c.1 ((1.1 2.1 3.1) x.1 y.1 z.1)))
> +  [((heads ... ((tails ...) . rest)) ...)
> +   (begin
> +    (format #t "heads: ~a ~%" heads)
> +    (format #t "tails: ~a ~%" tails)
> +    (format #t "rest:  ~a ~%" rest))])
> +@result{}
> +heads: ((a.0 b.0 c.0) (a.1 b.1 c.1))
> +tails: ((1.0 2.0 3.0) (1.1 2.1 3.1))
> +rest:  ((x.0 y.0 z.0) (x.1 y.1 z.1))
>   @end example
>   
> -@noindent
> -This expression returns the first element of each list within @var{lst}.
> -For proper lists of proper lists, it is equivalent to @code{(map car
> -lst)}.  However, it performs additional checks to make sure that
> -@var{lst} and the lists therein are proper lists, as prescribed by the
> -pattern, raising an error if they are not.
> -
> -Compared to hand-written code, pattern matching noticeably improves
> -clarity and conciseness---no need to resort to series of @code{car} and
> -@code{cdr} calls when matching lists, for instance.  It also improves
> -robustness, by making sure the input @emph{completely} matches the
> -pattern---conversely, hand-written code often trades robustness for
> -conciseness.  And of course, @code{match} is a macro, and the code it
> -expands to is just as efficient as equivalent hand-written code.
> -
> -The pattern matcher is defined as follows:
> +@noindent A pattern matcher can match an object against several
> +patterns and extract the elements that make it up.
> +
> +@example
> +(match '((l1 . r1) (l2 . r2) (l3 . r3))
> +  [((left . right) ...)
> +   (list left right)])
> +
> +@result{} ((l1 l2 l3) (r1 r2 r3))
> +@end example
> +
> +@example
> +(match '((1 . (a . b)) (2 . (c . d)) (3 . (e . f)))
> +  [((key . (left . right)) ...)
> +   (fold-right acons '() key right )])
> +
> +@result{} ((1 . b) (2 . d) (3 . f))
> +@end example
> +
> +@example
> +(match '(((a b c) e f g) 1 2 3)
> +  [(((head ...) . rest) tails ...)
> +   (acons tails head rest )])
> +
> +@result {} (((1 2 3) a b c) e f g)
> +@end example
> +
> +Patterns can represent any Scheme object: lists, strings, symbols,
> +records, etc.
> +
> +@noindent When a matching pattern is found, an expression is evaluated
> +with pattern variables bound to the corresponding elements of the object.
> +
> +@example
> +(let re ([m #(a "b" c "d" e "f" g)])
> +   (match m
> +     [(or (e) #(e)) e]
> +     [(or #(e1 e2 es ...)
> +	   (e1 e2 es ...))
> +      (cons (cons e1 e2)
> + 	    (re es))]))
> +
> +@result{} ((a . "b") (c . "d") (e . "f") . g)
> +@end example
> +
> +@example
> +(let re ([m '(a b c d e f g h i)])
> +   (match m
> +     [(e) e]
> +     [(e1 e2 es ...)
> +      (acons e1 e2 (re es))]))
> +
> +@result{} ((a . b) (c . d) (e . f) (g . h) . i)
> +@end example
> +
> +@noindent Compared to hand-written code, pattern matching noticeably
> +improves clarity and conciseness---no need to resort to series of
> +@code{car} and @code{cdr} calls when matching lists, for instance.
> +It also improves robustness, by making sure the input @emph{completely}
> +matches the pattern---conversely, hand-written code often trades
> +robustness for conciseness.  And of course, @code{match} is a macro,
> +and the code it expands to is just as efficient as equivalent
> +hand-written code.
> +
> +@noindent We define @code{match} as follows: @*

Why did you change this from

      The pattern matcher is defined as follows:

? While the 'we' / 'our' / ... construct is pretty convenient, IMO it is 
better avoided as long as the avoidance doesn't lead to awkward 
constructions.

>   @deffn {Scheme Syntax} match exp clause1 clause2 @dots{}
>   Match object @var{exp} against the patterns in @var{clause1}
> @@ -96,9 +167,9 @@ value produced by the first matching clause.  If no clause matches,
>   throw an exception with key @code{match-error}.
>   
>   Each clause has the form @code{(pattern body1 body2 @dots{})}.  Each
> -@var{pattern} must follow the syntax described below.  Each body is an
> +@code{pattern} must follow the syntax described below.  Each body is an
>   arbitrary Scheme expression, possibly referring to pattern variables of
> -@var{pattern}.
> +@code{pattern}.
>   @end deffn
>   
>   @c FIXME: Document other forms:
> @@ -114,7 +185,7 @@ arbitrary Scheme expression, possibly referring to pattern variables of
>   @c
>   @c clause ::= (pat body) | (pat => exp)
>   
> -The syntax and interpretation of patterns is as follows:
> +@noindent @* The pattern language is specified as follows: @*


The stuff below still defines the interpretation, not only the 
language/grammar.  The change 'syntax -> language' seems fine to me, but 
why remove 'interpretation'?

Additionally, I personally would go for interpretation->semantics, but 
maybe that's too obscure for a general audience.

 > [...]>   @deffn {Scheme Syntax} match-lambda* clause1 clause2 @dots{}
> @@ -264,11 +335,10 @@ and can also be used for recursive functions which match on their
>   arguments as in @code{match-lambda*}.
>   
>   @example
> -(match-let (((x y) (list 1 2))
> -            ((a b) (list 3 4)))
> -  (list a b x y))
> -@result{}
> -(3 4 1 2)
> +(match-let ([(x y ...) (list 1 2 3)]
> +            [(a b ...) (list 3 4 5)])
> +  (list x a y b))
> +@result{} (1 3 (2 3) (4 5))
>   @end example
>   @end deffn
>   
> @@ -287,22 +357,34 @@ Similar to @code{match-let}, but analogously to @code{let*}, match and
>   bind the variables in sequence, with preceding match variables in scope.
>   
>   @example
> -(match-let* (((x y) (list 1 2))
> -             ((a b) (list x 4)))
> -  (list a b x y))
> +(match-let* ([(x . y) (list 1 2 3)]
> +             [(a . b) (list x 4 y)])
> +  (list a b))
>   @equiv{}

The old example was simpler and still fully demonstrated 'match-let*', 
why the change (besides [])?

>[...]
>   
> +@example
> +(define wrap '(((((unnest arbitrary nestings))))))
> +
> +(let unwrap ([peel wrap])
> +  (match-let* ([([core ...]) peel]
> +	       [(wrapper ...) core])
> +    (if (> (length wrapper) 1)
> +	wrapper
> +	(unwrap wrapper))))
> +
> +@result{} (unnest arbitrary nestings)
> +@end example
> +

(Not saying anything about this example TBC.)

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

next prev parent reply	other threads:[~2023-02-01 16:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01 13:09 [PATCH v3] docs/match: pattern matcher example makeover Blake Shaw
2023-02-01 16:40 ` Maxime Devos [this message]
2023-02-02 17:47   ` David Pirotte
2023-02-03 10:05     ` Blake Shaw
2023-02-03 12:05       ` Arun Isaac
2023-02-03 13:10         ` Blake Shaw
2023-02-03 13:43     ` Josselin Poiret
2023-02-03 14:14       ` Blake Shaw

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9725d8d9-faf2-3306-94ae-209086aa5a82@telenet.be \
    --to=maximedevos@telenet.be \
    --cc=blake@reproduciblemedia.com \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).