bug#55815: [PATCH] bindat: Improve str, strz documentation

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#55815: [PATCH] bindat: Improve str, strz documentation
@ 2022-06-06  2:22 Richard Hansen
  2022-06-06 10:59 ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Hansen @ 2022-06-06  2:22 UTC (permalink / raw)
  To: 55815; +Cc: monnier

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

X-Debbugs-CC: monnier@iro.umontreal.ca

* doc/lispref/processes.texi (Bindat Types): Expand the documentation
for the `str' and `strz' types to clarify expectations and explain
edge case behavior.
---
  doc/lispref/processes.texi | 26 +++++++++++++++++++++++---
  1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 668a577870..68621d32a8 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3479,11 +3479,31 @@ Bindat Types
  @var{bitlen} has to be a multiple of 8.
  
  @item str @var{len}
-String of bytes of length @var{len}.
+String of length @var{len}.  When packing, the first @var{len} bytes
+of the input string are copied to the packed output.  If the input
+string is shorter than @var{len}, the remaining bytes are set to zero.
+The input string must be unibyte (@pxref{Text Representations}).  When
+unpacking, any zero bytes in the packed input string will appear in
+the unpacked output.
  
  @item strz &optional @var{len}
-Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
-field with length @var{len}.
+If @var{len} is not provided: Variable-length null-terminated string.
+When packing, the entire input string is copied to the packed output
+followed by a zero byte (null terminator).  The input string must be
+unibyte (@pxref{Text Representations}) and must not contain any zero
+bytes.  When unpacking, the resulting string contains all bytes up to
+(but excluding) the null terminator.
+
+If @var{len} is provided: @code{strz} behaves the same as @code{str}
+with one difference. When unpacking, the first zero byte (null
+terminator) encountered in the packed string and all subsequent bytes
+are excluded from the unpacked result.
+
+@quotation Caution
+The packed output will not be null-terminated unless the input string
+is shorter than @var{len} or it contains a zero byte within the first
+@var{len} bytes.
+@end quotation
  
  @item vec @var{len} [@var{type}]
  Vector of @var{len} elements.  The type of the elements is given by
-- 
2.36.1

[-- Attachment #2: 0001-bindat-Improve-str-strz-documentation.patch --]
[-- Type: text/x-patch, Size: 2260 bytes --]

From 329ec3650c0ccc8a01bd6196805d5136218ef391 Mon Sep 17 00:00:00 2001
From: Richard Hansen <rhansen@rhansen.org>
Date: Thu, 2 Jun 2022 21:05:40 -0400
Subject: [PATCH] bindat: Improve str, strz documentation

* doc/lispref/processes.texi (Bindat Types): Expand the documentation
for the `str' and `strz' types to clarify expectations and explain
edge case behavior.
---
 doc/lispref/processes.texi | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 668a577870..68621d32a8 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3479,11 +3479,31 @@ Bindat Types
 @var{bitlen} has to be a multiple of 8.
 
 @item str @var{len}
-String of bytes of length @var{len}.
+String of length @var{len}.  When packing, the first @var{len} bytes
+of the input string are copied to the packed output.  If the input
+string is shorter than @var{len}, the remaining bytes are set to zero.
+The input string must be unibyte (@pxref{Text Representations}).  When
+unpacking, any zero bytes in the packed input string will appear in
+the unpacked output.
 
 @item strz &optional @var{len}
-Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
-field with length @var{len}.
+If @var{len} is not provided: Variable-length null-terminated string.
+When packing, the entire input string is copied to the packed output
+followed by a zero byte (null terminator).  The input string must be
+unibyte (@pxref{Text Representations}) and must not contain any zero
+bytes.  When unpacking, the resulting string contains all bytes up to
+(but excluding) the null terminator.
+
+If @var{len} is provided: @code{strz} behaves the same as @code{str}
+with one difference. When unpacking, the first zero byte (null
+terminator) encountered in the packed string and all subsequent bytes
+are excluded from the unpacked result.
+
+@quotation Caution
+The packed output will not be null-terminated unless the input string
+is shorter than @var{len} or it contains a zero byte within the first
+@var{len} bytes.
+@end quotation
 
 @item vec @var{len} [@var{type}]
 Vector of @var{len} elements.  The type of the elements is given by
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#55815: [PATCH] bindat: Improve str, strz documentation
  2022-06-06  2:22 bug#55815: [PATCH] bindat: Improve str, strz documentation Richard Hansen
@ 2022-06-06 10:59 ` Eli Zaretskii
  2022-06-06 23:31   ` Richard Hansen
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2022-06-06 10:59 UTC (permalink / raw)
  To: Richard Hansen; +Cc: 55815, monnier

> Cc: monnier@iro.umontreal.ca
> Date: Sun, 5 Jun 2022 22:22:01 -0400
> From: Richard Hansen <rhansen@rhansen.org>
> 
>   @item str @var{len}
> -String of bytes of length @var{len}.
> +String of length @var{len}.

I think it is better to say

  Unibyte string that is @var{len} bytes long.

>   @item strz &optional @var{len}
> -Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
> -field with length @var{len}.
> +If @var{len} is not provided: Variable-length null-terminated string.

Same here: it is better to mention the unibyte-ness up front, since
it's important.

> +If @var{len} is provided: @code{strz} behaves the same as @code{str}
> +with one difference. When unpacking, the first zero byte (null
                      ^^
Our conventions are to leave two spaces between sentences.

Also, for consistency, I suggest to use "null byte" everywhere, to
avoid potential confusion of non-native English speakers.

Thanks.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#55815: [PATCH] bindat: Improve str, strz documentation
  2022-06-06 10:59 ` Eli Zaretskii
@ 2022-06-06 23:31   ` Richard Hansen
  2022-06-07 16:30     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Hansen @ 2022-06-06 23:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55815, monnier


[-- Attachment #1.1.1: Type: text/plain, Size: 1023 bytes --]

Thanks for the review.  A new revision is attached.

On 6/6/22 06:59, Eli Zaretskii wrote:
> I think it is better to say
> 
>    Unibyte string that is @var{len} bytes long.

Done.  I may have gone overboard though -- I did so because there are three representations that matter:

   1. The input string to be packed.
   2. The packed output.
   3. The result of unpacking.

Right now all three of those are unibyte, but in a future patch I plan on changing the first to accept unibyte-convertible multibyte input strings.

> Our conventions are to leave two spaces between sentences.

Done.

> Also, for consistency, I suggest to use "null byte" everywhere, to 
> avoid potential confusion of non-native English speakers.

Done.

I also fixed a flaw in the previous revision: packing to a fixed-length field doesn't actually write a null byte if the input is shorter than the field. This only matters if the caller provided a pre-allocated string that doesn't have null bytes.

Thanks,
Richard

[-- Attachment #1.1.2: v2-0001-bindat-Improve-str-strz-documentation.patch --]
[-- Type: text/x-patch, Size: 2718 bytes --]

From acc7717dee85dbaac66c3ddc65ea9dfe62bbdc23 Mon Sep 17 00:00:00 2001
From: Richard Hansen <rhansen@rhansen.org>
Date: Thu, 2 Jun 2022 21:05:40 -0400
Subject: [PATCH v2] bindat: Improve str, strz documentation

* doc/lispref/processes.texi (Bindat Types): Expand the documentation
for the `str' and `strz' types to clarify expectations and explain
edge case behavior.
---
 doc/lispref/processes.texi | 37 ++++++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 668a577870..1f6a9e3a7d 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3479,11 +3479,42 @@ Bindat Types
 @var{bitlen} has to be a multiple of 8.
 
 @item str @var{len}
-String of bytes of length @var{len}.
+Unibyte string of length @var{len}.  When packing, the first @var{len}
+bytes of the input string are copied to the packed output.  If the
+input string is shorter than @var{len}, the remaining bytes will be
+null (zero) unless a pre-allocated string was provided to
+@code{bindat-pack}, in which case the remaining bytes are left
+unmodified.  The input string must be unibyte (@pxref{Text
+Representations}).  When unpacking, any null bytes in the packed input
+string will appear in the unpacked unibyte output.
 
 @item strz &optional @var{len}
-Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
-field with length @var{len}.
+If @var{len} is not provided: Variable-length null-terminated unibyte
+string.  When packing, the entire input string is copied to the packed
+output followed by a null byte.  The length of the packed output is
+the length of the input string plus one (for the added null byte).
+The input string must be unibyte (@pxref{Text Representations}) and
+must not contain any null bytes.  When unpacking, the resulting
+unibyte string contains all bytes up to (but excluding) the null byte.
+
+If @var{len} is provided: @code{strz} behaves the same as @code{str}
+with one difference: When unpacking, the first null byte encountered
+in the packed string and all subsequent bytes are excluded from the
+unpacked result.
+
+@quotation Caution
+The packed output will not be null-terminated unless one of the
+following is true:
+@itemize
+@item
+The input string is shorter than @var{len} and either no pre-allocated
+string was provided to @code{bindat-pack} or the appropriate byte in
+the pre-allocated string was already null.
+@item
+The input string contains a null byte within the first @var{len}
+bytes.
+@end itemize
+@end quotation
 
 @item vec @var{len} [@var{type}]
 Vector of @var{len} elements.  The type of the elements is given by
-- 
2.36.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#55815: [PATCH] bindat: Improve str, strz documentation
  2022-06-06 23:31   ` Richard Hansen
@ 2022-06-07 16:30     ` Eli Zaretskii
  2022-06-07 18:17       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-06-08  4:16       ` Richard Hansen
  0 siblings, 2 replies; 7+ messages in thread
From: Eli Zaretskii @ 2022-06-07 16:30 UTC (permalink / raw)
  To: Richard Hansen; +Cc: 55815, monnier

> Date: Mon, 6 Jun 2022 19:31:35 -0400
> Cc: 55815@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Richard Hansen <rhansen@rhansen.org>
> 
> > I think it is better to say
> > 
> >    Unibyte string that is @var{len} bytes long.
> 
> Done.  I may have gone overboard though -- I did so because there are three representations that matter:
> 
>    1. The input string to be packed.
>    2. The packed output.
>    3. The result of unpacking.
> 
> Right now all three of those are unibyte, but in a future patch I plan on changing the first to accept unibyte-convertible multibyte input strings.

Not sure I understand: what do you mean by "unibyte-convertible
multibyte input strings", and how do they differ from the other kinds?

In any case, you say "unibyte input string" too many time, and that's
unnecessary.  One example:

> +Unibyte string of length @var{len}.  When packing, the first @var{len}
> +bytes of the input string are copied to the packed output.  If the
> +input string is shorter than @var{len}, the remaining bytes will be
> +null (zero) unless a pre-allocated string was provided to
> +@code{bindat-pack}, in which case the remaining bytes are left
> +unmodified.  The input string must be unibyte (@pxref{Text

Why do we need to say the input must be unibyte when we already said
that up front?

(There's more of this redundancy in the patch.)

Stefan, any further comments?





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#55815: [PATCH] bindat: Improve str, strz documentation
  2022-06-07 16:30     ` Eli Zaretskii
@ 2022-06-07 18:17       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-06-08  4:16       ` Richard Hansen
  1 sibling, 0 replies; 7+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-06-07 18:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55815, Richard Hansen

> Stefan, any further comments?

Nothign specific, no.  The patch sounds good (it's important to clarify
what kind of "zero-terminated strings" we're supporting).


        Stefan






^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#55815: [PATCH] bindat: Improve str, strz documentation
  2022-06-07 16:30     ` Eli Zaretskii
  2022-06-07 18:17       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-06-08  4:16       ` Richard Hansen
  2022-06-09  7:30         ` Eli Zaretskii
  1 sibling, 1 reply; 7+ messages in thread
From: Richard Hansen @ 2022-06-08  4:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 55815, monnier


[-- Attachment #1.1.1: Type: text/plain, Size: 575 bytes --]

On 6/7/22 12:30, Eli Zaretskii wrote:
>> Right now all three of those are unibyte, but in a future patch I 
>> plan on changing the first to accept unibyte-convertible multibyte 
>> input strings.
> 
> Not sure I understand: what do you mean by "unibyte-convertible 
> multibyte input strings", and how do they differ from the other kinds?

I mean multibyte strings that do not contain characters that will cause string-to-unibyte to signal an error.

> In any case, you say "unibyte input string" too many time, and that's 
> unnecessary.

Done, see attached.

[-- Attachment #1.1.2: v3-0001-bindat-Improve-str-strz-documentation.patch --]
[-- Type: text/x-patch, Size: 2645 bytes --]

From 089b0e54e868e0c28b262d6b09a2d6af322ea31e Mon Sep 17 00:00:00 2001
From: Richard Hansen <rhansen@rhansen.org>
Date: Thu, 2 Jun 2022 21:05:40 -0400
Subject: [PATCH v3] bindat: Improve str, strz documentation

* doc/lispref/processes.texi (Bindat Types): Expand the documentation
for the `str' and `strz' types to clarify expectations and explain
edge case behavior.
---
 doc/lispref/processes.texi | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 668a577870..a93a617c8a 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3479,11 +3479,41 @@ Bindat Types
 @var{bitlen} has to be a multiple of 8.
 
 @item str @var{len}
-String of bytes of length @var{len}.
+Unibyte string (@pxref{Text Representations}) of length @var{len}.
+When packing, the first @var{len} bytes of the input string are copied
+to the packed output.  If the input string is shorter than @var{len},
+the remaining bytes will be null (zero) unless a pre-allocated string
+was provided to @code{bindat-pack}, in which case the remaining bytes
+are left unmodified.  When unpacking, any null bytes in the packed
+input string will appear in the unpacked output.
 
 @item strz &optional @var{len}
-Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
-field with length @var{len}.
+If @var{len} is not provided: Variable-length null-terminated unibyte
+string (@pxref{Text Representations}).  When packing, the entire input
+string is copied to the packed output followed by a null byte.  The
+length of the packed output is the length of the input string plus one
+(for the added null byte).  The input string must not contain any null
+bytes.  When unpacking, the resulting string contains all bytes up to
+(but excluding) the null byte.
+
+If @var{len} is provided: @code{strz} behaves the same as @code{str}
+with one difference: When unpacking, the first null byte encountered
+in the packed string and all subsequent bytes are excluded from the
+unpacked result.
+
+@quotation Caution
+The packed output will not be null-terminated unless one of the
+following is true:
+@itemize
+@item
+The input string is shorter than @var{len} and either no pre-allocated
+string was provided to @code{bindat-pack} or the appropriate byte in
+the pre-allocated string was already null.
+@item
+The input string contains a null byte within the first @var{len}
+bytes.
+@end itemize
+@end quotation
 
 @item vec @var{len} [@var{type}]
 Vector of @var{len} elements.  The type of the elements is given by
-- 
2.36.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* bug#55815: [PATCH] bindat: Improve str, strz documentation
  2022-06-08  4:16       ` Richard Hansen
@ 2022-06-09  7:30         ` Eli Zaretskii
  0 siblings, 0 replies; 7+ messages in thread
From: Eli Zaretskii @ 2022-06-09  7:30 UTC (permalink / raw)
  To: Richard Hansen; +Cc: 55815-done, monnier

> Date: Wed, 8 Jun 2022 00:16:51 -0400
> Cc: 55815@debbugs.gnu.org, monnier@iro.umontreal.ca
> From: Richard Hansen <rhansen@rhansen.org>
> 
> On 6/7/22 12:30, Eli Zaretskii wrote:
> >> Right now all three of those are unibyte, but in a future patch I 
> >> plan on changing the first to accept unibyte-convertible multibyte 
> >> input strings.
> > 
> > Not sure I understand: what do you mean by "unibyte-convertible 
> > multibyte input strings", and how do they differ from the other kinds?
> 
> I mean multibyte strings that do not contain characters that will cause string-to-unibyte to signal an error.

IOW, multibyte strings that contain only ASCII characters and
characters of the 'eight-bit' charset.

> > In any case, you say "unibyte input string" too many time, and that's 
> > unnecessary.
> 
> Done, see attached.

Thanks, installed.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-09  7:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-06  2:22 bug#55815: [PATCH] bindat: Improve str, strz documentation Richard Hansen
2022-06-06 10:59 ` Eli Zaretskii
2022-06-06 23:31   ` Richard Hansen
2022-06-07 16:30     ` Eli Zaretskii
2022-06-07 18:17       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-06-08  4:16       ` Richard Hansen
2022-06-09  7:30         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).