all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Richard Hansen <rhansen@rhansen.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 55719@debbugs.gnu.org, emacs-devel@gnu.org
Subject: bug#55719: [PATCH] bindat strz fixes
Date: Wed, 1 Jun 2022 16:23:57 -0400	[thread overview]
Message-ID: <bad19553-4118-2e28-4fa1-e25125a03b37@rhansen.org> (raw)
In-Reply-To: <jwv4k144n8t.fsf-monnier+emacs@gnu.org>


[-- Attachment #1.1: Type: text/plain, Size: 4431 bytes --]

On 6/1/22 08:04, Stefan Monnier wrote:
>>    * The documentation says that the packed output is null terminated
>>      so that's what users expect.
>>    * It is safer (packed output is less likely to cause some other
>>      program to run past the end of the field).
>>    * Without this change, there is no difference between `strz N` and
>>      `str N`. So what would be the point of `strz N`?
>>    * If the user selected strz, the application probably requires null
>>      termination in all cases, not just when the string is under a
>>      certain length.
> 
> You're listing advantages of this choice, which we know about already.
> The other choice also has advantages.  These don't count as "particular
> reasons" (e.g. a real-life concrete *need* for it, rather than a mere
> preference).

I don't have a concrete need for it at the moment. I just noticed the bug while I was fixing the other bugs. I can leave that change out of the patch series if that is the only thing impeding merge.

> The particular reason to prefer the current behavior is
> backward compatibility (which we could call "inertia").

Anyone that needs the current behavior should be using `str N`, not `strz N`. If they're using `strz N`, then I would consider that to be a bug in their code. If this change breaks someone, they can fix it easily: just change `strz N` to `str N`. I understand that we should endeavor to maintain compatibility, but keeping the current behavior would be intentionally preserving a bug to accommodate other bugs. I don't think that's a good trade-off.

> Note also that `strz` without a length (or with a nil length) behaves
> the way you want.

No -- strz without a length results in variable length encoding. Without these changes, the only way to get a fixed length output with guaranteed null termination is to do `(substring str 0 (1- N))` before packing. (Or explicitly pack a separate zero byte immediately after the `str N-1` or `strz N-1` entry.)

> Of course, we could add an additional (optional) arg to `strz` to
> specify what should happen when unpacking a string with missing NUL byte
> as well as whether to truncate to N-1 chars rather than to N chars to
> make sure there is a terminating NUL byte.

I don't think that's necessary. I think most use cases are satisfied by the current str behavior and the strz behavior with these patches.

>>> How 'bout
>>>        (bindat-unpack spec "ab")
>>> ?
>>
>> I added some comments explaining why cases like that aren't tested.
> 
> The byte-string to unpack is not necessarily built from our own code
> (usually bindat is used to communicate with some other application), so
> whether our code can generate "ab" is not really relevant: the question
> still comes up about what we should do with "ab" (where valid answers
> could be to return "ab" or to return "a" or to signal an error, ...).
> Of course we can also decide it's "undefined".

Regardless of what other applications produce, packed strings like that are invalid according to the spec provided to `bindat-unpack`. Attempting to do something useful with such invalid strings assumes that Postel's Law is a good thing, which I don't agree with (especially with security-sensitive protocols). I think it is safest to signal an error, which it currently does.

A real example of why it might be undesirable to attempt to process a strz string without a null terminator:

   1. The sender streams multiple packed messages over a reliable, in-order protocol like TCP or pipes.
   2. The sending system breaks up the messages at arbitrary places to fit in packets/buffers, so the receiving side might see an incomplete message that will be completed by a future packet (or multiple future packets).
   3. The receiver attempts to extract a message from the bytes received so far. If the bytes do not form a valid message according to the bindat-unpack spec, then assume the message was truncated. Remember the bytes received so far and wait for the next packet.
   4. When the next packet arrives, concatenate its bytes with the remembered bytes and try again.

If bindat-unpack did not signal an error when expecting a null terminator, then the above protocol would not work without extra effort by the user.

Either way, changing the way `bindat-unpack` handles invalid strings feels out of scope for this particular bug report.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-06-01 20:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30  4:11 bug#55719: 29.0.50; various bindat strz bugs Richard Hansen
2022-05-30 16:53 ` bug#55719: [PATCH] bindat strz fixes Richard Hansen
2022-05-31 11:08   ` Eli Zaretskii
2022-05-31 20:08     ` Richard Hansen
2022-05-31 11:08   ` Eli Zaretskii
2022-05-31 23:00   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-06-01  5:28     ` Richard Hansen
2022-06-01 12:04       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-06-01 20:23         ` Richard Hansen [this message]
2022-06-01 20:29           ` Richard Hansen
2022-06-02  2:52           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-06-05 19:30             ` Richard Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bad19553-4118-2e28-4fa1-e25125a03b37@rhansen.org \
    --to=rhansen@rhansen.org \
    --cc=55719@debbugs.gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.