On 6/1/22 08:04, Stefan Monnier wrote:
>>    * The documentation says that the packed output is null terminated
>>      so that's what users expect.
>>    * It is safer (packed output is less likely to cause some other
>>      program to run past the end of the field).
>>    * Without this change, there is no difference between `strz N` and
>>      `str N`. So what would be the point of `strz N`?
>>    * If the user selected strz, the application probably requires null
>>      termination in all cases, not just when the string is under a
>>      certain length.
> 
> You're listing advantages of this choice, which we know about already.
> The other choice also has advantages.  These don't count as "particular
> reasons" (e.g. a real-life concrete *need* for it, rather than a mere
> preference).

I don't have a concrete need for it at the moment. I just noticed the bug while I was fixing the other bugs. I can leave that change out of the patch series if that is the only thing impeding merge.

> The particular reason to prefer the current behavior is
> backward compatibility (which we could call "inertia").

Anyone that needs the current behavior should be using `str N`, not `strz N`. If they're using `strz N`, then I would consider that to be a bug in their code. If this change breaks someone, they can fix it easily: just change `strz N` to `str N`. I understand that we should endeavor to maintain compatibility, but keeping the current behavior would be intentionally preserving a bug to accommodate other bugs. I don't think that's a good trade-off.

> Note also that `strz` without a length (or with a nil length) behaves
> the way you want.

No -- strz without a length results in variable length encoding. Without these changes, the only way to get a fixed length output with guaranteed null termination is to do `(substring str 0 (1- N))` before packing. (Or explicitly pack a separate zero byte immediately after the `str N-1` or `strz N-1` entry.)

> Of course, we could add an additional (optional) arg to `strz` to
> specify what should happen when unpacking a string with missing NUL byte
> as well as whether to truncate to N-1 chars rather than to N chars to
> make sure there is a terminating NUL byte.

I don't think that's necessary. I think most use cases are satisfied by the current str behavior and the strz behavior with these patches.

>>> How 'bout
>>>        (bindat-unpack spec "ab")
>>> ?
>>
>> I added some comments explaining why cases like that aren't tested.
> 
> The byte-string to unpack is not necessarily built from our own code
> (usually bindat is used to communicate with some other application), so
> whether our code can generate "ab" is not really relevant: the question
> still comes up about what we should do with "ab" (where valid answers
> could be to return "ab" or to return "a" or to signal an error, ...).
> Of course we can also decide it's "undefined".

Regardless of what other applications produce, packed strings like that are invalid according to the spec provided to `bindat-unpack`. Attempting to do something useful with such invalid strings assumes that Postel's Law is a good thing, which I don't agree with (especially with security-sensitive protocols). I think it is safest to signal an error, which it currently does.

A real example of why it might be undesirable to attempt to process a strz string without a null terminator:

   1. The sender streams multiple packed messages over a reliable, in-order protocol like TCP or pipes.
   2. The sending system breaks up the messages at arbitrary places to fit in packets/buffers, so the receiving side might see an incomplete message that will be completed by a future packet (or multiple future packets).
   3. The receiver attempts to extract a message from the bytes received so far. If the bytes do not form a valid message according to the bindat-unpack spec, then assume the message was truncated. Remember the bytes received so far and wait for the next packet.
   4. When the next packet arrives, concatenate its bytes with the remembered bytes and try again.

If bindat-unpack did not signal an error when expecting a null terminator, then the above protocol would not work without extra effort by the user.

Either way, changing the way `bindat-unpack` handles invalid strings feels out of scope for this particular bug report.