unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Bindat can exhaust memory when unpacking to vector
@ 2023-10-22 19:36 Petteri Hintsanen
  2023-10-23 11:25 ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Petteri Hintsanen @ 2023-10-22 19:36 UTC (permalink / raw)
  To: emacs-devel

It appears that bindat-unpack can exhaust memory when it unpacks to a
vector.  Consider the following Bindat type expression:

  (require 'bindat)
  (defconst foo-bindat-spec
    (bindat-type
     (length uint 32)
     (data vec length)))

If you then, for example, eval this

  (bindat-unpack foo-bindat-spec [255 255 255 255 0 1 2 3 4 5])

it will result in (make-vector 4294967295 0), which makes Emacs hang.

Now it can be argued whether this is a problem or not.

If the data being unpacked cannot be trusted, then some kind of sanity
checking should be done to avoid things like above.  In my application
this is the case, so I embedded validation into the spec itself:

  (defconst foo-bindat-spec
    (bindat-type
     (length uint 32)
     (_ unit (when (> length 1234) (error "malicious input")))
     (data vec length)))

Which is somewhat messy, but it works.

I also played around with the idea of patching bindat.el itself to do
trivial checking against the input data size, like this:

diff --git a/lisp/emacs-lisp/bindat.el b/lisp/emacs-lisp/bindat.el
index 6f2af7f975b..c58f6c8f5c7 100644
--- a/lisp/emacs-lisp/bindat.el
+++ b/lisp/emacs-lisp/bindat.el
@@ -204,6 +204,9 @@ bindat--unpack-item
    ('str (bindat--unpack-str len))
    ('strz (bindat--unpack-strz len))
    ('vec
+    (when (> len (length bindat-raw))
+      (error "Vector length %d is greater than raw data length %d."
+             len (length bindat-raw)))
     (let ((v (make-vector len 0)) (vlen 1))
       (if (consp vectype)
 	  (setq vlen (nth 1 vectype)

but this feels way too intrusive.  Checking should be optional and
somehow programmable, perhaps a separate "checked vec" type?  (I don't
have any good, concrete ideas, sorry.)

Nonetheless, I think it would be prudent to mention this potential
pitfall in the Emacs Lisp manual so that users become aware of it.
I can write a texinfo patch if you agree.

[Side note: vec seems to be the only bindat type that suffers from this.
str (string) is another type that has a varying length, but it is
unpacked by `substring' which will implicitly guard against nonsensical
lengths.]

Thanks,
Petteri




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-10-22 19:36 Bindat can exhaust memory when unpacking to vector Petteri Hintsanen
@ 2023-10-23 11:25 ` Eli Zaretskii
  2023-10-23 19:36   ` Petteri Hintsanen
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2023-10-23 11:25 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: emacs-devel

> From: Petteri Hintsanen <petterih@iki.fi>
> Date: Sun, 22 Oct 2023 22:36:36 +0300
> 
> It appears that bindat-unpack can exhaust memory when it unpacks to a
> vector.  Consider the following Bindat type expression:
> 
>   (require 'bindat)
>   (defconst foo-bindat-spec
>     (bindat-type
>      (length uint 32)
>      (data vec length)))
> 
> If you then, for example, eval this
> 
>   (bindat-unpack foo-bindat-spec [255 255 255 255 0 1 2 3 4 5])
> 
> it will result in (make-vector 4294967295 0), which makes Emacs hang.

Hang or just slow down tremendously due to paging?  How much time did
you wait before deciding that Emacs hung?  And how much memory do you
have on that machine?

The above call to make-vector is supposed to signal a memory-full
error if the number of elements is more than the system can support,
but it cannot be smart enough to consider the amount of available VM.

> Now it can be argued whether this is a problem or not.

Can you argue why this should be considered a problem for Emacs to
solve, and not a cockpit error on the part of the Lisp program that
makes such calls?

> Nonetheless, I think it would be prudent to mention this potential
> pitfall in the Emacs Lisp manual so that users become aware of it.
> I can write a texinfo patch if you agree.

What would we say? that unpacking vectors larger than available memory
would cause Emacs to run out of memory?

> [Side note: vec seems to be the only bindat type that suffers from this.
> str (string) is another type that has a varying length, but it is
> unpacked by `substring' which will implicitly guard against nonsensical
> lengths.]

See above: make-vector is also protected.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-10-23 11:25 ` Eli Zaretskii
@ 2023-10-23 19:36   ` Petteri Hintsanen
  2023-10-24  4:34     ` tomas
  2023-11-04  7:48     ` Eli Zaretskii
  0 siblings, 2 replies; 10+ messages in thread
From: Petteri Hintsanen @ 2023-10-23 19:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Hang or just slow down tremendously due to paging?  How much time did
> you wait before deciding that Emacs hung?  And how much memory do you
> have on that machine?

Sorry, I was careless with wording; obviously what I meant was "slowed
down due to paging."  I did not wait for long, maybe a minute or so.
This machine has ample memory, 16 GB RAM plus another 16 GB swap space.

> Can you argue why this should be considered a problem for Emacs to
> solve, and not a cockpit error on the part of the Lisp program that
> makes such calls?

No, not really.  Now when I think about it, it is fairly obvious that
the programmer should realize this when using such low level mechanism
as Bindat.

>> Nonetheless, I think it would be prudent to mention this potential
>> pitfall in the Emacs Lisp manual so that users become aware of it.
>> I can write a texinfo patch if you agree.
>
> What would we say? that unpacking vectors larger than available memory
> would cause Emacs to run out of memory?

I was thinking of some kind of note about validation, e.g.:

  Notice that unpacking data to a vector of length LEN will
  unconditionally preallocate memory with at least LEN bytes.  If LEN is
  calculated from the input, it is a good idea to validate its value
  before unpacking.

But, again, users of Bindat can be expected to know that tainted data
should be sanitized.  And what is described above is an implementation
detail.  So it does not make much sense to mention this in the manual.

Sorry for the noise.
Petteri



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-10-23 19:36   ` Petteri Hintsanen
@ 2023-10-24  4:34     ` tomas
  2023-11-12 19:11       ` Petteri Hintsanen
  2023-11-04  7:48     ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: tomas @ 2023-10-24  4:34 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 702 bytes --]

On Mon, Oct 23, 2023 at 10:36:01PM +0300, Petteri Hintsanen wrote:
> Eli Zaretskii <eliz@gnu.org> writes:

[...]

> > Can you argue why this should be considered a problem for Emacs to
> > solve, and not a cockpit error on the part of the Lisp program that
> > makes such calls?
> 
> No, not really.  Now when I think about it, it is fairly obvious that
> the programmer should realize this when using such low level mechanism
> as Bindat.

OTOH, a maximum length of something might be thought as a (n optional)
part of a type description. Types are predicates.

As could be a set of bounds for a numerical type. Perhaps there's an
extension opportunity for bindat.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-10-23 19:36   ` Petteri Hintsanen
  2023-10-24  4:34     ` tomas
@ 2023-11-04  7:48     ` Eli Zaretskii
  2023-11-04  8:21       ` Stefan Monnier
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2023-11-04  7:48 UTC (permalink / raw)
  To: Petteri Hintsanen, Stefan Monnier; +Cc: emacs-devel

> X-Spam-Status: No, score=0.653 tagged_above=-9999 required=6
> 	tests=[SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.652] autolearn=disabled
> From: Petteri Hintsanen <petterih@iki.fi>
> Cc: emacs-devel@gnu.org
> Date: Mon, 23 Oct 2023 22:36:01 +0300
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Hang or just slow down tremendously due to paging?  How much time did
> > you wait before deciding that Emacs hung?  And how much memory do you
> > have on that machine?
> 
> Sorry, I was careless with wording; obviously what I meant was "slowed
> down due to paging."  I did not wait for long, maybe a minute or so.
> This machine has ample memory, 16 GB RAM plus another 16 GB swap space.
> 
> > Can you argue why this should be considered a problem for Emacs to
> > solve, and not a cockpit error on the part of the Lisp program that
> > makes such calls?
> 
> No, not really.  Now when I think about it, it is fairly obvious that
> the programmer should realize this when using such low level mechanism
> as Bindat.
> 
> >> Nonetheless, I think it would be prudent to mention this potential
> >> pitfall in the Emacs Lisp manual so that users become aware of it.
> >> I can write a texinfo patch if you agree.
> >
> > What would we say? that unpacking vectors larger than available memory
> > would cause Emacs to run out of memory?
> 
> I was thinking of some kind of note about validation, e.g.:
> 
>   Notice that unpacking data to a vector of length LEN will
>   unconditionally preallocate memory with at least LEN bytes.  If LEN is
>   calculated from the input, it is a good idea to validate its value
>   before unpacking.
> 
> But, again, users of Bindat can be expected to know that tainted data
> should be sanitized.  And what is described above is an implementation
> detail.  So it does not make much sense to mention this in the manual.

Stefan, any comments?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-11-04  7:48     ` Eli Zaretskii
@ 2023-11-04  8:21       ` Stefan Monnier
  2024-03-10 21:58         ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2023-11-04  8:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Petteri Hintsanen, emacs-devel

>> But, again, users of Bindat can be expected to know that tainted data
>> should be sanitized.  And what is described above is an implementation
>> detail.  So it does not make much sense to mention this in the manual.
>
> Stefan, any comments?

Hmm... I'm also tempted to say it's the programmer's fault, yet at the
same time, it's BinDat which does the parsing so the programmer doesn't
really have much opportunity to sanitize the data beforehand (short of
doing a manual pre-parse which kind of defeats the purpose).

With `bindat-type`, it's not too hard to insert ELisp code within the
BinDat parsing, so it *can* be solved from there, but it's indeed not
something that's been seriously considered so far.

Kind of an embarrassing blind spot.  BinDat should instead encourage the
addition of sanity checks within the parsing code (including maybe add
specific support for it).


        Stefan




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-10-24  4:34     ` tomas
@ 2023-11-12 19:11       ` Petteri Hintsanen
  0 siblings, 0 replies; 10+ messages in thread
From: Petteri Hintsanen @ 2023-11-12 19:11 UTC (permalink / raw)
  To: tomas; +Cc: Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Kind of an embarrassing blind spot.  BinDat should instead encourage the
> addition of sanity checks within the parsing code (including maybe add
> specific support for it).

This was also something I had in mind, along the lines of what Tomas
wrote below:

<tomas@tuxteam.de> writes:

> OTOH, a maximum length of something might be thought as a (n optional)
> part of a type description. Types are predicates.
>
> As could be a set of bounds for a numerical type. Perhaps there's an
> extension opportunity for bindat.

So yes, I also agree that this could be an extension opportunity, but I
wouldn't push it further until we have concrete use cases.  It is
usually futile to try to generalize from zero examples.  (Well, I do
have one example, but still.)

For the time being using inline checks within the specification suffices
for me.

Thanks,
Petteri



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2023-11-04  8:21       ` Stefan Monnier
@ 2024-03-10 21:58         ` Stefan Monnier
  2024-03-19 20:09           ` Petteri Hintsanen
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier @ 2024-03-10 21:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Petteri Hintsanen, emacs-devel

[ Somehow I can't find my previous answer in the `emacs-devel`
  archive.  In any case, here's an updated (and late) answer :-)  ]

> If you then, for example, eval this
> 
>   (bindat-unpack foo-bindat-spec [255 255 255 255 0 1 2 3 4 5])
> 
> it will result in (make-vector 4294967295 0), which makes Emacs hang.

Yup.  Validation is needed.  And

> If the data being unpacked cannot be trusted, then some kind of sanity
> checking should be done to avoid things like above.  In my application
> this is the case, so I embedded validation into the spec itself:
> 
>   (defconst foo-bindat-spec
>     (bindat-type
>      (length uint 32)
>      (_ unit (when (> length 1234) (error "malicious input")))
>      (data vec length)))
> 
> Which is somewhat messy, but it works.

You should also be able to do

    (defconst foo-bindat-spec
      (bindat-type
       (length uint 32)
       (data vec (if (< length 1234) length (error "malicious input")))))

as well.  But it's still not very elegant.  We could provide helper
functions so you can write things like

    (defconst foo-bindat-spec
      (bindat-type
       (length uint 32)
       (data vec (bounded 1 length 1234))))

so we signal an error if the length is less than 1 or larger than 1234.

> I also played around with the idea of patching bindat.el itself to do
> trivial checking against the input data size, like this:
> 
> diff --git a/lisp/emacs-lisp/bindat.el b/lisp/emacs-lisp/bindat.el
> index 6f2af7f975b..c58f6c8f5c7 100644
> --- a/lisp/emacs-lisp/bindat.el
> +++ b/lisp/emacs-lisp/bindat.el
> @@ -204,6 +204,9 @@ bindat--unpack-item
>     ('str (bindat--unpack-str len))
>     ('strz (bindat--unpack-strz len))
>     ('vec
> +    (when (> len (length bindat-raw))
> +      (error "Vector length %d is greater than raw data length %d."
> +             len (length bindat-raw)))
>      (let ((v (make-vector len 0)) (vlen 1))
>        (if (consp vectype)
>           (setq vlen (nth 1 vectype)
> 
> but this feels way too intrusive.

Actually, this is a nice solution, I think.

It could conceivably be too conservative if someone manages to use `vec`
with a type that occupies less than one byte (say the input bytes is
a bitvector, or a compressed stream, and the vector output is the result of
uncompressing), but currently I can't see how that could be implemented
with `vec`.  I think it can be implemented with a recursive type, like I do in
`bindat-test--LEB128`, but doing it with `vec` would be very cumbersome.

It seems hypothetical enough that I think we should go with your patch.

> Checking should be optional and somehow programmable, perhaps
> a separate "checked vec" type?  (I don't have any good, concrete
> ideas, sorry.)

I don't see the benefit of not-checking, to be honest.


        Stefan




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2024-03-10 21:58         ` Stefan Monnier
@ 2024-03-19 20:09           ` Petteri Hintsanen
  2024-03-19 21:08             ` Stefan Monnier
  0 siblings, 1 reply; 10+ messages in thread
From: Petteri Hintsanen @ 2024-03-19 20:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>   (defconst foo-bindat-spec
>>     (bindat-type
>>      (length uint 32)
>>      (_ unit (when (> length 1234) (error "malicious input")))
>>      (data vec length)))
>> 
>> Which is somewhat messy, but it works.
>
> You should also be able to do
>
>     (defconst foo-bindat-spec
>       (bindat-type
>        (length uint 32)
>        (data vec (if (< length 1234) length (error "malicious input")))))
>
> as well.  But it's still not very elegant.

Ah sure, right.  This is nice I think.

>     (defconst foo-bindat-spec
>       (bindat-type
>        (length uint 32)
>        (data vec (bounded 1 length 1234))))
>
> so we signal an error if the length is less than 1 or larger than 1234.

Sure.  But I'm not sure which kind of helpers would be the most
beneficial for general use.  Like said, I don't have that many examples,
and for my few validation cases I have found the "unit type" to be good
enough.

>> I also played around with the idea of patching bindat.el itself to do
>> trivial checking against the input data size, like this:
>> [...]
> Actually, this is a nice solution, I think.
> It seems hypothetical enough that I think we should go with your patch.

Feel free to apply it in whatever form you want, if you think it is
appropriate.

>> Checking should be optional and somehow programmable, perhaps
>> a separate "checked vec" type?  (I don't have any good, concrete
>> ideas, sorry.)
>
> I don't see the benefit of not-checking, to be honest.

I probably thought about efficiency here.  If bindat-unpack is hammered
many many many times and the checking could be _safely_ ignored, then,
perhaps, there might be a noticeable difference in performance.  Or
maybe not.

Thanks,
Petteri



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Bindat can exhaust memory when unpacking to vector
  2024-03-19 20:09           ` Petteri Hintsanen
@ 2024-03-19 21:08             ` Stefan Monnier
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan Monnier @ 2024-03-19 21:08 UTC (permalink / raw)
  To: Petteri Hintsanen; +Cc: Eli Zaretskii, emacs-devel

>>     (defconst foo-bindat-spec
>>       (bindat-type
>>        (length uint 32)
>>        (data vec (bounded 1 length 1234))))
>>
>> so we signal an error if the length is less than 1 or larger than 1234.
>
> Sure.  But I'm not sure which kind of helpers would be the most
> beneficial for general use.  Like said, I don't have that many examples,
> and for my few validation cases I have found the "unit type" to be good
> enough.

Same here.

>>> I also played around with the idea of patching bindat.el itself to do
>>> trivial checking against the input data size, like this:
>>> [...]
>> Actually, this is a nice solution, I think.
>> It seems hypothetical enough that I think we should go with your patch.
> Feel free to apply it in whatever form you want, if you think it is
> appropriate.

I did, thank you :-)

>>> Checking should be optional and somehow programmable, perhaps
>>> a separate "checked vec" type?  (I don't have any good, concrete
>>> ideas, sorry).
>> I don't see the benefit of not-checking, to be honest.
> I probably thought about efficiency here.

In theory it can have an impact, but I think in practice it's hard to
imagine a case where it will be significant.


        Stefan




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-19 21:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-22 19:36 Bindat can exhaust memory when unpacking to vector Petteri Hintsanen
2023-10-23 11:25 ` Eli Zaretskii
2023-10-23 19:36   ` Petteri Hintsanen
2023-10-24  4:34     ` tomas
2023-11-12 19:11       ` Petteri Hintsanen
2023-11-04  7:48     ` Eli Zaretskii
2023-11-04  8:21       ` Stefan Monnier
2024-03-10 21:58         ` Stefan Monnier
2024-03-19 20:09           ` Petteri Hintsanen
2024-03-19 21:08             ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).