From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Bindat can exhaust memory when unpacking to vector Date: Sun, 10 Mar 2024 17:58:14 -0400 Message-ID: References: <87pm16fnzv.fsf@iki.fi> <83zg09oa1v.fsf@gnu.org> <87a5s9m8ri.fsf@iki.fi> <83pm0q557b.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1569"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Petteri Hintsanen , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Mar 10 22:59:10 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rjRCH-0000AZ-Kt for ged-emacs-devel@m.gmane-mx.org; Sun, 10 Mar 2024 22:59:09 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rjRBW-000307-KL; Sun, 10 Mar 2024 17:58:22 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rjRBV-0002zj-6H for emacs-devel@gnu.org; Sun, 10 Mar 2024 17:58:21 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rjRBT-0002Vi-0Z; Sun, 10 Mar 2024 17:58:20 -0400 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id B987D80DB3; Sun, 10 Mar 2024 17:58:16 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1710107895; bh=UJhicdd7IfmzVtXdkgrsMvRezy3SXcFWJVYMZoA1b4I=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=mxcBQfcIxZsw27ep37gIquOmkgLoz3AoR8M2/Xfih87kXHBpdNLHjz6slUEdIIVAh +LSn0DRxkFeP7Gy9JTyefrTeWdmQWhBZ0WhJ55n/6lOU4H1F52briOa2kbzDj+iGQx Bs2VHcvBV5y/FC1vC902PNj9bd3tE9ycDj5EFVucgtxMn/GvBEDSao1Q118gdwXH+0 ByeQ9NISfrTxOW9BIG6vkHnT1umqpQHCgHnXv6c41bgPxuDAztrM7NrCnN/EXZxq/5 1d7SELiYU8nuvvNYKuv2v4mh2Wnf1FYYii6CtiBX96Z1Lp+K5VPzazgr4J58YziR1E mPQFAghf+dXlQ== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 84C9A8062A; Sun, 10 Mar 2024 17:58:15 -0400 (EDT) Original-Received: from alfajor (69-165-147-56.dsl.teksavvy.com [69.165.147.56]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 54A3312056B; Sun, 10 Mar 2024 17:58:15 -0400 (EDT) In-Reply-To: (Stefan Monnier's message of "Sat, 04 Nov 2023 04:21:25 -0400") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:316990 Archived-At: [ Somehow I can't find my previous answer in the `emacs-devel` archive. In any case, here's an updated (and late) answer :-) ] > If you then, for example, eval this > > (bindat-unpack foo-bindat-spec [255 255 255 255 0 1 2 3 4 5]) > > it will result in (make-vector 4294967295 0), which makes Emacs hang. Yup. Validation is needed. And > If the data being unpacked cannot be trusted, then some kind of sanity > checking should be done to avoid things like above. In my application > this is the case, so I embedded validation into the spec itself: > > (defconst foo-bindat-spec > (bindat-type > (length uint 32) > (_ unit (when (> length 1234) (error "malicious input"))) > (data vec length))) > > Which is somewhat messy, but it works. You should also be able to do (defconst foo-bindat-spec (bindat-type (length uint 32) (data vec (if (< length 1234) length (error "malicious input"))))) as well. But it's still not very elegant. We could provide helper functions so you can write things like (defconst foo-bindat-spec (bindat-type (length uint 32) (data vec (bounded 1 length 1234)))) so we signal an error if the length is less than 1 or larger than 1234. > I also played around with the idea of patching bindat.el itself to do > trivial checking against the input data size, like this: > > diff --git a/lisp/emacs-lisp/bindat.el b/lisp/emacs-lisp/bindat.el > index 6f2af7f975b..c58f6c8f5c7 100644 > --- a/lisp/emacs-lisp/bindat.el > +++ b/lisp/emacs-lisp/bindat.el > @@ -204,6 +204,9 @@ bindat--unpack-item > ('str (bindat--unpack-str len)) > ('strz (bindat--unpack-strz len)) > ('vec > + (when (> len (length bindat-raw)) > + (error "Vector length %d is greater than raw data length %d." > + len (length bindat-raw))) > (let ((v (make-vector len 0)) (vlen 1)) > (if (consp vectype) > (setq vlen (nth 1 vectype) > > but this feels way too intrusive. Actually, this is a nice solution, I think. It could conceivably be too conservative if someone manages to use `vec` with a type that occupies less than one byte (say the input bytes is a bitvector, or a compressed stream, and the vector output is the result of uncompressing), but currently I can't see how that could be implemented with `vec`. I think it can be implemented with a recursive type, like I do in `bindat-test--LEB128`, but doing it with `vec` would be very cumbersome. It seems hypothetical enough that I think we should go with your patch. > Checking should be optional and somehow programmable, perhaps > a separate "checked vec" type? (I don't have any good, concrete > ideas, sorry.) I don't see the benefit of not-checking, to be honest. Stefan