From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Keith Wright Newsgroups: gmane.lisp.guile.user Subject: Re: Uploading Word documents, PDFs, PNG files etc Date: Wed, 13 May 2009 23:47:06 -0400 Message-ID: <200905140347.n4E3l6LB003384@fcs13.keithdiane.us> References: <87vdo7au56.fsf@ambire.localdomain> <87vdo5qc52.fsf@gnu.org> <7i0kzuog.fsf@vps203.linuxvps.org> <3ae3aa420905131223i3c7b83b0tf5a6ec9b200a8704@mail.gmail.com> NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1242256680 27837 80.91.229.12 (13 May 2009 23:18:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 13 May 2009 23:18:00 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Thu May 14 01:17:50 2009 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1M4Nht-0002QW-Q7 for guile-user@m.gmane.org; Thu, 14 May 2009 01:17:50 +0200 Original-Received: from localhost ([127.0.0.1]:36275 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M4Nht-0004PW-0P for guile-user@m.gmane.org; Wed, 13 May 2009 19:17:49 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M4Nhd-0004Dm-AW for guile-user@gnu.org; Wed, 13 May 2009 19:17:33 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M4NhY-0004BU-Ji for guile-user@gnu.org; Wed, 13 May 2009 19:17:32 -0400 Original-Received: from [199.232.76.173] (port=38401 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M4NhY-0004BH-AY for guile-user@gnu.org; Wed, 13 May 2009 19:17:28 -0400 Original-Received: from mail6.sea5.speakeasy.net ([69.17.117.8]:42684) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1M4NhX-0001wp-MB for guile-user@gnu.org; Wed, 13 May 2009 19:17:28 -0400 Original-Received: (qmail 28603 invoked from network); 13 May 2009 23:17:14 -0000 Original-Received: from dsl.keithdiane.us (HELO fcs12.keithdiane.us) ([66.92.74.188]) (envelope-sender ) by mail6.sea5.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 13 May 2009 23:17:14 -0000 Original-Received: from fcs13.keithdiane.us (fcs13 [192.168.1.112]) by fcs12.keithdiane.us (Postfix) with ESMTP id 545BA228350 for ; Wed, 13 May 2009 19:17:23 -0400 (EDT) Original-Received: from fcs13.keithdiane.us (localhost.localdomain [127.0.0.1]) by fcs13.keithdiane.us (Postfix) with ESMTP id A10CEAD0002 for ; Wed, 13 May 2009 23:47:12 -0400 (EDT) Original-Received: (from kwright@localhost) by fcs13.keithdiane.us (8.13.1/8.13.1/Submit) id n4E3l6LB003384; Wed, 13 May 2009 23:47:06 -0400 X-Authentication-Warning: fcs13.keithdiane.us: kwright set sender to kwright@keithdiane.us using -f In-reply-to: <3ae3aa420905131223i3c7b83b0tf5a6ec9b200a8704@mail.gmail.com> (message from Linas Vepstas on Wed, 13 May 2009 14:23:21 -0500) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:7296 Archived-At: > From: Linas Vepstas > Cc: guile-user@gnu.org > > 2009/5/13 Sebastian Tennant : > > > Restricting regexps to actual text is fine... until > > you need to grep binary data, or, as in this case, > > a combination of text and binary data. > > > in cgi.scm that extracted the uploaded (possibly > > binary) file, because the pattern identifying the > > beginning of the file in the raw data string is > > simple ("\n\r\n\r") - > > No, this sounds somehow broken. If I remember correctly, > binary mime-parts should have a ConentLength header > so you can skip over them. If ContentLength is absent, > then the part should bee ascii-encoded (e.g. base64) > yeah, grapping large blocks of ascii sucks, which is > why the ContetnLength should be used. > > -- linas If the spec says a length indication followed by a fixed length of arbitrary binary data, then it is not just sucky, but incorrect to apply either grep or regexp to the binary. It will seem to work until it hits a binary data that "by accident" contains the string you are looking for. The only correct algorithm is to make a preliminary pass to somehow remove the binary data and pseudo-concatenate the remaining strings. -- Keith