From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Newsgroups: gmane.lisp.guile.devel Subject: Making custom binary input ports unbuffered Date: Wed, 15 Jan 2014 00:00:04 +0100 Message-ID: <878uuiyxsb.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1389740610 29495 80.91.229.3 (14 Jan 2014 23:03:30 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 14 Jan 2014 23:03:30 +0000 (UTC) Cc: Andy Wingo , Mark H Weaver To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Jan 15 00:03:24 2014 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1W3D11-0003Pe-Ki for guile-devel@m.gmane.org; Wed, 15 Jan 2014 00:03:23 +0100 Original-Received: from localhost ([::1]:51129 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3D11-0001pB-8X for guile-devel@m.gmane.org; Tue, 14 Jan 2014 18:03:23 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50105) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3D0s-0001ov-27 for guile-devel@gnu.org; Tue, 14 Jan 2014 18:03:19 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3D0m-0002WJ-6z for guile-devel@gnu.org; Tue, 14 Jan 2014 18:03:14 -0500 Original-Received: from hera.aquilenet.fr ([141.255.128.1]:52601) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3D0l-0002Vk-PT for guile-devel@gnu.org; Tue, 14 Jan 2014 18:03:08 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 983DF1B3E; Wed, 15 Jan 2014 00:00:05 +0100 (CET) Original-Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EEscshyGFGrh; Wed, 15 Jan 2014 00:00:05 +0100 (CET) Original-Received: from pluto (reverse-83.fdn.fr [80.67.176.83]) by hera.aquilenet.fr (Postfix) with ESMTPSA id C812D1B34; Wed, 15 Jan 2014 00:00:04 +0100 (CET) X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 25 =?utf-8?Q?Niv=C3=B4se?= an 222 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0xEA52ECF4 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 83C4 F8E5 10A3 3B4C 5BEA D15D 77DD 95E2 EA52 ECF4 X-OS: x86_64-unknown-linux-gnu User-Agent: Gnus/5.130007 (Ma Gnus v0.7) Emacs/24.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x [fuzzy] X-Received-From: 141.255.128.1 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:16802 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi! As discussed on IRC, our R6 custom binary input ports (CBIPs) are currently buffered. The buffer is hard-coded and setvbuf doesn=E2=80=99t w= ork on non-file ports. Having a buffer can be problematic for several reasons. 1. The user=E2=80=99s =E2=80=98get-position=E2=80=99 will always point pa= st what the port=E2=80=99s user sees. This could be worked around by subtracting read_pos to what =E2=80=98get-position=E2=80=99 returns, but that feels wrong: conceptu= ally, the port=E2=80=99s position is something under the CBIP implementor=E2=80= =99s control. (I wonder how fopencookie/fseek deal with this.) 2. Some applications want no buffering. My use case was that I read from a byte stream, and at some point I want to compute a hash over a delimited part of that stream. To do that, I intuitively wanted to have a CBIP that wraps some other input port, and do the hash computation in that CBIP. But that only works if we can guarantee that the CBIP doesn=E2=80=99t read more= than what was actually asked. Same for the delimited port in (web response). The patch below makes CBIPs unbuffered (see the tests for the properties it gives.) It works thanks to the optimization in =E2=80=98scm_c_read=E2= =80=99 for unbuffered binary ports. This is going to be a performance hit for applications that read things byte by byte, *or* via textual procedures (=E2=80=98scm_getc=E2=80=99, =E2= =80=98get-string=E2=80=99, etc.) But the assumption is that people rather use =E2=80=98get-bytevector= -n=E2=80=99 (or similar) to get a chunk of data. However! There are places in (web ...) where CBIPs are used for mixed binary/textual input. When that happens, all the accesses end up being unbuffered, which really sucks. So, what do we do? :-) Thanks, Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-Make-custom-binary-input-ports-unbuffered.patch Content-Description: the patch >From ef60588d6e76d6ad0ae09197043c6d7371beb1b7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= Date: Tue, 14 Jan 2014 21:54:51 +0100 Subject: [PATCH] Make custom binary input ports unbuffered. * libguile/r6rs-ports.c (make_cbip): Leave read_{pos,end,buf_size} unchanged and call 'scm_port_non_buffer'. (cbip_fill_input): Change to use PORT's associated bytevector as an intermediate copy passed to READ_PROC and then copied back into PORT's own buffer. Reallocate a new bytevector when it's smaller than the current 'read_buf_size'. Don't loop back to 'again' label and remove it. * test-suite/tests/r6rs-ports.test ("7.2.7 Input Ports")["custom binary input & 'port-position'", "custom binary input & 'read!' calls"]: New tests. --- libguile/r6rs-ports.c | 77 ++++++++++++++++++++++++++-------------- test-suite/tests/r6rs-ports.test | 60 +++++++++++++++++++++++++++++++ 2 files changed, 110 insertions(+), 27 deletions(-) diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c index 790c24c..f2a654e 100644 --- a/libguile/r6rs-ports.c +++ b/libguile/r6rs-ports.c @@ -280,13 +280,17 @@ cbp_close (SCM port) static scm_t_bits custom_binary_input_port_type = 0; -/* Size of the buffer embedded in custom binary input ports. */ +/* Initial size of the buffer embedded in custom binary input ports. */ #define CBIP_BUFFER_SIZE 4096 /* Return the bytevector associated with PORT. */ #define SCM_CBIP_BYTEVECTOR(_port) \ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 4) +/* Set BV as the bytevector associated with PORT. */ +#define SCM_SET_CBIP_BYTEVECTOR(_port, _bv) \ + SCM_SIMPLE_VECTOR_SET (SCM_PACK (SCM_STREAM (_port)), 4, (_bv)) + /* Return the various procedures of PORT. */ #define SCM_CBIP_READ_PROC(_port) \ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 0) @@ -297,15 +301,11 @@ make_cbip (SCM read_proc, SCM get_position_proc, SCM set_position_proc, SCM close_proc) { SCM port, bv, method_vector; - char *c_bv; - unsigned c_len; scm_t_port *c_port; const unsigned long mode_bits = SCM_OPN | SCM_RDNG; - /* Use a bytevector as the underlying buffer. */ - c_len = CBIP_BUFFER_SIZE; - bv = scm_c_make_bytevector (c_len); - c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv); + /* Pre-allocate a bytevector to be passed to the 'read!' method. */ + bv = scm_c_make_bytevector (CBIP_BUFFER_SIZE); /* Store the various methods and bytevector in a vector. */ method_vector = scm_c_make_vector (5, SCM_BOOL_F); @@ -326,10 +326,13 @@ make_cbip (SCM read_proc, SCM get_position_proc, /* Attach it the method vector. */ SCM_SETSTREAM (port, SCM_UNPACK (method_vector)); - /* Have the port directly access the buffer (bytevector). */ - c_port->read_pos = c_port->read_buf = (unsigned char *) c_bv; - c_port->read_end = (unsigned char *) c_bv; - c_port->read_buf_size = c_len; + /* Make it unbuffered. This is necessary to guarantee that (1) users can + actually implement GET_POSITION_PROC correctly, and that (2) each + 'get-bytevector-*' call has exactly one corresponding READ_PROC call. + The latter is necessary in some applications, typically when wrapping + another port where we don't want to consume more than what was + actually asked for. */ + scm_port_non_buffer (c_port); /* Mark PORT as open, readable and unbuffered (hmm, how elegant...). */ SCM_SET_CELL_TYPE (port, custom_binary_input_port_type | mode_bits); @@ -346,34 +349,54 @@ cbip_fill_input (SCM port) int result; scm_t_port *c_port = SCM_PTAB_ENTRY (port); - again: if (c_port->read_pos >= c_port->read_end) { /* Invoke the user's `read!' procedure. */ - unsigned c_octets; + size_t c_octets, c_requested; SCM bv, read_proc, octets; - /* Use the bytevector associated with PORT as the buffer passed to the + read_proc = SCM_CBIP_READ_PROC (port); + + /* Attempt to pass the bytevector associated with PORT to the `read!' procedure, thereby avoiding additional allocations. */ bv = SCM_CBIP_BYTEVECTOR (port); - read_proc = SCM_CBIP_READ_PROC (port); - /* The assumption here is that C_PORT's internal buffer wasn't changed - behind our back. */ - assert (c_port->read_buf == - (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv)); - assert ((unsigned) c_port->read_buf_size - == SCM_BYTEVECTOR_LENGTH (bv)); + /* When called via the 'get-bytevector-*' procedures, and thus via + 'scm_c_read', we are passed the caller-provided buffer, so we need + to check its size. */ + c_requested = c_port->read_buf_size; + + if (SCM_BYTEVECTOR_LENGTH (bv) < c_requested) + { + /* Bad luck: we have to make another allocation. Save that + bytevector for later reuse, in the hope that the application + has regular access patterns. */ + bv = scm_c_make_bytevector (c_requested); + SCM_SET_CBIP_BYTEVECTOR (port, bv); + } + + /* READ_PROC must always be called with a strictly positive number of + bytes to read; otherwise it is forced to return 0, which is used + to indicate EOF. */ + if (SCM_LIKELY (c_requested > 0)) + { + octets = scm_call_3 (read_proc, bv, SCM_INUM0, + scm_from_size_t (c_requested)); + c_octets = scm_to_size_t (octets); + } + else + c_octets = 0; - octets = scm_call_3 (read_proc, bv, SCM_INUM0, - SCM_I_MAKINUM (CBIP_BUFFER_SIZE)); - c_octets = scm_to_uint (octets); + if (SCM_UNLIKELY (c_octets > c_requested)) + scm_out_of_range (FUNC_NAME, octets); - c_port->read_pos = (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv); + /* Copy the data back to the original buffer. */ + memcpy ((char *) c_port->read_pos, SCM_BYTEVECTOR_CONTENTS (bv), + c_octets); c_port->read_end = (unsigned char *) c_port->read_pos + c_octets; - if (c_octets > 0) - goto again; + if (c_octets != 0 || c_requested == 0) + result = (int) *c_port->read_pos; else result = EOF; } diff --git a/test-suite/tests/r6rs-ports.test b/test-suite/tests/r6rs-ports.test index eaae29f..41b46b1 100644 --- a/test-suite/tests/r6rs-ports.test +++ b/test-suite/tests/r6rs-ports.test @@ -447,6 +447,66 @@ not `set-port-position!'" (u8-list->bytevector (map char->integer (string->list "Port!"))))))) + (pass-if-equal "custom binary input & 'port-position'" + '(0 2 5 11) + ;; Check that the value returned by 'port-position' is correct, and + ;; that each 'port-position' call leads one call to the + ;; 'get-position' method. + (let* ((str "Hello Port!") + (output (make-bytevector (string-length str))) + (source (with-fluids ((%default-port-encoding "UTF-8")) + (open-string-input-port str))) + (read! (lambda (bv start count) + (let ((r (get-bytevector-n! source bv start count))) + (if (eof-object? r) + 0 + r)))) + (pos '()) + (get-pos (lambda () + (let ((p (port-position source))) + (set! pos (cons p pos)) + p))) + (port (make-custom-binary-input-port "the port" read! + get-pos #f #f))) + + (and (= 0 (port-position port)) + (begin + (get-bytevector-n! port output 0 2) + (= 2 (port-position port))) + (begin + (get-bytevector-n! port output 2 3) + (= 5 (port-position port))) + (let ((bv (string->utf8 (get-string-all port)))) + (bytevector-copy! bv 0 output 5 (bytevector-length bv)) + (= (string-length str) (port-position port))) + (bytevector=? output (string->utf8 str)) + (reverse pos)))) + + (pass-if-equal "custom binary input & 'read!' calls" + `((2 "He") (3 "llo") (42 " Port!")) + (let* ((str "Hello Port!") + (source (with-fluids ((%default-port-encoding "UTF-8")) + (open-string-input-port str))) + (reads '()) + (read! (lambda (bv start count) + (set! reads (cons count reads)) + (let ((r (get-bytevector-n! source bv start count))) + (if (eof-object? r) + 0 + r)))) + (port (make-custom-binary-input-port "the port" read! + #f #f #f))) + + (let ((ret (list (get-bytevector-n port 2) + (get-bytevector-n port 3) + (get-bytevector-n port 42)))) + (zip (reverse reads) + (map (lambda (obj) + (if (bytevector? obj) + (utf8->string obj) + obj)) + ret))))) + (pass-if "custom binary input port `close-proc' is called" (let* ((closed? #f) (read! (lambda (bv start count) 0)) -- 1.8.4 --=-=-=--