From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: Re: review/merge request: wip-array-refactor Date: Tue, 04 Aug 2009 14:21:08 +0200 Message-ID: References: <874ot48k4h.fsf@arudy.ossau.uklinux.net> <87k51pyj0c.fsf@arudy.ossau.uklinux.net> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1249388621 12756 80.91.229.12 (4 Aug 2009 12:23:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 4 Aug 2009 12:23:41 +0000 (UTC) Cc: guile-devel To: Neil Jerram Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Aug 04 14:23:34 2009 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MYJ1Q-00052W-KJ for guile-devel@m.gmane.org; Tue, 04 Aug 2009 14:21:41 +0200 Original-Received: from localhost ([127.0.0.1]:43178 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MYJ1P-0000O8-Lu for guile-devel@m.gmane.org; Tue, 04 Aug 2009 08:21:39 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MYJ1D-0000Lj-SH for guile-devel@gnu.org; Tue, 04 Aug 2009 08:21:27 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MYJ17-0000Ea-NG for guile-devel@gnu.org; Tue, 04 Aug 2009 08:21:27 -0400 Original-Received: from [199.232.76.173] (port=50869 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MYJ17-0000EL-1m for guile-devel@gnu.org; Tue, 04 Aug 2009 08:21:21 -0400 Original-Received: from a-pb-sasl-sd.pobox.com ([64.74.157.62]:37383 helo=sasl.smtp.pobox.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MYJ16-0003PN-A9 for guile-devel@gnu.org; Tue, 04 Aug 2009 08:21:20 -0400 Original-Received: from localhost.localdomain (unknown [127.0.0.1]) by a-pb-sasl-sd.pobox.com (Postfix) with ESMTP id 282F31D201; Tue, 4 Aug 2009 08:21:19 -0400 (EDT) Original-Received: from unquote (unknown [82.123.246.238]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-sd.pobox.com (Postfix) with ESMTPSA id 3B5E61D200; Tue, 4 Aug 2009 08:21:16 -0400 (EDT) In-Reply-To: <87k51pyj0c.fsf@arudy.ossau.uklinux.net> (Neil Jerram's message of "Thu, 30 Jul 2009 22:10:27 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.92 (gnu/linux) X-Pobox-Relay-ID: 505B069C-80F1-11DE-9E0E-AEF1826986A2-02397024!a-pb-sasl-sd.pobox.com X-detected-operating-system: by monty-python.gnu.org: Solaris 10 (beta) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:9019 Archived-At: Hi Neil, On Thu 30 Jul 2009 23:10, Neil Jerram writes: > Andy Wingo writes: > >> On Wed 22 Jul 2009 23:48, Neil Jerram writes: >> >>> I have two overall questions in mind. >>> >>> - What do you have in mind as regards releasing this? Even though it >>> looks good, I think it would be better to let it mature for a while, >>> and hence not to put it into 1.9.x/2.0. (And we're not short of new >>> stuff in 1.9.x/2.0!) >> >> Personally I would prefer that it come out in 2.0. I'm fairly (but not >> entirely) confident of its consistency as it is, and quite confident >> that it is a more maintainable, and hence fixable, codebase. > > I could be wrong, but I don't intuitively feel comfortable with that. > It just feels too quick/early. > > On the other hand, I think this is really valuable work, and > absolutely don't want an interval of years or months before it gets > out there. > > What is our release plan after 2.0? I don't know. I'd like something > more dynamic than the very long intervals between major releases that > we've had in the past. But it seems there is a conflict between > > - major releases being the points at which we can break the API/ABI > (with accompanying documentation) > > - wanting to have such releases more frequently than in the past, so > that good new stuff gets out quicker > > - wanting not to create grief for Guile users by changing the API/ABI > frequently. > > Is there a solution? I don't know of one, no. I know of two models that work: one, when you're just starting developing a library, and downstream users are making the first cuts at their software too, and everything is in froth and people are willing to adapt to API or ABI changes. The library is not widely distributed, so changes don't affect many people besides the developers, like distributors or users. The second model is when you already have a wide deployed base. You can make additions to your API and ABI, and deprecated old API or ABI, but you can't remove old API or change the ABI. Incompatible breaks are painful, and the switching-over time is somewhere between a year and three years. The right length of a stable series seems to be about 4 or 5 years. So in the second condition, where Guile seems to be, we need to mostly preserve API and ABI, though we can remove the deprecated bits every few years. But new API or ABI has to be accompanied with lots of thought, because you have to support it for 5 years or more. Dunno, I'm babbling, but the thing is that I feel like if there are changes that need making, we should make them now. Like Mark's %nil work. My perception is that we won't have another chance for another few years. Unless of course, distros miss 2.0 altogether, like Python has done with 3.0 and 3.1... We could do that. Seems like needless churn, but perhaps it's necessary to get the wider exposure. >> The reason that I want it in is that bytevectors are a nice api for I/O, >> but once you have data in memory, often it's best to treat it as a >> uniform array -- be it of u8 color components, or s32 audio samples. >> >> Uniform vectors are almost by nature "in flight" between two places. > > (Not sure I agree. I'd say uniform vectors are mostly holding numbers > in a computation, or for plotting on a graph.) But how do you plot? If you use some sort of external software, you have two options: code your plotting in C, and loop over the data with the C API. Or do it in Scheme, and... loop over the s16vector, writing each sample individually? How do you get at the bits of the s16vector so they can be written to a port? Use the impoverished uniform-vector-write ? (rnrs bytevectors) combined with (rnrs io ports) is the best way to get numeric data into and out of a process, from Scheme. But -- the uniform vector API is the best API for dealing with that data from Scheme. > That sounds to me like motivation for adding a richer API to > bytevectors (and which could all be in Scheme), not necessarily for > the deep unification of uniform and byte vectors that you have coded. There is e.g.: (bytevector-s16-native-ref bv n) or (bytevector-s16-ref bv n (endianness big)) But that `n' is in bytes, not in elements. If you really want to treat the bytevector as a numeric array, you're better off with the SRFI-4 API. It is a better API. There's no reason why the SRFI-4 API could not apply to bytevectors: (s16vector-ref bv n) == (bytevector-s16-native-ref bv (* n 2)) Also, only srfi-4 vectors have a read syntax like #s16(1 2 3). You can't express that with bytevectors, because you would have to encode the endianness into your source file. > TBH, with your refactoring up to this point, I still don't have the > overall picture (arrays, uniform vectors, bitvectors, strings etc) > firmly in my head. I'd like to do that and then reconsider your > points above. There are two things. One is a generic API for accessing arrays, using array handles. The second is a rebase of srfi-4 vectors on top of bytevectors. >>>> (u8vector-ref #u32(#xffffffff) 0) => 255 > > Note that using #xffffffff here glosses over the endianness problem. Of course. Fortunately there is a sensible interpretation -- that the u32vector is in native-endianness. The alternative is this: (let ((bv (make-bytevector 4))) (bytevector-u32-native-set! bv 0 #xffffffff) (bytevector-u8-ref bv 0)) Which is actually less efficient. You could of course do: (bytevector-u8-ref #u32(#xffffffff) 0) => 255 (bytevector-u8-ref #u32(#x01234567) 0) => ? if that would be your preference; the latter answer is just as endianness-dependent as if you used the `let' idiom above to ref the value. > (I think my inclination at this point is that I'd prefer explicit > conversions.) When it matters, I would think that the bytevector API is sufficiently explicit for anyone. Note that referencing values that are more than 8 bits wide have two flavors: bytevector-s16-ref bv n endianness bytevector-s16-native-ref bv n So you have all the power available to you. Or... we could indeed prohibit (u8vector-ref #u32(0) 0). But there doesn't seem to be a point. Why bother? >> I ought to be able to get at the bits of a packed (uniform) vector. The >> whole point of being a uniform vector is to specify a certain bit layout >> of your data. > > Huh? I would say it is to be able to store numbers with a given range > (or precision) efficiently, and to be able to access them efficiently > from both Scheme and C. Note that for access, an f64vector is almost certainly less efficient than a Scheme vector of reals, from Scheme, due to the need to heap-allocate the f64 values as you ref them. I've written lots of code that deals with srfi-4 vectors. I have three kinds of use cases. First is data being shoved around in a dynamically-typed system: dbus messages, gconf values, a system we at work, etc. Second, but related, is dealing with chunks of data that come from elsewhere, like GDK pixbufs, or GStreamer buffers. Third is hacking compilers, as in Guile itself, or emitting machine code for other machines. In all of these cases, the data doesn't just stay in Guile. It either comes from somewhere else or ends up going somewhere else. The semantics that are implemented in this patch set actually help all of these cases, and make Scheme more powerful -- it's not just C any more that can get at the bits of an array. It allows me to code less in C and more in Scheme. >>>> (u8vector? #u32(#xffffffff)) => #f >> >> However, we need to preserve type dispatch: >> >> (cond >> ((u8vector? x) ...) >> ((s8vector? x) ...) >> ...) >> >>>> (bytevector? #u32(#xffffffff)) => #t >> >> This to me is like allowing (integer? 1) and (integer? 1.0); in >> /essence/ a #u32 is an array of bytes, interpreted in a certain way. > > I think you have in mind that all uniform vectors are filled by > reading from a port, or are destined for writing out to a port. > > That is an important use, but there is another one: preparing > numerical data for handling in both C and Scheme. In this use, the > concept of an underlying array of bytes plays no part. You are correct. But the other use cases I mentioned are no less valid. In summary... I don't mean to be a bore, but I really don't like the existing unif.c and srfi-4.c. They are painful to understand and to hack on. I think those bits should be merged. I also think that srfi-4 vectors should be implemented in terms of bytevectors, for the reasons above. If you really want to, we can prohibit u8vector-ref from operating on u32vectors, but that seems unnecessary to me. I also think that the behavior as implemented in wip-array-refactor should go in, for 2.0 -- because we just won't have another chance in the next few years. Not enough testing isn't really a valid concern IMO, because how else is it going to get testing? But I do appreciate your input, and decisions. Cheers, Andy -- http://wingolog.org/