From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Han-Wen Nienhuys" Newsgroups: gmane.lisp.guile.devel Subject: Re: Internal visibility Date: Wed, 11 Jun 2008 13:09:29 -0300 Message-ID: References: <87k5i5d6ei.fsf@ossau.uklinux.net> <87fxslr1jr.fsf_-_@gnu.org> <878wxv5t7q.fsf@gnu.org> <87mym6dv6t.fsf@gnu.org> <49dd78620806091110v7a667787mef392fbf4446139d@mail.gmail.com> <87iqwhn3jw.fsf@gnu.org> <87k5gw4eow.fsf@unknownlamer.org> Reply-To: hanwen@xs4all.nl NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1213417615 8743 80.91.229.12 (14 Jun 2008 04:26:55 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 14 Jun 2008 04:26:55 +0000 (UTC) Cc: guile-devel@gnu.org To: "Clinton Ebadi" Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Jun 14 06:27:34 2008 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1K7NMS-0006gE-UW for guile-devel@m.gmane.org; Sat, 14 Jun 2008 06:27:33 +0200 Original-Received: from localhost ([127.0.0.1]:35534 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K7NLe-0008F9-Qc for guile-devel@m.gmane.org; Sat, 14 Jun 2008 00:26:42 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1K6StK-0005PF-KI for guile-devel@gnu.org; Wed, 11 Jun 2008 12:09:42 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1K6StK-0005Oq-5q for guile-devel@gnu.org; Wed, 11 Jun 2008 12:09:42 -0400 Original-Received: from [199.232.76.173] (port=42262 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1K6StJ-0005Ol-Qu for guile-devel@gnu.org; Wed, 11 Jun 2008 12:09:41 -0400 Original-Received: from yw-out-1718.google.com ([74.125.46.152]:1093) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1K6StJ-0002ed-Bc for guile-devel@gnu.org; Wed, 11 Jun 2008 12:09:41 -0400 Original-Received: by yw-out-1718.google.com with SMTP id 9so1803856ywk.66 for ; Wed, 11 Jun 2008 09:09:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:reply-to :to:subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=91UxW/RgqjExPTf9y+KcpO7NvQhRKYyRmtx/LoGNy84=; b=NeqJoyfsFY53Naan5j9BQ+FoyhD99m2ePgo2cqt+EjsMlK4FkKsKNBrqfWKM9MbIX9 //o16nI/gib7cO1wLcmUwSzjn0tidiHNWWX3qji4z/TN/aDnrhAfqKWHzNJGSjoE1dbA jXdQ4C4KPoLkKxt+p+u7vDSYpKut4ttZRRI6k= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:to:subject:cc:in-reply-to :mime-version:content-type:content-transfer-encoding :content-disposition:references; b=wd22iI4atWdQZQwjpmvUb4giyKJxSaCeh0yPRT2WZAWLzSY9E37CCO4GTL+PddziSF 7a9a+48LPgubohB8SrlScUJqMaZGq0vFkqNToJSycXQGx+mJFvW+hL3yb4x2y9pqr23R ZMtTcd4/MHXYy62ivKbjGTF/+gyjbOEqqbgS4= Original-Received: by 10.114.182.15 with SMTP id e15mr6816481waf.84.1213200569231; Wed, 11 Jun 2008 09:09:29 -0700 (PDT) Original-Received: by 10.114.173.4 with HTTP; Wed, 11 Jun 2008 09:09:29 -0700 (PDT) In-Reply-To: <87k5gw4eow.fsf@unknownlamer.org> Content-Disposition: inline X-detected-kernel: by monty-python.gnu.org: Linux 2.6 (newer, 2) X-Mailman-Approved-At: Sat, 14 Jun 2008 00:25:53 -0400 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:7328 On Wed, Jun 11, 2008 at 4:24 AM, Clinton Ebadi wrote: >>> Strings in Guile will eventually be sequences of Unicode code points (as >>> opposed to "bytes"), which can be represented in a variety of different >>> ways (UTF-8, UCS-4, etc.). How Guile represents strings and whether >>> this representation "changes dynamically" (as you suggested) should not >>> be exposed to the applications in order to leave as much freedom as >>> possible to Guile's implementation strategy. >> >> I think that a sequence of Unicode code points this is a somewhat >> limited view of how strings should be used. Among others, the >> implication is that programs cannot rely on being able to index a >> string in O(1) time (since the string might be UTF-x encoded). >> >> What do I use if I want to have guaranteed O(1) indexing -that is- if >> I want to manipulate strings of bytes? >> >> How would I read the contents of a binary file without jumping through >> encoding hoops? > > Uniform byte vectors. If you're using C you can just read everything > into a normal C array and then use > scm_take_u8_vector()/scm_u8vector_elements(). Are you serious? You want me to run regexes over uniform vectors? concatenating uniform vectors? doing a scm_display and being able to make sense of it? What scares me of this idea of doing The Right Thing with unicode of me is that Judging by the signature of the functions, the char* <-> string conversion are thought to (in the future, at least) change their behavior depending on the LC_LOCALE environment setting. If I would use strings rather than uniform vectors (which seems wise if I don't want to reimplement half of guile) * the performance of my software will be dependent on what users happen to have in their LOCALE. If I am unlucky, every string that passes through the C interface will transcoded from and to UTF-x implicitly. * GUILE is thought to only support one locale at a time. No using GUILE to transcode strings, for example. Can we at least have a scm_to_locale_stringn() that takes an explicit encoding/locale parameter, so that I can have some guarantee of how GUILE is (not) munging my strings? -- Han-Wen Nienhuys - hanwen@xs4all.nl - http://www.xs4all.nl/~hanwen