From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Marius Vollmer Newsgroups: gmane.lisp.guile.devel Subject: Re: Unicode and Guile Date: Wed, 12 Nov 2003 03:30:23 +0100 Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: <87wua6fhds.fsf@zagadka.ping.de> References: <20031021171534.GA13246@lark> <200310260003.RAA10375@morrowfield.regexps.com> <20031031132525.GB715@lark> <200311032031.MAA19389@morrowfield.regexps.com> <20031106181635.GA9546@lark> <200311111902.LAA25202@morrowfield.regexps.com> <87znf2ig46.fsf@zagadka.ping.de> <200311120140.RAA26670@morrowfield.regexps.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1068604318 934 80.91.224.253 (12 Nov 2003 02:31:58 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 12 Nov 2003 02:31:58 +0000 (UTC) Cc: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Nov 12 03:31:55 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AJknS-0002Yz-01 for ; Wed, 12 Nov 2003 03:31:55 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AJlkP-00008B-9n for guile-devel@m.gmane.org; Tue, 11 Nov 2003 22:32:49 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AJljw-00007o-K3 for guile-devel@gnu.org; Tue, 11 Nov 2003 22:32:20 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AJljQ-0008UZ-R8 for guile-devel@gnu.org; Tue, 11 Nov 2003 22:32:19 -0500 Original-Received: from [195.253.8.218] (helo=mail.dokom.net) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AJljQ-0008UR-Aa for guile-devel@gnu.org; Tue, 11 Nov 2003 22:31:48 -0500 Original-Received: from dialin.speedway43.dip157.dokom.de ([195.138.43.157] helo=zagadka.ping.de) by mail.dokom.net with smtp (Exim 3.36 #3) id 1AJkog-0002l6-00 for guile-devel@gnu.org; Wed, 12 Nov 2003 03:33:10 +0100 Original-Received: (qmail 15285 invoked by uid 1000); 12 Nov 2003 02:30:23 -0000 Original-To: Tom Lord In-Reply-To: <200311120140.RAA26670@morrowfield.regexps.com> (Tom Lord's message of "Tue, 11 Nov 2003 17:40:28 -0800 (PST)") User-Agent: Gnus/5.1002 (Gnus v5.10.2) Emacs/21.3 (gnu/linux) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:3003 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:3003 Tom Lord writes: > > > ~ (make-text-marker text index) => > > > What about having _only_ markers and not allow integers as > > indices? > > Seems excessive and aribtrary. How do I implement (Emacs') GOTO-CHAR > without standing on my head? Yes, right, there need to be conversions between markers and integers, but I'm worried that people will write code like (do ((i 0 (1+ i)) (>= i (text-length text))) (... (text-ref text i) ...)) and we'll have trouble implementing this efficiently for graphemes of variable sizes. When people are encouraged to use markers like this (do ((i (text-start text) (marker-forward i 1)) ((marker-at-end? i))) (... (marker-ref i) ...)) things should be easier. (Of course, there should also be things like 'text-map', etc.) > (I strongly suggest splay trees as an ideal implementation strategy > for for TEXT?. They would make _both_ mutating and functional > REPLACE efficient.) Ok, if there is no cost for making texts mutable, we should of course do it. > > > > There is no essential difference between a grapheme and a text > > > object of length 1, and thus the proposal makes GRAPHEME? a > > > subtype of TYPE. > > > Do we need the concept of grapheme at all, then? > > Interesting question! And it ties in with your question about "why > not just markers and not integer indexes". > > I don't see a good way to ground markers _without_ integer indexes. Yes. What I'm worried about is that it is expensive to go from an integer index to the memory location where the indicated grapheme is stored. On the other hand, it us easy to increment the marker to the next grapheme in a text. > Graphemes are a reasonable "what the user thinks of as a character". Yep, the concept of graphemes is useable, if only in the documentation. What I really had in mind was not the concept, but the data type. Is it important to have a new data type, or could we just have (define (grapheme? obj) (and (text? obj) (= (text-length obj) 1))) (define grapheme=? text=?) (define grapheme