From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: Re: Unicode and Guile Date: Mon, 17 Nov 2003 18:17:28 +0200 Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Message-ID: <20031117161728.GE730@lark> References: <20031021171534.GA13246@lark> <200310260003.RAA10375@morrowfield.regexps.com> <20031031132525.GB715@lark> <200311032031.MAA19389@morrowfield.regexps.com> <20031106181635.GA9546@lark> <200311111902.LAA25202@morrowfield.regexps.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1069152044 28565 80.91.224.253 (18 Nov 2003 10:40:44 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 18 Nov 2003 10:40:44 +0000 (UTC) Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Nov 18 11:40:42 2003 Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AM3Hm-0005KZ-00 for ; Tue, 18 Nov 2003 11:40:42 +0100 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.24) id 1AM4D7-0004z8-Oa for guile-devel@m.gmane.org; Tue, 18 Nov 2003 06:39:57 -0500 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.24) id 1AM45H-0003UV-3G for guile-devel@gnu.org; Tue, 18 Nov 2003 06:31:51 -0500 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.24) id 1AM44j-0003KF-PL for guile-devel@gnu.org; Tue, 18 Nov 2003 06:31:48 -0500 Original-Received: from [216.166.232.203] (helo=ambient.2y.net) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.24) id 1AM44a-0003HT-Up for guile-devel@gnu.org; Tue, 18 Nov 2003 06:31:09 -0500 Original-Received: from localhost (mantis.schoolnet.na [::ffff:196.44.140.238]) (AUTH: LOGIN wingo) by ambient.2y.net with esmtp; Tue, 18 Nov 2003 05:29:29 -0500 Original-Received: from wingo by localhost with local (Exim 3.36 #1 (Debian)) id 1ALm48-0000PI-00 for ; Mon, 17 Nov 2003 18:17:28 +0200 Original-To: guile-devel@gnu.org Mail-Followup-To: guile-devel@gnu.org Content-Disposition: inline In-Reply-To: <200311111902.LAA25202@morrowfield.regexps.com> X-Operating-System: Linux lark 2.4.20-1-686 User-Agent: Mutt/1.5.4i X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.2 Precedence: list List-Id: Developers list for Guile, the GNU extensibility library List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: main.gmane.org gmane.lisp.guile.devel:3050 X-Report-Spam: http://spam.gmane.org/gmane.lisp.guile.devel:3050 On Tue, 11 Nov 2003, Tom Lord wrote: > Thanks for the pointer to the Python type (on which I won't comment > :-). Thanks for the excuse to think about this more. And thanks for thinking this through a lot more properly than I was, and for caring about the problem, and for having patience with the ignorant :-) > ** CHAR? Makes No Sense In Unicode I think I'm starting to get a clue. Case mapping demonstrates this pretty clearly... Incidentally, GLib's function for this is evidently broken: gunichar g_unichar_toupper (gunichar c); Although they do have g_utf8_strup, which operates on a string and does the correct thing. > * The Proposal > > The proposal has two parts. Part 1 introduces a new type, TEXT?, > which is a string-like type that is compatible with Unicode, and > a subtype of TEXT?, GRAPHEME?, to represent "conceptual > characters". Wow, you really have thought a lot more about this than I have. > It is important to note that, in general, EQV? and EQUAL? do _not_ > test for grapheme equality. GRAPHEME=? must be used instead. I can see why EQV? shouldn't test for equality: a precomposed grapheme can be the same as one made with combining characters. But why not overload EQUAL?, given that they would display the same (with a suitable glyph rendering library)? Perhaps this is not possible in portable Scheme? If this question is ignorant, my apologies. > So, texts really need markers that work like those in Emacs: It does indeed appear so. I withdraw my ridicule of this idea :-P > * Optional Changes to CHAR? and STRING? > > ~ TEXT? values contain an "encoding" attribute, just as strings > do (utf-8, etc.) Why should an implementation support more than one encoding, internally? > ~ (string? a-text-value) => #t Would be difficult with Guile, given the C interface... Perhaps if there were an abstract string type, with "simple strings" as a subtype, then C functions wanting a string (just for reading) would not call SCM_STRING_CHARS but scm_string_chars, or the like... > [I]f I'm sitting in california and write a protable Scheme program > that generates anagrams of a name, it'd be awefully swell if (a) My > code doesn't have to "know" anything special about unicode internals; > (b) my code works when passed her name as input. Indeed. Overall, your proposal is IMHO well-thought out, and is of high quality. I am humbled :). I hope something like this can go into Guile soon. Cheers, wingo. _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel