From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: Using libunistring for string comparisons et al Date: Sat, 19 Mar 2011 10:06:51 -0400 Message-ID: <87lj0bi4qs.fsf@netris.org> References: <336042.33326.qm@web37901.mail.mud.yahoo.com> <878vwgmhah.fsf@netris.org> <511668.33680.qm@web37902.mail.mud.yahoo.com> <87sjuokniq.fsf@netris.org> <118142.11911.qm@web37907.mail.mud.yahoo.com> <87ipvjlvgj.fsf@netris.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1300543957 14706 80.91.229.12 (19 Mar 2011 14:12:37 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 19 Mar 2011 14:12:37 +0000 (UTC) Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , guile-devel@gnu.org To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Mar 19 15:12:33 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q0wtM-00051v-1F for guile-devel@m.gmane.org; Sat, 19 Mar 2011 15:12:32 +0100 Original-Received: from localhost ([127.0.0.1]:48538 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q0wtK-0001RR-SK for guile-devel@m.gmane.org; Sat, 19 Mar 2011 10:12:30 -0400 Original-Received: from [140.186.70.92] (port=48480 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q0woa-0006mR-Kt for guile-devel@gnu.org; Sat, 19 Mar 2011 10:07:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q0woZ-0001BU-91 for guile-devel@gnu.org; Sat, 19 Mar 2011 10:07:36 -0400 Original-Received: from world.peace.net ([96.39.62.75]:48438) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q0woW-00018G-4E; Sat, 19 Mar 2011 10:07:32 -0400 Original-Received: from 209-6-93-251.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com ([209.6.93.251] helo=freedomincluded) by world.peace.net with esmtpa (Exim 4.69) (envelope-from ) id 1Q0woA-0003nH-HH; Sat, 19 Mar 2011 10:07:10 -0400 Original-Received: from mhw by freedomincluded with local (Exim 4.69) (envelope-from ) id 1Q0wns-0004My-EC; Sat, 19 Mar 2011 10:06:52 -0400 In-Reply-To: (Andy Wingo's message of "Sat, 19 Mar 2011 13:31:30 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:11915 Archived-At: Andy Wingo writes: >> Ludovic, Andy and I discussed this on IRC, and came to the conclusion >> that UTF-8 should be the encoding assumed by functions such as >> scm_c_define, scm_c_define_gsubr, scm_c_define_gsubr_with_generic, >> scm_c_export, scm_c_define_module, scm_c_resolve_module, >> scm_c_use_module, etc. > > Can we step back a little and revisit this decision? > > Clearly, we need to specify the encoding for these procedures, and have > it not be locale encoding. However I don't think we would be breaking > anyone's code if we simply restricted it to 7-bit ASCII. > > I am quite sensitive to the "justice" argument -- that we not restrict > the names our users give to Scheme identifiers, or the characters they > use in their strings. But these values typically come from literals in > C source code, which has no portable superset of ASCII. Not everyone writes portable code. Who here limits their code to the R6RS and avoids all Guile-specific features? Portability may be something to strive for, but when compelling reasons dictate otherwise, it's not unreasonable to limit your portability to better compilers like gcc. For those who don't speak English but wish to hack with Guile, being able to write code in their own language is a compelling reason. Anyway, one can only hope that some future C standard supports unicode, but if the folks who control those standards don't give a damn about non-english speakers, that doesn't mean we should follow their example. > Furthermore, such a default would not restrict our users at all -- they > can always use the non-_c_ variants with a symbol explicitly constructed > with (e.g.) scm_from_utf8_symbol. We have those convenience functions for a reason. You recently proposed several more convenience functions, so apparently you prefer to save keystrokes like the rest of us. I'm sure our non-english-speaking comrades feel the same way. Let me ask you this: why would you oppose changing the scm_c_ functions to use UTF-8 by default? If you're comfortable with ASCII-only names, then UTF-8 will work fine for you, since ASCII strings are unchanged in UTF-8. > Finally, users are moving away from these functions anyway. The thing > to do now is to write Scheme, not C: and in Scheme we do the Right > Thing. If you write all your code in Scheme now, then you should care even less about the scm_c_ functions. So why oppose what you recently agreed to? As a meta-comment: I've grown rather weary from fighting this battle alone. My hacking has completely stopped because of this argument. To those of you out there who care about this issue, please let your voices be heard. I know you're there, because a few of you stated your opinions rather strongly on IRC. If others don't join in soon, I'm likely to soon give up on this, and be left with rather less enthusiasm for Guile than when I started, I'm sorry to say. Mark