From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.devel Subject: Re: mutable interfaces - was: Guile: What's wrong with this? Date: Sat, 07 Jan 2012 13:55:55 -0500 Message-ID: <8739br8hqc.fsf@netris.org> References: <4F027F35.5020001@gmail.com> <1325603029.22166.YahooMailNeo@web37906.mail.mud.yahoo.com> <4F032C41.3070300@gmail.com> <87mxa4ifux.fsf@gnu.org> <4F038BF4.1070200@gnu.org> <87obujzmmc.fsf@Kagami.home> <4F048972.5040803@gnu.org> <87lipnm8yx.fsf@Kagami.home> <4F04D01D.5050801@gnu.org> <8762grf28k.fsf@netris.org> <4F05DC47.1000202@gnu.org> <878vlldb4k.fsf@netris.org> <1325811764.22562.YahooMailNeo@web37903.mail.mud.yahoo.com> <87wr95bo9y.fsf@netris.org> <1325857075.77324.YahooMailNeo@web37903.mail.mud.yahoo.com> <877h14bsx0.fsf@netris.org> <4F074647.1020000@gnu.org> <87ty478p9f.fsf@netris.org> <4F088252.9040000@gnu.org> <877h138iwm.fsf@netris.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1325962593 3461 80.91.229.12 (7 Jan 2012 18:56:33 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 7 Jan 2012 18:56:33 +0000 (UTC) Cc: guile-devel@gnu.org To: Bruce Korb Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sat Jan 07 19:56:26 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RjbRG-0002H9-27 for guile-devel@m.gmane.org; Sat, 07 Jan 2012 19:56:22 +0100 Original-Received: from localhost ([::1]:47254 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RjbRF-0007FF-Mz for guile-devel@m.gmane.org; Sat, 07 Jan 2012 13:56:21 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:45638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RjbRD-0007Ay-FR for guile-devel@gnu.org; Sat, 07 Jan 2012 13:56:20 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RjbRC-0000n7-8j for guile-devel@gnu.org; Sat, 07 Jan 2012 13:56:19 -0500 Original-Received: from world.peace.net ([96.39.62.75]:40118) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RjbRB-0000n3-W3; Sat, 07 Jan 2012 13:56:18 -0500 Original-Received: from c-98-216-245-176.hsd1.ma.comcast.net ([98.216.245.176] helo=yeeloong) by world.peace.net with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.69) (envelope-from ) id 1RjbR6-0005TJ-Fi; Sat, 07 Jan 2012 13:56:12 -0500 In-Reply-To: <877h138iwm.fsf@netris.org> (Mark H. Weaver's message of "Sat, 07 Jan 2012 13:30:33 -0500") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 96.39.62.75 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:13417 Archived-At: Replying to myself... > Again, I stress that this has nothing to do with Guile. All software, > if it wishes to be properly internationalized, needs to think about > where a string came from. In general, your program's source code (and > thus the C string literals it contains) will have a different encoding > than C strings that come from the user. C strings of different > encodings are essentially of different types (even though C's type > system is too crude to distinguish them), and you must treat them as > such. In case it wasn't clear: Scheme strings don't have any encoding; they are a sequence of Unicode characters. Therefore, you never have to think about where a Scheme string came from. What you need to think about is where a raw sequence of bytes came from, whether it be a C string (C chars are not characters but merely bytes), a Scheme bytevector, or the bytes in a command-line argument, environment variable, or the bytes read from a file descriptor. Ideally, our code would make these distinctions very clear. However, if you're not motivated (or don't have time) to fix that properly right now, there's one fact that can save you a lot of time: on GNU/Linux and POSIX systems, every locale encoding is compatible with ASCII. Therefore, if you know that a string contains only ASCII characters, then you don't need to think about whether to use scm_from_locale_string or scm_from_utf8_string, because they'll both be equivalent. Mark