From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.lisp.guile.user Subject: Re: I'm looking for a method of converting a string's character encoding Date: Sat, 28 Apr 2012 23:55:32 +0300 Message-ID: <834ns37f0b.fsf@gnu.org> References: <87obqbwykh.fsf@gnuvola.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: dough.gmane.org 1335646540 7639 80.91.229.3 (28 Apr 2012 20:55:40 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 28 Apr 2012 20:55:40 +0000 (UTC) Cc: guile-user@gnu.org, ttn@gnuvola.org, sunjoong@gmail.com To: Daniel Krueger Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sat Apr 28 22:55:37 2012 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SOEg2-0004An-DO for guile-user@m.gmane.org; Sat, 28 Apr 2012 22:55:34 +0200 Original-Received: from localhost ([::1]:46936 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SOEg1-0008IZ-Hv for guile-user@m.gmane.org; Sat, 28 Apr 2012 16:55:33 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:50816) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SOEfx-0008IT-KJ for guile-user@gnu.org; Sat, 28 Apr 2012 16:55:30 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SOEfv-0002qh-Sg for guile-user@gnu.org; Sat, 28 Apr 2012 16:55:29 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:45089) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SOEfv-0002qL-KP for guile-user@gnu.org; Sat, 28 Apr 2012 16:55:27 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0M3700G00ICHVW00@a-mtaout23.012.net.il> for guile-user@gnu.org; Sat, 28 Apr 2012 23:55:25 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([84.229.21.156]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M3700GA8JGCTYB0@a-mtaout23.012.net.il>; Sat, 28 Apr 2012 23:55:25 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.175 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:9419 Archived-At: > Date: Sat, 28 Apr 2012 20:29:22 +0200 > From: Daniel Krueger > Cc: guile-user@gnu.org, Sunjoong Lee > > i think there shouldn't be any transcoding of guile's strings, as > strings are internal representation of characters, no matter how they > are encoded. So the only time when encoding matters is when it passes > it's `internal boundarys', i mean if you write the string to a port or > read from a port or pass it as a string to a foreign library. For the > ports all transcoding is available, and as said, the real > representation of guile strings internally is as utf8, which can't be > changed. The only additional thing i forgot about are bytevectors, if > you convert a string to an explicit representation, but afaik there > you also can give the encoding to use. > > Am I wrong? You are mostly right, but only "mostly". Experience teaches that sometimes you need to change encoding even inside "the boundaries". One notable example is when the original encoding was determined incorrectly, and the application wants to "re-decode" the string, when its external origin is no longer available. Another example is an application that wants to convert an encoded string into base-64 (or similar) form -- you'll need to encode the string internally first. These kinds of rare, but still important, use cases are the reason why Emacs Lisp has primitives to do encoding and decoding of in-memory strings; as much as Emacs maintainers want to get rid of the related need to support "unibyte strings", they are not going to go away any time soon. IOW, Guile needs a way to represent a string encoded in something other than UTF-8, and convert between UTF-8 and other encodings.