From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.user Subject: Re: iconv or something like that Date: Thu, 23 Oct 2014 14:00:31 -0400 Message-ID: <87oat27jk0.fsf@yeeloong.lan> References: <5448E72A.6030307@konradmakowski.pl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1414087398 8210 80.91.229.3 (23 Oct 2014 18:03:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 23 Oct 2014 18:03:18 +0000 (UTC) Cc: Guile User To: Konrad Makowski Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Thu Oct 23 20:03:11 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XhMj8-0004bY-Ik for guile-user@m.gmane.org; Thu, 23 Oct 2014 20:03:10 +0200 Original-Received: from localhost ([::1]:42689 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhMj8-0003rH-5w for guile-user@m.gmane.org; Thu, 23 Oct 2014 14:03:10 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:47347) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhMhj-00022W-IN for guile-user@gnu.org; Thu, 23 Oct 2014 14:01:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XhMhd-0005Dm-Ay for guile-user@gnu.org; Thu, 23 Oct 2014 14:01:43 -0400 Original-Received: from world.peace.net ([96.39.62.75]:45293) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhMhd-0005B9-8G for guile-user@gnu.org; Thu, 23 Oct 2014 14:01:37 -0400 Original-Received: from c-24-62-95-23.hsd1.ma.comcast.net ([24.62.95.23] helo=yeeloong.lan) by world.peace.net with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1XhMhL-0001O6-5y; Thu, 23 Oct 2014 14:01:19 -0400 In-Reply-To: <5448E72A.6030307@konradmakowski.pl> (Konrad Makowski's message of "Thu, 23 Oct 2014 13:31:54 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.94 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 96.39.62.75 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:11591 Archived-At: Konrad Makowski writes: > Is there any solution to convert charset from one encoding to another? Yes, but character encodings are only relevant when converting between a sequence of _bytes_ (a bytevector), and a sequence of _characters_ [*] (a string). These conversions happen implicitly while performing I/O, converting Scheme strings to/from C, etc. [*] More precisely, Scheme strings are sequences of unicode code points. It doesn't make sense to talk about the encoding of a Scheme string, or to convert a Scheme string from one encoding to another, because they are not byte sequences. It sounds like you already have a Scheme string that was incorrectly decoded from bytes, and are asking how to fix it up. Unfortunately, this won't work, because many valid ISO-8859-2 byte sequences are not valid UTF-8, and will therefore lead to decoding errors. > I have database in iso-8859-2 but my script runs in utf-8. I use dbi module. Having looked at the guile-dbi source code, I see that it always uses the current locale encoding when talking to databases. Specifically, it always uses 'scm_from_locale_string' and 'scm_to_locale_string'. For your purposes, you'd like it to use 'scm_from_stringn' and 'scm_to_stringn' instead, with "ISO-8859-2" as the 'encoding' argument. My knowledge of modern databases is limited, so I'm not sure how this problem is normally dealt with. It seems to me that, ideally, strings in databases should be sequences of Unicode code points, rather than sequences of bytes. If that were the case, then this problem wouldn't arise. It would be good if someone with more knowledge of databases would chime in here. In the meantime, I can see a few possible solutions/workarounds: * Enhance guile-dbi to include an 'encoding' field to its database handles, add a new API procedure to set it, and use it in all the appropriate places. This only makes sense if database strings are conceptually byte sequences, otherwise it should probably be fixed in some other way. * Hack your local copy of guile-dbi to use 'scm_from_stringn' and 'scm_to_stringn' with a hard-coded "ISO-8859-2" in the appropriate places. * Use 'setlocale' to set a ISO-8859-2 locale temporarily while performing database queries. Which database are you using? Mark