From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Konrad Makowski Newsgroups: gmane.lisp.guile.user Subject: Re: iconv or something like that Date: Sat, 25 Oct 2014 10:24:34 +0200 Message-ID: <544B5E42.10107@konradmakowski.pl> References: <5448E72A.6030307@konradmakowski.pl> <87oat27jk0.fsf@yeeloong.lan> <544B4B4A.8090801@konradmakowski.pl> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1414225511 15359 80.91.229.3 (25 Oct 2014 08:25:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 25 Oct 2014 08:25:11 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sat Oct 25 10:25:04 2014 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Xhwel-0003Qs-Mf for guile-user@m.gmane.org; Sat, 25 Oct 2014 10:25:03 +0200 Original-Received: from localhost ([::1]:52529 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xhwel-0000re-9Q for guile-user@m.gmane.org; Sat, 25 Oct 2014 04:25:03 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33373) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhweX-0000rW-LT for guile-user@gnu.org; Sat, 25 Oct 2014 04:24:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XhweR-0007TZ-Ct for guile-user@gnu.org; Sat, 25 Oct 2014 04:24:49 -0400 Original-Received: from mail.mydevil.net ([212.91.26.138]:34080) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XhweR-0007TV-4b for guile-user@gnu.org; Sat, 25 Oct 2014 04:24:43 -0400 Original-Received: from adlv95.neoplus.adsl.tpnet.pl ([79.185.25.95] helo=[192.168.1.22]) by mail.mydevil.net with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.80.1) (envelope-from ) id 1Xhwb3-000LKQ-Ri for guile-user@gnu.org; Sat, 25 Oct 2014 10:21:15 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 In-Reply-To: <544B4B4A.8090801@konradmakowski.pl> X-AV-Check: Passed X-System-Sender: poczta@konradmakowski.pl X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 212.91.26.138 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:11595 Archived-At: O problem resolved and was not related to mysql or locale but my mistake. Konrad W dniu 25.10.2014 o 09:03, Konrad Makowski pisze: > I'm using MySQL. And figure out that if i send query: "SET NAMES utf8" > or "SET NAMES utf8 COLLATE utf8_general_ci" to the database (in > terminal for example) mysql converts for me charset of returned data. > But if i do the same in my guile script it reports error: > In ice-9/boot-9.scm: > 157: 9 [catch #t # ...] > In unknown file: > ?: 8 [apply-smob/1 #] > In ice-9/boot-9.scm: > 63: 7 [call-with-prompt prompt0 ...] > In ice-9/eval.scm: > 432: 6 [eval # #] > 432: 5 [eval # #] > 387: 4 [eval # #] > 387: 3 [eval # #] > 387: 2 [eval # #] > 387: 1 [eval # #] > In unknown file: > ?: 0 [utf8->string #vu8(80 65 87 69 163)] > > ERROR: In procedure utf8->string: > ERROR: Throw to key `decoding-error' with args `("scm_from_stringn" > "input locale conversion error" 84 #vu8(80 65 87 69 163))'. > > My locale say that: > LANG=pl_PL.UTF-8 > LANGUAGE=pl:en > LC_CTYPE="pl_PL.UTF-8" > LC_NUMERIC="pl_PL.UTF-8" > LC_TIME="pl_PL.UTF-8" > LC_COLLATE="pl_PL.UTF-8" > LC_MONETARY="pl_PL.UTF-8" > LC_MESSAGES="pl_PL.UTF-8" > LC_PAPER="pl_PL.UTF-8" > LC_NAME="pl_PL.UTF-8" > LC_ADDRESS="pl_PL.UTF-8" > LC_TELEPHONE="pl_PL.UTF-8" > LC_MEASUREMENT="pl_PL.UTF-8" > LC_IDENTIFICATION="pl_PL.UTF-8" > LC_ALL=pl_PL.UTF-8 > > Any idea? > > Konrad > > W dniu 23.10.2014 o 20:00, Mark H Weaver pisze: >> Konrad Makowski writes: >>> Is there any solution to convert charset from one encoding to another? >> Yes, but character encodings are only relevant when converting between a >> sequence of _bytes_ (a bytevector), and a sequence of _characters_ [*] >> (a string). These conversions happen implicitly while performing I/O, >> converting Scheme strings to/from C, etc. >> >> [*] More precisely, Scheme strings are sequences of unicode code points. >> >> It doesn't make sense to talk about the encoding of a Scheme string, or >> to convert a Scheme string from one encoding to another, because they >> are not byte sequences. >> >> It sounds like you already have a Scheme string that was incorrectly >> decoded from bytes, and are asking how to fix it up. Unfortunately, >> this won't work, because many valid ISO-8859-2 byte sequences are not >> valid UTF-8, and will therefore lead to decoding errors. >> >>> I have database in iso-8859-2 but my script runs in utf-8. I use dbi >>> module. >> Having looked at the guile-dbi source code, I see that it always uses >> the current locale encoding when talking to databases. Specifically, it >> always uses 'scm_from_locale_string' and 'scm_to_locale_string'. For >> your purposes, you'd like it to use 'scm_from_stringn' and >> 'scm_to_stringn' instead, with "ISO-8859-2" as the 'encoding' argument. >> >> My knowledge of modern databases is limited, so I'm not sure how this >> problem is normally dealt with. It seems to me that, ideally, strings >> in databases should be sequences of Unicode code points, rather than >> sequences of bytes. If that were the case, then this problem wouldn't >> arise. >> >> It would be good if someone with more knowledge of databases would chime >> in here. >> >> In the meantime, I can see a few possible solutions/workarounds: >> >> * Enhance guile-dbi to include an 'encoding' field to its database >> handles, add a new API procedure to set it, and use it in all the >> appropriate places. This only makes sense if database strings are >> conceptually byte sequences, otherwise it should probably be fixed in >> some other way. >> >> * Hack your local copy of guile-dbi to use 'scm_from_stringn' and >> 'scm_to_stringn' with a hard-coded "ISO-8859-2" in the appropriate >> places. >> >> * Use 'setlocale' to set a ISO-8859-2 locale temporarily while >> performing database queries. >> >> Which database are you using? >> >> Mark >> > > >