Re: iconv or something like that

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

From: Konrad Makowski <poczta@konradmakowski.pl>
To: guile-user@gnu.org
Subject: Re: iconv or something like that
Date: Sat, 25 Oct 2014 10:24:34 +0200	[thread overview]
Message-ID: <544B5E42.10107@konradmakowski.pl> (raw)
In-Reply-To: <544B4B4A.8090801@konradmakowski.pl>

O problem resolved and was not related to mysql or locale but my mistake.

Konrad

W dniu 25.10.2014 o 09:03, Konrad Makowski pisze:
> I'm using MySQL. And figure out that if i send query: "SET NAMES utf8" 
> or "SET NAMES utf8 COLLATE utf8_general_ci" to the database (in 
> terminal for example) mysql converts for me charset of returned data. 
> But if i do the same in my guile script it reports error:
> In ice-9/boot-9.scm:
>  157: 9 [catch #t #<catch-closure 1cff400> ...]
> In unknown file:
>    ?: 8 [apply-smob/1 #<catch-closure 1cff400>]
> In ice-9/boot-9.scm:
>   63: 7 [call-with-prompt prompt0 ...]
> In ice-9/eval.scm:
>  432: 6 [eval # #]
>  432: 5 [eval # #]
>  387: 4 [eval # #]
>  387: 3 [eval # #]
>  387: 2 [eval # #]
>  387: 1 [eval # #]
> In unknown file:
>    ?: 0 [utf8->string #vu8(80 65 87 69 163)]
>
> ERROR: In procedure utf8->string:
> ERROR: Throw to key `decoding-error' with args `("scm_from_stringn" 
> "input locale conversion error" 84 #vu8(80 65 87 69 163))'.
>
> My locale say that:
> LANG=pl_PL.UTF-8
> LANGUAGE=pl:en
> LC_CTYPE="pl_PL.UTF-8"
> LC_NUMERIC="pl_PL.UTF-8"
> LC_TIME="pl_PL.UTF-8"
> LC_COLLATE="pl_PL.UTF-8"
> LC_MONETARY="pl_PL.UTF-8"
> LC_MESSAGES="pl_PL.UTF-8"
> LC_PAPER="pl_PL.UTF-8"
> LC_NAME="pl_PL.UTF-8"
> LC_ADDRESS="pl_PL.UTF-8"
> LC_TELEPHONE="pl_PL.UTF-8"
> LC_MEASUREMENT="pl_PL.UTF-8"
> LC_IDENTIFICATION="pl_PL.UTF-8"
> LC_ALL=pl_PL.UTF-8
>
> Any idea?
>
> Konrad
>
> W dniu 23.10.2014 o 20:00, Mark H Weaver pisze:
>> Konrad Makowski <poczta@konradmakowski.pl> writes:
>>> Is there any solution to convert charset from one encoding to another?
>> Yes, but character encodings are only relevant when converting between a
>> sequence of _bytes_ (a bytevector), and a sequence of _characters_ [*]
>> (a string).  These conversions happen implicitly while performing I/O,
>> converting Scheme strings to/from C, etc.
>>
>> [*] More precisely, Scheme strings are sequences of unicode code points.
>>
>> It doesn't make sense to talk about the encoding of a Scheme string, or
>> to convert a Scheme string from one encoding to another, because they
>> are not byte sequences.
>>
>> It sounds like you already have a Scheme string that was incorrectly
>> decoded from bytes, and are asking how to fix it up. Unfortunately,
>> this won't work, because many valid ISO-8859-2 byte sequences are not
>> valid UTF-8, and will therefore lead to decoding errors.
>>
>>> I have database in iso-8859-2 but my script runs in utf-8. I use dbi 
>>> module.
>> Having looked at the guile-dbi source code, I see that it always uses
>> the current locale encoding when talking to databases. Specifically, it
>> always uses 'scm_from_locale_string' and 'scm_to_locale_string'.  For
>> your purposes, you'd like it to use 'scm_from_stringn' and
>> 'scm_to_stringn' instead, with "ISO-8859-2" as the 'encoding' argument.
>>
>> My knowledge of modern databases is limited, so I'm not sure how this
>> problem is normally dealt with.  It seems to me that, ideally, strings
>> in databases should be sequences of Unicode code points, rather than
>> sequences of bytes.  If that were the case, then this problem wouldn't
>> arise.
>>
>> It would be good if someone with more knowledge of databases would chime
>> in here.
>>
>> In the meantime, I can see a few possible solutions/workarounds:
>>
>> * Enhance guile-dbi to include an 'encoding' field to its database
>>    handles, add a new API procedure to set it, and use it in all the
>>    appropriate places.  This only makes sense if database strings are
>>    conceptually byte sequences, otherwise it should probably be fixed in
>>    some other way.
>>
>> * Hack your local copy of guile-dbi to use 'scm_from_stringn' and
>>    'scm_to_stringn' with a hard-coded "ISO-8859-2" in the appropriate
>>    places.
>>
>> * Use 'setlocale' to set a ISO-8859-2 locale temporarily while
>>    performing database queries.
>>
>> Which database are you using?
>>
>>       Mark
>>
>
>
>

next prev parent reply	other threads:[~2014-10-25  8:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-23 11:31 iconv or something like that Konrad Makowski
2014-10-23 18:00 ` Mark H Weaver
2014-10-23 18:07   ` Greg Troxel
2014-10-25  7:03   ` Konrad Makowski
2014-10-25  8:24     ` Konrad Makowski [this message]
2014-10-25 18:51       ` Thien-Thi Nguyen
2014-10-25 20:25         ` Konrad Makowski
2014-10-26  9:50           ` Thien-Thi Nguyen
2014-10-26 16:21             ` Barry Schwartz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544B5E42.10107@konradmakowski.pl \
    --to=poczta@konradmakowski.pl \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).