unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: Konrad Makowski <poczta@konradmakowski.pl>
Cc: Guile User <guile-user@gnu.org>
Subject: Re: iconv or something like that
Date: Thu, 23 Oct 2014 14:00:31 -0400	[thread overview]
Message-ID: <87oat27jk0.fsf@yeeloong.lan> (raw)
In-Reply-To: <5448E72A.6030307@konradmakowski.pl> (Konrad Makowski's message of "Thu, 23 Oct 2014 13:31:54 +0200")

Konrad Makowski <poczta@konradmakowski.pl> writes:
> Is there any solution to convert charset from one encoding to another?

Yes, but character encodings are only relevant when converting between a
sequence of _bytes_ (a bytevector), and a sequence of _characters_ [*]
(a string).  These conversions happen implicitly while performing I/O,
converting Scheme strings to/from C, etc.

[*] More precisely, Scheme strings are sequences of unicode code points.

It doesn't make sense to talk about the encoding of a Scheme string, or
to convert a Scheme string from one encoding to another, because they
are not byte sequences.

It sounds like you already have a Scheme string that was incorrectly
decoded from bytes, and are asking how to fix it up.  Unfortunately,
this won't work, because many valid ISO-8859-2 byte sequences are not
valid UTF-8, and will therefore lead to decoding errors.

> I have database in iso-8859-2 but my script runs in utf-8. I use dbi module.

Having looked at the guile-dbi source code, I see that it always uses
the current locale encoding when talking to databases.  Specifically, it
always uses 'scm_from_locale_string' and 'scm_to_locale_string'.  For
your purposes, you'd like it to use 'scm_from_stringn' and
'scm_to_stringn' instead, with "ISO-8859-2" as the 'encoding' argument.

My knowledge of modern databases is limited, so I'm not sure how this
problem is normally dealt with.  It seems to me that, ideally, strings
in databases should be sequences of Unicode code points, rather than
sequences of bytes.  If that were the case, then this problem wouldn't
arise.

It would be good if someone with more knowledge of databases would chime
in here.

In the meantime, I can see a few possible solutions/workarounds:

* Enhance guile-dbi to include an 'encoding' field to its database
  handles, add a new API procedure to set it, and use it in all the
  appropriate places.  This only makes sense if database strings are
  conceptually byte sequences, otherwise it should probably be fixed in
  some other way.

* Hack your local copy of guile-dbi to use 'scm_from_stringn' and
  'scm_to_stringn' with a hard-coded "ISO-8859-2" in the appropriate
  places.

* Use 'setlocale' to set a ISO-8859-2 locale temporarily while
  performing database queries.

Which database are you using?

     Mark



  reply	other threads:[~2014-10-23 18:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-23 11:31 iconv or something like that Konrad Makowski
2014-10-23 18:00 ` Mark H Weaver [this message]
2014-10-23 18:07   ` Greg Troxel
2014-10-25  7:03   ` Konrad Makowski
2014-10-25  8:24     ` Konrad Makowski
2014-10-25 18:51       ` Thien-Thi Nguyen
2014-10-25 20:25         ` Konrad Makowski
2014-10-26  9:50           ` Thien-Thi Nguyen
2014-10-26 16:21             ` Barry Schwartz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87oat27jk0.fsf@yeeloong.lan \
    --to=mhw@netris.org \
    --cc=guile-user@gnu.org \
    --cc=poczta@konradmakowski.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).