unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Daniel Krueger <keenbug@googlemail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: guile-user@gnu.org, ttn@gnuvola.org, sunjoong@gmail.com
Subject: Re: I'm looking for a method of converting a string's character encoding
Date: Mon, 30 Apr 2012 12:18:59 +0200	[thread overview]
Message-ID: <CAAh5vOOcGxcZt1kWu4d0O+Onu331zoKtTVr9ZwhUBxbfgyPYaw@mail.gmail.com> (raw)
In-Reply-To: <834ns37f0b.fsf@gnu.org>

On Sat, Apr 28, 2012 at 10:55 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> One notable example is when the original encoding was determined
> incorrectly, and the application wants to "re-decode" the string, when
> its external origin is no longer available.

Okay, but then I would suggest either if you know you're probably not
getting the right encoding but can determine it later to only store
the input as a bytevector and later decode it correctly. Or if you
already have the string you could encode it back to a bytevector with
the wrong guessed encoding (which should emit the original input I
think) and then re-decode it with the right encoding. Wouldn't that be
the same solution as adding a primitive which does the same thing but
on some lower level?

> Another example is an
> application that wants to convert an encoded string into base-64 (or
> similar) form -- you'll need to encode the string internally first.

Here I don't have enough experience, but wouldn't you then just again
transform the string into a bytevector and further work with it?

> IOW, Guile needs a way to represent a string encoded in something
> other than UTF-8, and convert between UTF-8 and other encodings.

I think strings should be encoding `independent', so you don't have to
mind that if you don't need to, and if you're working with a special
encoding you're working on a representation of the `text' as a number
of characters encoded in some numbers, so you use a bytevector.

The only thing I'm not sure about is whether guile supports encoding a
string (into a bytevector) in some other format than UTF-8, so if
there don't exist other procedures I would suggest adding a string to
bytevector decoder which takes an encoder and the encoders (or just
procedures which convert the string directly into a bytevector in a
specific encoding).

WDYT?



  parent reply	other threads:[~2012-04-30 10:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee
2012-04-28  1:40 ` Sunjoong Lee
2012-04-28 16:38 ` Sunjoong Lee
2012-04-28 17:33   ` Thien-Thi Nguyen
2012-04-28 18:29     ` Daniel Krueger
2012-04-28 19:54       ` Thien-Thi Nguyen
2012-04-28 20:55       ` Eli Zaretskii
2012-04-28 22:42         ` Sunjoong Lee
2012-04-29  0:25         ` Sunjoong Lee
2012-04-30 10:18         ` Daniel Krueger [this message]
2012-04-30 12:21           ` Eli Zaretskii
2012-05-03 22:34           ` Ludovic Courtès
2012-05-02  3:57 ` Daniel Hartwig
2012-05-03  5:14 ` Sunjoong Lee
2012-05-03 22:31 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAh5vOOcGxcZt1kWu4d0O+Onu331zoKtTVr9ZwhUBxbfgyPYaw@mail.gmail.com \
    --to=keenbug@googlemail.com \
    --cc=eliz@gnu.org \
    --cc=guile-user@gnu.org \
    --cc=sunjoong@gmail.com \
    --cc=ttn@gnuvola.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).