unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Daniel Hartwig <mandyke@gmail.com>
To: guile-user@gnu.org
Subject: Re: I'm looking for a method of converting a string's character encoding
Date: Wed, 2 May 2012 11:57:32 +0800	[thread overview]
Message-ID: <CAN3veRcbNfEJG6ADn7hxFFkJXBG-xMy3mBMoN_bajRtfPzXckg@mail.gmail.com> (raw)
In-Reply-To: <CAK93xhpSbFQM-FuPAhQVKJgtZHZKO7b2yO--ciNZPenQA_3xkg@mail.gmail.com>

On 28 April 2012 05:13, Sunjoong Lee <sunjoong@gmail.com> wrote:
>
> Background;
> #:decode-body? keyword of http-get seems not to work properly; I should
> set #:decode-body? to false value and decode the contents body string
> manually. If a web page's charset be utf-8, there be no problem. If not, a
> problem occurs. decode-response-body of (web client) call decode-string with
> web page's charset. But real charset of bytevector is iso-8859-1, not web
> page's charset. If so, you should not let http-get use decode-response-body.

Hello

It seems you later made some headway on this, but just a note to clarify:

Bytevectors are raw data, they do not have an encoding.  Web ports are
set to ISO-8859-1 as this is an 8-bit encoding that can be read as raw
data.  The output of http-get with '#:decode-body #f' *should* be a
bytevector of exactly the bytes sent by the server.

This is mentioned in the comments for read-request:

 > (use-modules (web request))
 > ,d read-request
 Read an HTTP request from @var{port}, optionally attaching the given
 metadata, @var{meta}.

 As a side effect, sets the encoding on @var{port} to
 ISO-8859-1 (latin-1), so that reading one character reads one byte.  See
 the discussion of character sets in "HTTP Requests" in the manual, for
 more information.

Can you provide us with a couple of sites where http-get or
decode-string does not work properly?  Or was something else at play
here?  This would help to investigate what the issue is.  (I am lazy
today to find some, I think you must know of a few :-)

>
> After getting response-body with bytevector form, you should decode it with
> "iso-8859-1" like decode-string's manner. Then you'll get web page's
> contents body string; it's charset is what you see in response header.
>

Note that ISO-8859-1 does not cover much of Unicode so decoding the
bytevector as that will lose much data.

Regards



  parent reply	other threads:[~2012-05-02  3:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee
2012-04-28  1:40 ` Sunjoong Lee
2012-04-28 16:38 ` Sunjoong Lee
2012-04-28 17:33   ` Thien-Thi Nguyen
2012-04-28 18:29     ` Daniel Krueger
2012-04-28 19:54       ` Thien-Thi Nguyen
2012-04-28 20:55       ` Eli Zaretskii
2012-04-28 22:42         ` Sunjoong Lee
2012-04-29  0:25         ` Sunjoong Lee
2012-04-30 10:18         ` Daniel Krueger
2012-04-30 12:21           ` Eli Zaretskii
2012-05-03 22:34           ` Ludovic Courtès
2012-05-02  3:57 ` Daniel Hartwig [this message]
2012-05-03  5:14 ` Sunjoong Lee
2012-05-03 22:31 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN3veRcbNfEJG6ADn7hxFFkJXBG-xMy3mBMoN_bajRtfPzXckg@mail.gmail.com \
    --to=mandyke@gmail.com \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).