From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Daniel Hartwig Newsgroups: gmane.lisp.guile.user Subject: Re: I'm looking for a method of converting a string's character encoding Date: Wed, 2 May 2012 11:57:32 +0800 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1335931064 29221 80.91.229.3 (2 May 2012 03:57:44 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 2 May 2012 03:57:44 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Wed May 02 05:57:44 2012 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SPQhD-0007l0-2W for guile-user@m.gmane.org; Wed, 02 May 2012 05:57:43 +0200 Original-Received: from localhost ([::1]:50799 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SPQhC-0004Yv-AI for guile-user@m.gmane.org; Tue, 01 May 2012 23:57:42 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:56452) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SPQh7-0004Yq-Sd for guile-user@gnu.org; Tue, 01 May 2012 23:57:39 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SPQh5-0007xx-UK for guile-user@gnu.org; Tue, 01 May 2012 23:57:37 -0400 Original-Received: from mail-yx0-f169.google.com ([209.85.213.169]:60368) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SPQh5-0007xi-N2 for guile-user@gnu.org; Tue, 01 May 2012 23:57:35 -0400 Original-Received: by yenm8 with SMTP id m8so253068yen.0 for ; Tue, 01 May 2012 20:57:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=CUB8YF7Xe13Lnfyw5JLcuSQfxaRG1aydAHlwTquOkLI=; b=bnFfHfq3f0yKv2W8e/BFTISDgSdHIzzi1XveLHWH6/AHrVZuUS4Ltl/2PfIm20Z0c3 /ka1ARqYeUmWItKmVNQ5UPAYtGGTYZbbP6sGTgUZ0nv8eUHcnpm7JcM0aNr5ARTf8vzA wp1ZAh4t/lP6kldSk1sXTHaZU4jL/THJDvIPM21lDH373cqJTjP27qJuB9ydBB1CQoNi 7hiI2mwaLzqoagcPXGVUOQ1iBRAyTmmqTorV81xLAAYMhuBKbqQrtaOL+RHbSqKTkViy qdQ2TfPtENTeqfvhIVKcC0BP2TjjwQ9r4MOKr/RSzjFNB9UK6dobWwjROFGqj42gU1Fz VbUw== Original-Received: by 10.50.46.194 with SMTP id x2mr3503645igm.60.1335931052677; Tue, 01 May 2012 20:57:32 -0700 (PDT) Original-Received: by 10.231.233.135 with HTTP; Tue, 1 May 2012 20:57:32 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.213.169 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:9439 Archived-At: On 28 April 2012 05:13, Sunjoong Lee wrote: > > Background; > #:decode-body? keyword of=C2=A0http-get seems not to work=C2=A0properly; = I should > set=C2=A0#:decode-body? to false value and decode the contents body strin= g > manually. If a web page's=C2=A0charset be utf-8, there be no problem. If = not, a > problem occurs.=C2=A0decode-response-body of (web client) call=C2=A0decod= e-string with > web page's=C2=A0charset. But real=C2=A0charset of=C2=A0bytevector is iso-= 8859-1, not=C2=A0web > page's charset. If so, you should not let=C2=A0http-get use=C2=A0decode-r= esponse-body. Hello It seems you later made some headway on this, but just a note to clarify: Bytevectors are raw data, they do not have an encoding. Web ports are set to ISO-8859-1 as this is an 8-bit encoding that can be read as raw data. The output of http-get with '#:decode-body #f' *should* be a bytevector of exactly the bytes sent by the server. This is mentioned in the comments for read-request: > (use-modules (web request)) > ,d read-request Read an HTTP request from @var{port}, optionally attaching the given metadata, @var{meta}. As a side effect, sets the encoding on @var{port} to ISO-8859-1 (latin-1), so that reading one character reads one byte. See the discussion of character sets in "HTTP Requests" in the manual, for more information. Can you provide us with a couple of sites where http-get or decode-string does not work properly? Or was something else at play here? This would help to investigate what the issue is. (I am lazy today to find some, I think you must know of a few :-) > > After getting=C2=A0response-body with=C2=A0bytevector form, you should de= code it with > "iso-8859-1" like=C2=A0decode-string's manner. Then you'll get web page's > contents body string; it's=C2=A0charset is what you see in response heade= r. > Note that ISO-8859-1 does not cover much of Unicode so decoding the bytevector as that will lose much data. Regards