From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Sunjoong Lee Newsgroups: gmane.lisp.guile.user Subject: Re: I'm looking for a method of converting a string's character encoding Date: Sun, 29 Apr 2012 01:38:28 +0900 Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=f46d0438932bfb71db04bebfda1d X-Trace: dough.gmane.org 1335631142 5970 80.91.229.3 (28 Apr 2012 16:39:02 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 28 Apr 2012 16:39:02 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sat Apr 28 18:39:01 2012 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SOAfi-0004Ze-HH for guile-user@m.gmane.org; Sat, 28 Apr 2012 18:38:58 +0200 Original-Received: from localhost ([::1]:41876 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SOAfh-0005n6-UW for guile-user@m.gmane.org; Sat, 28 Apr 2012 12:38:57 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:39837) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SOAfd-0005mz-Gs for guile-user@gnu.org; Sat, 28 Apr 2012 12:38:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SOAfb-0000Av-Mk for guile-user@gnu.org; Sat, 28 Apr 2012 12:38:53 -0400 Original-Received: from mail-we0-f169.google.com ([74.125.82.169]:49173) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SOAfb-0000Ai-Dt for guile-user@gnu.org; Sat, 28 Apr 2012 12:38:51 -0400 Original-Received: by werj55 with SMTP id j55so1376691wer.0 for ; Sat, 28 Apr 2012 09:38:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=ur7+7kXPbNprak6Zzqs4uyzxy+2+2JqmbaC3XK8awCU=; b=y/znITyx86Iqo+JGbbAxIxXWICTy5ikIHj5dP9ig9+F1IKPH7n3+9tcJMb3HL/P5D7 QkkDp+v+HS10a9Cui+iwxwB4MbUX01LD6EsUEiAgtqdgesTZr+IY5obUBhHUPC+CBshQ FFAJRQE94K2zpwuF9eZT+CAFeu/zdbow/LJ1WlWSGLtr1e+1tTtt1WeLUupNERhvc8S+ B8wyjZI0a0ZxROv/7CATxe3ACRzmjhZMhTi+KTHAv3iahmGjEidGiGol9VZxiurT3KLw XLvFjfuRnln2ptrBYfzOGqcAheITf8PHyTX0oF7MiqctF8vpAdHcX9ucqFlXsS6j7/qI liEA== Original-Received: by 10.180.93.196 with SMTP id cw4mr15641984wib.19.1335631128785; Sat, 28 Apr 2012 09:38:48 -0700 (PDT) Original-Received: by 10.223.93.206 with HTTP; Sat, 28 Apr 2012 09:38:28 -0700 (PDT) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 74.125.82.169 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:9415 Archived-At: --f46d0438932bfb71db04bebfda1d Content-Type: text/plain; charset=UTF-8 http-get is innocent but I need encoding converter yet. In front-of my program, after appending a line (set-port-encoding! (current-output-port) "utf-8") , the contents body string of web page displayed well. With with-fluids and %default-port-encoding, I can use html->sxml . But contents of output sxml's codeset is the original web page's. For example, when you want to compare strings, you must use codeset of the web pages's. If you want to compare strings of two web pages, codeset converting method may be need. 2012/4/28 Sunjoong Lee > > Background; > #:decode-body? keyword of http-get seems not to work properly; I should > set #:decode-body? to false value and decode the contents body string > manually. If a web page's charset be utf-8, there be no problem. If not, a > problem occurs. decode-response-body of (web client) call decode-string > with web page's charset. But real charset of bytevector is iso-8859-1, > not web page's charset. If so, you should not let http-get > use decode-response-body. > --f46d0438932bfb71db04bebfda1d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
http-get is=C2=A0innocent but I need encoding co= nverter yet.

In front-of my program, after appending a line (set-por= t-encoding! (current-output-port) "utf-8") ,=C2=A0the contents bo= dy string of web page displayed well. With=C2=A0with-fluids and=C2=A0%defau= lt-port-encoding, I can use html->sxml . But contents of output sxml'= ;s codeset is the original web page's. For example, when you want to co= mpare strings, you must use codeset of the web pages's. If you want to = compare strings of two web pages, codeset converting method may be need.
2012/4/28 Sunjoong Lee <= ;sunjoong@gmail.com= >
Background;
#:decode-body? keyword of=C2=A0http-get seems no= t to work=C2=A0properly; I should set=C2=A0#:decode-body? to false value an= d decode the contents body string manually. If a web page's=C2=A0charse= t be utf-8, there be no problem. If not, a problem occurs.=C2=A0decode-resp= onse-body of (web client) call=C2=A0decode-string with web page's=C2=A0= charset. But real=C2=A0charset of=C2=A0bytevector is iso-8859-1, not=C2=A0w= eb page's charset. If so, you should not let=C2=A0http-get use=C2=A0dec= ode-response-body.
--f46d0438932bfb71db04bebfda1d--