* I'm looking for a method of converting a string's character encoding @ 2012-04-27 21:13 Sunjoong Lee 2012-04-28 1:40 ` Sunjoong Lee ` (4 more replies) 0 siblings, 5 replies; 15+ messages in thread From: Sunjoong Lee @ 2012-04-27 21:13 UTC (permalink / raw) To: guile-user [-- Attachment #1: Type: text/plain, Size: 1152 bytes --] Hello, I'm looking for a method of converting a string's character encoding from a certain codeset to utf-8. I know the string of Guile uses utf-8 and (read (open-bytevector-input-port (string->utf8 "hello"))) returns "hello" . But what if the string "hello" be encoded not utf-8 and you want to get utf-8 converted string? What I want is like iconv. Background; #:decode-body? keyword of http-get seems not to work properly; I should set #:decode-body? to false value and decode the contents body string manually. If a web page's charset be utf-8, there be no problem. If not, a problem occurs. decode-response-body of (web client) call decode-string with web page's charset. But real charset of bytevector is iso-8859-1, not web page's charset. If so, you should not let http-get use decode-response-body. After getting response-body with bytevector form, you should decode it with "iso-8859-1" like decode-string's manner. Then you'll get web page's contents body string; it's charset is what you see in response header. Now, I need to convert this contents body string to utf-8 but I don't know how. I think it would be with port i/o. Thanks. [-- Attachment #2: Type: text/html, Size: 1411 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee @ 2012-04-28 1:40 ` Sunjoong Lee 2012-04-28 16:38 ` Sunjoong Lee ` (3 subsequent siblings) 4 siblings, 0 replies; 15+ messages in thread From: Sunjoong Lee @ 2012-04-28 1:40 UTC (permalink / raw) To: guile-user [-- Attachment #1: Type: text/plain, Size: 1267 bytes --] Are file-port and string-port much different? I can convert strings in file but I want not to use file. My terminal charset is utf-8. Suppose there be a "XXX" encoded text file "a.txt", it would be converted like this: (use-modules (ice-9 rdelim)) (set-port-encoding! (current-output-port) "utf-8") (define port (open-input-file "a.txt")) (set-port-encoding! port "XXX") (display (read-delimited "" port)) (close-port port) I tried similar manner with string-port but failed. In real case, there is "XXX" encoded string. In this case, I cannot prepare it, so read it from a file. (use-modules (ice-9 rdelim)) (define port (open-input-file "a.txt")) (set-port-encoding! port "XXX") (let ((port1 (open-input-string (let ((str (read-delimited "" port))) (close-input-port port) str))) (port2 (open-output-string))) (set-port-encoding! port1 "XXX") (set-port-encoding! port2 "utf-8") (display (read-delimited "" port1) port2) (close-input-port port1) (display (get-output-string port2)) (close-output-port port2)) 2012/4/28 Sunjoong Lee <sunjoong@gmail.com> > > Now, I need to convert this contents body string to utf-8 but I don't know > how. I think it would be with port i/o. > [-- Attachment #2: Type: text/html, Size: 2231 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee 2012-04-28 1:40 ` Sunjoong Lee @ 2012-04-28 16:38 ` Sunjoong Lee 2012-04-28 17:33 ` Thien-Thi Nguyen 2012-05-02 3:57 ` Daniel Hartwig ` (2 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Sunjoong Lee @ 2012-04-28 16:38 UTC (permalink / raw) To: guile-user [-- Attachment #1: Type: text/plain, Size: 1042 bytes --] http-get is innocent but I need encoding converter yet. In front-of my program, after appending a line (set-port-encoding! (current-output-port) "utf-8") , the contents body string of web page displayed well. With with-fluids and %default-port-encoding, I can use html->sxml . But contents of output sxml's codeset is the original web page's. For example, when you want to compare strings, you must use codeset of the web pages's. If you want to compare strings of two web pages, codeset converting method may be need. 2012/4/28 Sunjoong Lee <sunjoong@gmail.com> > > Background; > #:decode-body? keyword of http-get seems not to work properly; I should > set #:decode-body? to false value and decode the contents body string > manually. If a web page's charset be utf-8, there be no problem. If not, a > problem occurs. decode-response-body of (web client) call decode-string > with web page's charset. But real charset of bytevector is iso-8859-1, > not web page's charset. If so, you should not let http-get > use decode-response-body. > [-- Attachment #2: Type: text/html, Size: 1382 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 16:38 ` Sunjoong Lee @ 2012-04-28 17:33 ` Thien-Thi Nguyen 2012-04-28 18:29 ` Daniel Krueger 0 siblings, 1 reply; 15+ messages in thread From: Thien-Thi Nguyen @ 2012-04-28 17:33 UTC (permalink / raw) To: Sunjoong Lee; +Cc: guile-user () Sunjoong Lee <sunjoong@gmail.com> () Sun, 29 Apr 2012 01:38:28 +0900 http-get is innocent but I need encoding converter yet. It sounds like a good exercise (that would flush out bugs and raise confidence in the infrastructure) would be to implement an iconv-workalike program in Scheme. Maybe one already exists? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 17:33 ` Thien-Thi Nguyen @ 2012-04-28 18:29 ` Daniel Krueger 2012-04-28 19:54 ` Thien-Thi Nguyen 2012-04-28 20:55 ` Eli Zaretskii 0 siblings, 2 replies; 15+ messages in thread From: Daniel Krueger @ 2012-04-28 18:29 UTC (permalink / raw) To: Thien-Thi Nguyen; +Cc: guile-user, Sunjoong Lee Hi, i think there shouldn't be any transcoding of guile's strings, as strings are internal representation of characters, no matter how they are encoded. So the only time when encoding matters is when it passes it's `internal boundarys', i mean if you write the string to a port or read from a port or pass it as a string to a foreign library. For the ports all transcoding is available, and as said, the real representation of guile strings internally is as utf8, which can't be changed. The only additional thing i forgot about are bytevectors, if you convert a string to an explicit representation, but afaik there you also can give the encoding to use. Am I wrong? - Daniel On Sat, Apr 28, 2012 at 7:33 PM, Thien-Thi Nguyen <ttn@gnuvola.org> wrote: > () Sunjoong Lee <sunjoong@gmail.com> > () Sun, 29 Apr 2012 01:38:28 +0900 > > http-get is innocent but I need encoding converter yet. > > It sounds like a good exercise (that would flush out bugs and > raise confidence in the infrastructure) would be to implement > an iconv-workalike program in Scheme. Maybe one already exists? > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 18:29 ` Daniel Krueger @ 2012-04-28 19:54 ` Thien-Thi Nguyen 2012-04-28 20:55 ` Eli Zaretskii 1 sibling, 0 replies; 15+ messages in thread From: Thien-Thi Nguyen @ 2012-04-28 19:54 UTC (permalink / raw) To: Daniel Krueger; +Cc: guile-user, Sunjoong Lee () Daniel Krueger <keenbug@googlemail.com> () Sat, 28 Apr 2012 20:29:22 +0200 i think there shouldn't be any transcoding of guile's strings, as strings are internal representation of characters, no matter how they are encoded. So the only time when encoding matters is when it passes it's `internal boundarys', i mean if you write the string to a port or read from a port or pass it as a string to a foreign library. Indeed, iconv(1) converts external representations (files). How it does that internally is an implementation detail. That's the main reason why i suggested it as a model for exercising Guile's internals -- it's very easy to check correctness. For the ports all transcoding is available, and as said, the real representation of guile strings internally is as utf8, which can't be changed. IIUC, the internal representation of strings is not UTF-8 (at least, not all the time), but anyway, that doesn't matter at all. The proposed task is to use procedures and features provided by Guile (i.e., its public API) to do mimic iconv. The only additional thing i forgot about are bytevectors, if you convert a string to an explicit representation, but afaik there you also can give the encoding to use. Am I wrong? I don't know. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 18:29 ` Daniel Krueger 2012-04-28 19:54 ` Thien-Thi Nguyen @ 2012-04-28 20:55 ` Eli Zaretskii 2012-04-28 22:42 ` Sunjoong Lee ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Eli Zaretskii @ 2012-04-28 20:55 UTC (permalink / raw) To: Daniel Krueger; +Cc: guile-user, ttn, sunjoong > Date: Sat, 28 Apr 2012 20:29:22 +0200 > From: Daniel Krueger <keenbug@googlemail.com> > Cc: guile-user@gnu.org, Sunjoong Lee <sunjoong@gmail.com> > > i think there shouldn't be any transcoding of guile's strings, as > strings are internal representation of characters, no matter how they > are encoded. So the only time when encoding matters is when it passes > it's `internal boundarys', i mean if you write the string to a port or > read from a port or pass it as a string to a foreign library. For the > ports all transcoding is available, and as said, the real > representation of guile strings internally is as utf8, which can't be > changed. The only additional thing i forgot about are bytevectors, if > you convert a string to an explicit representation, but afaik there > you also can give the encoding to use. > > Am I wrong? You are mostly right, but only "mostly". Experience teaches that sometimes you need to change encoding even inside "the boundaries". One notable example is when the original encoding was determined incorrectly, and the application wants to "re-decode" the string, when its external origin is no longer available. Another example is an application that wants to convert an encoded string into base-64 (or similar) form -- you'll need to encode the string internally first. These kinds of rare, but still important, use cases are the reason why Emacs Lisp has primitives to do encoding and decoding of in-memory strings; as much as Emacs maintainers want to get rid of the related need to support "unibyte strings", they are not going to go away any time soon. IOW, Guile needs a way to represent a string encoded in something other than UTF-8, and convert between UTF-8 and other encodings. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 20:55 ` Eli Zaretskii @ 2012-04-28 22:42 ` Sunjoong Lee 2012-04-29 0:25 ` Sunjoong Lee 2012-04-30 10:18 ` Daniel Krueger 2 siblings, 0 replies; 15+ messages in thread From: Sunjoong Lee @ 2012-04-28 22:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: guile-user, Daniel Krueger, ttn [-- Attachment #1: Type: text/plain, Size: 2660 bytes --] Thanks hien-Thi, Daniel and Eli. Eli pointed a good example; I'll say another one. In the countries, it's character encoded multibytes, like China, Japan and Korea (i.e., in CJKs), it would be a common issue to convert codeset. In Korea, a certain web page may be written by EUC-KR codeset and another by UTF-8. In Japan, Shift-JIS, EUC-JP, ISO-2022-JP and UTF-8. In China, GBK, gb18030, Big5, Big5-HKSCS and UTF-8. I mean that koreans use 2 different codesets, japanese 4, chinese 5 in the net. It seems not to happen comparing chinese web page and korean web page with a same program but... Suppose you want to write a program monitoring web pages, the codeset converter would be need. Just in CJKs? Greeks use 3 codesets, vietnamese 2, arabs 3, and so on. It looks like that russians use many codesets like chinese. 2012/4/29 Eli Zaretskii <eliz@gnu.org> > > Date: Sat, 28 Apr 2012 20:29:22 +0200 > > From: Daniel Krueger <keenbug@googlemail.com> > > Cc: guile-user@gnu.org, Sunjoong Lee <sunjoong@gmail.com> > > > > i think there shouldn't be any transcoding of guile's strings, as > > strings are internal representation of characters, no matter how they > > are encoded. So the only time when encoding matters is when it passes > > it's `internal boundarys', i mean if you write the string to a port or > > read from a port or pass it as a string to a foreign library. For the > > ports all transcoding is available, and as said, the real > > representation of guile strings internally is as utf8, which can't be > > changed. The only additional thing i forgot about are bytevectors, if > > you convert a string to an explicit representation, but afaik there > > you also can give the encoding to use. > > > > Am I wrong? > > You are mostly right, but only "mostly". Experience teaches that > sometimes you need to change encoding even inside "the boundaries". > One notable example is when the original encoding was determined > incorrectly, and the application wants to "re-decode" the string, when > its external origin is no longer available. Another example is an > application that wants to convert an encoded string into base-64 (or > similar) form -- you'll need to encode the string internally first. > > These kinds of rare, but still important, use cases are the reason why > Emacs Lisp has primitives to do encoding and decoding of in-memory > strings; as much as Emacs maintainers want to get rid of the related > need to support "unibyte strings", they are not going to go away any > time soon. > > IOW, Guile needs a way to represent a string encoded in something > other than UTF-8, and convert between UTF-8 and other encodings. > [-- Attachment #2: Type: text/html, Size: 3511 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 20:55 ` Eli Zaretskii 2012-04-28 22:42 ` Sunjoong Lee @ 2012-04-29 0:25 ` Sunjoong Lee 2012-04-30 10:18 ` Daniel Krueger 2 siblings, 0 replies; 15+ messages in thread From: Sunjoong Lee @ 2012-04-29 0:25 UTC (permalink / raw) To: Daniel Krueger; +Cc: guile-user [-- Attachment #1: Type: text/plain, Size: 2670 bytes --] Only supporting UTF-8 is still strange but I understand why Daniel said so now. After these two line appending, most of my problem on http-get was solved: (set-port-encoding! (current-output-port) "UTF-8") (fluid-set! %default-port-encoding "UTF-8") This is like a magic!! I think it's better to append this information to Guile manual page. My first problem was not displaying contents body of web page. Second was not calling html->sxml of guile-lib. After reading htmlparg.scm, I realized html->sxml will call htmlprag-internal:parse-html and htmlprag-internal:parse-html use the string-port. I remembered this sentense; "When string ports are created, they do not inherit a character encoding from the current locale." Most people would not realize utility like html->sxml how to implemented and you need to use fluid-set! . 2012/4/29 Eli Zaretskii <eliz@gnu.org> > > Date: Sat, 28 Apr 2012 20:29:22 +0200 > > From: Daniel Krueger <keenbug@googlemail.com> > > Cc: guile-user@gnu.org, Sunjoong Lee <sunjoong@gmail.com> > > > > i think there shouldn't be any transcoding of guile's strings, as > > strings are internal representation of characters, no matter how they > > are encoded. So the only time when encoding matters is when it passes > > it's `internal boundarys', i mean if you write the string to a port or > > read from a port or pass it as a string to a foreign library. For the > > ports all transcoding is available, and as said, the real > > representation of guile strings internally is as utf8, which can't be > > changed. The only additional thing i forgot about are bytevectors, if > > you convert a string to an explicit representation, but afaik there > > you also can give the encoding to use. > > > > Am I wrong? > > You are mostly right, but only "mostly". Experience teaches that > sometimes you need to change encoding even inside "the boundaries". > One notable example is when the original encoding was determined > incorrectly, and the application wants to "re-decode" the string, when > its external origin is no longer available. Another example is an > application that wants to convert an encoded string into base-64 (or > similar) form -- you'll need to encode the string internally first. > > These kinds of rare, but still important, use cases are the reason why > Emacs Lisp has primitives to do encoding and decoding of in-memory > strings; as much as Emacs maintainers want to get rid of the related > need to support "unibyte strings", they are not going to go away any > time soon. > > IOW, Guile needs a way to represent a string encoded in something > other than UTF-8, and convert between UTF-8 and other encodings. > [-- Attachment #2: Type: text/html, Size: 3430 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-28 20:55 ` Eli Zaretskii 2012-04-28 22:42 ` Sunjoong Lee 2012-04-29 0:25 ` Sunjoong Lee @ 2012-04-30 10:18 ` Daniel Krueger 2012-04-30 12:21 ` Eli Zaretskii 2012-05-03 22:34 ` Ludovic Courtès 2 siblings, 2 replies; 15+ messages in thread From: Daniel Krueger @ 2012-04-30 10:18 UTC (permalink / raw) To: Eli Zaretskii; +Cc: guile-user, ttn, sunjoong On Sat, Apr 28, 2012 at 10:55 PM, Eli Zaretskii <eliz@gnu.org> wrote: > One notable example is when the original encoding was determined > incorrectly, and the application wants to "re-decode" the string, when > its external origin is no longer available. Okay, but then I would suggest either if you know you're probably not getting the right encoding but can determine it later to only store the input as a bytevector and later decode it correctly. Or if you already have the string you could encode it back to a bytevector with the wrong guessed encoding (which should emit the original input I think) and then re-decode it with the right encoding. Wouldn't that be the same solution as adding a primitive which does the same thing but on some lower level? > Another example is an > application that wants to convert an encoded string into base-64 (or > similar) form -- you'll need to encode the string internally first. Here I don't have enough experience, but wouldn't you then just again transform the string into a bytevector and further work with it? > IOW, Guile needs a way to represent a string encoded in something > other than UTF-8, and convert between UTF-8 and other encodings. I think strings should be encoding `independent', so you don't have to mind that if you don't need to, and if you're working with a special encoding you're working on a representation of the `text' as a number of characters encoded in some numbers, so you use a bytevector. The only thing I'm not sure about is whether guile supports encoding a string (into a bytevector) in some other format than UTF-8, so if there don't exist other procedures I would suggest adding a string to bytevector decoder which takes an encoder and the encoders (or just procedures which convert the string directly into a bytevector in a specific encoding). WDYT? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-30 10:18 ` Daniel Krueger @ 2012-04-30 12:21 ` Eli Zaretskii 2012-05-03 22:34 ` Ludovic Courtès 1 sibling, 0 replies; 15+ messages in thread From: Eli Zaretskii @ 2012-04-30 12:21 UTC (permalink / raw) To: Daniel Krueger; +Cc: guile-user, ttn, sunjoong > Date: Mon, 30 Apr 2012 12:18:59 +0200 > From: Daniel Krueger <keenbug@googlemail.com> > Cc: ttn@gnuvola.org, guile-user@gnu.org, sunjoong@gmail.com > > I think strings should be encoding `independent', so you don't have to > mind that if you don't need to, and if you're working with a special > encoding you're working on a representation of the `text' as a number > of characters encoded in some numbers, so you use a bytevector. That would do, I think. > The only thing I'm not sure about is whether guile supports encoding a > string (into a bytevector) in some other format than UTF-8, so if > there don't exist other procedures I would suggest adding a string to > bytevector decoder which takes an encoder and the encoders (or just > procedures which convert the string directly into a bytevector in a > specific encoding). > > WDYT? Sounds like a plan to me ;-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-30 10:18 ` Daniel Krueger 2012-04-30 12:21 ` Eli Zaretskii @ 2012-05-03 22:34 ` Ludovic Courtès 1 sibling, 0 replies; 15+ messages in thread From: Ludovic Courtès @ 2012-05-03 22:34 UTC (permalink / raw) To: guile-user Hi, Daniel Krueger <keenbug@googlemail.com> skribis: > The only thing I'm not sure about is whether guile supports encoding a > string (into a bytevector) in some other format than UTF-8 It does, by virtue of mixed binary/textual ports (see my previous message.) Thanks, Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee 2012-04-28 1:40 ` Sunjoong Lee 2012-04-28 16:38 ` Sunjoong Lee @ 2012-05-02 3:57 ` Daniel Hartwig 2012-05-03 5:14 ` Sunjoong Lee 2012-05-03 22:31 ` Ludovic Courtès 4 siblings, 0 replies; 15+ messages in thread From: Daniel Hartwig @ 2012-05-02 3:57 UTC (permalink / raw) To: guile-user On 28 April 2012 05:13, Sunjoong Lee <sunjoong@gmail.com> wrote: > > Background; > #:decode-body? keyword of http-get seems not to work properly; I should > set #:decode-body? to false value and decode the contents body string > manually. If a web page's charset be utf-8, there be no problem. If not, a > problem occurs. decode-response-body of (web client) call decode-string with > web page's charset. But real charset of bytevector is iso-8859-1, not web > page's charset. If so, you should not let http-get use decode-response-body. Hello It seems you later made some headway on this, but just a note to clarify: Bytevectors are raw data, they do not have an encoding. Web ports are set to ISO-8859-1 as this is an 8-bit encoding that can be read as raw data. The output of http-get with '#:decode-body #f' *should* be a bytevector of exactly the bytes sent by the server. This is mentioned in the comments for read-request: > (use-modules (web request)) > ,d read-request Read an HTTP request from @var{port}, optionally attaching the given metadata, @var{meta}. As a side effect, sets the encoding on @var{port} to ISO-8859-1 (latin-1), so that reading one character reads one byte. See the discussion of character sets in "HTTP Requests" in the manual, for more information. Can you provide us with a couple of sites where http-get or decode-string does not work properly? Or was something else at play here? This would help to investigate what the issue is. (I am lazy today to find some, I think you must know of a few :-) > > After getting response-body with bytevector form, you should decode it with > "iso-8859-1" like decode-string's manner. Then you'll get web page's > contents body string; it's charset is what you see in response header. > Note that ISO-8859-1 does not cover much of Unicode so decoding the bytevector as that will lose much data. Regards ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee ` (2 preceding siblings ...) 2012-05-02 3:57 ` Daniel Hartwig @ 2012-05-03 5:14 ` Sunjoong Lee 2012-05-03 22:31 ` Ludovic Courtès 4 siblings, 0 replies; 15+ messages in thread From: Sunjoong Lee @ 2012-05-03 5:14 UTC (permalink / raw) To: Daniel Hartwig; +Cc: guile-user [-- Attachment #1: Type: text/plain, Size: 764 bytes --] Hi, Daniel; 2012/4/28 Daniel Hartwig <mandyke@gmail.com> > > Can you provide us with a couple of sites where http-get or > decode-string does not work properly? Or was something else at play > here? This would help to investigate what the issue is. (I am lazy > today to find some, I think you must know of a few :-) > To cut a long story short, http-get is innocent; I had misunderstood it, sorry. I had summarized some issues of http-get in the post, http://lists.gnu.org/archive/html/guile-user/2012-05/msg00005.html . With "Example 1 - working", my problem was resolved mostly. "Example 3 - not working" is an unsolved issue but is not an encoding problem; I think it may be a design issue of declare-uri-header! or string->uri. Thank you for comments. [-- Attachment #2: Type: text/html, Size: 1206 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I'm looking for a method of converting a string's character encoding 2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee ` (3 preceding siblings ...) 2012-05-03 5:14 ` Sunjoong Lee @ 2012-05-03 22:31 ` Ludovic Courtès 4 siblings, 0 replies; 15+ messages in thread From: Ludovic Courtès @ 2012-05-03 22:31 UTC (permalink / raw) To: guile-user Hi, Sunjoong Lee <sunjoong@gmail.com> skribis: > I'm looking for a method of converting a string's character encoding from a > certain codeset to utf-8. I know the string of Guile uses utf-8 and (read > (open-bytevector-input-port (string->utf8 "hello"))) returns "hello" . But > what if the string "hello" be encoded not utf-8 and you want to get utf-8 > converted string? What I want is like iconv. Ports in Guile are both binary and textual. This allows for things like: scheme@(guile-user)> (use-modules (rnrs io ports)) scheme@(guile-user)> (define (string->enc s e) (let ((p (with-fluids ((%default-port-encoding e)) (open-input-string s)))) (get-bytevector-all p))) scheme@(guile-user)> (string->enc "hello" "UTF-16BE") $1 = #vu8(0 104 0 101 0 108 0 108 0 111) scheme@(guile-user)> (string->enc "hello" "ISO-8859-3") $2 = #vu8(104 101 108 108 111) scheme@(guile-user)> (use-modules (rnrs bytevectors)) scheme@(guile-user)> (utf16->string $1) $3 = "hello" You may also want to look at ‘string->pointer’ in (system foreign). Does it answer your question? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-05-03 22:34 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-27 21:13 I'm looking for a method of converting a string's character encoding Sunjoong Lee 2012-04-28 1:40 ` Sunjoong Lee 2012-04-28 16:38 ` Sunjoong Lee 2012-04-28 17:33 ` Thien-Thi Nguyen 2012-04-28 18:29 ` Daniel Krueger 2012-04-28 19:54 ` Thien-Thi Nguyen 2012-04-28 20:55 ` Eli Zaretskii 2012-04-28 22:42 ` Sunjoong Lee 2012-04-29 0:25 ` Sunjoong Lee 2012-04-30 10:18 ` Daniel Krueger 2012-04-30 12:21 ` Eli Zaretskii 2012-05-03 22:34 ` Ludovic Courtès 2012-05-02 3:57 ` Daniel Hartwig 2012-05-03 5:14 ` Sunjoong Lee 2012-05-03 22:31 ` Ludovic Courtès
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).