* string-ports issue on Windows @ 2019-04-16 4:13 Christopher Lam 2019-04-16 14:34 ` Eli Zaretskii 2019-04-16 17:28 ` Mark H Weaver 0 siblings, 2 replies; 14+ messages in thread From: Christopher Lam @ 2019-04-16 4:13 UTC (permalink / raw) To: guile-user Dear All I'm struggling with string-ports on Windows. Last para of https://www.gnu.org/software/guile/manual/html_node/String-Ports.html "With string ports, the port-encoding is treated differently than other types of ports. When string ports are created, they do not inherit a character encoding from the current locale. They are given a default locale that allows them to handle all valid string characters." This causes a string-sanitize function to not run correctly in Windows. (locale-encoding) says "CP1252" no matter what LANG or setlocale I try. The use case is to sanitize string for html, but on Windows it munges extended-unicode. So i've had to resort to this uglier code: https://pastebin.com/raw/ys4QrhMh which does work, but raises the question why we have to do it in the first place. It means *any* string-ports must be avoided to avoid cross-platform string issues, right? It's a shame because (format) is quite useful. Chris ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-16 4:13 string-ports issue on Windows Christopher Lam @ 2019-04-16 14:34 ` Eli Zaretskii 2019-04-16 17:15 ` Mark H Weaver 2019-04-16 17:28 ` Mark H Weaver 1 sibling, 1 reply; 14+ messages in thread From: Eli Zaretskii @ 2019-04-16 14:34 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user > From: Christopher Lam <christopher.lck@gmail.com> > Date: Tue, 16 Apr 2019 04:13:14 +0000 > > I'm struggling with string-ports on Windows. Which version of Guile are you using, and where/how did you obtain the Windows binary? > Last para of > https://www.gnu.org/software/guile/manual/html_node/String-Ports.html > "With string ports, the port-encoding is treated differently than other > types of ports. When string ports are created, they do not inherit a > character encoding from the current locale. They are given a default locale > that allows them to handle all valid string characters." > > This causes a string-sanitize function to not run correctly in Windows. > (locale-encoding) says "CP1252" no matter what LANG or setlocale I try. > > The use case is to sanitize string for html, but on Windows it munges > extended-unicode. Can you show the code that doesn't work for you as you expected? At least the example in "String Ports" seems to work for me as I'd expect, with Guile 2.0.11. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-16 14:34 ` Eli Zaretskii @ 2019-04-16 17:15 ` Mark H Weaver 0 siblings, 0 replies; 14+ messages in thread From: Mark H Weaver @ 2019-04-16 17:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: guile-user Hi, Eli Zaretskii <eliz@gnu.org> writes: >> From: Christopher Lam <christopher.lck@gmail.com> >> Date: Tue, 16 Apr 2019 04:13:14 +0000 >> >> I'm struggling with string-ports on Windows. > > Which version of Guile are you using, and where/how did you obtain the > Windows binary? > >> Last para of >> https://www.gnu.org/software/guile/manual/html_node/String-Ports.html >> "With string ports, the port-encoding is treated differently than other >> types of ports. When string ports are created, they do not inherit a >> character encoding from the current locale. They are given a default locale >> that allows them to handle all valid string characters." >> >> This causes a string-sanitize function to not run correctly in Windows. >> (locale-encoding) says "CP1252" no matter what LANG or setlocale I try. >> >> The use case is to sanitize string for html, but on Windows it munges >> extended-unicode. > > Can you show the code that doesn't work for you as you expected? At > least the example in "String Ports" seems to work for me as I'd > expect, with Guile 2.0.11. The change in string ports that Christopher is referring to was first introduced in Guile 2.2.0. Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-16 4:13 string-ports issue on Windows Christopher Lam 2019-04-16 14:34 ` Eli Zaretskii @ 2019-04-16 17:28 ` Mark H Weaver 2019-04-16 23:26 ` Christopher Lam 1 sibling, 1 reply; 14+ messages in thread From: Mark H Weaver @ 2019-04-16 17:28 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user Hi Christopher, Christopher Lam <christopher.lck@gmail.com> writes: > I'm struggling with string-ports on Windows. > > Last para of > https://www.gnu.org/software/guile/manual/html_node/String-Ports.html > "With string ports, the port-encoding is treated differently than other > types of ports. When string ports are created, they do not inherit a > character encoding from the current locale. They are given a default locale > that allows them to handle all valid string characters." > > This causes a string-sanitize function to not run correctly in Windows. > (locale-encoding) says "CP1252" no matter what LANG or setlocale I try. > > The use case is to sanitize string for html, but on Windows it munges > extended-unicode. Can you explain more fully what the problem is? I know a fair amount about Unicode, but my knowledge of Windows is extremely weak. What exactly is "extended-unicode" in this context? References welcome. Thanks, Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-16 17:28 ` Mark H Weaver @ 2019-04-16 23:26 ` Christopher Lam 2019-04-17 19:30 ` Mark H Weaver 0 siblings, 1 reply; 14+ messages in thread From: Christopher Lam @ 2019-04-16 23:26 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user Thank you Mark The problem is rather obscure and may have been fixed in 2.2. I've taken the reins of handling the guile code in GnuCash. For various reasons I can't fathom, the Windows build includes Guile 2.0.14 rather than Guile-2.2. I've checked NEWS and there was change in SRFI-6 string-ports to make them Unicode-capable in 2.0.6. Bearing in mind majority of strings code in GnuCash handle Unicode just fine. However, there are some currencies e.g.TYR https://en.wikipedia.org/wiki/Turkish_lira need extended Unicode and are misprinted as ? in the reports. I've dwelved down and figure there are only 2 offending functions. (format #f "~a bla" str) and (with-output-to-string) as described above. After much experimentation I can fix by changing (format) to (string-append), and changing (with-ouput-to-string) to (open-string-port) and importing srfi-6 as described in original port, and these fix the TYR symbol display. Hence my suspicion that string-ports on Windows munging Unicode. To try elucidate this I've also tried removing (setlocale LC_ALL "") and dumping (locale-encoding) which is "CP1252". There are also other bits where UTF8 is being interpreted as CP1252 but these are outside the scope of this post. So, I'm rather late in this game (started diving into scheme 18 months ago) and have probably missed many controversial changes in the past years, but the issue above seems weird to me, why the Windows port is munging Unicode :) On Tue, 16 Apr 2019 at 17:29, Mark H Weaver <mhw@netris.org> wrote: > Hi Christopher, > > Christopher Lam <christopher.lck@gmail.com> writes: > > > I'm struggling with string-ports on Windows. > > > > Last para of > > https://www.gnu.org/software/guile/manual/html_node/String-Ports.html > > "With string ports, the port-encoding is treated differently than other > > types of ports. When string ports are created, they do not inherit a > > character encoding from the current locale. They are given a default > locale > > that allows them to handle all valid string characters." > > > > This causes a string-sanitize function to not run correctly in Windows. > > (locale-encoding) says "CP1252" no matter what LANG or setlocale I try. > > > > The use case is to sanitize string for html, but on Windows it munges > > extended-unicode. > > Can you explain more fully what the problem is? I know a fair amount > about Unicode, but my knowledge of Windows is extremely weak. > > What exactly is "extended-unicode" in this context? References welcome. > > Thanks, > Mark > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-16 23:26 ` Christopher Lam @ 2019-04-17 19:30 ` Mark H Weaver 2019-04-18 16:22 ` Christopher Lam 0 siblings, 1 reply; 14+ messages in thread From: Mark H Weaver @ 2019-04-17 19:30 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user Hi Christopher, Christopher Lam <christopher.lck@gmail.com> writes: > Bearing in mind majority of strings code in GnuCash handle Unicode just > fine. However, there are some currencies e.g.TYR > https://en.wikipedia.org/wiki/Turkish_lira need extended Unicode and are > misprinted as ? in the reports. I looked into Turkish lira, which according to the web page above was recently (2012) assigned the Unicode code point U+20BA, included in Unicode 6.2. As that the same code point that's getting munged on your end? I tried passing this character through Guile 2.2's string ports and also through 'format' on my system, and encountered no difficulties: --8<---------------cut here---------------start------------->8--- mhw@jojen ~/guile-stable-2.2$ guile GNU Guile 2.2.4 Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> (define lira #\x20BA) scheme@(guile-user)> lira $1 = #\₺ scheme@(guile-user)> (string lira) $2 = "₺" scheme@(guile-user)> (call-with-output-string (lambda (port) (display lira port))) $3 = "₺" scheme@(guile-user)> (call-with-output-string (lambda (port) (display (string lira) port))) $4 = "₺" scheme@(guile-user)> (call-with-input-string (string lira) read-char) $5 = #\₺ scheme@(guile-user)> (call-with-input-string (string lira) read) $6 = ₺ scheme@(guile-user)> ,use (ice-9 rdelim) scheme@(guile-user)> (call-with-input-string (string lira) read-line) $7 = "₺" scheme@(guile-user)> (number->string (char->integer (call-with-input-string (string lira) read-char)) 16) $8 = "20ba" scheme@(guile-user)> (format #f "~a bla" lira) $9 = "₺ bla" scheme@(guile-user)> (format #f "~a bla" (string lira)) $10 = "₺ bla" scheme@(guile-user)> ,use (ice-9 format) scheme@(guile-user)> (format #f "~a bla" lira) $11 = "₺ bla" scheme@(guile-user)> (format #f "~a bla" (string lira)) $12 = "₺ bla" scheme@(guile-user)> (simple-format #f "~a bla" lira) $13 = "₺ bla" scheme@(guile-user)> (simple-format #f "~a bla" (string lira)) $14 = "₺ bla" scheme@(guile-user)> --8<---------------cut here---------------end--------------->8--- I'm not sure if you'll see it correctly in your mail client, but in the above transcript, I see unmunged Turkish liras in every response from Guile except $8. If you type the same commands above into a fresh Guile session on your system, does it behave correctly, or are the results munged? If you're able to produce a small self-contained example that demonstrates the problem, e.g. something along the lines of the transcript above but which misbehaves on your system, that would be very helpful. Also, can you tell me precisely which version of Guile is misbehaving, and what versions of libunistring and/or iconv is Guile linked with? Thanks, Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-17 19:30 ` Mark H Weaver @ 2019-04-18 16:22 ` Christopher Lam 2019-04-18 18:51 ` Eli Zaretskii 2019-04-18 19:29 ` Mark H Weaver 0 siblings, 2 replies; 14+ messages in thread From: Christopher Lam @ 2019-04-18 16:22 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user Hi Mark Thank you so much for looking into this. I'm reviewing the GnuCash for Windows package (v3.5 released April 2019) which contains the following libraries: - guile 2.0.14 - libunistring 0.9.7.0 - libiconv 1.15.0.0 I've managed to run the included guile.exe -- copied the libraries to %temp%\lib and adding that to GUILE_LOAD_PATH Here's the equivalent transcript, run from Windows 10 Command Prompt --8<---------------cut here---------------end--------------->8--- scheme@(guile-user)> (define lira #\x20BA) scheme@(guile-user)> lira $3 = #\20272 scheme@(guile-user)> (string lira) $4 = "\u20ba" scheme@(guile-user)> (define lira #\x20BA) scheme@(guile-user)> lira $5 = #\20272 scheme@(guile-user)> (string lira) $6 = "\u20ba" scheme@(guile-user)> (call-with-output-string (lambda (port) (display lira port))) $7 = "?" scheme@(guile-user)> (call-with-output-string (lambda (port) (display (string lira) port))) $8 = "?" scheme@(guile-user)> (locale-encoding) $9 = "CP1252" ... scheme@(guile-user)> (setlocale LC_ALL "") $15 = "English_Australia.1252" scheme@(guile-user)> (locale-encoding) $16 = "CP1252" scheme@(guile-user)> (setlocale LC_ALL "C") $17 = "C" scheme@(guile-user)> (locale-encoding) $18 = "CP1252" --8<---------------cut here---------------end--------------->8--- On Wed, 17 Apr 2019 at 19:32, Mark H Weaver <mhw@netris.org> wrote: > Hi Christopher, > > Christopher Lam <christopher.lck@gmail.com> writes: > > > Bearing in mind majority of strings code in GnuCash handle Unicode just > > fine. However, there are some currencies e.g.TYR > > https://en.wikipedia.org/wiki/Turkish_lira need extended Unicode and are > > misprinted as ? in the reports. > > I looked into Turkish lira, which according to the web page above was > recently (2012) assigned the Unicode code point U+20BA, included in > Unicode 6.2. As that the same code point that's getting munged on your > end? > > I tried passing this character through Guile 2.2's string ports and also > through 'format' on my system, and encountered no difficulties: > > --8<---------------cut here---------------start------------->8--- > mhw@jojen ~/guile-stable-2.2$ guile > GNU Guile 2.2.4 > Copyright (C) 1995-2017 Free Software Foundation, Inc. > > Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. > This program is free software, and you are welcome to redistribute it > under certain conditions; type `,show c' for details. > > Enter `,help' for help. > scheme@(guile-user)> (define lira #\x20BA) > scheme@(guile-user)> lira > $1 = #\₺ > scheme@(guile-user)> (string lira) > $2 = "₺" > scheme@(guile-user)> (call-with-output-string > (lambda (port) (display lira port))) > $3 = "₺" > scheme@(guile-user)> (call-with-output-string > (lambda (port) (display (string lira) port))) > $4 = "₺" > scheme@(guile-user)> (call-with-input-string (string lira) read-char) > $5 = #\₺ > scheme@(guile-user)> (call-with-input-string (string lira) read) > $6 = ₺ > scheme@(guile-user)> ,use (ice-9 rdelim) > scheme@(guile-user)> (call-with-input-string (string lira) read-line) > $7 = "₺" > scheme@(guile-user)> (number->string (char->integer > (call-with-input-string (string lira) read-char)) 16) > $8 = "20ba" > scheme@(guile-user)> (format #f "~a bla" lira) > $9 = "₺ bla" > scheme@(guile-user)> (format #f "~a bla" (string lira)) > $10 = "₺ bla" > scheme@(guile-user)> ,use (ice-9 format) > scheme@(guile-user)> (format #f "~a bla" lira) > $11 = "₺ bla" > scheme@(guile-user)> (format #f "~a bla" (string lira)) > $12 = "₺ bla" > scheme@(guile-user)> (simple-format #f "~a bla" lira) > $13 = "₺ bla" > scheme@(guile-user)> (simple-format #f "~a bla" (string lira)) > $14 = "₺ bla" > scheme@(guile-user)> > --8<---------------cut here---------------end--------------->8--- > > I'm not sure if you'll see it correctly in your mail client, but in the > above transcript, I see unmunged Turkish liras in every response from > Guile except $8. > > If you type the same commands above into a fresh Guile session on your > system, does it behave correctly, or are the results munged? > > If you're able to produce a small self-contained example that > demonstrates the problem, e.g. something along the lines of the > transcript above but which misbehaves on your system, that would be very > helpful. > > Also, can you tell me precisely which version of Guile is misbehaving, > and what versions of libunistring and/or iconv is Guile linked with? > > Thanks, > Mark > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-18 16:22 ` Christopher Lam @ 2019-04-18 18:51 ` Eli Zaretskii 2019-04-18 19:29 ` Mark H Weaver 1 sibling, 0 replies; 14+ messages in thread From: Eli Zaretskii @ 2019-04-18 18:51 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user > From: Christopher Lam <christopher.lck@gmail.com> > Date: Thu, 18 Apr 2019 16:22:24 +0000 > Cc: guile-user <guile-user@gnu.org> > > scheme@(guile-user)> (setlocale LC_ALL "") > $15 = "English_Australia.1252" > scheme@(guile-user)> (locale-encoding) > $16 = "CP1252" > scheme@(guile-user)> (setlocale LC_ALL "C") > $17 = "C" > scheme@(guile-user)> (locale-encoding) > $18 = "CP1252" The above is expected, if codepage 1252 is your system codepage. Here's what I get, FWIW: scheme@(guile-user)> (use-modules (ice-9 i18n)) scheme@(guile-user)> (locale-encoding) $1 = "CP1252" scheme@(guile-user)> (setlocale LC_ALL "French_France.1250") $2 = "French_France.1250" scheme@(guile-user)> (locale-encoding) $3 = "CP1250" scheme@(guile-user)> (setlocale LC_ALL "") $4 = "English_United States.1252" scheme@(guile-user)> (locale-encoding) $5 = "CP1252" scheme@(guile-user)> (setlocale LC_ALL "C") $6 = "C" scheme@(guile-user)> (locale-encoding) $7 = "CP1255" My system locale is 1255, as you might have guessed. HTH ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-18 16:22 ` Christopher Lam 2019-04-18 18:51 ` Eli Zaretskii @ 2019-04-18 19:29 ` Mark H Weaver 2019-04-18 21:18 ` Mark H Weaver 1 sibling, 1 reply; 14+ messages in thread From: Mark H Weaver @ 2019-04-18 19:29 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user Hi Christopher, Christopher Lam <christopher.lck@gmail.com> writes: > Hi Mark > Thank you so much for looking into this. > I'm reviewing the GnuCash for Windows package (v3.5 released April 2019) > which contains the following libraries: > - guile 2.0.14 Ah, for some reason I thought you were using Guile 2.2. That explains the problem. In Guile 2.0, string ports internally used the locale encoding by default, which meant that any characters not supported by the locale encoding would be munged. Guile 2.2 changed the behavior of string ports to always use UTF-8 internally, which ensures that all valid Guile strings can pass through unmunged. So, this problem would almost certainly be fixed by updating to Guile 2.2. Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-18 19:29 ` Mark H Weaver @ 2019-04-18 21:18 ` Mark H Weaver 2019-04-19 10:26 ` Christopher Lam 0 siblings, 1 reply; 14+ messages in thread From: Mark H Weaver @ 2019-04-18 21:18 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user Hi again, Earlier, I wrote: > Christopher Lam <christopher.lck@gmail.com> writes: > >> Hi Mark >> Thank you so much for looking into this. >> I'm reviewing the GnuCash for Windows package (v3.5 released April 2019) >> which contains the following libraries: >> - guile 2.0.14 > > Ah, for some reason I thought you were using Guile 2.2. That explains > the problem. > > In Guile 2.0, string ports internally used the locale encoding by > default, which meant that any characters not supported by the locale > encoding would be munged. > > Guile 2.2 changed the behavior of string ports to always use UTF-8 > internally, which ensures that all valid Guile strings can pass through > unmunged. > > So, this problem would almost certainly be fixed by updating to > Guile 2.2. It's probably a good idea to update to Guile 2.2 anyway, but I'd like to also offer the following workaround, which monkey patches the string port procedures in Guile 2.0 to behave more like Guile 2.2. Note that it only patches the Scheme APIs for string ports, and not the underlying C functions. It might be that some code, possibly within Guile itself, creates a string port using the C functions, and such string ports may still munge characters. Anyway, if you want to try it, arrange for GnuCash to evaluate the code below, after initializing Guile. Mark (when (string=? (effective-version) "2.0") ;; When using Guile 2.0.x, use monkey patching to change the ;; behavior of string ports to use UTF-8 as the internal encoding. ;; Note that this is the default behavior in Guile 2.2 or later. (let* ((mod (resolve-module '(guile))) (orig-open-input-string (module-ref mod 'open-input-string)) (orig-open-output-string (module-ref mod 'open-output-string)) (orig-object->string (module-ref mod 'object->string)) (orig-simple-format (module-ref mod 'simple-format))) (define (open-input-string str) (with-fluids ((%default-port-encoding "UTF-8")) (orig-open-input-string str))) (define (open-output-string) (with-fluids ((%default-port-encoding "UTF-8")) (orig-open-output-string))) (define (object->string . args) (with-fluids ((%default-port-encoding "UTF-8")) (apply orig-object->string args))) (define (simple-format . args) (with-fluids ((%default-port-encoding "UTF-8")) (apply orig-simple-format args))) (define (call-with-input-string str proc) (proc (open-input-string str))) (define (call-with-output-string proc) (let ((port (open-output-string))) (proc port) (get-output-string port))) (module-set! mod 'open-input-string open-input-string) (module-set! mod 'open-output-string open-output-string) (module-set! mod 'object->string object->string) (module-set! mod 'simple-format simple-format) (module-set! mod 'call-with-input-string call-with-input-string) (module-set! mod 'call-with-output-string call-with-output-string) (when (eqv? (module-ref mod 'format) orig-simple-format) (module-set! mod 'format simple-format)))) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-18 21:18 ` Mark H Weaver @ 2019-04-19 10:26 ` Christopher Lam 2019-05-14 4:42 ` Christopher Lam 0 siblings, 1 reply; 14+ messages in thread From: Christopher Lam @ 2019-04-19 10:26 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user Hi, The patch *does* work and handles unicode properly :) There are unintended consequences however, whereby other (probably C-based) string-code in Windows are now reading the lira-symbol into unexpected chars (eg lira-symbol -> "â‚°" i.e. #xe2 #x201a #xba) but this is now outside the scope of this post. Thank you again! On Thu, 18 Apr 2019 at 21:20, Mark H Weaver <mhw@netris.org> wrote: > Hi again, > > Earlier, I wrote: > > > Christopher Lam <christopher.lck@gmail.com> writes: > > > >> Hi Mark > >> Thank you so much for looking into this. > >> I'm reviewing the GnuCash for Windows package (v3.5 released April 2019) > >> which contains the following libraries: > >> - guile 2.0.14 > > > > Ah, for some reason I thought you were using Guile 2.2. That explains > > the problem. > > > > In Guile 2.0, string ports internally used the locale encoding by > > default, which meant that any characters not supported by the locale > > encoding would be munged. > > > > Guile 2.2 changed the behavior of string ports to always use UTF-8 > > internally, which ensures that all valid Guile strings can pass through > > unmunged. > > > > So, this problem would almost certainly be fixed by updating to > > Guile 2.2. > > It's probably a good idea to update to Guile 2.2 anyway, but I'd like to > also offer the following workaround, which monkey patches the string > port procedures in Guile 2.0 to behave more like Guile 2.2. > > Note that it only patches the Scheme APIs for string ports, and not the > underlying C functions. It might be that some code, possibly within > Guile itself, creates a string port using the C functions, and such > string ports may still munge characters. > > Anyway, if you want to try it, arrange for GnuCash to evaluate the code > below, after initializing Guile. > > Mark > > > (when (string=? (effective-version) "2.0") > ;; When using Guile 2.0.x, use monkey patching to change the > ;; behavior of string ports to use UTF-8 as the internal encoding. > ;; Note that this is the default behavior in Guile 2.2 or later. > (let* ((mod (resolve-module '(guile))) > (orig-open-input-string (module-ref mod 'open-input-string)) > (orig-open-output-string (module-ref mod 'open-output-string)) > (orig-object->string (module-ref mod 'object->string)) > (orig-simple-format (module-ref mod 'simple-format))) > > (define (open-input-string str) > (with-fluids ((%default-port-encoding "UTF-8")) > (orig-open-input-string str))) > > (define (open-output-string) > (with-fluids ((%default-port-encoding "UTF-8")) > (orig-open-output-string))) > > (define (object->string . args) > (with-fluids ((%default-port-encoding "UTF-8")) > (apply orig-object->string args))) > > (define (simple-format . args) > (with-fluids ((%default-port-encoding "UTF-8")) > (apply orig-simple-format args))) > > (define (call-with-input-string str proc) > (proc (open-input-string str))) > > (define (call-with-output-string proc) > (let ((port (open-output-string))) > (proc port) > (get-output-string port))) > > (module-set! mod 'open-input-string open-input-string) > (module-set! mod 'open-output-string open-output-string) > (module-set! mod 'object->string object->string) > (module-set! mod 'simple-format simple-format) > (module-set! mod 'call-with-input-string call-with-input-string) > (module-set! mod 'call-with-output-string call-with-output-string) > > (when (eqv? (module-ref mod 'format) orig-simple-format) > (module-set! mod 'format simple-format)))) > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-04-19 10:26 ` Christopher Lam @ 2019-05-14 4:42 ` Christopher Lam 2019-05-26 10:52 ` Christopher Lam 0 siblings, 1 reply; 14+ messages in thread From: Christopher Lam @ 2019-05-14 4:42 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user Hi Mark Final update - first, we've reused your efficient substring-replace function in https://github.com/Gnucash/gnucash/commit/7d15e6e4e727c87fb4a501e924c4ae02276e508d from a few years ago. Second, the email thread https://lists.gnu.org/archive/html/guile-devel/2014-03/msg00060.html confirmed a lot of issues in guile-2.0 could be solved in Windows by upgrading to guile-2.2. So, GnuCash has now upgraded to guile-2.2 on Windows and the string-ports are now behaving. Thank you (twice) :) On Fri, 19 Apr 2019 at 10:26, Christopher Lam <christopher.lck@gmail.com> wrote: > Hi, > The patch *does* work and handles unicode properly :) There are unintended > consequences however, whereby other (probably C-based) string-code in > Windows are now reading the lira-symbol into unexpected chars (eg > lira-symbol -> "â‚°" i.e. #xe2 #x201a #xba) but this is now outside the > scope of this post. > Thank you again! > > On Thu, 18 Apr 2019 at 21:20, Mark H Weaver <mhw@netris.org> wrote: > >> Hi again, >> >> Earlier, I wrote: >> >> > Christopher Lam <christopher.lck@gmail.com> writes: >> > >> >> Hi Mark >> >> Thank you so much for looking into this. >> >> I'm reviewing the GnuCash for Windows package (v3.5 released April >> 2019) >> >> which contains the following libraries: >> >> - guile 2.0.14 >> > >> > Ah, for some reason I thought you were using Guile 2.2. That explains >> > the problem. >> > >> > In Guile 2.0, string ports internally used the locale encoding by >> > default, which meant that any characters not supported by the locale >> > encoding would be munged. >> > >> > Guile 2.2 changed the behavior of string ports to always use UTF-8 >> > internally, which ensures that all valid Guile strings can pass through >> > unmunged. >> > >> > So, this problem would almost certainly be fixed by updating to >> > Guile 2.2. >> >> It's probably a good idea to update to Guile 2.2 anyway, but I'd like to >> also offer the following workaround, which monkey patches the string >> port procedures in Guile 2.0 to behave more like Guile 2.2. >> >> Note that it only patches the Scheme APIs for string ports, and not the >> underlying C functions. It might be that some code, possibly within >> Guile itself, creates a string port using the C functions, and such >> string ports may still munge characters. >> >> Anyway, if you want to try it, arrange for GnuCash to evaluate the code >> below, after initializing Guile. >> >> Mark >> >> >> (when (string=? (effective-version) "2.0") >> ;; When using Guile 2.0.x, use monkey patching to change the >> ;; behavior of string ports to use UTF-8 as the internal encoding. >> ;; Note that this is the default behavior in Guile 2.2 or later. >> (let* ((mod (resolve-module '(guile))) >> (orig-open-input-string (module-ref mod 'open-input-string)) >> (orig-open-output-string (module-ref mod 'open-output-string)) >> (orig-object->string (module-ref mod 'object->string)) >> (orig-simple-format (module-ref mod 'simple-format))) >> >> (define (open-input-string str) >> (with-fluids ((%default-port-encoding "UTF-8")) >> (orig-open-input-string str))) >> >> (define (open-output-string) >> (with-fluids ((%default-port-encoding "UTF-8")) >> (orig-open-output-string))) >> >> (define (object->string . args) >> (with-fluids ((%default-port-encoding "UTF-8")) >> (apply orig-object->string args))) >> >> (define (simple-format . args) >> (with-fluids ((%default-port-encoding "UTF-8")) >> (apply orig-simple-format args))) >> >> (define (call-with-input-string str proc) >> (proc (open-input-string str))) >> >> (define (call-with-output-string proc) >> (let ((port (open-output-string))) >> (proc port) >> (get-output-string port))) >> >> (module-set! mod 'open-input-string open-input-string) >> (module-set! mod 'open-output-string open-output-string) >> (module-set! mod 'object->string object->string) >> (module-set! mod 'simple-format simple-format) >> (module-set! mod 'call-with-input-string call-with-input-string) >> (module-set! mod 'call-with-output-string call-with-output-string) >> >> (when (eqv? (module-ref mod 'format) orig-simple-format) >> (module-set! mod 'format simple-format)))) >> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-05-14 4:42 ` Christopher Lam @ 2019-05-26 10:52 ` Christopher Lam 2019-05-26 20:48 ` Mark H Weaver 0 siblings, 1 reply; 14+ messages in thread From: Christopher Lam @ 2019-05-26 10:52 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-user Addendum - wish to confirm if guile bug (guile-2.2 on Windows): - set locale to non-Anglo so that (setlocale LC_ALL) returns "French_France.1252" - call (strftime "%B" 4000000) - that's 4x10^6 -- this should return "février 1970" but the following error arises: Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input locale conversion error" 0 #vu8(102 233 118 114 105 101 114 32 49 57 55 48))'. Is this a bug? On Tue, 14 May 2019 at 12:42, Christopher Lam <christopher.lck@gmail.com> wrote: > Hi Mark > Final update - first, we've reused your efficient substring-replace > function in > https://github.com/Gnucash/gnucash/commit/7d15e6e4e727c87fb4a501e924c4ae02276e508d > from a few years ago. > Second, the email thread > https://lists.gnu.org/archive/html/guile-devel/2014-03/msg00060.html > confirmed a lot of issues in guile-2.0 could be solved in Windows by > upgrading to guile-2.2. So, GnuCash has now upgraded to guile-2.2 on > Windows and the string-ports are now behaving. > Thank you (twice) > :) > > On Fri, 19 Apr 2019 at 10:26, Christopher Lam <christopher.lck@gmail.com> > wrote: > >> Hi, >> The patch *does* work and handles unicode properly :) There are >> unintended consequences however, whereby other (probably C-based) >> string-code in Windows are now reading the lira-symbol into unexpected >> chars (eg lira-symbol -> "â‚°" i.e. #xe2 #x201a #xba) but this is now >> outside the scope of this post. >> Thank you again! >> >> On Thu, 18 Apr 2019 at 21:20, Mark H Weaver <mhw@netris.org> wrote: >> >>> Hi again, >>> >>> Earlier, I wrote: >>> >>> > Christopher Lam <christopher.lck@gmail.com> writes: >>> > >>> >> Hi Mark >>> >> Thank you so much for looking into this. >>> >> I'm reviewing the GnuCash for Windows package (v3.5 released April >>> 2019) >>> >> which contains the following libraries: >>> >> - guile 2.0.14 >>> > >>> > Ah, for some reason I thought you were using Guile 2.2. That explains >>> > the problem. >>> > >>> > In Guile 2.0, string ports internally used the locale encoding by >>> > default, which meant that any characters not supported by the locale >>> > encoding would be munged. >>> > >>> > Guile 2.2 changed the behavior of string ports to always use UTF-8 >>> > internally, which ensures that all valid Guile strings can pass through >>> > unmunged. >>> > >>> > So, this problem would almost certainly be fixed by updating to >>> > Guile 2.2. >>> >>> It's probably a good idea to update to Guile 2.2 anyway, but I'd like to >>> also offer the following workaround, which monkey patches the string >>> port procedures in Guile 2.0 to behave more like Guile 2.2. >>> >>> Note that it only patches the Scheme APIs for string ports, and not the >>> underlying C functions. It might be that some code, possibly within >>> Guile itself, creates a string port using the C functions, and such >>> string ports may still munge characters. >>> >>> Anyway, if you want to try it, arrange for GnuCash to evaluate the code >>> below, after initializing Guile. >>> >>> Mark >>> >>> >>> (when (string=? (effective-version) "2.0") >>> ;; When using Guile 2.0.x, use monkey patching to change the >>> ;; behavior of string ports to use UTF-8 as the internal encoding. >>> ;; Note that this is the default behavior in Guile 2.2 or later. >>> (let* ((mod (resolve-module '(guile))) >>> (orig-open-input-string (module-ref mod 'open-input-string)) >>> (orig-open-output-string (module-ref mod 'open-output-string)) >>> (orig-object->string (module-ref mod 'object->string)) >>> (orig-simple-format (module-ref mod 'simple-format))) >>> >>> (define (open-input-string str) >>> (with-fluids ((%default-port-encoding "UTF-8")) >>> (orig-open-input-string str))) >>> >>> (define (open-output-string) >>> (with-fluids ((%default-port-encoding "UTF-8")) >>> (orig-open-output-string))) >>> >>> (define (object->string . args) >>> (with-fluids ((%default-port-encoding "UTF-8")) >>> (apply orig-object->string args))) >>> >>> (define (simple-format . args) >>> (with-fluids ((%default-port-encoding "UTF-8")) >>> (apply orig-simple-format args))) >>> >>> (define (call-with-input-string str proc) >>> (proc (open-input-string str))) >>> >>> (define (call-with-output-string proc) >>> (let ((port (open-output-string))) >>> (proc port) >>> (get-output-string port))) >>> >>> (module-set! mod 'open-input-string open-input-string) >>> (module-set! mod 'open-output-string open-output-string) >>> (module-set! mod 'object->string object->string) >>> (module-set! mod 'simple-format simple-format) >>> (module-set! mod 'call-with-input-string call-with-input-string) >>> (module-set! mod 'call-with-output-string call-with-output-string) >>> >>> (when (eqv? (module-ref mod 'format) orig-simple-format) >>> (module-set! mod 'format simple-format)))) >>> >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: string-ports issue on Windows 2019-05-26 10:52 ` Christopher Lam @ 2019-05-26 20:48 ` Mark H Weaver 0 siblings, 0 replies; 14+ messages in thread From: Mark H Weaver @ 2019-05-26 20:48 UTC (permalink / raw) To: Christopher Lam; +Cc: guile-user Hi Christopher, Christopher Lam <christopher.lck@gmail.com> writes: > Addendum - wish to confirm if guile bug (guile-2.2 on Windows): > - set locale to non-Anglo so that (setlocale LC_ALL) returns > "French_France.1252" > - call (strftime "%B" 4000000) - that's 4x10^6 -- this should return > "février 1970" > > but the following error arises: > Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input > locale conversion error" 0 #vu8(102 233 118 114 105 101 114 32 49 57 55 > 48))'. > > Is this a bug? Yes, it is. I've filed a bug report and provided more information here: https://bugs.gnu.org/35920 Thanks for letting us know about this. Regards, Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-05-26 20:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-16 4:13 string-ports issue on Windows Christopher Lam 2019-04-16 14:34 ` Eli Zaretskii 2019-04-16 17:15 ` Mark H Weaver 2019-04-16 17:28 ` Mark H Weaver 2019-04-16 23:26 ` Christopher Lam 2019-04-17 19:30 ` Mark H Weaver 2019-04-18 16:22 ` Christopher Lam 2019-04-18 18:51 ` Eli Zaretskii 2019-04-18 19:29 ` Mark H Weaver 2019-04-18 21:18 ` Mark H Weaver 2019-04-19 10:26 ` Christopher Lam 2019-05-14 4:42 ` Christopher Lam 2019-05-26 10:52 ` Christopher Lam 2019-05-26 20:48 ` Mark H Weaver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).