unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* string-ports issue on Windows
@ 2019-04-16  4:13 Christopher Lam
  2019-04-16 14:34 ` Eli Zaretskii
  2019-04-16 17:28 ` Mark H Weaver
  0 siblings, 2 replies; 14+ messages in thread
From: Christopher Lam @ 2019-04-16  4:13 UTC (permalink / raw)
  To: guile-user

Dear All

I'm struggling with string-ports on Windows.

Last para of
https://www.gnu.org/software/guile/manual/html_node/String-Ports.html
"With string ports, the port-encoding is treated differently than other
types of ports. When string ports are created, they do not inherit a
character encoding from the current locale. They are given a default locale
that allows them to handle all valid string characters."

This causes a string-sanitize function to not run correctly in Windows.
(locale-encoding) says "CP1252" no matter what LANG or setlocale I try.

The use case is to sanitize string for html, but on Windows it munges
extended-unicode. So i've had to resort to this uglier code:
https://pastebin.com/raw/ys4QrhMh which does work, but raises the question
why we have to do it in the first place.

It means *any* string-ports must be avoided to avoid cross-platform string
issues, right? It's a shame because (format) is quite useful.

Chris


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-16  4:13 string-ports issue on Windows Christopher Lam
@ 2019-04-16 14:34 ` Eli Zaretskii
  2019-04-16 17:15   ` Mark H Weaver
  2019-04-16 17:28 ` Mark H Weaver
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2019-04-16 14:34 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

> From: Christopher Lam <christopher.lck@gmail.com>
> Date: Tue, 16 Apr 2019 04:13:14 +0000
> 
> I'm struggling with string-ports on Windows.

Which version of Guile are you using, and where/how did you obtain the
Windows binary?

> Last para of
> https://www.gnu.org/software/guile/manual/html_node/String-Ports.html
> "With string ports, the port-encoding is treated differently than other
> types of ports. When string ports are created, they do not inherit a
> character encoding from the current locale. They are given a default locale
> that allows them to handle all valid string characters."
> 
> This causes a string-sanitize function to not run correctly in Windows.
> (locale-encoding) says "CP1252" no matter what LANG or setlocale I try.
> 
> The use case is to sanitize string for html, but on Windows it munges
> extended-unicode.

Can you show the code that doesn't work for you as you expected?  At
least the example in "String Ports" seems to work for me as I'd
expect, with Guile 2.0.11.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-16 14:34 ` Eli Zaretskii
@ 2019-04-16 17:15   ` Mark H Weaver
  0 siblings, 0 replies; 14+ messages in thread
From: Mark H Weaver @ 2019-04-16 17:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: guile-user

Hi,

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Christopher Lam <christopher.lck@gmail.com>
>> Date: Tue, 16 Apr 2019 04:13:14 +0000
>> 
>> I'm struggling with string-ports on Windows.
>
> Which version of Guile are you using, and where/how did you obtain the
> Windows binary?
>
>> Last para of
>> https://www.gnu.org/software/guile/manual/html_node/String-Ports.html
>> "With string ports, the port-encoding is treated differently than other
>> types of ports. When string ports are created, they do not inherit a
>> character encoding from the current locale. They are given a default locale
>> that allows them to handle all valid string characters."
>> 
>> This causes a string-sanitize function to not run correctly in Windows.
>> (locale-encoding) says "CP1252" no matter what LANG or setlocale I try.
>> 
>> The use case is to sanitize string for html, but on Windows it munges
>> extended-unicode.
>
> Can you show the code that doesn't work for you as you expected?  At
> least the example in "String Ports" seems to work for me as I'd
> expect, with Guile 2.0.11.

The change in string ports that Christopher is referring to was first
introduced in Guile 2.2.0.

       Mark



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-16  4:13 string-ports issue on Windows Christopher Lam
  2019-04-16 14:34 ` Eli Zaretskii
@ 2019-04-16 17:28 ` Mark H Weaver
  2019-04-16 23:26   ` Christopher Lam
  1 sibling, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2019-04-16 17:28 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

Hi Christopher,

Christopher Lam <christopher.lck@gmail.com> writes:

> I'm struggling with string-ports on Windows.
>
> Last para of
> https://www.gnu.org/software/guile/manual/html_node/String-Ports.html
> "With string ports, the port-encoding is treated differently than other
> types of ports. When string ports are created, they do not inherit a
> character encoding from the current locale. They are given a default locale
> that allows them to handle all valid string characters."
>
> This causes a string-sanitize function to not run correctly in Windows.
> (locale-encoding) says "CP1252" no matter what LANG or setlocale I try.
>
> The use case is to sanitize string for html, but on Windows it munges
> extended-unicode.

Can you explain more fully what the problem is?  I know a fair amount
about Unicode, but my knowledge of Windows is extremely weak.

What exactly is "extended-unicode" in this context?  References welcome.

      Thanks,
        Mark



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-16 17:28 ` Mark H Weaver
@ 2019-04-16 23:26   ` Christopher Lam
  2019-04-17 19:30     ` Mark H Weaver
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Lam @ 2019-04-16 23:26 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

Thank you Mark

The problem is rather obscure and may have been fixed in 2.2.

I've taken the reins of handling the guile code in GnuCash. For various
reasons I can't fathom, the Windows build includes Guile 2.0.14 rather than
Guile-2.2. I've checked NEWS and there was change in SRFI-6 string-ports to
make them Unicode-capable in 2.0.6.

Bearing in mind majority of strings code in GnuCash handle Unicode just
fine. However, there are some currencies e.g.TYR
https://en.wikipedia.org/wiki/Turkish_lira need extended Unicode and are
misprinted as ? in the reports.

I've dwelved down and figure there are only 2 offending functions. (format
#f "~a bla" str) and (with-output-to-string) as described above. After much
experimentation I can fix by changing (format) to (string-append), and
changing (with-ouput-to-string) to (open-string-port) and importing srfi-6
as described in original port, and these fix the TYR symbol display. Hence
my suspicion that string-ports on Windows munging Unicode. To try elucidate
this I've also tried removing (setlocale LC_ALL "") and dumping
(locale-encoding) which is "CP1252".

There are also other bits where UTF8 is being interpreted as CP1252 but
these are outside the scope of this post.

So, I'm rather late in this game (started diving into scheme 18 months ago)
and have probably missed many controversial changes in the past years, but
the issue above seems weird to me, why the Windows port is munging Unicode
:)

On Tue, 16 Apr 2019 at 17:29, Mark H Weaver <mhw@netris.org> wrote:

> Hi Christopher,
>
> Christopher Lam <christopher.lck@gmail.com> writes:
>
> > I'm struggling with string-ports on Windows.
> >
> > Last para of
> > https://www.gnu.org/software/guile/manual/html_node/String-Ports.html
> > "With string ports, the port-encoding is treated differently than other
> > types of ports. When string ports are created, they do not inherit a
> > character encoding from the current locale. They are given a default
> locale
> > that allows them to handle all valid string characters."
> >
> > This causes a string-sanitize function to not run correctly in Windows.
> > (locale-encoding) says "CP1252" no matter what LANG or setlocale I try.
> >
> > The use case is to sanitize string for html, but on Windows it munges
> > extended-unicode.
>
> Can you explain more fully what the problem is?  I know a fair amount
> about Unicode, but my knowledge of Windows is extremely weak.
>
> What exactly is "extended-unicode" in this context?  References welcome.
>
>       Thanks,
>         Mark
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-16 23:26   ` Christopher Lam
@ 2019-04-17 19:30     ` Mark H Weaver
  2019-04-18 16:22       ` Christopher Lam
  0 siblings, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2019-04-17 19:30 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

Hi Christopher,

Christopher Lam <christopher.lck@gmail.com> writes:

> Bearing in mind majority of strings code in GnuCash handle Unicode just
> fine. However, there are some currencies e.g.TYR
> https://en.wikipedia.org/wiki/Turkish_lira need extended Unicode and are
> misprinted as ? in the reports.

I looked into Turkish lira, which according to the web page above was
recently (2012) assigned the Unicode code point U+20BA, included in
Unicode 6.2.  As that the same code point that's getting munged on your
end?

I tried passing this character through Guile 2.2's string ports and also
through 'format' on my system, and encountered no difficulties:

--8<---------------cut here---------------start------------->8---
mhw@jojen ~/guile-stable-2.2$ guile
GNU Guile 2.2.4
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (define lira #\x20BA)
scheme@(guile-user)> lira
$1 = #\₺
scheme@(guile-user)> (string lira)
$2 = "₺"
scheme@(guile-user)> (call-with-output-string
                       (lambda (port) (display lira port)))
$3 = "₺"
scheme@(guile-user)> (call-with-output-string
                       (lambda (port) (display (string lira) port)))
$4 = "₺"
scheme@(guile-user)> (call-with-input-string (string lira) read-char)
$5 = #\₺
scheme@(guile-user)> (call-with-input-string (string lira) read)
$6 = ₺
scheme@(guile-user)> ,use (ice-9 rdelim)
scheme@(guile-user)> (call-with-input-string (string lira) read-line)
$7 = "₺"
scheme@(guile-user)> (number->string (char->integer (call-with-input-string (string lira) read-char)) 16)
$8 = "20ba"
scheme@(guile-user)> (format #f "~a bla" lira)
$9 = "₺ bla"
scheme@(guile-user)> (format #f "~a bla" (string lira))
$10 = "₺ bla"
scheme@(guile-user)> ,use (ice-9 format)
scheme@(guile-user)> (format #f "~a bla" lira)
$11 = "₺ bla"
scheme@(guile-user)> (format #f "~a bla" (string lira))
$12 = "₺ bla"
scheme@(guile-user)> (simple-format #f "~a bla" lira)
$13 = "₺ bla"
scheme@(guile-user)> (simple-format #f "~a bla" (string lira))
$14 = "₺ bla"
scheme@(guile-user)> 
--8<---------------cut here---------------end--------------->8---

I'm not sure if you'll see it correctly in your mail client, but in the
above transcript, I see unmunged Turkish liras in every response from
Guile except $8.

If you type the same commands above into a fresh Guile session on your
system, does it behave correctly, or are the results munged?

If you're able to produce a small self-contained example that
demonstrates the problem, e.g. something along the lines of the
transcript above but which misbehaves on your system, that would be very
helpful.

Also, can you tell me precisely which version of Guile is misbehaving,
and what versions of libunistring and/or iconv is Guile linked with?

      Thanks,
        Mark



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-17 19:30     ` Mark H Weaver
@ 2019-04-18 16:22       ` Christopher Lam
  2019-04-18 18:51         ` Eli Zaretskii
  2019-04-18 19:29         ` Mark H Weaver
  0 siblings, 2 replies; 14+ messages in thread
From: Christopher Lam @ 2019-04-18 16:22 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

Hi Mark
Thank you so much for looking into this.
I'm reviewing the GnuCash for Windows package (v3.5 released April 2019)
which contains the following libraries:
- guile 2.0.14
- libunistring 0.9.7.0
- libiconv 1.15.0.0
I've managed to run the included guile.exe -- copied the libraries to
%temp%\lib and adding that to GUILE_LOAD_PATH
Here's the equivalent transcript, run from Windows 10 Command Prompt
--8<---------------cut here---------------end--------------->8---
scheme@(guile-user)> (define lira #\x20BA)
scheme@(guile-user)> lira
$3 = #\20272
scheme@(guile-user)> (string lira)
$4 = "\u20ba"
scheme@(guile-user)> (define lira #\x20BA)
scheme@(guile-user)> lira
$5 = #\20272
scheme@(guile-user)> (string lira)
$6 = "\u20ba"
scheme@(guile-user)> (call-with-output-string
                       (lambda (port) (display lira port)))
$7 = "?"
scheme@(guile-user)> (call-with-output-string
                       (lambda (port) (display (string lira) port)))
$8 = "?"
scheme@(guile-user)> (locale-encoding)
$9 = "CP1252"
...
scheme@(guile-user)> (setlocale LC_ALL "")
$15 = "English_Australia.1252"
scheme@(guile-user)> (locale-encoding)
$16 = "CP1252"
scheme@(guile-user)> (setlocale LC_ALL "C")
$17 = "C"
scheme@(guile-user)> (locale-encoding)
$18 = "CP1252"
--8<---------------cut here---------------end--------------->8---


On Wed, 17 Apr 2019 at 19:32, Mark H Weaver <mhw@netris.org> wrote:

> Hi Christopher,
>
> Christopher Lam <christopher.lck@gmail.com> writes:
>
> > Bearing in mind majority of strings code in GnuCash handle Unicode just
> > fine. However, there are some currencies e.g.TYR
> > https://en.wikipedia.org/wiki/Turkish_lira need extended Unicode and are
> > misprinted as ? in the reports.
>
> I looked into Turkish lira, which according to the web page above was
> recently (2012) assigned the Unicode code point U+20BA, included in
> Unicode 6.2.  As that the same code point that's getting munged on your
> end?
>
> I tried passing this character through Guile 2.2's string ports and also
> through 'format' on my system, and encountered no difficulties:
>
> --8<---------------cut here---------------start------------->8---
> mhw@jojen ~/guile-stable-2.2$ guile
> GNU Guile 2.2.4
> Copyright (C) 1995-2017 Free Software Foundation, Inc.
>
> Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
> This program is free software, and you are welcome to redistribute it
> under certain conditions; type `,show c' for details.
>
> Enter `,help' for help.
> scheme@(guile-user)> (define lira #\x20BA)
> scheme@(guile-user)> lira
> $1 = #\₺
> scheme@(guile-user)> (string lira)
> $2 = "₺"
> scheme@(guile-user)> (call-with-output-string
>                        (lambda (port) (display lira port)))
> $3 = "₺"
> scheme@(guile-user)> (call-with-output-string
>                        (lambda (port) (display (string lira) port)))
> $4 = "₺"
> scheme@(guile-user)> (call-with-input-string (string lira) read-char)
> $5 = #\₺
> scheme@(guile-user)> (call-with-input-string (string lira) read)
> $6 = ₺
> scheme@(guile-user)> ,use (ice-9 rdelim)
> scheme@(guile-user)> (call-with-input-string (string lira) read-line)
> $7 = "₺"
> scheme@(guile-user)> (number->string (char->integer
> (call-with-input-string (string lira) read-char)) 16)
> $8 = "20ba"
> scheme@(guile-user)> (format #f "~a bla" lira)
> $9 = "₺ bla"
> scheme@(guile-user)> (format #f "~a bla" (string lira))
> $10 = "₺ bla"
> scheme@(guile-user)> ,use (ice-9 format)
> scheme@(guile-user)> (format #f "~a bla" lira)
> $11 = "₺ bla"
> scheme@(guile-user)> (format #f "~a bla" (string lira))
> $12 = "₺ bla"
> scheme@(guile-user)> (simple-format #f "~a bla" lira)
> $13 = "₺ bla"
> scheme@(guile-user)> (simple-format #f "~a bla" (string lira))
> $14 = "₺ bla"
> scheme@(guile-user)>
> --8<---------------cut here---------------end--------------->8---
>
> I'm not sure if you'll see it correctly in your mail client, but in the
> above transcript, I see unmunged Turkish liras in every response from
> Guile except $8.
>
> If you type the same commands above into a fresh Guile session on your
> system, does it behave correctly, or are the results munged?
>
> If you're able to produce a small self-contained example that
> demonstrates the problem, e.g. something along the lines of the
> transcript above but which misbehaves on your system, that would be very
> helpful.
>
> Also, can you tell me precisely which version of Guile is misbehaving,
> and what versions of libunistring and/or iconv is Guile linked with?
>
>       Thanks,
>         Mark
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-18 16:22       ` Christopher Lam
@ 2019-04-18 18:51         ` Eli Zaretskii
  2019-04-18 19:29         ` Mark H Weaver
  1 sibling, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2019-04-18 18:51 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

> From: Christopher Lam <christopher.lck@gmail.com>
> Date: Thu, 18 Apr 2019 16:22:24 +0000
> Cc: guile-user <guile-user@gnu.org>
> 
> scheme@(guile-user)> (setlocale LC_ALL "")
> $15 = "English_Australia.1252"
> scheme@(guile-user)> (locale-encoding)
> $16 = "CP1252"
> scheme@(guile-user)> (setlocale LC_ALL "C")
> $17 = "C"
> scheme@(guile-user)> (locale-encoding)
> $18 = "CP1252"

The above is expected, if codepage 1252 is your system codepage.
Here's what I get, FWIW:

  scheme@(guile-user)> (use-modules (ice-9 i18n))
  scheme@(guile-user)> (locale-encoding)
  $1 = "CP1252"
  scheme@(guile-user)> (setlocale LC_ALL "French_France.1250")
  $2 = "French_France.1250"
  scheme@(guile-user)> (locale-encoding)
  $3 = "CP1250"
  scheme@(guile-user)> (setlocale LC_ALL "")
  $4 = "English_United States.1252"
  scheme@(guile-user)> (locale-encoding)
  $5 = "CP1252"
  scheme@(guile-user)> (setlocale LC_ALL "C")
  $6 = "C"
  scheme@(guile-user)> (locale-encoding)
  $7 = "CP1255"

My system locale is 1255, as you might have guessed.

HTH



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-18 16:22       ` Christopher Lam
  2019-04-18 18:51         ` Eli Zaretskii
@ 2019-04-18 19:29         ` Mark H Weaver
  2019-04-18 21:18           ` Mark H Weaver
  1 sibling, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2019-04-18 19:29 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

Hi Christopher,

Christopher Lam <christopher.lck@gmail.com> writes:

> Hi Mark
> Thank you so much for looking into this.
> I'm reviewing the GnuCash for Windows package (v3.5 released April 2019)
> which contains the following libraries:
> - guile 2.0.14

Ah, for some reason I thought you were using Guile 2.2.  That explains
the problem.

In Guile 2.0, string ports internally used the locale encoding by
default, which meant that any characters not supported by the locale
encoding would be munged.

Guile 2.2 changed the behavior of string ports to always use UTF-8
internally, which ensures that all valid Guile strings can pass through
unmunged.

So, this problem would almost certainly be fixed by updating to
Guile 2.2.

       Mark



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-18 19:29         ` Mark H Weaver
@ 2019-04-18 21:18           ` Mark H Weaver
  2019-04-19 10:26             ` Christopher Lam
  0 siblings, 1 reply; 14+ messages in thread
From: Mark H Weaver @ 2019-04-18 21:18 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

Hi again,

Earlier, I wrote:

> Christopher Lam <christopher.lck@gmail.com> writes:
>
>> Hi Mark
>> Thank you so much for looking into this.
>> I'm reviewing the GnuCash for Windows package (v3.5 released April 2019)
>> which contains the following libraries:
>> - guile 2.0.14
>
> Ah, for some reason I thought you were using Guile 2.2.  That explains
> the problem.
>
> In Guile 2.0, string ports internally used the locale encoding by
> default, which meant that any characters not supported by the locale
> encoding would be munged.
>
> Guile 2.2 changed the behavior of string ports to always use UTF-8
> internally, which ensures that all valid Guile strings can pass through
> unmunged.
>
> So, this problem would almost certainly be fixed by updating to
> Guile 2.2.

It's probably a good idea to update to Guile 2.2 anyway, but I'd like to
also offer the following workaround, which monkey patches the string
port procedures in Guile 2.0 to behave more like Guile 2.2.

Note that it only patches the Scheme APIs for string ports, and not the
underlying C functions.  It might be that some code, possibly within
Guile itself, creates a string port using the C functions, and such
string ports may still munge characters.

Anyway, if you want to try it, arrange for GnuCash to evaluate the code
below, after initializing Guile.

      Mark


(when (string=? (effective-version) "2.0")
  ;; When using Guile 2.0.x, use monkey patching to change the
  ;; behavior of string ports to use UTF-8 as the internal encoding.
  ;; Note that this is the default behavior in Guile 2.2 or later.
  (let* ((mod                     (resolve-module '(guile)))
         (orig-open-input-string  (module-ref mod 'open-input-string))
         (orig-open-output-string (module-ref mod 'open-output-string))
         (orig-object->string     (module-ref mod 'object->string))
         (orig-simple-format      (module-ref mod 'simple-format)))

    (define (open-input-string str)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (orig-open-input-string str)))

    (define (open-output-string)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (orig-open-output-string)))

    (define (object->string . args)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (apply orig-object->string args)))

    (define (simple-format . args)
      (with-fluids ((%default-port-encoding "UTF-8"))
        (apply orig-simple-format args)))

    (define (call-with-input-string str proc)
      (proc (open-input-string str)))

    (define (call-with-output-string proc)
      (let ((port (open-output-string)))
        (proc port)
        (get-output-string port)))

    (module-set! mod 'open-input-string       open-input-string)
    (module-set! mod 'open-output-string      open-output-string)
    (module-set! mod 'object->string          object->string)
    (module-set! mod 'simple-format           simple-format)
    (module-set! mod 'call-with-input-string  call-with-input-string)
    (module-set! mod 'call-with-output-string call-with-output-string)

    (when (eqv? (module-ref mod 'format) orig-simple-format)
      (module-set! mod 'format simple-format))))



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-18 21:18           ` Mark H Weaver
@ 2019-04-19 10:26             ` Christopher Lam
  2019-05-14  4:42               ` Christopher Lam
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Lam @ 2019-04-19 10:26 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

Hi,
The patch *does* work and handles unicode properly :) There are unintended
consequences however, whereby other (probably C-based) string-code in
Windows are now reading the lira-symbol into unexpected chars (eg
lira-symbol -> "â‚°" i.e. #xe2 #x201a #xba) but this is now outside the
scope of this post.
Thank you again!

On Thu, 18 Apr 2019 at 21:20, Mark H Weaver <mhw@netris.org> wrote:

> Hi again,
>
> Earlier, I wrote:
>
> > Christopher Lam <christopher.lck@gmail.com> writes:
> >
> >> Hi Mark
> >> Thank you so much for looking into this.
> >> I'm reviewing the GnuCash for Windows package (v3.5 released April 2019)
> >> which contains the following libraries:
> >> - guile 2.0.14
> >
> > Ah, for some reason I thought you were using Guile 2.2.  That explains
> > the problem.
> >
> > In Guile 2.0, string ports internally used the locale encoding by
> > default, which meant that any characters not supported by the locale
> > encoding would be munged.
> >
> > Guile 2.2 changed the behavior of string ports to always use UTF-8
> > internally, which ensures that all valid Guile strings can pass through
> > unmunged.
> >
> > So, this problem would almost certainly be fixed by updating to
> > Guile 2.2.
>
> It's probably a good idea to update to Guile 2.2 anyway, but I'd like to
> also offer the following workaround, which monkey patches the string
> port procedures in Guile 2.0 to behave more like Guile 2.2.
>
> Note that it only patches the Scheme APIs for string ports, and not the
> underlying C functions.  It might be that some code, possibly within
> Guile itself, creates a string port using the C functions, and such
> string ports may still munge characters.
>
> Anyway, if you want to try it, arrange for GnuCash to evaluate the code
> below, after initializing Guile.
>
>       Mark
>
>
> (when (string=? (effective-version) "2.0")
>   ;; When using Guile 2.0.x, use monkey patching to change the
>   ;; behavior of string ports to use UTF-8 as the internal encoding.
>   ;; Note that this is the default behavior in Guile 2.2 or later.
>   (let* ((mod                     (resolve-module '(guile)))
>          (orig-open-input-string  (module-ref mod 'open-input-string))
>          (orig-open-output-string (module-ref mod 'open-output-string))
>          (orig-object->string     (module-ref mod 'object->string))
>          (orig-simple-format      (module-ref mod 'simple-format)))
>
>     (define (open-input-string str)
>       (with-fluids ((%default-port-encoding "UTF-8"))
>         (orig-open-input-string str)))
>
>     (define (open-output-string)
>       (with-fluids ((%default-port-encoding "UTF-8"))
>         (orig-open-output-string)))
>
>     (define (object->string . args)
>       (with-fluids ((%default-port-encoding "UTF-8"))
>         (apply orig-object->string args)))
>
>     (define (simple-format . args)
>       (with-fluids ((%default-port-encoding "UTF-8"))
>         (apply orig-simple-format args)))
>
>     (define (call-with-input-string str proc)
>       (proc (open-input-string str)))
>
>     (define (call-with-output-string proc)
>       (let ((port (open-output-string)))
>         (proc port)
>         (get-output-string port)))
>
>     (module-set! mod 'open-input-string       open-input-string)
>     (module-set! mod 'open-output-string      open-output-string)
>     (module-set! mod 'object->string          object->string)
>     (module-set! mod 'simple-format           simple-format)
>     (module-set! mod 'call-with-input-string  call-with-input-string)
>     (module-set! mod 'call-with-output-string call-with-output-string)
>
>     (when (eqv? (module-ref mod 'format) orig-simple-format)
>       (module-set! mod 'format simple-format))))
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-04-19 10:26             ` Christopher Lam
@ 2019-05-14  4:42               ` Christopher Lam
  2019-05-26 10:52                 ` Christopher Lam
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Lam @ 2019-05-14  4:42 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

Hi Mark
Final update - first, we've reused your efficient substring-replace
function in
https://github.com/Gnucash/gnucash/commit/7d15e6e4e727c87fb4a501e924c4ae02276e508d
from a few years ago.
Second, the email thread
https://lists.gnu.org/archive/html/guile-devel/2014-03/msg00060.html
confirmed a lot of issues in guile-2.0 could be solved in Windows by
upgrading to guile-2.2. So, GnuCash has now upgraded to guile-2.2 on
Windows and the string-ports are now behaving.
Thank you (twice)
:)

On Fri, 19 Apr 2019 at 10:26, Christopher Lam <christopher.lck@gmail.com>
wrote:

> Hi,
> The patch *does* work and handles unicode properly :) There are unintended
> consequences however, whereby other (probably C-based) string-code in
> Windows are now reading the lira-symbol into unexpected chars (eg
> lira-symbol -> "â‚°" i.e. #xe2 #x201a #xba) but this is now outside the
> scope of this post.
> Thank you again!
>
> On Thu, 18 Apr 2019 at 21:20, Mark H Weaver <mhw@netris.org> wrote:
>
>> Hi again,
>>
>> Earlier, I wrote:
>>
>> > Christopher Lam <christopher.lck@gmail.com> writes:
>> >
>> >> Hi Mark
>> >> Thank you so much for looking into this.
>> >> I'm reviewing the GnuCash for Windows package (v3.5 released April
>> 2019)
>> >> which contains the following libraries:
>> >> - guile 2.0.14
>> >
>> > Ah, for some reason I thought you were using Guile 2.2.  That explains
>> > the problem.
>> >
>> > In Guile 2.0, string ports internally used the locale encoding by
>> > default, which meant that any characters not supported by the locale
>> > encoding would be munged.
>> >
>> > Guile 2.2 changed the behavior of string ports to always use UTF-8
>> > internally, which ensures that all valid Guile strings can pass through
>> > unmunged.
>> >
>> > So, this problem would almost certainly be fixed by updating to
>> > Guile 2.2.
>>
>> It's probably a good idea to update to Guile 2.2 anyway, but I'd like to
>> also offer the following workaround, which monkey patches the string
>> port procedures in Guile 2.0 to behave more like Guile 2.2.
>>
>> Note that it only patches the Scheme APIs for string ports, and not the
>> underlying C functions.  It might be that some code, possibly within
>> Guile itself, creates a string port using the C functions, and such
>> string ports may still munge characters.
>>
>> Anyway, if you want to try it, arrange for GnuCash to evaluate the code
>> below, after initializing Guile.
>>
>>       Mark
>>
>>
>> (when (string=? (effective-version) "2.0")
>>   ;; When using Guile 2.0.x, use monkey patching to change the
>>   ;; behavior of string ports to use UTF-8 as the internal encoding.
>>   ;; Note that this is the default behavior in Guile 2.2 or later.
>>   (let* ((mod                     (resolve-module '(guile)))
>>          (orig-open-input-string  (module-ref mod 'open-input-string))
>>          (orig-open-output-string (module-ref mod 'open-output-string))
>>          (orig-object->string     (module-ref mod 'object->string))
>>          (orig-simple-format      (module-ref mod 'simple-format)))
>>
>>     (define (open-input-string str)
>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>         (orig-open-input-string str)))
>>
>>     (define (open-output-string)
>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>         (orig-open-output-string)))
>>
>>     (define (object->string . args)
>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>         (apply orig-object->string args)))
>>
>>     (define (simple-format . args)
>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>         (apply orig-simple-format args)))
>>
>>     (define (call-with-input-string str proc)
>>       (proc (open-input-string str)))
>>
>>     (define (call-with-output-string proc)
>>       (let ((port (open-output-string)))
>>         (proc port)
>>         (get-output-string port)))
>>
>>     (module-set! mod 'open-input-string       open-input-string)
>>     (module-set! mod 'open-output-string      open-output-string)
>>     (module-set! mod 'object->string          object->string)
>>     (module-set! mod 'simple-format           simple-format)
>>     (module-set! mod 'call-with-input-string  call-with-input-string)
>>     (module-set! mod 'call-with-output-string call-with-output-string)
>>
>>     (when (eqv? (module-ref mod 'format) orig-simple-format)
>>       (module-set! mod 'format simple-format))))
>>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-05-14  4:42               ` Christopher Lam
@ 2019-05-26 10:52                 ` Christopher Lam
  2019-05-26 20:48                   ` Mark H Weaver
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Lam @ 2019-05-26 10:52 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-user

Addendum - wish to confirm if guile bug (guile-2.2 on Windows):
- set locale to non-Anglo so that (setlocale LC_ALL) returns
"French_France.1252"
- call (strftime "%B" 4000000) - that's 4x10^6 -- this should return
"février 1970"

but the following error arises:
Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input
locale conversion error" 0 #vu8(102 233 118 114 105 101 114 32 49 57 55
48))'.

Is this a bug?

On Tue, 14 May 2019 at 12:42, Christopher Lam <christopher.lck@gmail.com>
wrote:

> Hi Mark
> Final update - first, we've reused your efficient substring-replace
> function in
> https://github.com/Gnucash/gnucash/commit/7d15e6e4e727c87fb4a501e924c4ae02276e508d
> from a few years ago.
> Second, the email thread
> https://lists.gnu.org/archive/html/guile-devel/2014-03/msg00060.html
> confirmed a lot of issues in guile-2.0 could be solved in Windows by
> upgrading to guile-2.2. So, GnuCash has now upgraded to guile-2.2 on
> Windows and the string-ports are now behaving.
> Thank you (twice)
> :)
>
> On Fri, 19 Apr 2019 at 10:26, Christopher Lam <christopher.lck@gmail.com>
> wrote:
>
>> Hi,
>> The patch *does* work and handles unicode properly :) There are
>> unintended consequences however, whereby other (probably C-based)
>> string-code in Windows are now reading the lira-symbol into unexpected
>> chars (eg lira-symbol -> "â‚°" i.e. #xe2 #x201a #xba) but this is now
>> outside the scope of this post.
>> Thank you again!
>>
>> On Thu, 18 Apr 2019 at 21:20, Mark H Weaver <mhw@netris.org> wrote:
>>
>>> Hi again,
>>>
>>> Earlier, I wrote:
>>>
>>> > Christopher Lam <christopher.lck@gmail.com> writes:
>>> >
>>> >> Hi Mark
>>> >> Thank you so much for looking into this.
>>> >> I'm reviewing the GnuCash for Windows package (v3.5 released April
>>> 2019)
>>> >> which contains the following libraries:
>>> >> - guile 2.0.14
>>> >
>>> > Ah, for some reason I thought you were using Guile 2.2.  That explains
>>> > the problem.
>>> >
>>> > In Guile 2.0, string ports internally used the locale encoding by
>>> > default, which meant that any characters not supported by the locale
>>> > encoding would be munged.
>>> >
>>> > Guile 2.2 changed the behavior of string ports to always use UTF-8
>>> > internally, which ensures that all valid Guile strings can pass through
>>> > unmunged.
>>> >
>>> > So, this problem would almost certainly be fixed by updating to
>>> > Guile 2.2.
>>>
>>> It's probably a good idea to update to Guile 2.2 anyway, but I'd like to
>>> also offer the following workaround, which monkey patches the string
>>> port procedures in Guile 2.0 to behave more like Guile 2.2.
>>>
>>> Note that it only patches the Scheme APIs for string ports, and not the
>>> underlying C functions.  It might be that some code, possibly within
>>> Guile itself, creates a string port using the C functions, and such
>>> string ports may still munge characters.
>>>
>>> Anyway, if you want to try it, arrange for GnuCash to evaluate the code
>>> below, after initializing Guile.
>>>
>>>       Mark
>>>
>>>
>>> (when (string=? (effective-version) "2.0")
>>>   ;; When using Guile 2.0.x, use monkey patching to change the
>>>   ;; behavior of string ports to use UTF-8 as the internal encoding.
>>>   ;; Note that this is the default behavior in Guile 2.2 or later.
>>>   (let* ((mod                     (resolve-module '(guile)))
>>>          (orig-open-input-string  (module-ref mod 'open-input-string))
>>>          (orig-open-output-string (module-ref mod 'open-output-string))
>>>          (orig-object->string     (module-ref mod 'object->string))
>>>          (orig-simple-format      (module-ref mod 'simple-format)))
>>>
>>>     (define (open-input-string str)
>>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>>         (orig-open-input-string str)))
>>>
>>>     (define (open-output-string)
>>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>>         (orig-open-output-string)))
>>>
>>>     (define (object->string . args)
>>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>>         (apply orig-object->string args)))
>>>
>>>     (define (simple-format . args)
>>>       (with-fluids ((%default-port-encoding "UTF-8"))
>>>         (apply orig-simple-format args)))
>>>
>>>     (define (call-with-input-string str proc)
>>>       (proc (open-input-string str)))
>>>
>>>     (define (call-with-output-string proc)
>>>       (let ((port (open-output-string)))
>>>         (proc port)
>>>         (get-output-string port)))
>>>
>>>     (module-set! mod 'open-input-string       open-input-string)
>>>     (module-set! mod 'open-output-string      open-output-string)
>>>     (module-set! mod 'object->string          object->string)
>>>     (module-set! mod 'simple-format           simple-format)
>>>     (module-set! mod 'call-with-input-string  call-with-input-string)
>>>     (module-set! mod 'call-with-output-string call-with-output-string)
>>>
>>>     (when (eqv? (module-ref mod 'format) orig-simple-format)
>>>       (module-set! mod 'format simple-format))))
>>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: string-ports issue on Windows
  2019-05-26 10:52                 ` Christopher Lam
@ 2019-05-26 20:48                   ` Mark H Weaver
  0 siblings, 0 replies; 14+ messages in thread
From: Mark H Weaver @ 2019-05-26 20:48 UTC (permalink / raw)
  To: Christopher Lam; +Cc: guile-user

Hi Christopher,

Christopher Lam <christopher.lck@gmail.com> writes:

> Addendum - wish to confirm if guile bug (guile-2.2 on Windows):
> - set locale to non-Anglo so that (setlocale LC_ALL) returns
> "French_France.1252"
> - call (strftime "%B" 4000000) - that's 4x10^6 -- this should return
> "février 1970"
>
> but the following error arises:
> Throw to key `decoding-error' with args `("scm_from_utf8_stringn" "input
> locale conversion error" 0 #vu8(102 233 118 114 105 101 114 32 49 57 55
> 48))'.
>
> Is this a bug?

Yes, it is.  I've filed a bug report and provided more information here:

  https://bugs.gnu.org/35920

Thanks for letting us know about this.

     Regards,
       Mark



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-05-26 20:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-16  4:13 string-ports issue on Windows Christopher Lam
2019-04-16 14:34 ` Eli Zaretskii
2019-04-16 17:15   ` Mark H Weaver
2019-04-16 17:28 ` Mark H Weaver
2019-04-16 23:26   ` Christopher Lam
2019-04-17 19:30     ` Mark H Weaver
2019-04-18 16:22       ` Christopher Lam
2019-04-18 18:51         ` Eli Zaretskii
2019-04-18 19:29         ` Mark H Weaver
2019-04-18 21:18           ` Mark H Weaver
2019-04-19 10:26             ` Christopher Lam
2019-05-14  4:42               ` Christopher Lam
2019-05-26 10:52                 ` Christopher Lam
2019-05-26 20:48                   ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).