unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#24186: setlocale can't be localised
@ 2016-08-08 16:32 Zefram
  2016-08-08 20:33 ` Andy Wingo
  0 siblings, 1 reply; 7+ messages in thread
From: Zefram @ 2016-08-08 16:32 UTC (permalink / raw)
  To: 24186

In Guile 1.8 it was possible to localise the effect of a setlocale
operation, but in Guile 2.0 it's no longer possible by natural use of the
locale API.  This loss of a useful facility is either a bug or something
that needs to be discussed in the documentation.

In Guile 1.8 one could perform a temporary setlocale for the execution of
some piece of code, and revert its effect by another setlocale on unwind.
This looks like:

    (define (call-with-locale cat newval body)
      (let ((oldval #f))
	(dynamic-wind
	  (lambda () (set! oldval (setlocale cat)) (setlocale cat newval))
	  body (lambda () (setlocale cat oldval)))))

Some difficulty arises from this being temporally scoped, where dynamic
or lexical scoping would be nicer, but in single-threaded programs it
works pretty well.  The C setlocale(3) API, after which Guile's setlocale
is modelled, is obviously designed to enable this kind of mechanism: the
read operation reports all relevant state, and the write operation with
the old value sets it all back as it was.  It is critical to this ability
that the read operation does indeed report all the state that will be set.

In Guile 2.0, the setlocale function no longer corresponds so closely to
the C setlocale(3), and this critical guarantee has been lost.  I have
previously reported in bug#22910 that the setlocale read operation
has a side effect on port encoding, and obviously that interferes with
the above code, but actually there's still a problem if that's fixed.
The setlocale *write* operation also affects port encoding (actually
the default port encoding fluid and the encoding of currently-selected
ports), and that seems to be an intentional change, but it also breaks
the above code.  The setlocale read operation doesn't report the encoding
of the currently-selected ports, so doesn't represent everything that
setlocale will set.  The setlocale write operation is not even capable
of setting the port encodings independently: it sets all three to the
encoding nominated by the locale selected for LC_CTYPE purposes.

I think adding this extra effect to setlocale was a mistake.  It doesn't
fit the locale API.  If the extra effect is removed, that would resolve
this problem.

If you really want setlocale to have this effect, then something needs to
be done to address the ability that has been lost.  The documentation
certainly needs to describe the effect on port encoding, which it
currently doesn't.  (There is a mention of some interaction with the
%default-port-encoding fluid in the documentation of that fluid, but it
doesn't match reality: it doesn't say that setlocale writes to the fluid.)
It also ought to specifically warn that the setlocale save-and-restore
dance that works in C doesn't work here.  It should explain what needs
to be done by library functions that want to achieve a localised locale
change.  Are they entirely forbidden to use setlocale?  Are they expected
to manually save and restore port encodings around setlocale calls?
(This is complicated by set-port-encoding! not accepting #f as an encoding
value, despite it actually being a permitted value for the encoding slot.)
Some example code equivalent to the above call-with-locale would be
useful.

-zefram





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#24186: setlocale can't be localised
  2016-08-08 16:32 bug#24186: setlocale can't be localised Zefram
@ 2016-08-08 20:33 ` Andy Wingo
  2016-08-08 22:30   ` Zefram
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Wingo @ 2016-08-08 20:33 UTC (permalink / raw)
  To: Zefram; +Cc: 24186

Greets :)

Firstly, just to make sure that we are getting things right in 2.2 (and
if not we need to know), would you mind testing with the latest 2.1.x
release?

Also, do you have a concrete program whose behavior you expect to be
different?  I appreciate the call-with-locale example but I'm not sure
what is going wrong so having an example invocation would be useful.

On Mon 08 Aug 2016 18:32, Zefram <zefram@fysh.org> writes:

> The setlocale *write* operation also affects port encoding (actually
> the default port encoding fluid and the encoding of currently-selected
> ports), and that seems to be an intentional change, but it also breaks
> the above code.

I believe that the intention (for better or for worse) is that calling
`setlocale' with 2 arguments changes the "default port encoding".  Like,
the next port you open will have the encoding specified by the
`setlocale', if you don't change it explicitly later.  But I don't think
it should change the encoding of already-open ports, should it?

Andy





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#24186: setlocale can't be localised
  2016-08-08 20:33 ` Andy Wingo
@ 2016-08-08 22:30   ` Zefram
  2016-08-09 12:15     ` Zefram
  2016-10-11  8:06     ` Ludovic Courtès
  0 siblings, 2 replies; 7+ messages in thread
From: Zefram @ 2016-08-08 22:30 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 24186

Andy Wingo wrote:
>Firstly, just to make sure that we are getting things right in 2.2 (and
>if not we need to know), would you mind testing with the latest 2.1.x
>release?

Inspection of the 2.1.3 code shows that, like 2.0, it sets the default
port encoding fluid and the encoding of the three currently-selected
ports, as a side effect of every setlocale call (both read and write).

I'm afraid I'm having difficulty compiling it.  I mostly install software
via the Debian packages, which is how I have used 1.8 and 2.0, so this
is my first time compiling a Guile myself.  It's failing on a missing
library for which Debian supplies no package.  I may sort this out later,
but right now I can't run 2.1.3.

>Also, do you have a concrete program whose behavior you expect to be
>different?

Anything I show you would be quite artificial.  Let's have a go at
getting closer to a real program.

A likely use of a temporary locale change is to format a numeric or
time string through a function that uses the currently-selected locale.
A temporary locale change would be required if the program needs to
format it in someone else's locale, or needs this string to be in a
predictable form for a particular file format regardless of user locale.
For example, strftime is such a function, and a web service might need
to format a time string for the user who made a particular request.
We can have users all over the world, so we identify each user's locale,
while the program as a whole uses either the "C" locale or the locale
of whoever is hosting the service.

    (define (call-with-locale cat val body)
      (let ((oldval #f))
	(dynamic-wind
	  (lambda () (set! oldval (setlocale cat)) (setlocale cat val))
	  body (lambda () (setlocale cat oldval)))))

    (define (day-of-week-string)
      (strftime "%A" (localtime (current-time)))) 

    (define (day-of-week-string-for-locale loc)
      (call-with-locale LC_TIME loc day-of-week-string))

    ;; user-locale is application-specific code defined elsewhere
    (define (day-of-week-string-for-user user)
      (day-of-week-string-for-locale (user-locale user)))

This much of the usage works fine:

scheme@(guile-user)> (day-of-week-string)
$1 = "Monday"
scheme@(guile-user)> (day-of-week-string-for-locale "de_DE")
$2 = "Montag"
scheme@(guile-user)> (day-of-week-string)
$3 = "Monday"

Observe that calling day-of-week-string-for-locale doesn't change the
prevailing locale of the program.  Thus the subsequent day-of-week-string
call uses the same locale that the first one did.  The above works
identically on Guile 1.8 and 2.0.

But things are different when we look at port encoding.  (Obviously now
we're on 2.0-specific code.)  Suppose that we have a currently-selected
input that is encoded in UTF-8.  Suppose further that this choice of
encoding is specific to this part of the application, not reflecting any
locale choice, and the program generally runs in the default "C" locale.
Now we get:

scheme@(guile-user)> (set-port-encoding! (current-input-port) "UTF-8")
scheme@(guile-user)> (day-of-week-string)
$5 = "Monday"
scheme@(guile-user)> (port-encoding (current-input-port))
$6 = "UTF-8"
scheme@(guile-user)> (day-of-week-string-for-locale "de_DE")
$7 = "Montag"
scheme@(guile-user)> (port-encoding (current-input-port))
$8 = "ANSI_X3.4-1968"

The locale-restoring part of call-with-locale, called via
day-of-week-string-for-locale, now has the side effect of setting the
input's encoding to the nominal encoding of the "C" locale, namely ASCII.
If not worked around, input processing breaks.

Is that sketch close enough to a concrete example?

>I believe that the intention (for better or for worse) is that calling
>`setlocale' with 2 arguments changes the "default port encoding".

(Aside:) *any* two-argument call, even if not relevant to encoding?
The encoding thing is only derived from LC_CTYPE, so even if one is
expecting something like this it's a bit surprising for an LC_TIME call
to affect encoding.

>the next port you open will have the encoding specified by the
>`setlocale', if you don't change it explicitly later.

To achieve the effect you've stated there, there is potentially a better
way.  You have quite sensibly described the effect at a higher user-story
kind of level, rather than say exactly what happens to the fluid.
You've put the fluid there, and documented it, as a perfectly sensible
way for the user to control the default port encoding.  As things stand,
the setlocale side effect is interfering with that control.

Suppose that instead the default port encoding fluid can take a special
value #:locale-at-open, which has the effect that when a port is opened
it will get its encoding set from the current locale.  You then have
the fluid default to that value, and have setlocale not touch the fluid
at all.  This way, if the user doesn't touch the fluid but does call
setlocale then the locale controls the encoding of new ports.  But if
the user does set the fluid (to something other than #:locale-at-open),
indicating a desire to specifically control default port encoding, then
setlocale doesn't clobber the user's choice.  How does this sound to you?

>                                                       But I don't think
>it should change the encoding of already-open ports, should it?

In a situation where setlocale is expected to deliberately side-effect
the default port encoding fluid, I can't figure out whether to expect it
to do more.  I suppose on general principle it's less surprising for it
to do less.  It's certainly less work to work around it, where the side
effects are unwanted.

If you go with the #:locale-at-open plan that I described above, then
setlocale should definitely not touch the encoding of already-open ports.
Just so that it is localisable as originally designed.

There's another way to get the best of both worlds.  In addition to the
#:locale-at-open value for the default port encoding fluid, there could
also be some special encoding value for a port, #:locale-at-io, meaning
to use whatever locale is in effect at the time of an I/O operation.
#:locale-at-io is also a valid value for the fluid, which will be copied
into a new port in the regular way.  The stdin, stdout, and stderr ports
that are automatically opened at program initialisation can be set to
#:locale-at-io, and setlocale now doesn't directly set the encoding of
any port.  If the user calls setlocale without otherwise controlling port
encoding then the locale controls the encoding of the primordial ports.
I expect that's the effect that the setlocale code was aiming for,
given that when setlocale is called it's too late to affect the opening
of the primordial ports.

-zefram





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#24186: setlocale can't be localised
  2016-08-08 22:30   ` Zefram
@ 2016-08-09 12:15     ` Zefram
  2016-08-09 17:43       ` Andy Wingo
  2016-10-11  8:06     ` Ludovic Courtès
  1 sibling, 1 reply; 7+ messages in thread
From: Zefram @ 2016-08-09 12:15 UTC (permalink / raw)
  To: Andy Wingo; +Cc: 24186

I wrote:
>is my first time compiling a Guile myself.  It's failing on a missing
>library for which Debian supplies no package.

Turns out there was a package.  It was complaining about a lack of
"bdw-gc", and Debian doesn't have anything of that name, but it does
have it under the name "libgc".  So I've now got 2.1.3 running.

All of the code in my day-of-week-string-for-locale sketch works exactly
the same on 2.1.3 as it did on 2.0.

-zefram





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#24186: setlocale can't be localised
  2016-08-09 12:15     ` Zefram
@ 2016-08-09 17:43       ` Andy Wingo
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Wingo @ 2016-08-09 17:43 UTC (permalink / raw)
  To: Zefram; +Cc: 24186

On Tue 09 Aug 2016 14:15, Zefram <zefram@fysh.org> writes:

> I wrote:
>>is my first time compiling a Guile myself.  It's failing on a missing
>>library for which Debian supplies no package.
>
> Turns out there was a package.  It was complaining about a lack of
> "bdw-gc", and Debian doesn't have anything of that name, but it does
> have it under the name "libgc".  So I've now got 2.1.3 running.
>
> All of the code in my day-of-week-string-for-locale sketch works exactly
> the same on 2.1.3 as it did on 2.0.

Tx for this testing and thanks again for these excellent bug reports :)

Andy





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#24186: setlocale can't be localised
  2016-08-08 22:30   ` Zefram
  2016-08-09 12:15     ` Zefram
@ 2016-10-11  8:06     ` Ludovic Courtès
  2016-10-11 12:49       ` Zefram
  1 sibling, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2016-10-11  8:06 UTC (permalink / raw)
  To: Zefram; +Cc: 24186

Hi,

Zefram <zefram@fysh.org> skribis:

>     (define (call-with-locale cat val body)
>       (let ((oldval #f))
> 	(dynamic-wind
> 	  (lambda () (set! oldval (setlocale cat)) (setlocale cat val))
> 	  body (lambda () (setlocale cat oldval)))))
>
>     (define (day-of-week-string)
>       (strftime "%A" (localtime (current-time)))) 
>
>     (define (day-of-week-string-for-locale loc)
>       (call-with-locale LC_TIME loc day-of-week-string))
>
>     ;; user-locale is application-specific code defined elsewhere
>     (define (day-of-week-string-for-user user)
>       (day-of-week-string-for-locale (user-locale user)))

This does not really answer your question, but (ice-9 i18n) provides
first-class locale objects, which avoid the whole global locale issue
(info "(guile) Internationalization").

Currently important procedures such as ‘strftime’ or SRFI-19’s
‘date->string’ cannot use such locale objects, though.  I think it would
make sense to add an optional locale object argument to ‘srftime’ and
‘date->string’ (though we should create a (srfi srfi-19 gnu) module for
that to make it clear that this is a GNU extension.)

That wouldn’t help with the ‘setlocale’ issue you describe per se, but
this would address such use cases in a different way.

WDYT?

Ludo’.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#24186: setlocale can't be localised
  2016-10-11  8:06     ` Ludovic Courtès
@ 2016-10-11 12:49       ` Zefram
  0 siblings, 0 replies; 7+ messages in thread
From: Zefram @ 2016-10-11 12:49 UTC (permalink / raw)
  To: Ludovic Courtes; +Cc: 24186

Ludovic Courtes wrote:
>That wouldn't help with the "setlocale" issue you describe per se, but
>this would address such use cases in a different way.
>
>WDYT?

Yes, explicit locale objects and locale parameters to relevant functions
are a good thing.  In general, the model of a global locale state is
broken, at least by threading, so some advance beyond the setlocale system
is necessary.  Note the new(er) "uselocale" system in libc, which gives
a per-thread locale state, fixing the biggest problem with setlocale.
Some form of that could also be mapped into Guile; it would be reasonable
to have a fluid that determines the locale to use where not overridden
by an explicit parameter.

All of that is welcome, but, as you say, doesn't deal with the actual
problem I identified with setlocale.  One can expect that setlocale will
continue to be used for the foreseeable future, and it needs to be shorn
of its unwanted side effects.

-zefram





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-10-11 12:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-08 16:32 bug#24186: setlocale can't be localised Zefram
2016-08-08 20:33 ` Andy Wingo
2016-08-08 22:30   ` Zefram
2016-08-09 12:15     ` Zefram
2016-08-09 17:43       ` Andy Wingo
2016-10-11  8:06     ` Ludovic Courtès
2016-10-11 12:49       ` Zefram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).