From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Zefram Newsgroups: gmane.lisp.guile.bugs Subject: bug#24186: setlocale can't be localised Date: Mon, 8 Aug 2016 23:30:37 +0100 Message-ID: <20160808223037.GI24721@fysh.org> References: <20160808163214.GF24721@fysh.org> <87invbm25x.fsf@pobox.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1470695483 14213 195.159.176.226 (8 Aug 2016 22:31:23 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 8 Aug 2016 22:31:23 +0000 (UTC) Cc: 24186@debbugs.gnu.org To: Andy Wingo Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Tue Aug 09 00:31:19 2016 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bWt4m-0003bM-Tv for guile-bugs@m.gmane.org; Tue, 09 Aug 2016 00:31:17 +0200 Original-Received: from localhost ([::1]:60269 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWt4j-0007AV-No for guile-bugs@m.gmane.org; Mon, 08 Aug 2016 18:31:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:34211) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWt4c-0007AL-Mf for bug-guile@gnu.org; Mon, 08 Aug 2016 18:31:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bWt4Y-0006f5-C1 for bug-guile@gnu.org; Mon, 08 Aug 2016 18:31:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:34893) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bWt4Y-0006f0-8E for bug-guile@gnu.org; Mon, 08 Aug 2016 18:31:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1bWt4X-00008U-UQ for bug-guile@gnu.org; Mon, 08 Aug 2016 18:31:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Zefram Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 08 Aug 2016 22:31:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24186 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 24186-submit@debbugs.gnu.org id=B24186.1470695443488 (code B ref 24186); Mon, 08 Aug 2016 22:31:01 +0000 Original-Received: (at 24186) by debbugs.gnu.org; 8 Aug 2016 22:30:43 +0000 Original-Received: from localhost ([127.0.0.1]:60423 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bWt4F-00007n-At for submit@debbugs.gnu.org; Mon, 08 Aug 2016 18:30:43 -0400 Original-Received: from river.fysh.org ([87.98.248.19]:50518 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bWt4D-00007e-Nx for 24186@debbugs.gnu.org; Mon, 08 Aug 2016 18:30:42 -0400 Original-Received: from zefram by river.fysh.org with local (Exim 4.84_2 #1 (Debian)) id 1bWt49-0003QY-H2; Mon, 08 Aug 2016 23:30:37 +0100 Content-Disposition: inline In-Reply-To: <87invbm25x.fsf@pobox.com> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:8368 Archived-At: Andy Wingo wrote: >Firstly, just to make sure that we are getting things right in 2.2 (and >if not we need to know), would you mind testing with the latest 2.1.x >release? Inspection of the 2.1.3 code shows that, like 2.0, it sets the default port encoding fluid and the encoding of the three currently-selected ports, as a side effect of every setlocale call (both read and write). I'm afraid I'm having difficulty compiling it. I mostly install software via the Debian packages, which is how I have used 1.8 and 2.0, so this is my first time compiling a Guile myself. It's failing on a missing library for which Debian supplies no package. I may sort this out later, but right now I can't run 2.1.3. >Also, do you have a concrete program whose behavior you expect to be >different? Anything I show you would be quite artificial. Let's have a go at getting closer to a real program. A likely use of a temporary locale change is to format a numeric or time string through a function that uses the currently-selected locale. A temporary locale change would be required if the program needs to format it in someone else's locale, or needs this string to be in a predictable form for a particular file format regardless of user locale. For example, strftime is such a function, and a web service might need to format a time string for the user who made a particular request. We can have users all over the world, so we identify each user's locale, while the program as a whole uses either the "C" locale or the locale of whoever is hosting the service. (define (call-with-locale cat val body) (let ((oldval #f)) (dynamic-wind (lambda () (set! oldval (setlocale cat)) (setlocale cat val)) body (lambda () (setlocale cat oldval))))) (define (day-of-week-string) (strftime "%A" (localtime (current-time)))) (define (day-of-week-string-for-locale loc) (call-with-locale LC_TIME loc day-of-week-string)) ;; user-locale is application-specific code defined elsewhere (define (day-of-week-string-for-user user) (day-of-week-string-for-locale (user-locale user))) This much of the usage works fine: scheme@(guile-user)> (day-of-week-string) $1 = "Monday" scheme@(guile-user)> (day-of-week-string-for-locale "de_DE") $2 = "Montag" scheme@(guile-user)> (day-of-week-string) $3 = "Monday" Observe that calling day-of-week-string-for-locale doesn't change the prevailing locale of the program. Thus the subsequent day-of-week-string call uses the same locale that the first one did. The above works identically on Guile 1.8 and 2.0. But things are different when we look at port encoding. (Obviously now we're on 2.0-specific code.) Suppose that we have a currently-selected input that is encoded in UTF-8. Suppose further that this choice of encoding is specific to this part of the application, not reflecting any locale choice, and the program generally runs in the default "C" locale. Now we get: scheme@(guile-user)> (set-port-encoding! (current-input-port) "UTF-8") scheme@(guile-user)> (day-of-week-string) $5 = "Monday" scheme@(guile-user)> (port-encoding (current-input-port)) $6 = "UTF-8" scheme@(guile-user)> (day-of-week-string-for-locale "de_DE") $7 = "Montag" scheme@(guile-user)> (port-encoding (current-input-port)) $8 = "ANSI_X3.4-1968" The locale-restoring part of call-with-locale, called via day-of-week-string-for-locale, now has the side effect of setting the input's encoding to the nominal encoding of the "C" locale, namely ASCII. If not worked around, input processing breaks. Is that sketch close enough to a concrete example? >I believe that the intention (for better or for worse) is that calling >`setlocale' with 2 arguments changes the "default port encoding". (Aside:) *any* two-argument call, even if not relevant to encoding? The encoding thing is only derived from LC_CTYPE, so even if one is expecting something like this it's a bit surprising for an LC_TIME call to affect encoding. >the next port you open will have the encoding specified by the >`setlocale', if you don't change it explicitly later. To achieve the effect you've stated there, there is potentially a better way. You have quite sensibly described the effect at a higher user-story kind of level, rather than say exactly what happens to the fluid. You've put the fluid there, and documented it, as a perfectly sensible way for the user to control the default port encoding. As things stand, the setlocale side effect is interfering with that control. Suppose that instead the default port encoding fluid can take a special value #:locale-at-open, which has the effect that when a port is opened it will get its encoding set from the current locale. You then have the fluid default to that value, and have setlocale not touch the fluid at all. This way, if the user doesn't touch the fluid but does call setlocale then the locale controls the encoding of new ports. But if the user does set the fluid (to something other than #:locale-at-open), indicating a desire to specifically control default port encoding, then setlocale doesn't clobber the user's choice. How does this sound to you? > But I don't think >it should change the encoding of already-open ports, should it? In a situation where setlocale is expected to deliberately side-effect the default port encoding fluid, I can't figure out whether to expect it to do more. I suppose on general principle it's less surprising for it to do less. It's certainly less work to work around it, where the side effects are unwanted. If you go with the #:locale-at-open plan that I described above, then setlocale should definitely not touch the encoding of already-open ports. Just so that it is localisable as originally designed. There's another way to get the best of both worlds. In addition to the #:locale-at-open value for the default port encoding fluid, there could also be some special encoding value for a port, #:locale-at-io, meaning to use whatever locale is in effect at the time of an I/O operation. #:locale-at-io is also a valid value for the fluid, which will be copied into a new port in the regular way. The stdin, stdout, and stderr ports that are automatically opened at program initialisation can be set to #:locale-at-io, and setlocale now doesn't directly set the encoding of any port. If the user calls setlocale without otherwise controlling port encoding then the locale controls the encoding of the primordial ports. I expect that's the effect that the setlocale code was aiming for, given that when setlocale is called it's too late to affect the opening of the primordial ports. -zefram