unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: guile-devel@gnu.org
Subject: Re: make check fails if no en_US.iso88591 locale
Date: Thu, 10 Sep 2009 17:33:02 +0200	[thread overview]
Message-ID: <8763bqu85t.fsf@gnu.org> (raw)
In-Reply-To: 1252586696.24639.116.camel@localhost.localdomain

[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]

Mike Gran <spk121@yahoo.com> writes:

> I could fix the test by testing only characters 0 to 127 in a C locale
> if a Latin-1 locale can't be found.

Yes, that'd be nice.

> I can also fix the test by using the 'setbinary' function

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (help setbinary)
`setbinary' is a primitive procedure in the (guile) module.

 -- Scheme Procedure: setbinary
     Sets the encoding for the current input, output, and error ports
     to ISO-8859-1.  That character encoding allows ports to operate on
     binary data.

     It also sets the default encoding for newly created ports to
     ISO-8859-1.

     The previous default encoding for new ports is returned
--8<---------------cut here---------------end--------------->8---

It seems to do a lot of things, which aren't clear from the name.  ;-)

What can be done about it?

At least it should be renamed, to `set-port-binary-mode!' or similar.

Then it'd be nice if that functionality could be split in several
functions, some operating on a per-port basis.  After all, one can
already do:

  (for-each (lambda (p)
              (set-port-encoding! p "ISO-8859-1"))
            (list (current-input-port) (current-output-port)
                  (current-error-port)))

So we just lack:

  ;; encoding for newly created ports
  (set-default-port-encoding! "ISO-8859-1")

With that `setbinary' can be implemented in Scheme.

> to force the encodings on stdin and stdout to a default value that
> will pass through binary data, instead of calling 'setlocale'.

Hmm, I think I'd still prefer `setlocale'.

regexec(3) doesn't say anything about the string encoding.  Do libc
implementations actually expect plain ASCII or Latin-1?  Or do they
adapt to the current locale's encoding?

> I looked in the POSIX spec on Regex for specific advice using 128-255 in
> regex in the C locale.  I didn't see anything offhand.  The spec does
> spend a lot of time talking about the interaction between the locale and
> regular expressions.  I get the impression from the spec that using
> regex on 128-255 in the C locale is an unexpected use of regular
> expressions.

http://www.opengroup.org/onlinepubs/9699919799/functions/regexec.html
reads:

  If, when regexec() is called, the locale is different from when the
  regular expression was compiled, the result is undefined.

It makes me think that, if a process runs with a UTF-8 locale and passes
raw UTF-8 bytes to regcomp(3) and regexec(3), it may work.

Hmm, the program below, with UTF-8-encoded source, works both with a
Latin-1 and a UTF-8 locale:


[-- Attachment #2: Type: text/x-csrc, Size: 295 bytes --]

#include <stdlib.h>
#include <regex.h>
#include <locale.h>

int
main (int argc, char *argv[])
{
  regex_t rx;
  regmatch_t match;

  setlocale (LC_ALL, "fr_FR.utf8");

  regcomp (&rx, "ça", REG_EXTENDED);
  return regexec (&rx, "ça va ?", 1, &match, 0) == 0
    ? EXIT_SUCCESS : EXIT_FAILURE;
}

[-- Attachment #3: Type: text/plain, Size: 89 bytes --]


Do you think it would work to just leave `regexp.test' as it is in 1.8?

Thanks,
Ludo'.

  reply	other threads:[~2009-09-10 15:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-09  0:45 make check fails if no en_US.iso88591 locale Neil Jerram
2009-09-09  1:28 ` Mike Gran
2009-09-09 21:53   ` Neil Jerram
2009-09-10  2:36     ` Mike Gran
2009-09-10 10:27       ` Ludovic Courtès
2009-09-10 12:44         ` Mike Gran
2009-09-10 15:33           ` Ludovic Courtès [this message]
2009-09-11  4:28             ` Mike Gran
2009-09-10 19:34       ` Neil Jerram
2009-09-10 21:17         ` Mike Gran
2009-09-09  8:08 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8763bqu85t.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).