From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] fix locale string reading Date: Wed, 9 Nov 2011 18:46:51 +0800 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=20cf307ca51e6cb4ba04b14b0124 X-Trace: dough.gmane.org 1320835632 22453 80.91.229.12 (9 Nov 2011 10:47:12 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 9 Nov 2011 10:47:12 +0000 (UTC) To: Peter Brett , guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Wed Nov 09 11:47:08 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RO5gK-00081b-UT for guile-devel@m.gmane.org; Wed, 09 Nov 2011 11:47:01 +0100 Original-Received: from localhost ([::1]:59252 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RO5gK-0002He-8d for guile-devel@m.gmane.org; Wed, 09 Nov 2011 05:47:00 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:48403) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RO5gG-0002HX-RQ for guile-devel@gnu.org; Wed, 09 Nov 2011 05:46:58 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RO5gC-0002l8-Gl for guile-devel@gnu.org; Wed, 09 Nov 2011 05:46:56 -0500 Original-Received: from mail-vw0-f41.google.com ([209.85.212.41]:51465) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RO5gC-0002l4-An for guile-devel@gnu.org; Wed, 09 Nov 2011 05:46:52 -0500 Original-Received: by vws16 with SMTP id 16so1526052vws.0 for ; Wed, 09 Nov 2011 02:46:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=sgZMEiUie2y/xWOakz5THcc+I+jidulj7BrnmSbdaeo=; b=Gj/yzbchPL0Loyy2FEnB08IWtQbjgd0BlG2vs7+tn+WDxzGNJhEkoHFUjjgEflWMoC w4vE+mYTUQerKNR9s8pcIW6VEQ/iMY9tt2NS6qtUL4kwjsWNJIODqLRWfuT0NKlmEGz1 ibAJjnRHEHyel/Bwtxwku9rOiUBAWEdzPsN3c= Original-Received: by 10.52.34.177 with SMTP id a17mr3409877vdj.103.1320835611407; Wed, 09 Nov 2011 02:46:51 -0800 (PST) Original-Received: by 10.52.111.5 with HTTP; Wed, 9 Nov 2011 02:46:51 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.212.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:12870 Archived-At: --20cf307ca51e6cb4ba04b14b0124 Content-Type: text/plain; charset=UTF-8 On Wed, Nov 9, 2011 at 6:20 PM, Peter Brett wrote: > Nala Ginrut writes: > > > But Guile will break in (command-line) proc, because Chinese string as > > command arguments can not get valid result from > > "u32_conv_from_encoding" called by "scm_from_stringn", and raised an > > error. > > Probably what should happen is that Guile's command-line parsing code > should use the environment-provided locale by taking the following > steps: > > 1) Save current locale. > 2) Set locale from environment. > 3) Call scm_*_locale_string() functions. > 4) Restore original locale. > > However, note that this still may cause decoding errors, because there's > no guarantee that argv is in the same encoding as the environment > specifies, or indeed in any valid encoding at all. So consider *also* > adding e.g. a (command-line-bv) function to return the command line > without attempting to decode it. > > This couldn't be the final solution. Even we add a (command-line-bv), it may cause encoding-error. Because (command-line) would read argv too , and raise the error. Unless we use (command-line-bv) and delete (command-line). > > So we don't have any chance to convert it or change locale from > > environment in the users' code because Guile has already crashed by > > "decoding-error". > > Hang on -- are you saying that if you run Guile with badly-encoded argv > then it will die before running any user code? That would obviously be a > bug. I think so. I mentioned it in the first mail of this thread. The badly-encoded argv can not get valid result but NULL from "u32_conv_from_encoding". So scm_from_locale_stringn will raise encording-error directly and show the argv as bytevector. But even none-badly-encoded argv can not get valid result either. I checked out the code, current_charset() can not return the correct current locale. I must run setlocale(LC_ALL,"") to query locale from environment first. But I think what you mean is *not query locale from envrionment*. If this is not a bug. And locale string can not get result from environment locale. The solution maybe get rid of (command-line), use (command-line-bv), it's the easiest way. But I don't think it's the best way. --20cf307ca51e6cb4ba04b14b0124 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Wed, Nov 9, 2011 at 6:20 PM, Peter Brett <peter@peter-b.co.u= k> wrote:
Nala Ginrut <n= alaginrut@gmail.com> writes:

> But Guile will break in (command-line) proc, because Chinese string as=
> command arguments can not get valid result from
> =C2=A0"u32_conv_from_encoding" called by "scm_from_stri= ngn", and raised an
> error.=C2=A0

Probably what should happen is that Guile's command-line parsing = code
should use the environment-provided locale by taking the following
steps:

1) Save current locale.
2) Set locale from environment.
3) Call scm_*_locale_string() functions.
4) Restore original locale.

However, note that this still may cause decoding errors, because there'= s
no guarantee that argv is in the same encoding as the environment
specifies, or indeed in any valid encoding at all. =C2=A0So consider *also*=
adding e.g. a (command-line-bv) function to return the command line
without attempting to decode it.

This couldn't be the fina= l solution.
Even we add a (command-line-bv), it may cause encodin= g-error. Because (command-line) would read argv too , and raise the error.<= /div>
Unless we use (command-line-bv) and delete (command-line).
= =C2=A0
> So we don't have any chance to convert it or change locale from > environment in the users' code because Guile has already crashed b= y
> "decoding-error".

Hang on -- are you saying that if you run Guile with badly-encoded ar= gv
then it will die before running any user code? =C2=A0That would obviously b= e a bug.

I think so. I mentioned it in the = first mail of this thread.
The badly-encoded argv can not get val= id result but NULL from=C2=A0=C2=A0"u32_conv_from_encoding= ". So scm_from_locale_stringn will raise encording-error directly and = show the argv as bytevector.
B= ut even none-badly-encoded argv can not get valid result either. I checked = out the code, current_charset() can not return the correct current locale. = I must run setlocale(LC_ALL,"") to query locale from=C2=A0environ= ment first.=C2=A0
B= ut I think what you mean is *not query locale from envrionment*.=C2=A0

If this is not a bug. And locale string can not get result from enviro= nment locale. The solution maybe get rid of (command-line), use (command-li= ne-bv), it's the easiest way. But I don't think it's the best w= ay.

--20cf307ca51e6cb4ba04b14b0124--