From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH] Add "scandir" procedure Date: Tue, 20 Dec 2011 11:25:29 +0800 Message-ID: References: <1314475521.3143.47.camel@Renee-desktop> <87ty4xmxl1.fsf@gnu.org> <87r500tf6f.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=14dae9cdc89d76433e04b47d9e3e X-Trace: dough.gmane.org 1324351543 9227 80.91.229.12 (20 Dec 2011 03:25:43 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 20 Dec 2011 03:25:43 +0000 (UTC) Cc: guile-devel@gnu.org To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Tue Dec 20 04:25:37 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RcqKd-0003HH-Cl for guile-devel@m.gmane.org; Tue, 20 Dec 2011 04:25:35 +0100 Original-Received: from localhost ([::1]:48827 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RcqKc-0001uA-S1 for guile-devel@m.gmane.org; Mon, 19 Dec 2011 22:25:34 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:57780) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RcqKZ-0001tu-1L for guile-devel@gnu.org; Mon, 19 Dec 2011 22:25:32 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RcqKX-0003Eg-SI for guile-devel@gnu.org; Mon, 19 Dec 2011 22:25:31 -0500 Original-Received: from mail-vx0-f169.google.com ([209.85.220.169]:59255) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RcqKX-0003Ec-LW; Mon, 19 Dec 2011 22:25:29 -0500 Original-Received: by vcge1 with SMTP id e1so4324024vcg.0 for ; Mon, 19 Dec 2011 19:25:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=BNaId/5aLW3cy0+q51mOLm0LIg+1yxJUPM7ye1xYXXk=; b=CcjDTDUvz026uZGefxfGxEoR/x68VxjcjTRTuOQHvIt2e8ASWvKb8bbqP86NTbxldR pKE7FyxQHDi3S8708NWmG7TpzB//vUN9ZkkxbbAYL+R38rz7pGGqA020nkqSit+U4olz SDGIzuzqpyZIlAbVINPCWD1UNGygOVNQQJlr0= Original-Received: by 10.220.66.70 with SMTP id m6mr163626vci.57.1324351529302; Mon, 19 Dec 2011 19:25:29 -0800 (PST) Original-Received: by 10.52.183.194 with HTTP; Mon, 19 Dec 2011 19:25:29 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.220.169 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:13167 Archived-At: --14dae9cdc89d76433e04b47d9e3e Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Oh, I forget to tell that C wrapper version doesn't have locale problem. And according to the commit log, seems we may expect it completely fixed in 2.0.4? On Tue, Dec 20, 2011 at 11:23 AM, Nala Ginrut wrote: > > > On Tue, Dec 20, 2011 at 5:38 AM, Ludovic Court=C3=A8s wrot= e: > >> Hi Nala! >> >> Thanks for testing! >> >> Nala Ginrut skribis: >> >> > 1. I think file-system-fold based scandir tried to traverse the whole >> > directories include sub-directories. It's rather slow for a deep one i= f >> I >> > just >> > want a files list under 0 level directory tree; >> >> The code had initially approximately 1 typo per line, and I think I=E2= =80=99ve >> fixed most of them now. ;-) >> >> So =E2=80=98scandir=E2=80=99 does not enter sub-directories. If it does= , that=E2=80=99s another >> bug. :-) >> >> > 2. New scandir will crash while encounters a Chinese file name. This >> will >> > be eliminated by using (setlocale LC_ALL "zh_CN.UTF-8"). >> > I think it's the same problem we faced in another thread. There's >> > something locale problem in Guile. Of course, we have a temporary >> solution >> > in recent commit; >> >> Yes, Guile views file names as strings and decodes them from the current >> locale encoding. So if there are file names encoded differently, then >> scm_from_locale_string, called by =E2=80=98readdir=E2=80=99, throws a de= coding-error. >> >> That=E2=80=99s unfortunate, I=E2=80=99m not sure what to do. I think GL= ib/GIO issues a >> warning in such cases, while still being able to handle the file. We >> could imagine =E2=80=98readdir=E2=80=99 returning a raw bytevector when = decoding fails, >> and =E2=80=98open-file=E2=80=99 & co. could accept it as input. But tha= t=E2=80=99s really ugly. >> >> I think Mark had some ideas about it, which would be worth checking. >> >> > 3. It returns weird result. E.g (scandir "mmr") >> > =3D=3D> ("." "." "." ".." ".." ".." "aa.c" "exclude" "ml" "myecl") >> >> One of the typos that got fixed, hopefully. :-) >> >> Can you check again? >> >> > Anyway, I think new scandir's cool. Though it's little slow than my C >> wrap >> > version >> >> Because it uses =E2=80=98file-system-fold=E2=80=99, it does one =E2=80= =98stat=E2=80=99 call for each >> file, which the C version doesn=E2=80=99t do. That should be the only >> efficiency difference. >> >> Maybe we could change our =E2=80=98scandir=E2=80=99 to return a list of = file name/stat >> pairs since we have the info anyway? >> >> > Well~it's a good idea I never thought about. And I've tested new commit, > it's wonderful. > Maybe you should consider this name/stat, now the new scandir is almost 4 > times slower than C wrapper version. > Anyway, name/stat list would be a smart and easy solution. ;-) > > --14dae9cdc89d76433e04b47d9e3e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Oh, I forget to tell that C wrapper version doesn't have locale problem= .
And according to the commit log, seems we may expect it=C2=A0complete= ly=C2=A0fixed in 2.0.4?

On Tue, Dec 20, 2= 011 at 11:23 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:


On Tue, Dec 20, 2011 at 5:38 AM, Ludovic Court=C3=A8s <ludo@gnu.o= rg> wrote:
Hi Nala!

Thanks for testing!

Nala Ginrut <n= alaginrut@gmail.com> skribis:

> 1. I think file-system-fold based scandir tried to traverse the whole<= br> > directories include sub-directories. It's rather slow for a deep o= ne if I
> just
> =C2=A0 =C2=A0 want a files list under 0 level directory tree;

The code had initially approximately 1 typo per line, and I think I= =E2=80=99ve
fixed most of them now. =C2=A0;-)

So =E2=80=98scandir=E2=80=99 does not enter sub-directories. =C2=A0If it do= es, that=E2=80=99s another
bug. =C2=A0:-)

> 2. New scandir will crash while encounters a Chinese file name. This w= ill
> be eliminated by using (setlocale LC_ALL "zh_CN.UTF-8").
> =C2=A0 =C2=A0 I think it's the same problem we faced in another th= read. There's
> something locale problem in Guile. Of course, we have a temporary solu= tion
> in recent commit;

Yes, Guile views file names as strings and decodes them from the curr= ent
locale encoding. =C2=A0So if there are file names encoded differently, then=
scm_from_locale_string, called by =E2=80=98readdir=E2=80=99, throws a decod= ing-error.

That=E2=80=99s unfortunate, I=E2=80=99m not sure what to do. =C2=A0I think = GLib/GIO issues a
warning in such cases, while still being able to handle the file. =C2=A0We<= br> could imagine =E2=80=98readdir=E2=80=99 returning a raw bytevector when dec= oding fails,
and =E2=80=98open-file=E2=80=99 & co. could accept it as input. =C2=A0B= ut that=E2=80=99s really ugly.

I think Mark had some ideas about it, which would be worth checking.

> 3. It returns weird result. E.g (scandir "mmr")
> =3D=3D> =C2=A0("." "." "." "..&q= uot; ".." ".." "aa.c" "exclude" &qu= ot;ml" "myecl")

One of the typos that got fixed, hopefully. =C2=A0:-)

Can you check again?

> Anyway, I think new scandir's cool. Though it's little slow th= an my C wrap
> version

Because it uses =E2=80=98file-system-fold=E2=80=99, it does one =E2= =80=98stat=E2=80=99 call for each
file, which the C version doesn=E2=80=99t do. =C2=A0That should be the only=
efficiency difference.

Maybe we could change our =E2=80=98scandir=E2=80=99 to return a list of fil= e name/stat
pairs since we have the info anyway?


Well~it's a good idea = I never thought about. And I've tested new commit, it's wonderful.<= /div>
Maybe you should consider this name/stat, now the new scandir is = almost 4 times slower than C wrapper version.
Anyway, name/stat list would be a smart and easy solution. ;-)=C2=A0


--14dae9cdc89d76433e04b47d9e3e--