* mingw runtime patches @ 2011-02-15 15:34 Jan Nieuwenhuizen 2011-02-15 15:34 ` [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name Jan Nieuwenhuizen ` (4 more replies) 0 siblings, 5 replies; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-02-15 15:34 UTC (permalink / raw) To: guile-devel I have now sucessfully cross-built guile-1.9 for mingw and used it to run a simple guile-gnome GUI [http:/lilypond.org/schikkers-list]. Earlier I have sent a couple of configure and build-time patches, what follows here are patches that I'm using to fix runtime problems. Most obvious lacking is still the relocation patch which we discussed. That needs more work to be used outside of our cross build system (GUB). Greetings, Jan. -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name. 2011-02-15 15:34 mingw runtime patches Jan Nieuwenhuizen @ 2011-02-15 15:34 ` Jan Nieuwenhuizen 2011-04-29 16:33 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Jan Nieuwenhuizen ` (3 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-02-15 15:34 UTC (permalink / raw) To: guile-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2645 bytes --] From: Jan Nieuwenhuizen <janneke@gnu.org> It does not look like this will be fixed any time soon in gnulib. 2011-02-04 Jan Nieuwenhuizen <janneke@gnu.org> * libguile/filesys.h: * libguile/filesys.c (mingw_canonicalize_file_name)[__MINGW32__]: Add minimal implementation of canonicalize_file_name for Mingw. --- libguile/filesys.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++ libguile/filesys.h | 6 ++++ 2 files changed, 74 insertions(+), 0 deletions(-) diff --git a/libguile/filesys.c b/libguile/filesys.c index 68d90d9..93b0ce2 100644 --- a/libguile/filesys.c +++ b/libguile/filesys.c @@ -1631,6 +1631,74 @@ SCM_DEFINE (scm_basename, "basename", 1, 1, 0, } #undef FUNC_NAME +#ifdef __MINGW32__ +/* gnulib's canonicalize_file_name silently fails on Mingw. */ +#include <ctype.h> +#include <direct.h> +#include <windows.h> + +static char const * +slashify (char const *str) +{ + char *p = (char*)str; + + while (*p) + { + if (*p == '\\') + *p = '/'; + p++; + } + return str; +} + +static char const * +strlower (char const *str) +{ + char *p = str; + while (*p) + { + *p = (char)tolower (*p); + p++; + } + return str; +} + +static char * +mingw_realpath (char const *name, char *resolved) +{ + char *rpath = NULL; + + if (name == NULL || name[0] == '\0') + return NULL; + + if (resolved == NULL) + { + rpath = malloc (PATH_MAX + 1); + if (rpath == NULL) + return NULL; + } + else + rpath = resolved; + + GetFullPathName (name, PATH_MAX, rpath, NULL); + strlower (slashify (rpath)); + struct stat st; + if (lstat (rpath, &st) < 0) + { + if (resolved == NULL) + free (rpath); + return NULL; + } + return rpath; +} + +char * +mingw_canonicalize_file_name (char const *name) +{ + return mingw_realpath (name, NULL); +} +#endif /* __MINGW32__ */ + SCM_DEFINE (scm_canonicalize_path, "canonicalize-path", 1, 0, 0, (SCM path), "Return the canonical path of @var{path}. A canonical path has\n" diff --git a/libguile/filesys.h b/libguile/filesys.h index 967ce74..2f11e85 100644 --- a/libguile/filesys.h +++ b/libguile/filesys.h @@ -27,6 +27,12 @@ \f +#ifdef __MINGW32__ +extern char *mingw_canonicalize_file_name (char const *name); +#undef canonicalize_file_name +#define canonicalize_file_name mingw_canonicalize_file_name +#endif /* __MINGW32__ */ + SCM_API scm_t_bits scm_tc16_dir; #define SCM_DIR_FLAG_OPEN (1L << 0) -- 1.7.1 -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name. 2011-02-15 15:34 ` [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name Jan Nieuwenhuizen @ 2011-04-29 16:33 ` Andy Wingo 2011-05-20 13:56 ` Jan Nieuwenhuizen 0 siblings, 1 reply; 47+ messages in thread From: Andy Wingo @ 2011-04-29 16:33 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel On Tue 15 Feb 2011 16:34, Jan Nieuwenhuizen <janneke-list@xs4all.nl> writes: > From: Jan Nieuwenhuizen <janneke@gnu.org> > > It does not look like this will be fixed any time soon in gnulib. > > 2011-02-04 Jan Nieuwenhuizen <janneke@gnu.org> > > * libguile/filesys.h: > * libguile/filesys.c (mingw_canonicalize_file_name)[__MINGW32__]: Add > minimal implementation of canonicalize_file_name for Mingw. I read the thread and it looked like this patch was close to being accepted, and that the real issue was a lack of rm.exe, which could be worked around, so I am going to forget about this patch on the Guile side. http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/25451 Regards, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name. 2011-04-29 16:33 ` Andy Wingo @ 2011-05-20 13:56 ` Jan Nieuwenhuizen 2011-05-20 14:54 ` Andy Wingo 0 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-05-20 13:56 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Andy Wingo writes: > I read the thread and it looked like this patch was close to being > accepted Yes, it looks like that. My first simplistic patch to just make guile canonicalize_file_name work on mingw took my an hour to write and test. I spent four full working days arguing with gnulib of the necessity of the patch, writing tests, writing, talking, rewriting a new patch. It's my estimate that it will only take two to four working days to get this in gnulib. I'm not planning to do that without sponsoring, sorry. Greetings, Jan -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name. 2011-05-20 13:56 ` Jan Nieuwenhuizen @ 2011-05-20 14:54 ` Andy Wingo 0 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-20 14:54 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: bug-gnulib, Bruno Haible, guile-devel Hello Jan, On Fri 20 May 2011 15:56, Jan Nieuwenhuizen <janneke@gnu.org> writes: > Andy Wingo writes: > >> I read the thread and it looked like this patch was close to being >> accepted > > Yes, it looks like that. Heh ;) > My first simplistic patch to just make guile canonicalize_file_name work > on mingw took my an hour to write and test. > > I spent four full working days arguing with gnulib of the necessity of > the patch, writing tests, writing, talking, rewriting a new patch. > > It's my estimate that it will only take two to four working days to get > this in gnulib. I'm not planning to do that without sponsoring, sorry. Understood. I have copied Bruno and the gnulib list for an update on what the status of this bug is, and what if anything is needed to fix canonicalize_path on mingw. The context: http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/25451 I would appreciate it if Gnulib folk would have another look at this one. Thanks, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-02-15 15:34 mingw runtime patches Jan Nieuwenhuizen 2011-02-15 15:34 ` [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name Jan Nieuwenhuizen @ 2011-02-15 15:35 ` Jan Nieuwenhuizen 2011-04-29 17:16 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ Jan Nieuwenhuizen ` (2 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-02-15 15:35 UTC (permalink / raw) To: guile-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1416 bytes --] From: Jan Nieuwenhuizen <janneke@gnu.org> 2011-02-04 Jan Nieuwenhuizen <janneke@gnu.org> * module/system/base/compile.scm (compiled-file-name): Add directory separator and remove colon for Mingw. Fixes compilation on Windows. --- module/system/base/compile.scm | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/module/system/base/compile.scm b/module/system/base/compile.scm index 7d46713..8c72e54 100644 --- a/module/system/base/compile.scm +++ b/module/system/base/compile.scm @@ -100,11 +100,16 @@ ".go") (else (car %load-compiled-extensions)))) (and %compile-fallback-path - (let ((f (string-append + (let* ((c (canonicalize-path file)) + (f (string-append %compile-fallback-path ;; no need for '/' separator here, canonicalize-path ;; will give us an absolute path - (canonicalize-path file) + (if (eq? (string-ref c 1) #\:) + ;; on Mingw remove drive-letter separator `:' to + ;; obtain valid file name + (substring c 2) + c) (compiled-extension)))) (and (false-if-exception (ensure-writable-dir (dirname f))) f)))) -- 1.7.1 -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-02-15 15:35 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Jan Nieuwenhuizen @ 2011-04-29 17:16 ` Andy Wingo 2011-04-29 17:30 ` Noah Lavine 2011-05-20 13:47 ` Jan Nieuwenhuizen 0 siblings, 2 replies; 47+ messages in thread From: Andy Wingo @ 2011-04-29 17:16 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel Hi Jan, On Tue 15 Feb 2011 16:35, Jan Nieuwenhuizen <janneke-list@xs4all.nl> writes: > From: Jan Nieuwenhuizen <janneke@gnu.org> > > 2011-02-04 Jan Nieuwenhuizen <janneke@gnu.org> > > * module/system/base/compile.scm (compiled-file-name): Add > directory separator and remove colon for Mingw. Fixes > compilation on Windows. > --- > module/system/base/compile.scm | 9 +++++++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/module/system/base/compile.scm b/module/system/base/compile.scm > index 7d46713..8c72e54 100644 > --- a/module/system/base/compile.scm > +++ b/module/system/base/compile.scm > @@ -100,11 +100,16 @@ > ".go") > (else (car %load-compiled-extensions)))) > (and %compile-fallback-path > - (let ((f (string-append > + (let* ((c (canonicalize-path file)) > + (f (string-append > %compile-fallback-path > ;; no need for '/' separator here, canonicalize-path > ;; will give us an absolute path > - (canonicalize-path file) > + (if (eq? (string-ref c 1) #\:) > + ;; on Mingw remove drive-letter separator `:' to > + ;; obtain valid file name > + (substring c 2) > + c) > (compiled-extension)))) > (and (false-if-exception (ensure-writable-dir (dirname f))) > f)))) I don't much like this approach. Besides mixing in a heuristic on all machines that is win32-specific, it makes c:/foo.scm collide with d:/foo.scm in the cache, and fails to also modify load.c which also does autocompilation in other contexts. I think we need a proper path library, and unfortunately I think it needs to be implemented at least partly in C, due to circularity issues. See http://docs.racket-lang.org/reference/pathutils.html for an example of what I'm talking about. Is anyone interested in implementing a path library? Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-04-29 17:16 ` Andy Wingo @ 2011-04-29 17:30 ` Noah Lavine 2011-05-01 11:30 ` Andy Wingo 2011-05-20 13:47 ` Jan Nieuwenhuizen 1 sibling, 1 reply; 47+ messages in thread From: Noah Lavine @ 2011-04-29 17:30 UTC (permalink / raw) To: Andy Wingo; +Cc: Jan Nieuwenhuizen, guile-devel > Is anyone interested in implementing a path library? > > Andy I might be able to work on it. I haven't done much for Guile lately, but I expect to have a lot more free time once my semester ends on May 7th. However, I don't know much about how Windows paths work. Are there any special considerations beyond the directory separator? Also, are there any characters that are valid in filenames on some systems but invalid on other systems? Noah ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-04-29 17:30 ` Noah Lavine @ 2011-05-01 11:30 ` Andy Wingo 2011-05-01 19:23 ` Noah Lavine ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-01 11:30 UTC (permalink / raw) To: Noah Lavine; +Cc: Jan Nieuwenhuizen, guile-devel On Fri 29 Apr 2011 19:30, Noah Lavine <noah.b.lavine@gmail.com> writes: >> Is anyone interested in implementing a path library? > > I might be able to work on it. Super! > However, I don't know much about how Windows paths work. Are there any > special considerations beyond the directory separator? Yep! Check that racket web page I linked to. You don't have to implement all of it, but it should be possible to implement, given the path abstraction. > Also, are there any characters that are valid in filenames on some > systems but invalid on other systems? Ah, I see you are under the delusion that paths are composed of characters :) This is not the case. To the OS, paths are NUL-terminated byte arrays, with some constraints about their composition, but which are not necessarily representable as strings. It is nice to offer the ability to convert to and from strings, when that is possible, but we must not assume that it is indeed possible. Basically I think the plan should be to add scm_from_locale_path, scm_from_raw_path, etc to filesys.[ch], and change any pathname-accepting procedure in Guile to accept path objects, producing them from strings when given strings, and pass the bytevector representation to the raw o/s procedures like `open' et al. Then for a lot of the utilities, we can add (ice-9 paths) or something, and implement most of the utility functions in Scheme. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-01 11:30 ` Andy Wingo @ 2011-05-01 19:23 ` Noah Lavine 2011-05-01 21:12 ` Andy Wingo 2011-05-01 21:48 ` Mark H Weaver 2011-05-02 20:58 ` Ludovic Courtès 2 siblings, 1 reply; 47+ messages in thread From: Noah Lavine @ 2011-05-01 19:23 UTC (permalink / raw) To: Andy Wingo; +Cc: Jan Nieuwenhuizen, guile-devel > Yep! Check that racket web page I linked to. You don't have to > implement all of it, but it should be possible to implement, given the > path abstraction. Okay, I've read it. It doesn't seem very complicated. Should we strive for API compatibility? I don't see any programs needing it right now, but maybe there would be in the future if we made them compatible. > Ah, I see you are under the delusion that paths are composed of > characters :) This is not the case. To the OS, paths are > NUL-terminated byte arrays, with some constraints about their > composition, but which are not necessarily representable as strings. It > is nice to offer the ability to convert to and from strings, when that > is possible, but we must not assume that it is indeed possible. Thanks! However, I'm also under a somewhat different delusion, which the Racket docs disagree with. I think of a path as a vector of "path elements", each of which represents a directory except that the last one might represent a file. I notice the Racket path library makes their path object distinct from this - you can build a path from a list of path elements with build-path, and turn a path into a list of path elements with explode-path, but you can't take an actual path object and manipulate its components (unless I've missed something). Do you think this is the right way to think of it? I'd say that my way of thinking makes more sense if you think that a filesystem is really just a directed acyclic graph (well, usually acyclic), and a path is a list of graph nodes. I can't quite see what the alternative model is, but I have a feeling there's another way of thinking where Racket's path library makes more sense. > Basically I think the plan should be to add scm_from_locale_path, > scm_from_raw_path, etc to filesys.[ch], and change any > pathname-accepting procedure in Guile to accept path objects, producing > them from strings when given strings, and pass the bytevector > representation to the raw o/s procedures like `open' et al. > > Then for a lot of the utilities, we can add (ice-9 paths) or something, > and implement most of the utility functions in Scheme. Sounds like a plan. Noah ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-01 19:23 ` Noah Lavine @ 2011-05-01 21:12 ` Andy Wingo 0 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-01 21:12 UTC (permalink / raw) To: Noah Lavine; +Cc: Jan Nieuwenhuizen, guile-devel Hi, On Sun 01 May 2011 21:23, Noah Lavine <noah.b.lavine@gmail.com> writes: >> Yep! Check that racket web page I linked to. You don't have to >> implement all of it, but it should be possible to implement, given the >> path abstraction. > > Okay, I've read it. It doesn't seem very complicated. Should we strive > for API compatibility? I don't see any programs needing it right now, > but maybe there would be in the future if we made them compatible. I don't think we need to be compatible, no. That said it does look pretty good. > I think of a path as a vector of "path elements", each of which > represents a directory except that the last one might represent a > file. I notice the Racket path library makes their path object > distinct from this - you can build a path from a list of path elements > with build-path, and turn a path into a list of path elements with > explode-path, but you can't take an actual path object and manipulate > its components (unless I've missed something). Do you think this is > the right way to think of it? I think that might be what you want sometimes, but it doesn't correspond to the underlying OS path concept. You could build such a thing on top of the byte arrays, but I don't think a vector (or list, ...) is always going to be what you want. I don't know. I would say to stick with byte arrays and strings on the lowest level. Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-01 11:30 ` Andy Wingo 2011-05-01 19:23 ` Noah Lavine @ 2011-05-01 21:48 ` Mark H Weaver 2011-05-02 7:45 ` Andy Wingo 2011-05-02 20:58 ` Ludovic Courtès 2 siblings, 1 reply; 47+ messages in thread From: Mark H Weaver @ 2011-05-01 21:48 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Andy Wingo <wingo@pobox.com> writes: > On Fri 29 Apr 2011 19:30, Noah Lavine <noah.b.lavine@gmail.com> writes: >> Also, are there any characters that are valid in filenames on some >> systems but invalid on other systems? > > Ah, I see you are under the delusion that paths are composed of > characters :) This is not the case. To the OS, paths are > NUL-terminated byte arrays, with some constraints about their > composition, but which are not necessarily representable as strings. This is the case on POSIX, but we should keep in mind that on some systems (e.g. Windows NT) filenames are considered character data, or at least so says PEP 383 <http://www.python.org/dev/peps/pep-0383/> IMHO, it would be best to avoid embedding into Guile the assumption that filenames, environment variables, command-line arguments, etc, are really bytevectors. > Basically I think the plan should be to add scm_from_locale_path, > scm_from_raw_path, etc to filesys.[ch], and change any > pathname-accepting procedure in Guile to accept path objects, producing > them from strings when given strings, and pass the bytevector > representation to the raw o/s procedures like `open' et al. I like this idea, but we should keep in mind that we face the same problem with things like environment variables, command-line arguments, etc. Ideally, we should try to come up with a coherent story and set of APIs for dealing with all of these data that are string-like, but actually bytevectors on some systems. Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-01 21:48 ` Mark H Weaver @ 2011-05-02 7:45 ` Andy Wingo 0 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-02 7:45 UTC (permalink / raw) To: Mark H Weaver; +Cc: guile-devel On Sun 01 May 2011 23:48, Mark H Weaver <mhw@netris.org> writes: > on some systems (e.g. Windows NT) filenames are considered character > data, or at least so says PEP 383 > <http://www.python.org/dev/peps/pep-0383/> Ah, interesting, I was blissfully ignorant; not the desired state when one is hacking file-name encoding :) Still, though, I think the basic point stands: copy what Racket does, because they actually do run well on windows and are happy with their abstraction. It's the sincerest form of flattery :) > Ideally, we should try to come up with a coherent story and set of > APIs for dealing with all of these data that are string-like, but > actually bytevectors on some systems. Environment variables and command-line arguments being the other ones that you mentioned; and yes, some common conventions here would be good. I still think, though, that path objects need their own data type. Peace, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-01 11:30 ` Andy Wingo 2011-05-01 19:23 ` Noah Lavine 2011-05-01 21:48 ` Mark H Weaver @ 2011-05-02 20:58 ` Ludovic Courtès 2011-05-02 21:58 ` Andy Wingo 2 siblings, 1 reply; 47+ messages in thread From: Ludovic Courtès @ 2011-05-02 20:58 UTC (permalink / raw) To: guile-devel Hi, Andy Wingo <wingo@pobox.com> writes: > Basically I think the plan should be to add scm_from_locale_path, > scm_from_raw_path, etc to filesys.[ch], and change any > pathname-accepting procedure in Guile to accept path objects, producing > them from strings when given strings, and pass the bytevector > representation to the raw o/s procedures like `open' et al. Seems to like a disjoint type “just for Windows” would be overkill, no? MIT/GNU Scheme has something this overkill [0]. Bigloo has just one variable, ‘file-separator’, which is either #\/ or #\\ [1]. Vicinities in SLIB/SCM are similar, with ‘vicinity:suffix?’ abstracting over slash vs. backslash [2]. I’m not sure how they handle MS-DOS volume names. Thanks, Ludo’. [0] http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Pathnames.html [1] http://www-sop.inria.fr/mimosa/fp/Bigloo/doc/bigloo-7.html#System-Programming [2] http://people.csail.mit.edu/jaffer/slib_2.html ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-02 20:58 ` Ludovic Courtès @ 2011-05-02 21:58 ` Andy Wingo 2011-05-02 22:18 ` Ludovic Courtès 2011-05-02 23:16 ` Eli Barzilay 0 siblings, 2 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-02 21:58 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Eli Barzilay, guile-devel On Mon 02 May 2011 22:58, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> writes: > >> Basically I think the plan should be to add scm_from_locale_path, >> scm_from_raw_path, etc to filesys.[ch], and change any >> pathname-accepting procedure in Guile to accept path objects, producing >> them from strings when given strings, and pass the bytevector >> representation to the raw o/s procedures like `open' et al. > > Seems to like a disjoint type “just for Windows” would be overkill, no? Maybe you're right; hummm! I have added a kind racketeer on Cc; perhaps if he has time, he might have some thoughts in this regard. :-) > Bigloo has just one variable, ‘file-separator’, which is either #\/ or > #\\ [1]. The funny thing is that this doesn't matter at all. Well, I mean that it's valid to construct pathnames with / as the separator on Windows, as / and \ are equivalent there. I still think that we need at least the ability to pass a bytevector as a path name, on GNU systems; and that if we can do so, then any routine that needs to deal with a path name would then need to deal in byte vectors in addition to strings, and at that point perhaps it is indeed useful to have a path library. > Vicinities in SLIB/SCM are similar, with ‘vicinity:suffix?’ > abstracting over slash vs. backslash [2]. I’m not sure how they handle > MS-DOS volume names. I don't think that they do handle volume names; at least, from what I could see in the API description there. Good questions! A -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-02 21:58 ` Andy Wingo @ 2011-05-02 22:18 ` Ludovic Courtès 2011-05-03 7:44 ` Andy Wingo 2011-05-02 23:16 ` Eli Barzilay 1 sibling, 1 reply; 47+ messages in thread From: Ludovic Courtès @ 2011-05-02 22:18 UTC (permalink / raw) To: Andy Wingo; +Cc: Eli Barzilay, guile-devel Hello! Andy Wingo <wingo@pobox.com> writes: > On Mon 02 May 2011 22:58, ludo@gnu.org (Ludovic Courtès) writes: [...] > The funny thing is that this doesn't matter at all. Well, I mean that > it's valid to construct pathnames with / as the separator on Windows, as > / and \ are equivalent there. Oh, good. > I still think that we need at least the ability to pass a bytevector as > a path name, on GNU systems; and that if we can do so, then any routine > that needs to deal with a path name would then need to deal in byte > vectors in addition to strings, and at that point perhaps it is indeed > useful to have a path library. To accommodate various file name encodings, right? Then yes. I think GLib and the like expect UTF-8 as the file name encoding and complain otherwise, so UTF-8 might be a better default than locale encoding (and it’s certainly wiser to be locale-independent.) >> Vicinities in SLIB/SCM are similar, with ‘vicinity:suffix?’ >> abstracting over slash vs. backslash [2]. I’m not sure how they handle >> MS-DOS volume names. > > I don't think that they do handle volume names; at least, from what I > could see in the API description there. So volumes matter in the file name canonicalization of the .go cache right? Couldn’t we mimic /cygdrive/c, etc.? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-02 22:18 ` Ludovic Courtès @ 2011-05-03 7:44 ` Andy Wingo 2011-05-03 8:38 ` Ludovic Courtès ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-03 7:44 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Eli Barzilay, guile-devel On Tue 03 May 2011 00:18, ludo@gnu.org (Ludovic Courtès) writes: >> I still think that we need at least the ability to pass a bytevector as >> a path name, on GNU systems; and that if we can do so, then any routine >> that needs to deal with a path name would then need to deal in byte >> vectors in addition to strings, and at that point perhaps it is indeed >> useful to have a path library. > > To accommodate various file name encodings, right? Then yes. That's the crazy thing: file names on GNU aren't in any encoding! They are byte strings that may or may not decode to a string, given some encoding. Granted, they're mostly UTF-8 these days, but users have the darndest files... > I think GLib and the like expect UTF-8 as the file name encoding and > complain otherwise, so UTF-8 might be a better default than locale > encoding (and it’s certainly wiser to be locale-independent.) It's more complicated than that. Here's the old interface that they used, which attempted to treat paths as utf-8: http://developer.gnome.org/glib/unstable/glib-Character-Set-Conversion.html (search for "file name encoding") The new API is abstract, so it allows operations like "get-display-name" and "get-bytes": http://developer.gnome.org/gio/2.28/GFile.html (search for "encoding" in that page) "All GFiles have a basename (get with g_file_get_basename()). These names are byte strings that are used to identify the file on the filesystem (relative to its parent directory) and there is no guarantees that they have any particular charset encoding or even make any sense at all. If you want to use filenames in a user interface you should use the display name that you can get by requesting the G_FILE_ATTRIBUTE_STANDARD_DISPLAY_NAME attribute with g_file_query_info(). This is guaranteed to be in utf8 and can be used in a user interface. But always store the real basename or the GFile to use to actually access the file, because there is no way to go from a display name to the actual name." > So volumes matter in the file name canonicalization of the .go cache > right? > > Couldn’t we mimic /cygdrive/c, etc.? Is that what cygwin does? We certainly could, yes; though for the purposes of joining the cache dir to an absolute filename, I guess we could simply change c:/foo to /c/foo... Hum! Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-03 7:44 ` Andy Wingo @ 2011-05-03 8:38 ` Ludovic Courtès 2011-05-04 3:59 ` Mark H Weaver 2011-06-16 22:29 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Andy Wingo 2 siblings, 0 replies; 47+ messages in thread From: Ludovic Courtès @ 2011-05-03 8:38 UTC (permalink / raw) To: Andy Wingo; +Cc: Eli Barzilay, guile-devel Hi, Andy Wingo <wingo@pobox.com> writes: > On Tue 03 May 2011 00:18, ludo@gnu.org (Ludovic Courtès) writes: > >>> I still think that we need at least the ability to pass a bytevector as >>> a path name, on GNU systems; and that if we can do so, then any routine >>> that needs to deal with a path name would then need to deal in byte >>> vectors in addition to strings, and at that point perhaps it is indeed >>> useful to have a path library. >> >> To accommodate various file name encodings, right? Then yes. > > That's the crazy thing: file names on GNU aren't in any encoding! Yes, that’s POSIX. >> I think GLib and the like expect UTF-8 as the file name encoding and >> complain otherwise, so UTF-8 might be a better default than locale >> encoding (and it’s certainly wiser to be locale-independent.) > > It's more complicated than that. Here's the old interface that they > used, which attempted to treat paths as utf-8: > > http://developer.gnome.org/glib/unstable/glib-Character-Set-Conversion.html > (search for "file name encoding") > > The new API is abstract, so it allows operations like "get-display-name" > and "get-bytes": > > http://developer.gnome.org/gio/2.28/GFile.html (search for "encoding" > in that page) Interesting. But when I launch Geeqie there’s a GLib warning when it encounters a non-UTF-8-encoded name, which basically makes me feel guilty for not using UTF-8. >> So volumes matter in the file name canonicalization of the .go cache >> right? >> >> Couldn’t we mimic /cygdrive/c, etc.? > > Is that what cygwin does? We certainly could, yes; though for the > purposes of joining the cache dir to an absolute filename, I guess we > could simply change c:/foo to /c/foo... Hum! Yes, that should be good enough (but that’s really just for Guile on MinGW since Guile on Cygwin cannot have this problem, AIUI.) Thanks, Ludo’. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-03 7:44 ` Andy Wingo 2011-05-03 8:38 ` Ludovic Courtès @ 2011-05-04 3:59 ` Mark H Weaver 2011-05-04 4:13 ` Noah Lavine 2011-06-16 22:29 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Andy Wingo 2 siblings, 1 reply; 47+ messages in thread From: Mark H Weaver @ 2011-05-04 3:59 UTC (permalink / raw) To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel Andy Wingo <wingo@pobox.com> writes: > That's the crazy thing: file names on GNU aren't in any encoding! They > are byte strings that may or may not decode to a string, given some > encoding. Granted, they're mostly UTF-8 these days, but users have the > darndest files... [...] > On Tue 03 May 2011 00:18, ludo@gnu.org (Ludovic Courtès) writes: >> I think GLib and the like expect UTF-8 as the file name encoding and >> complain otherwise, so UTF-8 might be a better default than locale >> encoding (and it’s certainly wiser to be locale-independent.) > > It's more complicated than that. Here's the old interface that they > used, which attempted to treat paths as utf-8: > > http://developer.gnome.org/glib/unstable/glib-Character-Set-Conversion.html > (search for "file name encoding") > > The new API is abstract, so it allows operations like "get-display-name" > and "get-bytes": > > http://developer.gnome.org/gio/2.28/GFile.html (search for "encoding" > in that page) > > "All GFiles have a basename (get with g_file_get_basename()). These > names are byte strings that are used to identify the file on the > filesystem (relative to its parent directory) and there is no > guarantees that they have any particular charset encoding or even make > any sense at all. If you want to use filenames in a user interface you > should use the display name that you can get by requesting the > G_FILE_ATTRIBUTE_STANDARD_DISPLAY_NAME attribute with > g_file_query_info(). This is guaranteed to be in utf8 and can be used > in a user interface. But always store the real basename or the GFile > to use to actually access the file, because there is no way to go from > a display name to the actual name." In my opinion, this is a bad approach to take in Guile. When developers are careful to robustly handle filenames with invalid encoding, it will lead to overly complex code. More often, when developers write more straightforward code, it will lead to code that works most of the time but fails badly when confronted with weird filenames. This is the same type of problem that plagues Bourne shell scripts. Let's please not go down that road. There is a better way. We can do a variant of what Python 3 does, described in PEP 383 <http://www.python.org/dev/peps/pep-0383/>. Basically, the idea is to provide alternative versions of scm_{to,from}_stringn that allow arbitrary bytevectors to be turned into strings and back again without any lossage. These alternative versions would be used for operations involving filenames et al, and should probably also be made available to users. Basically the idea is that "invalid bytes" are mapped to code points that will never appear in any valid encoding. PEP 383 maps such bytes to a range of surrogate code points that are reserved for use in UTF-16 surrogate pairs, and are otherwise considered invalid by Unicode. There are other possible mapping schemes as well. See section 3.7 of Unicode Technical Report #36 <http://www.unicode.org/reports/tr36/> for more discussion on this. I can understand why some say that filenames in GNU are not really strings but rather bytevectors. I respectfully disagree. Filenames, environment variables, command-line arguments, etc, are _conceptually_ strings. Let's not muddle that concept just because the transition to Unicode has not yet been completed in the GNU world. Hopefully in the future, these old-style POSIX byte strings will once again become true strings in concept. All that's required for this to happen is for popular software to agree to standardize on the use of UTF-8 for all of these things. This is reasonably likely to happen at some point. In practice, I see no advantage to calling them bytevectors other than to allow lossless storage of oddball filenames. It's not as if any sane user interface is going to display them in hex. Think about it. What are you really going to do with the bytevector version, other than to store it in case you want to convert it back into a filename, environment variable, or command-line argument? Think about the mess that this will make to otherwise simple code. Also think about the obscure bugs that will arise from programmers who balk at this and simply pass around the strings instead. Let's keep things simple. Let's use plain strings for everything that is _conceptually_ a string. Let's instead deal with the occasional ill-encoded-filename by allowing strings to represent these oddballs. Best, Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-04 3:59 ` Mark H Weaver @ 2011-05-04 4:13 ` Noah Lavine 2011-05-04 9:24 ` Ludovic Courtès 0 siblings, 1 reply; 47+ messages in thread From: Noah Lavine @ 2011-05-04 4:13 UTC (permalink / raw) To: Mark H Weaver; +Cc: Andy Wingo, Ludovic Courtès, guile-devel Hello all, I have another issue to raise. I think this is actually parallel to some of the stuff in the (web) module, as you will see. I've always thought it was ridiculous and hackish that I had to escape spaces in path strings. For instance, I have a folder called "Getting a Job" on my desktop, whose path is ~/Desktop/Getting\ a\ Job. The reason this strangeness enters is that path strings are actually lists (or vectors) encoded as strings. Conceptually, the path ~/Desktop/Getting\ a\ Job is the list ("~" "Desktop" "Getting a Job"). In this representation, there are no escapes and no separators. It always seemed cleaner to me to think about it that way. I think there should be some mechanism by which Guile users never have to think about escaping spaces (and any other characters they want in their paths). We don't have to represent them with lists or vectors, but there should be some mechanism for avoiding this. I said this is similar to the (web) module because of all of the discussion there of how HTTP encodes data types in text, and how it's better to think of a URI as URI type rather than a special string, etc. I think the same issue applies here - you've got list (or a list of lists, if you have a whole command-line with arguments) encoded as a string using ' ' and '/' as separators, and then you have to escape those characters when you want to use them in a different way, and the whole thing gets unnecessarily complicated because the right way to think about this is as lists of strings. Noah On Tue, May 3, 2011 at 11:59 PM, Mark H Weaver <mhw@netris.org> wrote: > Andy Wingo <wingo@pobox.com> writes: >> That's the crazy thing: file names on GNU aren't in any encoding! They >> are byte strings that may or may not decode to a string, given some >> encoding. Granted, they're mostly UTF-8 these days, but users have the >> darndest files... > [...] >> On Tue 03 May 2011 00:18, ludo@gnu.org (Ludovic Courtès) writes: >>> I think GLib and the like expect UTF-8 as the file name encoding and >>> complain otherwise, so UTF-8 might be a better default than locale >>> encoding (and it’s certainly wiser to be locale-independent.) >> >> It's more complicated than that. Here's the old interface that they >> used, which attempted to treat paths as utf-8: >> >> http://developer.gnome.org/glib/unstable/glib-Character-Set-Conversion.html >> (search for "file name encoding") >> >> The new API is abstract, so it allows operations like "get-display-name" >> and "get-bytes": >> >> http://developer.gnome.org/gio/2.28/GFile.html (search for "encoding" >> in that page) >> >> "All GFiles have a basename (get with g_file_get_basename()). These >> names are byte strings that are used to identify the file on the >> filesystem (relative to its parent directory) and there is no >> guarantees that they have any particular charset encoding or even make >> any sense at all. If you want to use filenames in a user interface you >> should use the display name that you can get by requesting the >> G_FILE_ATTRIBUTE_STANDARD_DISPLAY_NAME attribute with >> g_file_query_info(). This is guaranteed to be in utf8 and can be used >> in a user interface. But always store the real basename or the GFile >> to use to actually access the file, because there is no way to go from >> a display name to the actual name." > > In my opinion, this is a bad approach to take in Guile. When developers > are careful to robustly handle filenames with invalid encoding, it will > lead to overly complex code. More often, when developers write more > straightforward code, it will lead to code that works most of the time > but fails badly when confronted with weird filenames. This is the same > type of problem that plagues Bourne shell scripts. Let's please not go > down that road. > > There is a better way. We can do a variant of what Python 3 does, > described in PEP 383 <http://www.python.org/dev/peps/pep-0383/>. > > Basically, the idea is to provide alternative versions of > scm_{to,from}_stringn that allow arbitrary bytevectors to be turned into > strings and back again without any lossage. These alternative versions > would be used for operations involving filenames et al, and should > probably also be made available to users. > > Basically the idea is that "invalid bytes" are mapped to code points > that will never appear in any valid encoding. PEP 383 maps such bytes > to a range of surrogate code points that are reserved for use in UTF-16 > surrogate pairs, and are otherwise considered invalid by Unicode. There > are other possible mapping schemes as well. See section 3.7 of Unicode > Technical Report #36 <http://www.unicode.org/reports/tr36/> for more > discussion on this. > > I can understand why some say that filenames in GNU are not really > strings but rather bytevectors. I respectfully disagree. Filenames, > environment variables, command-line arguments, etc, are _conceptually_ > strings. Let's not muddle that concept just because the transition to > Unicode has not yet been completed in the GNU world. > > Hopefully in the future, these old-style POSIX byte strings will once > again become true strings in concept. All that's required for this to > happen is for popular software to agree to standardize on the use of > UTF-8 for all of these things. This is reasonably likely to happen at > some point. > > In practice, I see no advantage to calling them bytevectors other than > to allow lossless storage of oddball filenames. It's not as if any sane > user interface is going to display them in hex. Think about it. What > are you really going to do with the bytevector version, other than to > store it in case you want to convert it back into a filename, > environment variable, or command-line argument? Think about the mess > that this will make to otherwise simple code. Also think about the > obscure bugs that will arise from programmers who balk at this and > simply pass around the strings instead. > > Let's keep things simple. Let's use plain strings for everything that > is _conceptually_ a string. Let's instead deal with the occasional > ill-encoded-filename by allowing strings to represent these oddballs. > > Best, > Mark > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-04 4:13 ` Noah Lavine @ 2011-05-04 9:24 ` Ludovic Courtès 2011-05-17 16:59 ` Noah Lavine 0 siblings, 1 reply; 47+ messages in thread From: Ludovic Courtès @ 2011-05-04 9:24 UTC (permalink / raw) To: Noah Lavine; +Cc: Andy Wingo, Mark H Weaver, guile-devel Hi Noah, Noah Lavine <noah.b.lavine@gmail.com> writes: > The reason this strangeness enters is that path strings are actually > lists (or vectors) encoded as strings. Conceptually, the path > ~/Desktop/Getting\ a\ Job is the list ("~" "Desktop" "Getting a Job"). > In this representation, there are no escapes and no separators. It > always seemed cleaner to me to think about it that way. Agreed. However, POSIX procedures deal with strings, so you still need to convert to a string at some point. So I think there are few places where you could really use anything other than strings to represent file names—unless all of libguile is changed to deal with that, which seems unreasonable to me. MIT Scheme’s API goes this route, but that’s heavyweight and can hardly be retrofitted in a file-name-as-strings implementation, I think: <http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Pathnames.html>. > I said this is similar to the (web) module because of all of the > discussion there of how HTTP encodes data types in text, and how it's > better to think of a URI as URI type rather than a special string, > etc. Yes. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-04 9:24 ` Ludovic Courtès @ 2011-05-17 16:59 ` Noah Lavine 2011-05-17 19:26 ` Mark H Weaver ` (3 more replies) 0 siblings, 4 replies; 47+ messages in thread From: Noah Lavine @ 2011-05-17 16:59 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Andy Wingo, Mark H Weaver, guile-devel Hello all, I've been scanning some file api documentation and wondering what we could do that would translate across platforms reliably. I've been thinking of sort of concentric circles of operations, where the inner circles can easily be supported in a cross-platform way, and the outer ones require more and more hackery. What do you think of the following? Group 1: Treat pathnames as opaque objects that come from outside APIs and can only be used by passing them to APIs. We can support these in a way that will be compatible everywhere. Operations: open file, close file, stat file. In order to be useful, we might also provide a "command-line-argument->file" operation, but probably no reverse operation. Group 2: treat pathnames as vectors of opaque path components Operations: list items in a directory Group 3: now we need to care about encoding Operations: string->path, path->string. This will be much harder than groups 1 and 2. I think group 1 by itself would allow for most command-line programs that people want to write. If you add group 2, you could write find, ls, cat, and probably others. You need group 3 to write grep and a web server. My thought right now is that group 3 is going to have a complex API if we really want to get encodings right. Our goal should be that this complexity doesn't affect group 1 and group 2, which really should have very simple APIs. Now, some thoughts on group 3: Mark is right that paths are basically just strings, even though occasionally they're not. I sort of like the idea of the PEP-383 encoding (making paths strings that can potentially contain unused codepoints, which represent non-character bytes), but would that make path strings break under some Guile string operations? Also, when we convert strings to paths, we need to know what encoding the local filesystem uses. That will usually be UTF-8, but potentially might not be, correct? If we can auto-discover the correct encoding, we might be able to keep all of that in the background and just pretend that we can convert Guile strings to file system paths in a clean way. Noah On Wed, May 4, 2011 at 5:24 AM, Ludovic Courtès <ludo@gnu.org> wrote: > Hi Noah, > > Noah Lavine <noah.b.lavine@gmail.com> writes: > >> The reason this strangeness enters is that path strings are actually >> lists (or vectors) encoded as strings. Conceptually, the path >> ~/Desktop/Getting\ a\ Job is the list ("~" "Desktop" "Getting a Job"). >> In this representation, there are no escapes and no separators. It >> always seemed cleaner to me to think about it that way. > > Agreed. > > However, POSIX procedures deal with strings, so you still need to > convert to a string at some point. So I think there are few places > where you could really use anything other than strings to represent file > names—unless all of libguile is changed to deal with that, which seems > unreasonable to me. > > MIT Scheme’s API goes this route, but that’s heavyweight and can hardly > be retrofitted in a file-name-as-strings implementation, I think: > <http://www.gnu.org/software/mit-scheme/documentation/mit-scheme-ref/Pathnames.html>. > >> I said this is similar to the (web) module because of all of the >> discussion there of how HTTP encodes data types in text, and how it's >> better to think of a URI as URI type rather than a special string, >> etc. > > Yes. > > Thanks, > Ludo’. > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-17 16:59 ` Noah Lavine @ 2011-05-17 19:26 ` Mark H Weaver 2011-05-17 20:03 ` Mark H Weaver ` (2 subsequent siblings) 3 siblings, 0 replies; 47+ messages in thread From: Mark H Weaver @ 2011-05-17 19:26 UTC (permalink / raw) To: Noah Lavine; +Cc: Andy Wingo, Ludovic Courtès, guile-devel Hi Noah, Thanks for thinking about this thorny issue. Noah Lavine <noah.b.lavine@gmail.com> writes: > Group 1: Treat pathnames as opaque objects that come from outside APIs > and can only be used by passing them to APIs. We can support these in > a way that will be compatible everywhere. > Operations: open file, close file, stat file. > In order to be useful, we might also provide a > "command-line-argument->file" operation, but probably no reverse > operation. Unfortunately, we'd need more than just that one operation. What if you need to run an external command on a filename received from readdir? For this you need `file->command-line-argument'. What if you need to put that filename into an environment variable? Then you need `file->environment-variable-value'. What if you want to use an environment variable's value (which contains a filename) to either open the file directly or call an external command on it? For this you need `environment-value->file' or `environment-variable-value->command-line-argument'. What if you want to put a command-line argument into an environment variable? For this you need `command-line-argument->environment-variable-value'. What if you want to split the PATH environment variable (or another one like it) up into components, and then use those components to either read those component directories from scheme, or run external commands on those components, or put the components into environment variables? Also, what are we to do about backward-compatibility for all of the existing POSIX interfaces in Guile which have always returned strings? What are we to do with procedures like `program-arguments', `command-line', `environ', `getenv', `readdir' and `passwd:dir'? What can we pass to `main' that would both incorporate this new distinct command-line-argument type and maintain backward compatibility with scripts that expect strings? Best, Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-17 16:59 ` Noah Lavine 2011-05-17 19:26 ` Mark H Weaver @ 2011-05-17 20:03 ` Mark H Weaver 2011-05-23 19:42 ` Filenames and other POSIX byte strings as SCM strings without loss Mark H Weaver 2011-05-23 20:14 ` Paths as sequences of path components Mark H Weaver 3 siblings, 0 replies; 47+ messages in thread From: Mark H Weaver @ 2011-05-17 20:03 UTC (permalink / raw) To: Noah Lavine; +Cc: Andy Wingo, Ludovic Courtès, guile-devel Noah Lavine <noah.b.lavine@gmail.com> writes: > Mark is right that paths are basically just strings, even though > occasionally they're not. I sort of like the idea of the PEP-383 > encoding (making paths strings that can potentially contain unused > codepoints, which represent non-character bytes), but would that make > path strings break under some Guile string operations? Yes, this is indeed a problem. Instead of using isolated surrogate code points as recommended by PEP-383, I think we should instead use one of the alternative mappings proposed in section 3.7.4 of Unicode Technical Report #36 <http://www.unicode.org/reports/tr36/>: 1. Use 256 private-use code points, somewhere in the ranges F0000..FFFFD or 100000..10FFFD. This would probably cause the fewest security and interoperability problems. There is, however, some possibility of collision with other uses of private-use characters. 2. Use pairs of noncharacter code points in the range FDD0..FDEF. These are "super" private-use characters, and are discouraged for general interchange. The transformation would take each nibble of a byte Y, and add to FDD0 and FDE0, respectively. However, noncharacter code points may be replaced by U+FFFD ( � ) REPLACEMENT CHARACTER by some implementations, especially when they use them internally. (Again, incoming characters must never be deleted, because that can cause security problems.) > Also, when we convert strings to paths, we need to know what encoding > the local filesystem uses. That will usually be UTF-8, but potentially > might not be, correct? Yes, that is correct. I haven't looked deeply into this, but clearly a lot of software uses the current locale encoding to interpret these POSIX byte strings, and I suspect at least some software uses UTF-8 to interpret filenames. Fortunately, most popular modern distributions of GNU are now using UTF-8 locales by default, which basically makes the problem disappear. Regardless, this method of mapping ill-formed byte sequences to private-use code points can used with _any_ encoding, not just UTF-8. Best, Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Filenames and other POSIX byte strings as SCM strings without loss 2011-05-17 16:59 ` Noah Lavine 2011-05-17 19:26 ` Mark H Weaver 2011-05-17 20:03 ` Mark H Weaver @ 2011-05-23 19:42 ` Mark H Weaver 2011-07-01 10:51 ` Andy Wingo 2011-05-23 20:14 ` Paths as sequences of path components Mark H Weaver 3 siblings, 1 reply; 47+ messages in thread From: Mark H Weaver @ 2011-05-23 19:42 UTC (permalink / raw) To: Noah Lavine; +Cc: Andy Wingo, Ludovic Courtès, guile-devel Hello all, Andy and I have been discussing how to deal with pathnames on IRC. The tentative plan is to use normal strings to represent pathnames, command-line arguments, environmental variable values, and other such POSIX byte strings. We'd need to implement alternative conversions between POSIX byte strings and SCM strings which would implement a bijective (one-to-one) mapping between the set of all byte vectors and a subset of SCM strings. For purposes of this email, suppose they are called scm_to_permissive_stringn and scm_from_permissive_stringn. On top of these we would implement scm_to_permissive_locale_stringn, scm_from_permissive_locale_stringn, and some other convenience functions. These alternative mappings would be used to convert between POSIX byte strings and SCM strings. We'd reserve 256 private-use code points (somewhere in the ranges U+F0000..U+FFFFD or U+100000..U+10FFFD) which would represent bytes of ill-formed byte sequences. For purposes of this email, suppose we choose the range U+109700..U+1097FF. scm_from_permissive_locale_stringn would be used to convert filenames et al to SCM strings. Ill-formed byte sequences in the filename would be mapped to a sequence of Unicode characters in that range. For example, when using a UTF-8 locale, the filename 0x46 0x6F 0x6F 0xC0 0x80 0x41 would become a SCM string containing the characters: F, o, o, U+1097C0, U+109780, A. A few details: it is important for security reasons that the mapping be bijective (one-to-one) between all byte vectors and a subset of SCM strings. The subset would include all SCM strings that do not include characters within the reserved range U+109700..U+1097FF. Since scm_from_permissive_stringn maps invalid bytes to private-use code points in the range U+109700..U+1097FF, we must ensure that properly encoded code points in that range are mapped to something else. Otherwise, two distinct POSIX byte strings might map to the same SCM string. The simplest solution is to consider any byte sequence which would map to our reserved range to be invalid, and thus mapped one byte at a time using this scheme. For example, U+1097FF is represented in UTF-8 as 0xF4 0x89 0x9F 0xBF. Although scm_from_stringn would map this sequence of bytes to the single code point U+1097FF (when using UTF-8), scm_from_permissive_stringn would instead consider this entire byte sequence to be invalid, and instead map it to the 4 code points U+1097F4, U+109789, U+10979F, U+1097BF. We must also make sure that scm_to_permissive_stringn never maps two distinct SCM strings to the same POSIX byte string. In particular, we must make sure that the U+1097xx code points are only used to generate _invalid_ byte sequences, and never valid ones. The simplest way to do this is to apply scm_from_permissive_stringn to the result and make sure that it yields the original SCM string. If not, an exception would be thrown. So the tentative plan is to provide this alternative mapping, and use it whenever accessing POSIX byte strings, whether they be filenames, command-line arguments, environment variable values, fields within a passwd, group, wtmp, or utmp file, system information (e.g. the hostname or information from uname), etc. We should allow the user to access this mapping directly, via scm_{to,from}_permissive_stringn, scm_{to,from}_permissive_locale_stringn, scm_{to,from}_permissive_utf8_stringn, and also between strings and bytevectors in both Scheme and C: permissive-string->utf8, permissive-utf8->string, scm_permissive_string_to_utf8, scm_permissive_utf8_to_string, and we should probably add procedures to convert between strings and bytevectors using other encodings as well, most importantly the locale encoding. We'd also need permissive-string->pointer and permissive-pointer->string. I'm not sure about the names. Suggestions welcome. Regarding Noah's proposal to allow handling pathnames as sequences of path components: both Andy and I like this idea. However, as always, the devil's in the details. I'll write more about this in another email. Best, Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Filenames and other POSIX byte strings as SCM strings without loss 2011-05-23 19:42 ` Filenames and other POSIX byte strings as SCM strings without loss Mark H Weaver @ 2011-07-01 10:51 ` Andy Wingo 0 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-07-01 10:51 UTC (permalink / raw) To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel Hi Mark! On Mon 23 May 2011 21:42, Mark H Weaver <mhw@netris.org> writes: > The tentative plan is to use normal strings to represent pathnames, > command-line arguments, environmental variable values, and other such > POSIX byte strings. Apologies for not giving you prompt feedback on this idea. Basically I think it sounds like a great, workable plan. > For purposes of this email, suppose they are called > scm_to_permissive_stringn and scm_from_permissive_stringn. On top of > these we would implement scm_to_permissive_locale_stringn, > scm_from_permissive_locale_stringn, and some other convenience > functions. Sounds good. "Permissive" sounds a bit odd but I can't think of another name. "Foreign"? "Corrupt"? "Possibly invalid"? "Nonsense"? "Raw"? "Cooked"? "Bytes"? "scm_from_utf8_byte_string"? > Since scm_from_permissive_stringn maps invalid bytes to private-use code > points in the range U+109700..U+1097FF, we must ensure that properly > encoded code points in that range are mapped to something else. > Otherwise, two distinct POSIX byte strings might map to the same SCM > string. The simplest solution is to consider any byte sequence which > would map to our reserved range to be invalid, and thus mapped one byte > at a time using this scheme. For example, U+1097FF is represented in > UTF-8 as 0xF4 0x89 0x9F 0xBF. Although scm_from_stringn would map this > sequence of bytes to the single code point U+1097FF (when using UTF-8), > scm_from_permissive_stringn would instead consider this entire byte > sequence to be invalid, and instead map it to the 4 code points > U+1097F4, U+109789, U+10979F, U+1097BF. Works for me. > So the tentative plan is to provide this alternative mapping, and use it > whenever accessing POSIX byte strings, whether they be filenames, > command-line arguments, environment variable values, fields within a > passwd, group, wtmp, or utmp file, system information (e.g. the hostname > or information from uname), etc. Cool. > We should allow the user to access this mapping directly, via > > scm_{to,from}_permissive_stringn, > scm_{to,from}_permissive_locale_stringn, > scm_{to,from}_permissive_utf8_stringn, > > and also between strings and bytevectors in both Scheme and C: > > permissive-string->utf8, > permissive-utf8->string, > scm_permissive_string_to_utf8, > scm_permissive_utf8_to_string, > > and we should probably add procedures to convert between strings and > bytevectors using other encodings as well, most importantly the locale > encoding. > > We'd also need permissive-string->pointer and > permissive-pointer->string. > > I'm not sure about the names. Suggestions welcome. I'm liking "bytes". scm_from_locale_byte_stringn. byte-string->utf8. Perhaps not clear enough though. WDYT? > Regarding Noah's proposal to allow handling pathnames as sequences of > path components: both Andy and I like this idea. However, as always, > the devil's in the details. I'll write more about this in another > email. Sure, let's get this lowest level in first. Are you on it? :-) There is no hurry of course, just so we know... Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Paths as sequences of path components 2011-05-17 16:59 ` Noah Lavine ` (2 preceding siblings ...) 2011-05-23 19:42 ` Filenames and other POSIX byte strings as SCM strings without loss Mark H Weaver @ 2011-05-23 20:14 ` Mark H Weaver 2011-05-24 10:51 ` Hans Aberg 2011-11-23 22:15 ` Andy Wingo 3 siblings, 2 replies; 47+ messages in thread From: Mark H Weaver @ 2011-05-23 20:14 UTC (permalink / raw) To: Noah Lavine; +Cc: Andy Wingo, Ludovic Courtès, guile-devel Hello all, I really like the basic gist behind Noah's proposal, to allow programs to optionally represent paths (roughly) as sequences of path components. I haven't worked out all the details, and I'm glad to leave that job to someone else, but I do have a few comments to add: First of all, I think that the paths-as-components layer should be _above_ the POSIX-bytestrings-as-SCM-strings layer. In other words, the pathnames-as-components code should represent both complete pathnames and path components as SCM strings. In addition, I hope that the paths-as-components layer will allow code to conveniently manipulate paths while avoiding some of the common security problems that can arise. For example, a web application should be able to easily and safely use a user-supplied string to construct a pathname, without having to search the user-supplied string for things like "../../../../etc/passwd". When constructing paths from components, I think we should prevent a single component from being interpreted by the OS as multiple components. In other words, we should make sure that components do not contain path separators or other characters which are illegal in filenames (e.g. NUL). Either an exception should be thrown or they should be escaped somehow. If escaped, I think the transformation should be bijective. Also, I think there should be a very simple way to exclude "special" path components such as "." from "..", in a platform-neutral way. On the other hand, sometimes you really do need to include "." or ".." in a path, and so it ought to be possible to include them if needed. Apart from this, I wish to raise some questions for which I don't have answers: Should we provide a way to represent paths with multiple consecutive path separators? How should things like drive letters in DOS filenames be handled? How should the distinction between absolute and relative paths be handled? Should our existing POSIX interfaces which accept pathnames be extended to optionally accept these higher-level path objects? Best, Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Paths as sequences of path components 2011-05-23 20:14 ` Paths as sequences of path components Mark H Weaver @ 2011-05-24 10:51 ` Hans Aberg 2011-11-23 22:15 ` Andy Wingo 1 sibling, 0 replies; 47+ messages in thread From: Hans Aberg @ 2011-05-24 10:51 UTC (permalink / raw) To: Mark H Weaver; +Cc: Andy Wingo, Ludovic Courtès, guile-devel On 23 May 2011, at 22:14, Mark H Weaver wrote: > I really like the basic gist behind Noah's proposal, to allow programs > to optionally represent paths (roughly) as sequences of path components. > I haven't worked out all the details, and I'm glad to leave that job to > someone else, but I do have a few comments to add: ... > Should our existing POSIX interfaces which accept pathnames be extended > to optionally accept these higher-level path objects? it might be a part of POSIX in the future. I mentioned a similar thing on the standardization list, where it was discussed, and I thin somebody is working on it. Hans https://www.opengroup.org/sophocles/show_mail.tpl?CALLER=show_archive.tpl&source=L&listname=austin-group-l&id=13889 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Paths as sequences of path components 2011-05-23 20:14 ` Paths as sequences of path components Mark H Weaver 2011-05-24 10:51 ` Hans Aberg @ 2011-11-23 22:15 ` Andy Wingo 2011-11-25 2:51 ` Mark H Weaver 1 sibling, 1 reply; 47+ messages in thread From: Andy Wingo @ 2011-11-23 22:15 UTC (permalink / raw) To: Mark H Weaver; +Cc: Ludovic Courtès, guile-devel Hi Mark! Some wise man once said, "I want you back". Was it Justin Timberlake in the Backstreet Boys? Am I mixing up the boy-bands of yore? In any case, we miss your hack-energy in Guile :-) On Mon 23 May 2011 22:14, Mark H Weaver <mhw@netris.org> writes: > I really like the basic gist behind Noah's proposal, to allow programs > to optionally represent paths (roughly) as sequences of path components. This sounds cool. Want to work on it? Do you still have the same proposal regarding bytestrings? Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Paths as sequences of path components 2011-11-23 22:15 ` Andy Wingo @ 2011-11-25 2:51 ` Mark H Weaver 0 siblings, 0 replies; 47+ messages in thread From: Mark H Weaver @ 2011-11-25 2:51 UTC (permalink / raw) To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel Andy Wingo <wingo@pobox.com> writes: > we miss your hack-energy in Guile :-) Thanks for the encouragement, and also for forgiving my frequently poor social skills :) Free time has been scarce in the last 8 months (thanks to my 8-month-old nephew) but I would indeed like to finally get to work on improving Guile's string handling. >> I really like the basic gist behind Noah's proposal, to allow programs >> to optionally represent paths (roughly) as sequences of path components. > > This sounds cool. Want to work on it? Before embarking on paths as sequences of path components, I would first like to figure out how best to handle posix byte strings. > Do you still have the same proposal regarding bytestrings? I've had second thoughts about my last proposal. I will outline my recent thoughts on this subject in another email, hopefully in the next few days. Mark ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-03 7:44 ` Andy Wingo 2011-05-03 8:38 ` Ludovic Courtès 2011-05-04 3:59 ` Mark H Weaver @ 2011-06-16 22:29 ` Andy Wingo 2 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-06-16 22:29 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel Hi, This discussion strayed a bit far from the initial need to concatenate "/foo/bar" with "c:/baz/qux". On Tue 03 May 2011 09:44, Andy Wingo <wingo@pobox.com> writes: > On Tue 03 May 2011 00:18, ludo@gnu.org (Ludovic Courtès) writes: > >> So volumes matter in the file name canonicalization of the .go cache >> right? >> >> Couldn’t we mimic /cygdrive/c, etc.? > > Is that what cygwin does? We certainly could, yes; though for the > purposes of joining the cache dir to an absolute filename, I guess we > could simply change c:/foo to /c/foo... Hum! MSYS apparently does this as well. Probably it's what we should do in the case of caches. But this sort of thing is very nasty without a path library :-/ Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-02 21:58 ` Andy Wingo 2011-05-02 22:18 ` Ludovic Courtès @ 2011-05-02 23:16 ` Eli Barzilay 1 sibling, 0 replies; 47+ messages in thread From: Eli Barzilay @ 2011-05-02 23:16 UTC (permalink / raw) To: Andy Wingo; +Cc: ludo, guile-devel [Second attempt, my Emacs has unfortunate issues with Ludovic's name...] An hour ago, Andy Wingo wrote: > On Mon 02 May 2011 22:58, ludo@gnu.org (Ludovic Courtès) writes: > > > Andy Wingo <wingo@pobox.com> writes: > > > >> Basically I think the plan should be to add scm_from_locale_path, > >> scm_from_raw_path, etc to filesys.[ch], and change any > >> pathname-accepting procedure in Guile to accept path objects, > >> producing them from strings when given strings, and pass the > >> bytevector representation to the raw o/s procedures like `open' > >> et al. > > > > Seems to like a disjoint type “just for Windows” would be > > overkill, no? > > Maybe you're right; hummm! I have added a kind racketeer on Cc; perhaps > if he has time, he might have some thoughts in this regard. :-) I don't think that I can contribute much -- I'm mostly looking at these things from a user's point of view... Roughly speaking (mostly because I don't know what the issues that you're up against), our path values have "just paths" for whatever the OS wants -- so on Windows they might have either backslashes or slashes (since Racket accepts both). To write portable code we don't have a `file-separator' thing, instead, we have `build-path' that combines two paths with the right separator. Similarly, we have `split-path' to split up a path to the directory part and the last part. I think that it's generally better this way, since it represents the higher level operation rather than fiddling with the semantics of where and how to put separators directly (but this is not some religious issue, just seems to me like it would be more convenient). Also, we have cases where we want something that looks like a portable path (for example, naming relative file names in `require') -- for those we use /-separated strings that are limited to "safe" characters. And related, in cases where we want to encode path in code (for example, some macro that wants to generate a path), we'll use strings or byte strings, with the latter more common for lower level things. (But I'm just rambling now, I haven't slept in N days -- so feel free to ignore me...) -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-04-29 17:16 ` Andy Wingo 2011-04-29 17:30 ` Noah Lavine @ 2011-05-20 13:47 ` Jan Nieuwenhuizen 2011-05-20 14:01 ` Andy Wingo 2011-06-30 14:11 ` Andy Wingo 1 sibling, 2 replies; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-05-20 13:47 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Andy Wingo writes: > I don't much like this approach. Besides mixing in a heuristic on all > machines that is win32-specific, it makes c:/foo.scm collide with > d:/foo.scm in the cache, and fails to also modify load.c which also does > autocompilation in other contexts. Yes, a newer version of this patch also includes load.c and boot-9.scm. Of course the drive letter should be kept. > Is anyone interested in implementing a path library? What's the status/estimate on this -- of course I agree this would be nicer, otoh, a patch to these three files is available that makes guile run on mingw right now. Greetings, Jan -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-20 13:47 ` Jan Nieuwenhuizen @ 2011-05-20 14:01 ` Andy Wingo 2011-06-30 14:11 ` Andy Wingo 1 sibling, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-20 14:01 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel Hi Jan, On Fri 20 May 2011 15:47, Jan Nieuwenhuizen <janneke@gnu.org> writes: >> Is anyone interested in implementing a path library? > > What's the status/estimate on this -- of course I agree this would be > nicer, otoh, a patch to these three files is available that makes guile > run on mingw right now. Unclear :) I think the thing you did for your autobuilder was the right strategy for you and for lilypond, but that we can do even better in Guile if we give ourselves a bit of time. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names. 2011-05-20 13:47 ` Jan Nieuwenhuizen 2011-05-20 14:01 ` Andy Wingo @ 2011-06-30 14:11 ` Andy Wingo 1 sibling, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-06-30 14:11 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel On Fri 20 May 2011 15:47, Jan Nieuwenhuizen <janneke@gnu.org> writes: > Andy Wingo writes: > >> I don't much like this approach. Besides mixing in a heuristic on all >> machines that is win32-specific, it makes c:/foo.scm collide with >> d:/foo.scm in the cache, and fails to also modify load.c which also does >> autocompilation in other contexts. > > Yes, a newer version of this patch also includes load.c and boot-9.scm. > Of course the drive letter should be kept. I have applied a similar patch to Guile, before realizing you had a newer version. Sorry for the huge delay here. [I said:] >> Is anyone interested in implementing a path library? I don't think this would have helped very much in this case, given that taking an absolute path and turning it into a path suffix is not something that a path library is really good for. In reality all we need is a key that corresponds in a 1-to-1 relationship with a source file -- a SHA1 hash would have done as well. But oh well, c:/foo -> /c/foo it is! Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ. 2011-02-15 15:34 mingw runtime patches Jan Nieuwenhuizen 2011-02-15 15:34 ` [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name Jan Nieuwenhuizen 2011-02-15 15:35 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Jan Nieuwenhuizen @ 2011-02-15 15:35 ` Jan Nieuwenhuizen 2011-05-01 11:37 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 4/5] [mingw]: Delete existing target file before attempting rename Jan Nieuwenhuizen 2011-02-15 15:35 ` [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir Jan Nieuwenhuizen 4 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-02-15 15:35 UTC (permalink / raw) To: guile-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2477 bytes --] From: Jan Nieuwenhuizen <janneke@gnu.org> Without this patch, libguile exports symbols such as opendir, readdir, which expect and use guile's struct dirent that differs from mingw's dirent. Linking to libguile when using mingw's dirent gives unexpected results. 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> * libguile/win32-dirent.c: * libguile/filesys.c [MINGW32]: Include win32-dirent.h early, to pick-up defines. * libguile/win32-dirent.h (opendir, readdir, closedir, rewinddir, seekdir, telldir, dirfd): #define to guile_opendir ect. --- libguile/filesys.c | 5 ++++- libguile/win32-dirent.c | 4 ++-- libguile/win32-dirent.h | 8 ++++++++ 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/libguile/filesys.c b/libguile/filesys.c index 93b0ce2..880ee86 100644 --- a/libguile/filesys.c +++ b/libguile/filesys.c @@ -35,6 +35,10 @@ #include <stdio.h> #include <errno.h> +#if defined (__MINGW32__) || defined (_MSC_VER) || defined (__BORLANDC__) +# include "win32-dirent.h" +#endif /* __MINGW32__ || _MSC_VER || __BORLANDC__ */ + #include "libguile/_scm.h" #include "libguile/smob.h" #include "libguile/feature.h" @@ -94,7 +98,6 @@ #if defined (__MINGW32__) || defined (_MSC_VER) || defined (__BORLANDC__) -# include "win32-dirent.h" # define NAMLEN(dirent) strlen((dirent)->d_name) /* The following bits are per AC_HEADER_DIRENT doco in the autoconf manual */ #elif HAVE_DIRENT_H diff --git a/libguile/win32-dirent.c b/libguile/win32-dirent.c index de170c7..b5b2c60 100644 --- a/libguile/win32-dirent.c +++ b/libguile/win32-dirent.c @@ -20,14 +20,14 @@ # include <config.h> #endif +#include "win32-dirent.h" + #include "libguile/__scm.h" #include <windows.h> #include <stdio.h> #include <string.h> -#include "win32-dirent.h" - DIR * opendir (const char * name) { diff --git a/libguile/win32-dirent.h b/libguile/win32-dirent.h index 578db49..f9f8fe9 100644 --- a/libguile/win32-dirent.h +++ b/libguile/win32-dirent.h @@ -27,6 +27,14 @@ #include <sys/types.h> +#define opendir guile_opendir +#define readdir guile_readdir +#define closedir guile_closedir +#define rewinddir guile_rewinddir +#define seekdir guile_seekdir +#define telldir guile_telldir +#define dirfd guile_dirfd + struct dirstream { int fd; /* File descriptor. */ -- 1.7.1 -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ. 2011-02-15 15:35 ` [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ Jan Nieuwenhuizen @ 2011-05-01 11:37 ` Andy Wingo 2011-05-20 13:57 ` Jan Nieuwenhuizen 0 siblings, 1 reply; 47+ messages in thread From: Andy Wingo @ 2011-05-01 11:37 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel Hi Jan, On Tue 15 Feb 2011 16:35, Jan Nieuwenhuizen <janneke-list@xs4all.nl> writes: > From: Jan Nieuwenhuizen <janneke@gnu.org> > > Without this patch, libguile exports symbols such as opendir, readdir, > which expect and use guile's struct dirent that differs from mingw's > dirent. Linking to libguile when using mingw's dirent gives unexpected > results. > > 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> > > * libguile/win32-dirent.c: > * libguile/filesys.c [MINGW32]: Include win32-dirent.h early, > to pick-up defines. > > * libguile/win32-dirent.h (opendir, readdir, closedir, rewinddir, > seekdir, telldir, dirfd): #define to guile_opendir ect. If mingw defines variants of these routines, why are we not using them directly? Sorry for being obtuse :) I would like to get Guile working easily on Windows, and if I can do so by removing code from Guile, that would be great :-) Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ. 2011-05-01 11:37 ` Andy Wingo @ 2011-05-20 13:57 ` Jan Nieuwenhuizen 2011-06-16 22:22 ` Andy Wingo 0 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-05-20 13:57 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Andy Wingo writes: > If mingw defines variants of these routines, why are we not using them > directly? Good question. That may well be a better approach. Jan. -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ. 2011-05-20 13:57 ` Jan Nieuwenhuizen @ 2011-06-16 22:22 ` Andy Wingo 0 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-06-16 22:22 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel On Fri 20 May 2011 15:57, Jan Nieuwenhuizen <janneke@gnu.org> writes: > Andy Wingo writes: > >> If mingw defines variants of these routines, why are we not using them >> directly? > > Good question. That may well be a better approach. I have removed these files from Guile. We can revert that commit if it does not work. Would you mind trying current git on your build system? We would like to release 2.0.2 shortly. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH 4/5] [mingw]: Delete existing target file before attempting rename. 2011-02-15 15:34 mingw runtime patches Jan Nieuwenhuizen ` (2 preceding siblings ...) 2011-02-15 15:35 ` [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ Jan Nieuwenhuizen @ 2011-02-15 15:35 ` Jan Nieuwenhuizen 2011-05-01 11:40 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir Jan Nieuwenhuizen 4 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-02-15 15:35 UTC (permalink / raw) To: guile-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1533 bytes --] From: Jan Nieuwenhuizen <janneke@gnu.org> 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> * libguile/filesys.c [MINGW32] (my_rename): Add implementation that deletes target if it exists. Fixes rename behaviour. --- libguile/filesys.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) diff --git a/libguile/filesys.c b/libguile/filesys.c index 880ee86..a2be2d5 100644 --- a/libguile/filesys.c +++ b/libguile/filesys.c @@ -680,9 +680,10 @@ SCM_DEFINE (scm_link, "link", 2, 0, 0, #undef FUNC_NAME #endif /* HAVE_LINK */ -#ifdef HAVE_RENAME +#if defined (HAVE_RENAME) && !defined (__MINGW32__) #define my_rename rename -#else +#else /* !HAVE_RENAME || __MINGW32__ */ +#ifndef __MINGW32__ static int my_rename (const char *oldname, const char *newname) { @@ -698,7 +699,24 @@ my_rename (const char *oldname, const char *newname) } return rv; } -#endif +#else /* __MINGW32__ */ +static int +my_rename (const char *oldname, const char *newname) +{ + int rv; + struct stat stat; + + SCM_SYSCALL (rv = !stat (newname, &stat)); + if (rv != 0) + SCM_SYSCALL (rv = unlink (newname)); + if (rv == 0) + rv = rename (oldname, newname); + + return rv; +} +#endif /* __MINGW32__ */ +#endif /* !HAVE_RENAME || __MINGW32__ */ + SCM_DEFINE (scm_rename, "rename-file", 2, 0, 0, (SCM oldname, SCM newname), -- 1.7.1 -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH 4/5] [mingw]: Delete existing target file before attempting rename. 2011-02-15 15:35 ` [PATCH 4/5] [mingw]: Delete existing target file before attempting rename Jan Nieuwenhuizen @ 2011-05-01 11:40 ` Andy Wingo 2011-05-20 14:05 ` Jan Nieuwenhuizen 2011-06-16 21:45 ` Andy Wingo 0 siblings, 2 replies; 47+ messages in thread From: Andy Wingo @ 2011-05-01 11:40 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel On Tue 15 Feb 2011 16:35, Jan Nieuwenhuizen <janneke-list@xs4all.nl> writes: > From: Jan Nieuwenhuizen <janneke@gnu.org> > > 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> > > * libguile/filesys.c [MINGW32] (my_rename): Add implementation > that deletes target if it exists. Fixes rename behaviour. This patch has the obvious race condition. Why does the `rename' library routine not work on Win32? The man page says CONFORMING TO 4.3BSD, C89, C99, POSIX.1-2001. so I am surprised about this behavior. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 4/5] [mingw]: Delete existing target file before attempting rename. 2011-05-01 11:40 ` Andy Wingo @ 2011-05-20 14:05 ` Jan Nieuwenhuizen 2011-06-16 21:45 ` Andy Wingo 1 sibling, 0 replies; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-05-20 14:05 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Andy Wingo writes: > This patch has the obvious race condition. Yes. > Why does the `rename' library routine not work on Win32? Good question. > The man page says > > CONFORMING TO > 4.3BSD, C89, C99, POSIX.1-2001. > > so I am surprised about this behavior. Yes, that's interesting. Now what's broken here, the documentation, the implementation, some specific implementation, the one windows box that I tested on? Greetings, Jan. -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 4/5] [mingw]: Delete existing target file before attempting rename. 2011-05-01 11:40 ` Andy Wingo 2011-05-20 14:05 ` Jan Nieuwenhuizen @ 2011-06-16 21:45 ` Andy Wingo 1 sibling, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-06-16 21:45 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel Hi Jan, On Sun 01 May 2011 13:40, Andy Wingo <wingo@pobox.com> writes: > On Tue 15 Feb 2011 16:35, Jan Nieuwenhuizen <janneke-list@xs4all.nl> writes: > >> From: Jan Nieuwenhuizen <janneke@gnu.org> >> >> 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> >> >> * libguile/filesys.c [MINGW32] (my_rename): Add implementation >> that deletes target if it exists. Fixes rename behaviour. > > This patch has the obvious race condition. Why does the `rename' > library routine not work on Win32? The man page says > > CONFORMING TO > 4.3BSD, C89, C99, POSIX.1-2001. > > so I am surprised about this behavior. C99 actually doesn't specify what happens if the destination file exists. I have added the `rename' gnulib module, which should fix this particular issue (modulo canonicalize-lgpl, of course). Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir. 2011-02-15 15:34 mingw runtime patches Jan Nieuwenhuizen ` (3 preceding siblings ...) 2011-02-15 15:35 ` [PATCH 4/5] [mingw]: Delete existing target file before attempting rename Jan Nieuwenhuizen @ 2011-02-15 15:35 ` Jan Nieuwenhuizen 2011-05-01 11:42 ` Andy Wingo 4 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-02-15 15:35 UTC (permalink / raw) To: guile-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1079 bytes --] From: Jan Nieuwenhuizen <janneke@gnu.org> 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> * libguile/load.c (scm_init_load_path) [MINGW32]: Use $LOCALAPPDATA to avoid having a NULL cachedir, while still allowing override by using $XDG_CACHE_HOME. --- libguile/load.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/libguile/load.c b/libguile/load.c index c2380b9..48a28fe 100644 --- a/libguile/load.c +++ b/libguile/load.c @@ -283,6 +283,10 @@ scm_init_load_path () if ((e = getenv ("XDG_CACHE_HOME"))) snprintf (cachedir, sizeof(cachedir), "%s/" FALLBACK_DIR, e); +#ifdef __MINGW32__ + else if ((e = getenv ("LOCALAPPDATA"))) + snprintf (cachedir, sizeof (cachedir), "%s/.cache/" FALLBACK_DIR, e); +#endif /* __MINGW32__ */ else if ((e = getenv ("HOME"))) snprintf (cachedir, sizeof(cachedir), "%s/.cache/" FALLBACK_DIR, e); #ifdef HAVE_GETPWENT -- 1.7.1 -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply related [flat|nested] 47+ messages in thread
* Re: [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir. 2011-02-15 15:35 ` [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir Jan Nieuwenhuizen @ 2011-05-01 11:42 ` Andy Wingo 2011-05-20 14:03 ` Jan Nieuwenhuizen 0 siblings, 1 reply; 47+ messages in thread From: Andy Wingo @ 2011-05-01 11:42 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel On Tue 15 Feb 2011 16:35, Jan Nieuwenhuizen <janneke-list@xs4all.nl> writes: > 2011-02-15 Jan Nieuwenhuizen <janneke@gnu.org> > > * libguile/load.c (scm_init_load_path) [MINGW32]: Use $LOCALAPPDATA > to avoid having a NULL cachedir, while still allowing override by > using $XDG_CACHE_HOME. What sets LOCALAPPDATA? If it is the right thing on Windows then I am OK with applying this patch. Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir. 2011-05-01 11:42 ` Andy Wingo @ 2011-05-20 14:03 ` Jan Nieuwenhuizen 2011-06-16 22:02 ` Andy Wingo 0 siblings, 1 reply; 47+ messages in thread From: Jan Nieuwenhuizen @ 2011-05-20 14:03 UTC (permalink / raw) To: Andy Wingo; +Cc: guile-devel Andy Wingo writes: >> * libguile/load.c (scm_init_load_path) [MINGW32]: Use $LOCALAPPDATA >> to avoid having a NULL cachedir, while still allowing override by >> using $XDG_CACHE_HOME. > > What sets LOCALAPPDATA? If it is the right thing on Windows then I am > OK with applying this patch. Asking google, it seems that this is a newer Windows versions thing. Quite probably using plain $APPDATA is better. Whether either one is the right thing on windows -- if there's even such a thing -- I really don't have a clue, you would need to ask a windows guru for that. Greetings, Jan -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.nl ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir. 2011-05-20 14:03 ` Jan Nieuwenhuizen @ 2011-06-16 22:02 ` Andy Wingo 0 siblings, 0 replies; 47+ messages in thread From: Andy Wingo @ 2011-06-16 22:02 UTC (permalink / raw) To: Jan Nieuwenhuizen; +Cc: guile-devel On Fri 20 May 2011 16:03, Jan Nieuwenhuizen <janneke@gnu.org> writes: > Andy Wingo writes: > >>> * libguile/load.c (scm_init_load_path) [MINGW32]: Use $LOCALAPPDATA >>> to avoid having a NULL cachedir, while still allowing override by >>> using $XDG_CACHE_HOME. >> >> What sets LOCALAPPDATA? If it is the right thing on Windows then I am >> OK with applying this patch. > > Asking google, it seems that this is a newer Windows versions thing. > Quite probably using plain $APPDATA is better. Whether either one is > the right thing on windows -- if there's even such a thing -- I really > don't have a clue, you would need to ask a windows guru for that. OK. Applied your patch with some tweaks. Thanks! Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2011-11-25 2:51 UTC | newest] Thread overview: 47+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-15 15:34 mingw runtime patches Jan Nieuwenhuizen 2011-02-15 15:34 ` [PATCH 1/5] [mingw]: Add implementation of canonicalize_file_name Jan Nieuwenhuizen 2011-04-29 16:33 ` Andy Wingo 2011-05-20 13:56 ` Jan Nieuwenhuizen 2011-05-20 14:54 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Jan Nieuwenhuizen 2011-04-29 17:16 ` Andy Wingo 2011-04-29 17:30 ` Noah Lavine 2011-05-01 11:30 ` Andy Wingo 2011-05-01 19:23 ` Noah Lavine 2011-05-01 21:12 ` Andy Wingo 2011-05-01 21:48 ` Mark H Weaver 2011-05-02 7:45 ` Andy Wingo 2011-05-02 20:58 ` Ludovic Courtès 2011-05-02 21:58 ` Andy Wingo 2011-05-02 22:18 ` Ludovic Courtès 2011-05-03 7:44 ` Andy Wingo 2011-05-03 8:38 ` Ludovic Courtès 2011-05-04 3:59 ` Mark H Weaver 2011-05-04 4:13 ` Noah Lavine 2011-05-04 9:24 ` Ludovic Courtès 2011-05-17 16:59 ` Noah Lavine 2011-05-17 19:26 ` Mark H Weaver 2011-05-17 20:03 ` Mark H Weaver 2011-05-23 19:42 ` Filenames and other POSIX byte strings as SCM strings without loss Mark H Weaver 2011-07-01 10:51 ` Andy Wingo 2011-05-23 20:14 ` Paths as sequences of path components Mark H Weaver 2011-05-24 10:51 ` Hans Aberg 2011-11-23 22:15 ` Andy Wingo 2011-11-25 2:51 ` Mark H Weaver 2011-06-16 22:29 ` [PATCH 2/5] [mingw]: Have compiled-file-name produce valid names Andy Wingo 2011-05-02 23:16 ` Eli Barzilay 2011-05-20 13:47 ` Jan Nieuwenhuizen 2011-05-20 14:01 ` Andy Wingo 2011-06-30 14:11 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 3/5] [mingw]: Do not export opendir, readdir etc., as dirents differ Jan Nieuwenhuizen 2011-05-01 11:37 ` Andy Wingo 2011-05-20 13:57 ` Jan Nieuwenhuizen 2011-06-16 22:22 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 4/5] [mingw]: Delete existing target file before attempting rename Jan Nieuwenhuizen 2011-05-01 11:40 ` Andy Wingo 2011-05-20 14:05 ` Jan Nieuwenhuizen 2011-06-16 21:45 ` Andy Wingo 2011-02-15 15:35 ` [PATCH 5/5] [mingw]: Use $LOCALAPPDATA as a possible root for cachedir Jan Nieuwenhuizen 2011-05-01 11:42 ` Andy Wingo 2011-05-20 14:03 ` Jan Nieuwenhuizen 2011-06-16 22:02 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).