* Re: Fix 'dirname' and 'basename' on MS-Windows @ 2014-07-09 15:16 Nelson H. F. Beebe 2014-07-09 16:49 ` Eli Zaretskii 0 siblings, 1 reply; 5+ messages in thread From: Nelson H. F. Beebe @ 2014-07-09 15:16 UTC (permalink / raw) To: guile-devel; +Cc: beebe Eli Zaretskii <eliz@gnu.org> comments on misbehavior (or unexpected behavior) of guile's (basename ...) function: > (basename ".foo" ".foo") => "." > (basename "_foo" "_foo") => "." > > Also, isn't the following result wrong as well? > > (basename "/") => "/" According to built-in documentation: guile> (help basename) `basename' is a primitive procedure in the (guile) module. -- Scheme Procedure: basename filename [suffix] Return the base name of the file name FILENAME. The base name is the file name without any directory components. If SUFFIX is provided, and is equal to the end of BASENAME, it is removed also. So, let us see what these produce: guile> (basename ".foo" ".foo") "." guile> (basename "_foo" "_foo") "." The documentation clearly indicates that the matching suffix is removed, in which case, the result should be a empty string. The function therefore does not follow its documentation, and one or the other are wrong. However, the Unix (and POSIX) basename and dirname commands have been around since at least 1979 (I found them in my Unix 7th edition manuals from that year), and I think it would be wise to follow the POSIX standard for their implementation: % basename /tmp/x/y/z/foo.bar foo.bar % basename /tmp/x/y/z/foo.bar .bar foo % basename /tmp/x/y/z/foo.bar bar foo. % basename foo.bar .bar foo % basename .bar .bar .bar The possibly-surprising behaviour of that last example is due to the wording in POSIX (IEEE Std 1003.1-2001): >> ... >> 6. If the suffix operand is present, is not identical to the >> characters remaining in string, and is identical to a suffix of the >> characters remaining in string, the suffix suffix shall be removed >> from string. Otherwise, string is not modified by this step. It >> shall not be considered an error if suffix is not found in string. >> ... The phrase `is not identical to the characters remaining in string' means that ".bar" is the result, rather than "". Also notice that POSIX defines a basename() library function, but it takes only one argument, and thus does not have the same behavior as the basename command when the latter has two arguments. Because guile offers a choice of 1 or 2 arguments, its basename function was presumably modeled on the POSIX command, rather than the POSIX library function. Also, in guile documentation, would it not be better to replace "file name", "base name", FILENAME, and BASENAME with the standard POSIX terminology "pathname" and "filename"? /tmp/x/y/z/foo.bar # a pathname /tmp/x/y/z # the path to (or directory of) that pathname foo.bar # the filename of that pathname POSIX says this about those names: >> ... >> 3.2 Absolute Pathname >> >> A pathname beginning with a single or more than two >> slashes; see also Section 3.266 >> ... >> ... >> 3.40 Basename >> >> The final, or only, filename in a pathname. >> ... >> ... >> 3.169 Filename >> >> A name consisting of 1 to {NAME_MAX} bytes used to name a >> file. The characters composing the name may be selected >> from the set of all character values excluding the slash >> character and the null byte. The filenames dot and dot-dot >> have special meaning. A filename is sometimes referred to >> as a ``pathname component''. >> ... >> >> ... >> 3.266 Pathname >> >> A character string that is used to identify a file. In the >> context of IEEE Std 1003.1-2001, a pathname consists of, at >> most, {PATH_MAX} bytes, including the terminating null >> byte. It has an optional beginning slash, followed by zero or >> more filenames separated by slashes. A pathname may >> optionally contain one or more trailing slashes. Multiple >> successive slashes are considered to be the same as one >> slash. >> ... >> ... >> 3.319 Relative Pathname >> >> A pathname not beginning with a slash. >> ... >> >> ... >> 4.11 Pathname Resolution >> >> ... long complex text omitted ... >> >> A pathname consisting of a single slash shall resolve to the root >> directory of the process. A null pathname shall not be successfully >> resolved. A pathname that begins with two successive slashes may be >> interpreted in an implementation-defined manner, although more than >> two leading slashes shall be treated as a single slash. >> ... ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - ------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fix 'dirname' and 'basename' on MS-Windows 2014-07-09 15:16 Fix 'dirname' and 'basename' on MS-Windows Nelson H. F. Beebe @ 2014-07-09 16:49 ` Eli Zaretskii 0 siblings, 0 replies; 5+ messages in thread From: Eli Zaretskii @ 2014-07-09 16:49 UTC (permalink / raw) To: Nelson H. F. Beebe; +Cc: guile-devel > Date: Wed, 9 Jul 2014 09:16:35 -0600 (MDT) > From: "Nelson H. F. Beebe" <beebe@math.utah.edu> > Cc: beebe@math.utah.edu > > >> 6. If the suffix operand is present, is not identical to the > >> characters remaining in string, and is identical to a suffix of the > >> characters remaining in string, the suffix suffix shall be removed > >> from string. Otherwise, string is not modified by this step. It > >> shall not be considered an error if suffix is not found in string. > >> ... > > The phrase `is not identical to the characters remaining in string' > means that ".bar" is the result, rather than "". Any idea why does this exception make sense? > Also, in guile documentation, would it not be better to replace "file > name", "base name", FILENAME, and BASENAME with the standard POSIX > terminology "pathname" and "filename"? GNU Coding Standards frown upon using "path" for anything that is not a PATH-style directory list. In GNU terminology, "filename" can be both full (a.k.a. "absolute") and relative file names. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Windows file name separators
@ 2014-07-01 15:38 Ludovic Courtès
2014-07-02 16:13 ` Fix 'dirname' and 'basename' on MS-Windows Eli Zaretskii
0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2014-07-01 15:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: guile-devel
Eli Zaretskii <eliz@gnu.org> skribis:
>> From: ludo@gnu.org (Ludovic Courtès)
>> Cc: guile-devel@gnu.org
>> Date: Tue, 01 Jul 2014 11:36:32 +0200
>>
>> Eli Zaretskii <eliz@gnu.org> skribis:
>>
>> > In Emacs, some of the file and directory names recorded during the
>> > build and startup come from argv[0] and from prefix-relative directory
>> > names computed by configure. Is there something similar in Guile, and
>> > if so, where do I find that?
>>
>> The default %load-path uses absolute directory names based on what
>> ./configure computed.
>
> Thanks. Where do I find the code which does that? I'd like to review
> it with the issue at hand in mind.
You can look at load.c, and in particular scm_init_load_path.
Ludo’.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Fix 'dirname' and 'basename' on MS-Windows 2014-07-01 15:38 Windows file name separators Ludovic Courtès @ 2014-07-02 16:13 ` Eli Zaretskii 2014-07-09 14:22 ` Ludovic Courtès 0 siblings, 1 reply; 5+ messages in thread From: Eli Zaretskii @ 2014-07-02 16:13 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel These 2 functions don't deal correctly with Windows file names with drive letters and with UNCs. The patch below fixes that. Incidentally, isn't the line in scm_basename marked below wrong? if (i == end) { if (len > 0 && is_file_name_separator (scm_c_string_ref (filename, 0))) return scm_c_substring (filename, 0, 1); else return scm_dot_string; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<< } else return scm_c_substring (filename, i+1, end+1); It is responsible for the following strange results: (basename ".foo" ".foo") => "." (basename "_foo" "_foo") => "." Also, isn't the following result wrong as well? (basename "/") => "/" I think all of these should return the empty string, "". Here's the proposed patch for supporting Windows file names. --- libguile/filesys.c~1 2014-06-29 16:13:30 +0300 +++ libguile/filesys.c 2014-07-02 14:03:08 +0300 @@ -448,6 +448,18 @@ is_file_name_separator (SCM c) return 0; } +static int +is_drive_letter (SCM c) +{ +#ifdef __MINGW32__ + if (SCM_CHAR (c) >= 'a' && SCM_CHAR (c) <= 'z') + return 1; + else if (SCM_CHAR (c) >= 'A' && SCM_CHAR (c) <= 'Z') + return 1; +#endif + return 0; +} + SCM_DEFINE (scm_stat, "stat", 1, 1, 0, (SCM object, SCM exception_on_error), "Return an object containing various information about the file\n" @@ -1518,24 +1530,60 @@ SCM_DEFINE (scm_dirname, "dirname", 1, 0 { long int i; unsigned long int len; + /* Length of prefix before the top-level slash. Always zero on + Posix hosts, but may be non-zero on Windows. */ + long prefix_len = 0; + int is_unc = 0; + unsigned long unc_end = 0; SCM_VALIDATE_STRING (1, filename); len = scm_i_string_length (filename); + if (len >= 2 + && is_drive_letter (scm_c_string_ref (filename, 0)) + && scm_is_eq (scm_c_string_ref (filename, 1), SCM_MAKE_CHAR (':'))) + { + prefix_len = 1; + if (len > 2 && is_file_name_separator (scm_c_string_ref (filename, 2))) + prefix_len++; + } +#ifdef __MINGW32__ + if (len > 1 + && is_file_name_separator (scm_c_string_ref (filename, 0)) + && is_file_name_separator (scm_c_string_ref (filename, 1))) + { + is_unc = 1; + prefix_len = 1; + } +#endif i = len - 1; - while (i >= 0 && is_file_name_separator (scm_c_string_ref (filename, i))) + while (i >= prefix_len + && is_file_name_separator (scm_c_string_ref (filename, i))) --i; - while (i >= 0 && !is_file_name_separator (scm_c_string_ref (filename, i))) + if (is_unc) + unc_end = i + 1; + while (i >= prefix_len + && !is_file_name_separator (scm_c_string_ref (filename, i))) --i; - while (i >= 0 && is_file_name_separator (scm_c_string_ref (filename, i))) + while (i >= prefix_len + && is_file_name_separator (scm_c_string_ref (filename, i))) --i; - if (i < 0) + if (i < prefix_len) { - if (len > 0 && is_file_name_separator (scm_c_string_ref (filename, 0))) - return scm_c_substring (filename, 0, 1); + if (is_unc) + return scm_c_substring (filename, 0, unc_end); + else if (len > prefix_len + && is_file_name_separator (scm_c_string_ref (filename, prefix_len))) + return scm_c_substring (filename, 0, prefix_len + 1); +#ifdef __MINGW32__ + else if (len > prefix_len + && scm_is_eq (scm_c_string_ref (filename, 1), + SCM_MAKE_CHAR (':'))) + return scm_c_substring (filename, 0, prefix_len + 1); +#endif else return scm_dot_string; } @@ -1553,6 +1601,9 @@ SCM_DEFINE (scm_basename, "basename", 1, #define FUNC_NAME s_scm_basename { int i, j, len, end; + /* Length of prefix before the top-level slash. Always zero on + Posix hosts, but may be non-zero on Windows. */ + long prefix_len = 0; SCM_VALIDATE_STRING (1, filename); len = scm_i_string_length (filename); @@ -1564,11 +1615,17 @@ SCM_DEFINE (scm_basename, "basename", 1, SCM_VALIDATE_STRING (2, suffix); j = scm_i_string_length (suffix) - 1; } + if (len >= 2 + && is_drive_letter (scm_c_string_ref (filename, 0)) + && scm_is_eq (scm_c_string_ref (filename, 1), SCM_MAKE_CHAR (':'))) + prefix_len = 2; + i = len - 1; - while (i >= 0 && is_file_name_separator (scm_c_string_ref (filename, i))) + while (i >= prefix_len + && is_file_name_separator (scm_c_string_ref (filename, i))) --i; end = i; - while (i >= 0 && j >= 0 + while (i >= prefix_len && j >= 0 && (scm_i_string_ref (filename, i) == scm_i_string_ref (suffix, j))) { @@ -1577,12 +1634,20 @@ SCM_DEFINE (scm_basename, "basename", 1, } if (j == -1) end = i; - while (i >= 0 && !is_file_name_separator (scm_c_string_ref (filename, i))) + while (i >= prefix_len + && !is_file_name_separator (scm_c_string_ref (filename, i))) --i; if (i == end) { - if (len > 0 && is_file_name_separator (scm_c_string_ref (filename, 0))) - return scm_c_substring (filename, 0, 1); + if (len > prefix_len + && is_file_name_separator (scm_c_string_ref (filename, prefix_len))) + return scm_c_substring (filename, 0, prefix_len + 1); +#ifdef __MINGW32__ + else if (len > prefix_len + && scm_is_eq (scm_c_string_ref (filename, 1), + SCM_MAKE_CHAR (':'))) + return scm_c_substring (filename, 0, prefix_len + 1); +#endif else return scm_dot_string; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fix 'dirname' and 'basename' on MS-Windows 2014-07-02 16:13 ` Fix 'dirname' and 'basename' on MS-Windows Eli Zaretskii @ 2014-07-09 14:22 ` Ludovic Courtès 2014-07-09 14:53 ` Eli Zaretskii 0 siblings, 1 reply; 5+ messages in thread From: Ludovic Courtès @ 2014-07-09 14:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: guile-devel Eli Zaretskii <eliz@gnu.org> skribis: > These 2 functions don't deal correctly with Windows file names with > drive letters and with UNCs. The patch below fixes that. > > Incidentally, isn't the line in scm_basename marked below wrong? > > if (i == end) > { > if (len > 0 && is_file_name_separator (scm_c_string_ref (filename, 0))) > return scm_c_substring (filename, 0, 1); > else > return scm_dot_string; <<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > } > else > return scm_c_substring (filename, i+1, end+1); > > It is responsible for the following strange results: > > (basename ".foo" ".foo") => "." > (basename "_foo" "_foo") => "." > > Also, isn't the following result wrong as well? > > (basename "/") => "/" > > I think all of these should return the empty string, "". (I think I forgot about this message, sorry.) It seems that Gnulib’s dirname-lgpl and basename-lgpl modules do what you want. Could you confirm? If that’s the case, I’ll import them. If you want to commit Window-specific tests, that’s even better. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Fix 'dirname' and 'basename' on MS-Windows 2014-07-09 14:22 ` Ludovic Courtès @ 2014-07-09 14:53 ` Eli Zaretskii 0 siblings, 0 replies; 5+ messages in thread From: Eli Zaretskii @ 2014-07-09 14:53 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel > From: ludo@gnu.org (Ludovic Courtès) > Cc: guile-devel@gnu.org > Date: Wed, 09 Jul 2014 16:22:02 +0200 > > It seems that Gnulib’s dirname-lgpl and basename-lgpl modules do what > you want. Could you confirm? Yes, they do the job. But since we want to support UNCs, we need to define DOUBLE_SLASH_IS_DISTINCT_ROOT to a non-zero value, because Gnulib doesn't. > If that’s the case, I’ll import them. If you want to commit > Window-specific tests, that’s even better. There are no tests of these functions currently. Do you mean you want a Window-only test? Thanks. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-09 16:49 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-09 15:16 Fix 'dirname' and 'basename' on MS-Windows Nelson H. F. Beebe 2014-07-09 16:49 ` Eli Zaretskii -- strict thread matches above, loose matches on Subject: below -- 2014-07-01 15:38 Windows file name separators Ludovic Courtès 2014-07-02 16:13 ` Fix 'dirname' and 'basename' on MS-Windows Eli Zaretskii 2014-07-09 14:22 ` Ludovic Courtès 2014-07-09 14:53 ` Eli Zaretskii
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).