unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#57102: 29.0.50; Peculiar file-name-split edge case
@ 2022-08-10  8:24 Philip Kaludercic
  2022-08-12 15:35 ` Lars Ingebrigtsen
  2022-08-13 17:08 ` Mattias Engdegård
  0 siblings, 2 replies; 23+ messages in thread
From: Philip Kaludercic @ 2022-08-10  8:24 UTC (permalink / raw)
  To: 57102

[-- Attachment #1: Type: text/plain, Size: 992 bytes --]


I am not sure if this is intentional, but the new `file-name-split'
is a bit unintuitive in this edge-case:

   (file-name-split "/")        → ("" "" "")

while

   (file-name-split "/a")       → ("" "a")
   (file-name-split "a/")       → ("a" "")
   (file-name-split "a/b")      → ("a" "b")

another few peculiar cases might be

   (file-name-split "//")       → ("/" "" "")
   (file-name-split "///")      → ("" "" "")
   (file-name-split "////")     → ("" "" "")

as I'd expect '/' (in the first case) to never be part of a file name
(at least on a *nix system).

This all appears to happen only if there is no actual file name to be
found:

   (file-name-split "a/")       → ("a" "")
   (file-name-split "a//")      → ("a" "")
   (file-name-split "a///")     → ("a" "")

One simple solution might just be to remove all empty strings from the
return value of `file-name-split', as to my knowledge empty file names
are now allowed (?):


[-- Attachment #2: Type: text/plain, Size: 417 bytes --]

diff --git a/lisp/files.el b/lisp/files.el
index 65f9039b33..c5817be8da 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -5170,7 +5170,7 @@ file-name-split
                   (substring dir 0 -1))
                 components)
           (setq filename nil))))
-    components))
+    (delq "" components)))
 
 (defun file-parent-directory (filename)
   "Return the directory name of the parent directory of FILENAME.

[-- Attachment #3: Type: text/plain, Size: 7794 bytes --]


That should take care of all the inconsistencies I see here, and it
would also preserve the usual intuition that "path///to//file"
designates the same path as "path/to/file".

In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.34, cairo version 1.17.6)
 of 2022-08-01 built on rhea
Repository revision: 47f1cae83c269ea43d6b208e055ce536c017856f
Repository branch: feature/package+vc
System Description: Fedora Linux 36 (Workstation Edition)

Configured using:
 'configure --with-pgtk --with-native-compilation --with-imagemagick'

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ IMAGEMAGICK
JPEG JSON LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY
INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS XIM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: ELisp/l

Minor modes in effect:
  bug-reference-prog-mode: t
  global-git-commit-mode: t
  magit-auto-revert-mode: t
  auto-revert-mode: t
  shell-dirtrack-mode: t
  outline-minor-mode: t
  flymake-mode: t
  flyspell-mode: t
  repeat-mode: t
  diff-hl-flydiff-mode: t
  diff-hl-mode: t
  winner-mode: t
  windmove-mode: t
  corfu-history-mode: t
  corfu-mode: t
  vertico-multiform-mode: t
  vertico-mode: t
  electric-pair-mode: t
  recentf-mode: t
  save-place-mode: t
  savehist-mode: t
  xterm-mouse-mode: t
  pixel-scroll-precision-mode: t
  pixel-scroll-mode: t
  display-time-mode: t
  display-battery-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tab-bar-mode: t
  file-name-shadow-mode: t
  context-menu-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  line-number-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  auto-save-visited-mode: t

Load-path shadows:
/home/philip/.config/emacs/elpa/transient-0.3.7/transient hides /home/philip/Source/emacs/lisp/transient
~/.config/emacs/site-lisp/autoload hides /home/philip/Source/emacs/lisp/emacs-lisp/autoload
/home/philip/Source/emacs/lisp/ps-def hides /home/philip/Source/emacs/lisp/obsolete/ps-def

Features:
(shadow emacsbug flymake-cc cc-mode cc-fonts cc-guess cc-menus cc-cmds
cc-styles cc-align cc-engine cc-vars cc-defs ert loadhist mode-local
help-at-pt emacs-news-mode magit-extras dictionary dictionary-connection
qp timezone flow-fill mailalias smtpmail autocrypt-message ecomplete
ietf-drums-date sort smiley gnus-cite mail-extr textsec uni-scripts
idna-mapping ucs-normalize uni-confusable textsec-check gnus-bcklg
nnfolder modus-vivendi-theme ibuffer ibuffer-loaddefs compat-27
compat-26 compat-25 compat-24 cus-start hyperspec vertico-buffer
make-mode locate doctor files-x shell-command+ cl-print edebug debug
backtrace consult-vertico consult-icomplete consult compat-28
magit-bookmark bookmark macrostep compat-macs avy gnus-async gnus-ml
disp-table autocrypt-gnus autocrypt utf-7 nndraft nnmh epa-file
gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg nntp
gnus-cache whitespace shortdoc pulse color bug-reference grep xref
copyright time-stamp face-remap magit-submodule magit-obsolete
magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull
magit-fetch magit-clone magit-remote magit-commit magit-sequence
magit-notes magit-worktree magit-tag magit-merge magit-branch
magit-reset magit-files magit-refs magit-status magit magit-repos
magit-apply magit-wip magit-log which-func imenu magit-diff smerge-mode
git-commit log-edit add-log magit-core magit-autorevert autorevert
magit-margin magit-transient magit-process with-editor shell server
magit-mode transient edmacro kmacro magit-git magit-section magit-utils
crm dash dired-aux gnus-dired vc-hg vc-git vc-bzr vc-src vc-sccs vc-svn
vc-cvs vc-rcs sh-script smie executable vc-backup log-view pcvs-util
buffer-env compat org-element avl-tree generator ol-eww eww xdg
url-queue mm-url ol-rmail ol-mhe ol-irc ol-info ol-gnus nnselect
gnus-art mm-uu mml2015 mm-view mml-smime smime dig gnus-sum shr
pixel-fill kinsoku url-file url-dired svg dom gnus-group gnus-undo
gnus-start gnus-dbus gnus-cloud nnimap nnmail mail-source utf7 netrc
nnoo parse-time gnus-spec gnus-int gnus-range gnus-win ol-docview
doc-view filenotify jka-compr image-mode exif ol-bibtex ol-bbdb ol-w3m
ol-doi org-link-doi org ob ob-tangle ob-ref ob-lob ob-table ob-exp
org-macro org-footnote org-src ob-comint org-pcomplete pcomplete
org-list org-faces org-entities org-version ob-emacs-lisp ob-core
ob-eval org-table oc-basic bibtex iso8601 ol org-keys oc org-compat
advice org-macs org-loaddefs find-func cal-menu calendar cal-loaddefs
char-fold misearch multi-isearch mm-archive message yank-media rfc822
mml mml-sec epa mailabbrev gmm-utils mailheader mm-decode mm-bodies
mm-encode mule-util gnutls network-stream url-cache url-http url-auth
mail-parse rfc2231 url-gw nsm puny display-line-numbers finder-inf
vertico-directory orderless vertico-flat noutline outline checkdoc
flymake-proc flymake thingatpt flyspell ispell comp comp-cstr warnings
auth-source-pass repeat project dired-x sendmail rfc2047 rfc2045
ietf-drums gnus nnheader gnus-util time-date mail-utils range mm-util
mail-prsvr diff-hl-flydiff diff diff-hl vc-dir ewoc vc dired
dired-loaddefs vc-dispatcher diff-mode hippie-exp winner windmove
corfu-history corfu vertico-multiform vertico elec-pair recentf
tree-widget saveplace savehist xt-mouse modus-operandi-theme
modus-themes cus-edit pp icons wid-edit format-spec pixel-scroll
cua-base icomplete time battery dbus xml cus-load setup site-lisp
compile text-property-search comint ansi-color easy-mmode autoload
loaddefs-gen lisp-mnt coterm-autoloads embark-autoloads magit-autoloads
vertico-autoloads buffer-env-autoloads geiser-chibi-autoloads
consult-autoloads compat-autoloads crdt-autoloads corfu-autoloads
slime-autoloads geiser-impl help-fns radix-tree geiser-custom
geiser-base ring transient-autoloads info tex-site package let-alist
derived rx pcase cl-extra help-mode browse-url url url-proxy url-privacy
url-expand url-methods url-history url-cookie generate-lisp-file
url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq
eieio eieio-core cl-macs password-cache json map byte-opt gv bytecomp
byte-compile cconv url-vars epg rfc6068 epg-config subr-x cl-loaddefs
cl-lib rmc iso-transl tooltip eldoc paren electric uniquify ediff-hook
vc-hooks lisp-float-type elisp-mode mwheel term/pgtk-win pgtk-win
term/common-win pgtk-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs faces cus-face macroexp files window
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget keymap hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting cairo
gtk pgtk multi-tty make-network-process native-compile emacs)

Memory information:
((conses 16 1305290 153382)
 (symbols 48 47185 40)
 (strings 32 229531 23057)
 (string-bytes 1 7942713)
 (vectors 16 115746)
 (vector-slots 8 2759557 273441)
 (floats 8 843 1462)
 (intervals 56 52420 4538)
 (buffers 992 64))

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-10  8:24 bug#57102: 29.0.50; Peculiar file-name-split edge case Philip Kaludercic
@ 2022-08-12 15:35 ` Lars Ingebrigtsen
  2022-08-12 15:56   ` Philip Kaludercic
  2022-08-13 17:08 ` Mattias Engdegård
  1 sibling, 1 reply; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-12 15:35 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: 57102

Philip Kaludercic <philipk@posteo.net> writes:

> I am not sure if this is intentional, but the new `file-name-split'
> is a bit unintuitive in this edge-case:
>
>    (file-name-split "/")        → ("" "" "")
>
> while
>
>    (file-name-split "/a")       → ("" "a")
>    (file-name-split "a/")       → ("a" "")
>    (file-name-split "a/b")      → ("a" "b")

The logic is that

(equal (string-join (file-name-split foo) "/") foo)

is supposed to be always true.  I see that's not the case in the "/"
case, so that needs fixing.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-12 15:35 ` Lars Ingebrigtsen
@ 2022-08-12 15:56   ` Philip Kaludercic
  2022-08-12 15:59     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 23+ messages in thread
From: Philip Kaludercic @ 2022-08-12 15:56 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 57102

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Philip Kaludercic <philipk@posteo.net> writes:
>
>> I am not sure if this is intentional, but the new `file-name-split'
>> is a bit unintuitive in this edge-case:
>>
>>    (file-name-split "/")        → ("" "" "")
>>
>> while
>>
>>    (file-name-split "/a")       → ("" "a")
>>    (file-name-split "a/")       → ("a" "")
>>    (file-name-split "a/b")      → ("a" "b")
>
> The logic is that
>
> (equal (string-join (file-name-split foo) "/") foo)

How sensible is this in the first place?  Shouldn't it rather be
something like

(file-equal-p (apply #'file-name-concat (file-name-split filename)) filename)

[ which is currently likewise not given ]

Or to put it differently, who does the preceding empty string benefit if
we ignore the condition mentioned in the docstring?  Are there any
real-world use-cases?

> is supposed to be always true.  I see that's not the case in the "/"
> case, so that needs fixing.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-12 15:56   ` Philip Kaludercic
@ 2022-08-12 15:59     ` Lars Ingebrigtsen
  2022-08-12 16:29       ` Philip Kaludercic
  0 siblings, 1 reply; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-12 15:59 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: 57102

Philip Kaludercic <philipk@posteo.net> writes:

> How sensible is this in the first place?  Shouldn't it rather be
> something like
>
> (file-equal-p (apply #'file-name-concat (file-name-split filename)) filename)
>
> [ which is currently likewise not given ]
>
> Or to put it differently, who does the preceding empty string benefit if
> we ignore the condition mentioned in the docstring?  Are there any
> real-world use-cases?

You need to be able to tell (file-name-split "a/b") => ("a" "b") and
(file-name-split "/a/b") => ("" "a" "b") apart.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-12 15:59     ` Lars Ingebrigtsen
@ 2022-08-12 16:29       ` Philip Kaludercic
  2022-08-13 11:44         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 23+ messages in thread
From: Philip Kaludercic @ 2022-08-12 16:29 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 57102

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Philip Kaludercic <philipk@posteo.net> writes:
>
>> How sensible is this in the first place?  Shouldn't it rather be
>> something like
>>
>> (file-equal-p (apply #'file-name-concat (file-name-split filename)) filename)
>>
>> [ which is currently likewise not given ]
>>
>> Or to put it differently, who does the preceding empty string benefit if
>> we ignore the condition mentioned in the docstring?  Are there any
>> real-world use-cases?
>
> You need to be able to tell (file-name-split "a/b") => ("a" "b") and
> (file-name-split "/a/b") => ("" "a" "b") apart.

Could one instead prefix the list with a symbol (either `absolute' or
`relative') to distinguish the two cases.  Or do you think that would
just make it more complicated.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-12 16:29       ` Philip Kaludercic
@ 2022-08-13 11:44         ` Lars Ingebrigtsen
  2022-08-13 13:24           ` Philip Kaludercic
  2022-08-13 18:06           ` Augusto Stoffel
  0 siblings, 2 replies; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-13 11:44 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: 57102

Philip Kaludercic <philipk@posteo.net> writes:

> Could one instead prefix the list with a symbol (either `absolute' or
> `relative') to distinguish the two cases.  Or do you think that would
> just make it more complicated.

I think that just makes it way more complicated to use.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-13 11:44         ` Lars Ingebrigtsen
@ 2022-08-13 13:24           ` Philip Kaludercic
  2022-08-15  5:54             ` Lars Ingebrigtsen
  2022-08-13 18:06           ` Augusto Stoffel
  1 sibling, 1 reply; 23+ messages in thread
From: Philip Kaludercic @ 2022-08-13 13:24 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 57102

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Philip Kaludercic <philipk@posteo.net> writes:
>
>> Could one instead prefix the list with a symbol (either `absolute' or
>> `relative') to distinguish the two cases.  Or do you think that would
>> just make it more complicated.
>
> I think that just makes it way more complicated to use.

Grepping through emacs.git to see where file-name-split is used, it
seems that all it would change in most cases you'd just require an
additional (cdr ...).  At the same time, there is already an instance in
gnus-search.el that deals with the issue I brought up with the empty
strings.

As another alternative, how about file-name-split takes an optional
argument, and only does what I suggested in that case?  From how I see
file-name-split being used up until now (which doesn't have to mean a
lot), that could be a good compromise.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-10  8:24 bug#57102: 29.0.50; Peculiar file-name-split edge case Philip Kaludercic
  2022-08-12 15:35 ` Lars Ingebrigtsen
@ 2022-08-13 17:08 ` Mattias Engdegård
  2022-09-25 12:06   ` Stefan Kangas
  1 sibling, 1 reply; 23+ messages in thread
From: Mattias Engdegård @ 2022-08-13 17:08 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: Lars Ingebrigtsen, 57102

The current behaviour of file-name-split is based on a purely textual splitting on "/" which isn't as useful as basing it on actual file name components. For instance, the root component of a Posix file name is "/", not "". Looking at other languages and libraries is very much encouraged; they vary a lot in the amount of thought that has gone into their design.

Ideally we'd have a split function (the name is a placeholder for now) where:

(split "/a/b/c") -> ("/" "a" "b" "c")
(split "a/b/c/") -> ("a" "b" "c" "")
(split "/") -> ("/" "")

and, because repeated slashes mean the same thing as a single one in Posix except at the beginning,
(split "//a//b//") -> ("//" "a" "b" "")

An accompanying join operation would be the inverse, sort of:
(join "/" "a" "b" "c") -> "/a/b/c"
(join "a" "b" "") -> "a/b/"

where empty strings are ignored except at the end:
(join "" "a" "" "" "b") -> "a/b"
(join "a" "b" "" "") -> "a/b/"

Pre-joined chunks can be joined too:
(join "/a/b" "c/d/" "e") -> "/a/b/c/d/e"

Maybe components with a leading slash start over from the root:
(join "/" "a" "/b" "c") -> "/a/b/c" ?
(join "/" "a" "/b" "c") -> "/b/c" ?

Python's os.path.join does the latter; it's probably a good idea.

Now `file-name-concat` almost works like `join` above but not quite. Adding a new function is likely better than making compromises.

On Windows I'd expect that

(split "c:\\a\\b") -> ("c:\\" "a" "b")
(split "c:\\") -> ("c:\\" "")

but it's a bit complicated and then we have all the UNC path variants to deal with; a platform expert should be consulted.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-13 11:44         ` Lars Ingebrigtsen
  2022-08-13 13:24           ` Philip Kaludercic
@ 2022-08-13 18:06           ` Augusto Stoffel
  2022-08-14  6:24             ` Philip Kaludercic
  1 sibling, 1 reply; 23+ messages in thread
From: Augusto Stoffel @ 2022-08-13 18:06 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Philip Kaludercic, 57102

On Sat, 13 Aug 2022 at 13:44, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> Philip Kaludercic <philipk@posteo.net> writes:
>
>> Could one instead prefix the list with a symbol (either `absolute' or
>> `relative') to distinguish the two cases.  Or do you think that would
>> just make it more complicated.
>
> I think that just makes it way more complicated to use.

If instead of a symbol `absolute' or `relative' the first element was
one of the strings "/" or ".", then it would probably be pretty handy to
use.  But then of course the logic would be that

  (file-equal-p (apply #'file-name-concat (file-name-split filename)) filename)





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-13 18:06           ` Augusto Stoffel
@ 2022-08-14  6:24             ` Philip Kaludercic
  0 siblings, 0 replies; 23+ messages in thread
From: Philip Kaludercic @ 2022-08-14  6:24 UTC (permalink / raw)
  To: Augusto Stoffel; +Cc: Lars Ingebrigtsen, 57102

Augusto Stoffel <arstoffel@gmail.com> writes:

> On Sat, 13 Aug 2022 at 13:44, Lars Ingebrigtsen <larsi@gnus.org> wrote:
>
>> Philip Kaludercic <philipk@posteo.net> writes:
>>
>>> Could one instead prefix the list with a symbol (either `absolute' or
>>> `relative') to distinguish the two cases.  Or do you think that would
>>> just make it more complicated.
>>
>> I think that just makes it way more complicated to use.
>
> If instead of a symbol `absolute' or `relative' the first element was
> one of the strings "/" or ".", then it would probably be pretty handy to
> use.  But then of course the logic would be that
>
>   (file-equal-p (apply #'file-name-concat (file-name-split filename)) filename)

"/" or "." sounds a lot better than using symbols (at least on POSIX
systems).





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-13 13:24           ` Philip Kaludercic
@ 2022-08-15  5:54             ` Lars Ingebrigtsen
  2022-08-15 11:28               ` Lars Ingebrigtsen
  2022-08-15 11:35               ` Eli Zaretskii
  0 siblings, 2 replies; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-15  5:54 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: 57102

Philip Kaludercic <philipk@posteo.net> writes:

> Grepping through emacs.git to see where file-name-split is used, it
> seems that all it would change in most cases you'd just require an
> additional (cdr ...).  At the same time, there is already an instance in
> gnus-search.el that deals with the issue I brought up with the empty
> strings.

Stepping back a bit, the use case for file-name-split is basically "give
me the components of this file name", but it's trying to return
something that can unambiguously be put back together again to get to
the original string.  Perhaps that's misguided, and the interpretation
of the result be left up to the caller.

That is, perhaps what we want is

(file-name-split "a/b")
=> '("a" "b")

(file-name-split "a/b/")
=> '("a" "b")

(file-name-split "/a/b")
=> '("a" "b")

(file-name-split "//a////b////")
=> '("a" "b")

I'm not sure what

(file-name-split "c:/a/b")

on Windows should return in that case, though.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-15  5:54             ` Lars Ingebrigtsen
@ 2022-08-15 11:28               ` Lars Ingebrigtsen
  2022-08-15 15:57                 ` Philip Kaludercic
  2022-08-15 11:35               ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-15 11:28 UTC (permalink / raw)
  To: Philip Kaludercic; +Cc: 57102

Lars Ingebrigtsen <larsi@gnus.org> writes:

> That is, perhaps what we want is
>
> (file-name-split "a/b")
> => '("a" "b")
>
> (file-name-split "a/b/")
> => '("a" "b")

But in that case, this is just a synonym for (split-string foo "/" t).






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-15  5:54             ` Lars Ingebrigtsen
  2022-08-15 11:28               ` Lars Ingebrigtsen
@ 2022-08-15 11:35               ` Eli Zaretskii
  2022-08-17 10:55                 ` Lars Ingebrigtsen
  1 sibling, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2022-08-15 11:35 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: philipk, 57102

> Cc: 57102@debbugs.gnu.org
> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Mon, 15 Aug 2022 07:54:34 +0200
> 
> That is, perhaps what we want is
> 
> (file-name-split "a/b")
> => '("a" "b")
> 
> (file-name-split "a/b/")
> => '("a" "b")
> 
> (file-name-split "/a/b")
> => '("a" "b")

This should return '("/" "a" "b"), or '("/a" "b")  I think.

> (file-name-split "//a////b////")
> => '("a" "b")

And this should return '("/" "a" "b" "/"), or '("/a" "b/").

> I'm not sure what
> 
> (file-name-split "c:/a/b")
> 
> on Windows should return in that case, though.

If you agree with the above, then '("c:/" "a" "b").

If you don't care about the root directory, then '("a" "b")
(a.k.a. "leave the interpretation to the caller").





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-15 11:28               ` Lars Ingebrigtsen
@ 2022-08-15 15:57                 ` Philip Kaludercic
  0 siblings, 0 replies; 23+ messages in thread
From: Philip Kaludercic @ 2022-08-15 15:57 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 57102

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
>> That is, perhaps what we want is
>>
>> (file-name-split "a/b")
>> => '("a" "b")
>>
>> (file-name-split "a/b/")
>> => '("a" "b")
>
> But in that case, this is just a synonym for (split-string foo "/" t).

Plus the ability to do the right thing on the DOS/Windows.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-15 11:35               ` Eli Zaretskii
@ 2022-08-17 10:55                 ` Lars Ingebrigtsen
  0 siblings, 0 replies; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-08-17 10:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: philipk, 57102

Eli Zaretskii <eliz@gnu.org> writes:

>> (file-name-split "/a/b")
>> => '("a" "b")
>
> This should return '("/" "a" "b"), or '("/a" "b")  I think.
>
>> (file-name-split "//a////b////")
>> => '("a" "b")
>
> And this should return '("/" "a" "b" "/"), or '("/a" "b/").
>
>> I'm not sure what
>> 
>> (file-name-split "c:/a/b")
>> 
>> on Windows should return in that case, though.
>
> If you agree with the above, then '("c:/" "a" "b").
>
> If you don't care about the root directory, then '("a" "b")
> (a.k.a. "leave the interpretation to the caller").

I'm not at all sure, and perhaps we should have two different functions
here for the two different use cases that I think this function has.

The use cases, as I see it, are

1) "I just want to know which bits are in the path".  This is used by
dabbrev, which wants to add "a" and "b" here to the abbrevs.  In that
case, "c:/" is not what it wants.

2) "I need to perform some operation on each segment and then put the
file name back together again".  This is what browse-url does -- it
wants to %-encode each segment, but not the "/"s, so it wants to
preserve the absolute/relative distinction.  (But it needs to know that
the "c:/" part is not something to be encoded anyway.)  In that case, I
think we want '("/" "a" "b") and '("c:/" "a" "b").

So perhaps file-name-split should keep the current semantics (but the
bugs identified should be fixed) for 2), but we could add a new function
(or optional parameter) for 1) which would leave "/"/"c:/" off.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-08-13 17:08 ` Mattias Engdegård
@ 2022-09-25 12:06   ` Stefan Kangas
  2022-09-26 10:41     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Kangas @ 2022-09-25 12:06 UTC (permalink / raw)
  To: Mattias Engdegård, Philip Kaludercic; +Cc: Lars Ingebrigtsen, 57102

Mattias Engdegård <mattiase@acm.org> writes:

> The current behaviour of file-name-split is based on a purely textual
> splitting on "/" which isn't as useful as basing it on actual file
> name components. For instance, the root component of a Posix file name
> is "/", not "". Looking at other languages and libraries is very much
> encouraged; they vary a lot in the amount of thought that has gone
> into their design.

I also note that we have `string-split' and `string-join', but
`file-name-split' and `file-name-concat'.  I think `file-name-join'
would be a better name.

We should leave behind an alias if we rename it of course, perhaps
forever.





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-25 12:06   ` Stefan Kangas
@ 2022-09-26 10:41     ` Lars Ingebrigtsen
  2022-09-26 11:14       ` Stefan Kangas
  0 siblings, 1 reply; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-09-26 10:41 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Mattias Engdegård, Philip Kaludercic, 57102

Stefan Kangas <stefankangas@gmail.com> writes:

> I also note that we have `string-split' and `string-join', but
> `file-name-split' and `file-name-concat'.  I think `file-name-join'
> would be a better name.

I think that makes sense...

> We should leave behind an alias if we rename it of course, perhaps
> forever.

An obsolete alias?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-26 10:41     ` Lars Ingebrigtsen
@ 2022-09-26 11:14       ` Stefan Kangas
  2022-09-26 11:59         ` Lars Ingebrigtsen
  2022-09-26 12:07         ` Mattias Engdegård
  0 siblings, 2 replies; 23+ messages in thread
From: Stefan Kangas @ 2022-09-26 11:14 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Mattias Engdegård, Philip Kaludercic, 57102

Lars Ingebrigtsen <larsi@gnus.org> writes:

> Stefan Kangas <stefankangas@gmail.com> writes:
>
>> I also note that we have `string-split' and `string-join', but
>> `file-name-split' and `file-name-concat'.  I think `file-name-join'
>> would be a better name.
>
> I think that makes sense...
>
>> We should leave behind an alias if we rename it of course, perhaps
>> forever.
>
> An obsolete alias?

Sure.

But, thinking more about this, maybe we would want a new function.
Because this is not fun:

    (apply #'file-name-concat (file-name-split "/foo/bar"))
    => "foo/bar"

I'd like it better if it worked like this:

    (file-name-join (file-name-split "/foo/bar"))
    => "/foo/bar"

And then there are the issues Mattias has pointed out.  One small step
in the right direction would be to make sure that:

    (equal (file-name-split "/foo")
           (file-name-split "//foo"))





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-26 11:14       ` Stefan Kangas
@ 2022-09-26 11:59         ` Lars Ingebrigtsen
  2022-09-26 12:07         ` Mattias Engdegård
  1 sibling, 0 replies; 23+ messages in thread
From: Lars Ingebrigtsen @ 2022-09-26 11:59 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Mattias Engdegård, Philip Kaludercic, 57102

Stefan Kangas <stefankangas@gmail.com> writes:

> But, thinking more about this, maybe we would want a new function.
> Because this is not fun:
>
>     (apply #'file-name-concat (file-name-split "/foo/bar"))
>     => "foo/bar"

Uhm...  is that a bug, though?

> I'd like it better if it worked like this:
>
>     (file-name-join (file-name-split "/foo/bar"))
>     => "/foo/bar"
>
> And then there are the issues Mattias has pointed out.  One small step
> in the right direction would be to make sure that:
>
>     (equal (file-name-split "/foo")
>            (file-name-split "//foo"))

That seems like a bug.  :-/





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-26 11:14       ` Stefan Kangas
  2022-09-26 11:59         ` Lars Ingebrigtsen
@ 2022-09-26 12:07         ` Mattias Engdegård
  2022-09-26 12:27           ` Gregory Heytings
  1 sibling, 1 reply; 23+ messages in thread
From: Mattias Engdegård @ 2022-09-26 12:07 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: Lars Ingebrigtsen, 57102, Philip Kaludercic

26 sep. 2022 kl. 13.14 skrev Stefan Kangas <stefankangas@gmail.com>:

> One small step
> in the right direction would be to make sure that:
> 
>    (equal (file-name-split "/foo")
>           (file-name-split "//foo"))

There's the Posix peculiarity that /abc and //abc are potentially distinct, but ///abc should be equivalent to /abc if I understood it right.
Presumably all the Apollo Domain/OS users insist on it.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-26 12:07         ` Mattias Engdegård
@ 2022-09-26 12:27           ` Gregory Heytings
  2022-09-29  3:02             ` Richard Stallman
  0 siblings, 1 reply; 23+ messages in thread
From: Gregory Heytings @ 2022-09-26 12:27 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: Philip Kaludercic, Lars Ingebrigtsen, 57102, Stefan Kangas


>> One small step in the right direction would be to make sure that:
>>
>>    (equal (file-name-split "/foo")
>>           (file-name-split "//foo"))
>
> There's the Posix peculiarity that /abc and //abc are potentially 
> distinct, but ///abc should be equivalent to /abc if I understood it 
> right.
>

Indeed, POSIX says that "A pathname that begins with two successive 
<slash> characters may be interpreted in an implementation-defined manner, 
although more than two leading <slash> characters shall be treated as a 
single <slash> character."





^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-26 12:27           ` Gregory Heytings
@ 2022-09-29  3:02             ` Richard Stallman
  2022-09-29  6:20               ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Stallman @ 2022-09-29  3:02 UTC (permalink / raw)
  To: Gregory Heytings; +Cc: mattiase, philipk, 57102, larsi, stefankangas

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > There's the Posix peculiarity that /abc and //abc are potentially 
  > > distinct, but ///abc should be equivalent to /abc if I understood it 
  > > right.

We don't have to handle them that way in Emacs,
Emacs has its own rules about what double slashes mean.

In the GNU Project we do not "obey" standards such as POSIX -- we
follow them when that seems good for users, and we diverge from them
when there is a reason to.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#57102: 29.0.50; Peculiar file-name-split edge case
  2022-09-29  3:02             ` Richard Stallman
@ 2022-09-29  6:20               ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2022-09-29  6:20 UTC (permalink / raw)
  To: rms; +Cc: philipk, 57102, mattiase, gregory, stefankangas, larsi

> Cc: mattiase@acm.org, philipk@posteo.net, 57102@debbugs.gnu.org, larsi@gnus.org,
>  stefankangas@gmail.com
> From: Richard Stallman <rms@gnu.org>
> Date: Wed, 28 Sep 2022 23:02:29 -0400
> 
>   > > There's the Posix peculiarity that /abc and //abc are potentially 
>   > > distinct, but ///abc should be equivalent to /abc if I understood it 
>   > > right.
> 
> We don't have to handle them that way in Emacs,
> Emacs has its own rules about what double slashes mean.
> 
> In the GNU Project we do not "obey" standards such as POSIX -- we
> follow them when that seems good for users, and we diverge from them
> when there is a reason to.

That is true, but in this case following Posix _is_ good for the
users, since these file names are used in Real Life.  The Emacs's own
rules about multiple consecutive slashes don't contradict with the
above, because the Emacs rules don't have to apply when the slashes
are at the beginning of a file name.






^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-09-29  6:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-10  8:24 bug#57102: 29.0.50; Peculiar file-name-split edge case Philip Kaludercic
2022-08-12 15:35 ` Lars Ingebrigtsen
2022-08-12 15:56   ` Philip Kaludercic
2022-08-12 15:59     ` Lars Ingebrigtsen
2022-08-12 16:29       ` Philip Kaludercic
2022-08-13 11:44         ` Lars Ingebrigtsen
2022-08-13 13:24           ` Philip Kaludercic
2022-08-15  5:54             ` Lars Ingebrigtsen
2022-08-15 11:28               ` Lars Ingebrigtsen
2022-08-15 15:57                 ` Philip Kaludercic
2022-08-15 11:35               ` Eli Zaretskii
2022-08-17 10:55                 ` Lars Ingebrigtsen
2022-08-13 18:06           ` Augusto Stoffel
2022-08-14  6:24             ` Philip Kaludercic
2022-08-13 17:08 ` Mattias Engdegård
2022-09-25 12:06   ` Stefan Kangas
2022-09-26 10:41     ` Lars Ingebrigtsen
2022-09-26 11:14       ` Stefan Kangas
2022-09-26 11:59         ` Lars Ingebrigtsen
2022-09-26 12:07         ` Mattias Engdegård
2022-09-26 12:27           ` Gregory Heytings
2022-09-29  3:02             ` Richard Stallman
2022-09-29  6:20               ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).