unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
@ 2021-06-07 13:32 Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-07 14:08 ` Lars Ingebrigtsen
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-07 13:32 UTC (permalink / raw)
  To: 48902

This is my first Emacs bug report.

DESCRIPTION:

I use EMMS to listen to music, and recently, I have noticed that the
EMMS Browser does not show covers for some of my albums. Today, I
decided to investigate the problem, and I found the following.

REPRODUCTION STEPS:

1. Create a directory called "Dubmood - C’etait Mieux En Rda (DATA001)".
2. Copy a picture, say "cover.png" into the newly created directory.
3. Launch "emacs -Q".
3. Open the newly created directory in Dired.
4. Press RET on the picture.

EXPECTED RESULTS:

Emacs shows the picture.

ACTUAL RESULTS:

Emacs shows an empty window.

NOTES:

The same problem applies to a folder called "Ultrasyd - L'Épidemie
Dansante". It seems that both an apostrophe (') and a backtick (`)
cause problems in both EMMS and Dired.

-- Rudy

In GNU Emacs 28.0.50 (build 2, x86_64-apple-darwin20.3.0, NS appkit-2022.30 Version 11.2.3 (Build 20D91))
 of 2021-06-07 built on Workstation.local
Repository revision: 82ccc3afcf9ed1f8b22ed5634e788879cd1af279
Repository branch: master
Windowing system distributor 'Apple', version 10.3.2022
System Description:  macOS 11.2.3

Configured using:
 'configure --with-ns --with-native-compilation'

Configured features:
ACL DBUS GIF GLIB GMP GNUTLS JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY KQUEUE NS PDUMPER PNG RSVG THREADS TIFF TOOLKIT_SCROLL_BARS XIM
ZLIB

Important settings:
  value of $LANG: en_SK.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Group

Minor modes in effect:
  gnus-undo-mode: t
  shell-dirtrack-mode: t
  dumb-jump-mode: t
  global-hl-todo-mode: t
  show-paren-mode: t
  global-flycheck-mode: t
  recentf-mode: t
  counsel-projectile-mode: t
  projectile-mode: t
  global-subword-mode: t
  subword-mode: t
  all-the-icons-ivy-rich-mode: t
  ivy-prescient-mode: t
  prescient-persist-mode: t
  ivy-rich-project-root-cache-mode: t
  ivy-rich-mode: t
  ivy-mode: t
  guru-global-mode: t
  guru-mode: t
  which-key-mode: t
  save-place-mode: t
  global-auto-revert-mode: t
  delete-selection-mode: t
  override-global-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  size-indication-mode: t
  column-number-mode: t
  line-number-mode: t
  global-visual-line-mode: t
  visual-line-mode: t
  transient-mark-mode: t

Load-path shadows:
/Users/salutis/.emacs.d/elpa/transient-20210530.2252/transient hides /Users/salutis/Repositories/emacs/nextstep/Emacs.app/Contents/Resources/lisp/transient

Features:
(shadow emacsbug sendmail smiley gnus-cite mail-extr gnus-async
gnus-bcklg sort gnus-ml mm-archive gnutls network-stream url-http url-gw
nsm url-cache url-auth qp nnrss mm-url nndraft nnmh nnmaildir nnfolder
nnnil gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg
gnus-art mm-uu mml2015 mm-view mml-smime smime dig nntp gnus-cache
gnus-sum shr kinsoku svg dom gnus-group gnus-undo gnus-start gnus-dbus
dbus xml gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo gnus-spec
gnus-int gnus-range message rmc puny rfc822 mml mml-sec epa derived epg
epg-config mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader gnus-win em-unix em-term em-script em-prompt em-ls
em-hist em-pred em-glob em-dirs esh-var em-cmpl em-basic em-banner
em-alias esh-mode bookmark pp writegood-mode two-column password-store
auth-source-pass with-editor vterm face-remap vterm-module term/xterm
xterm term disp-table ehelp eshell esh-cmd esh-ext esh-opt esh-proc
esh-io esh-arg esh-module esh-groups esh-util server amx vc-git
diff-mode vc-dispatcher ffap tramp tramp-loaddefs trampver
tramp-integration files-x tramp-compat shell parse-time iso8601 ls-lisp
pcase flyspell ispell dumb-jump popup hl-todo gnus nnheader gnus-util
rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums mail-utils mm-util
mail-prsvr cus-load paren flycheck recentf tree-widget wid-edit
counsel-projectile ag vc-svn find-dired s dash ripgrep projectile grep
ibuf-ext ibuffer ibuffer-loaddefs thingatpt cap-words superword subword
all-the-icons-ivy-rich all-the-icons all-the-icons-faces data-material
data-weathericons data-octicons data-fileicons data-faicons
data-alltheicons ivy-prescient prescient counsel xdg xref project dired
dired-loaddefs compile text-property-search swiper ivy-rich ivy flx
ivy-faces ivy-overlay colir color guru-mode which-key
modus-vivendi-theme modus-operandi-theme modus-themes comp comp-cstr
warnings rx saveplace autorevert filenotify delsel exec-path-from-shell
diminish use-package-ensure-system-package system-packages use-package
use-package-ensure use-package-delight use-package-diminish
use-package-bind-key bind-key use-package-core finder-inf cl-extra
help-mode org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro
org-footnote org-src ob-comint org-pcomplete pcomplete comint ansi-color
ring org-list org-faces org-entities time-date noutline outline
easy-mmode org-version ob-emacs-lisp ob-core ob-eval org-table ol
org-keys org-compat advice org-macs org-loaddefs format-spec find-func
cal-menu calendar cal-loaddefs tex-site edmacro kmacro info package
browse-url url url-proxy url-privacy url-expand url-methods url-history
url-cookie url-domsuf url-util mailcap url-handlers url-parse
auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache json subr-x map url-vars seq byte-opt gv bytecomp
byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc electric
uniquify ediff-hook vc-hooks lisp-float-type mwheel term/ns-win ns-win
ucs-normalize mule-util term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch
easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
kqueue cocoa ns lcms2 multi-tty make-network-process native-compile
emacs)

Memory information:
((conses 16 862734 56170)
 (symbols 48 42507 4)
 (strings 32 248048 23357)
 (string-bytes 1 10036170)
 (vectors 16 87234)
 (vector-slots 8 2207490 61351)
 (floats 8 1197 526)
 (intervals 56 735 540)
 (buffers 992 25))





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 13:32 bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2021-06-07 14:08 ` Lars Ingebrigtsen
  2021-06-07 14:15   ` Eli Zaretskii
  2021-06-07 14:24   ` Lars Ingebrigtsen
  2021-06-07 14:13 ` Eli Zaretskii
  2021-06-08 10:39 ` naofumi
  2 siblings, 2 replies; 29+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-07 14:08 UTC (permalink / raw)
  To: 48902; +Cc: Rudolf Adamkovič

Rudolf Adamkovič via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

> 1. Create a directory called "Dubmood - C’etait Mieux En Rda (DATA001)".
> 2. Copy a picture, say "cover.png" into the newly created directory.
> 3. Launch "emacs -Q".
> 3. Open the newly created directory in Dired.
> 4. Press RET on the picture.
>
> EXPECTED RESULTS:
>
> Emacs shows the picture.
>
> ACTUAL RESULTS:
>
> Emacs shows an empty window.

I'm unable to reproduce this problem on Debian/bullseye...

> In GNU Emacs 28.0.50 (build 2, x86_64-apple-darwin20.3.0, NS appkit-2022.30 Version 11.2.3 (Build 20D91))
>  of 2021-06-07 built on Workstation.local

... so perhaps it's only a problem for the Macos build?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 13:32 bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-07 14:08 ` Lars Ingebrigtsen
@ 2021-06-07 14:13 ` Eli Zaretskii
  2021-06-08 22:21   ` Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-08 10:39 ` naofumi
  2 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-07 14:13 UTC (permalink / raw)
  To: Rudolf Adamkovič; +Cc: 48902

> Date: Mon, 07 Jun 2021 15:32:10 +0200
> From:  Rudolf Adamkovič via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> 1. Create a directory called "Dubmood - C’etait Mieux En Rda (DATA001)".
> 2. Copy a picture, say "cover.png" into the newly created directory.
> 3. Launch "emacs -Q".
> 3. Open the newly created directory in Dired.
> 4. Press RET on the picture.
> 
> EXPECTED RESULTS:
> 
> Emacs shows the picture.
> 
> ACTUAL RESULTS:
> 
> Emacs shows an empty window.
> 
> NOTES:
> 
> The same problem applies to a folder called "Ultrasyd - L'Épidemie
> Dansante". It seems that both an apostrophe (') and a backtick (`)
> cause problems in both EMMS and Dired.

When you visit that directory in Dired, what is the value of
buffer-file-coding-system in the Dired buffer?

Also, you mention backtick, but there's no such character in the
example you show.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 14:08 ` Lars Ingebrigtsen
@ 2021-06-07 14:15   ` Eli Zaretskii
  2021-06-07 14:24   ` Lars Ingebrigtsen
  1 sibling, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-07 14:15 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 48902, salutis

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Mon, 07 Jun 2021 16:08:31 +0200
> Cc: Rudolf Adamkovič <salutis@me.com>
> 
> I'm unable to reproduce this problem on Debian/bullseye...
> 
> > In GNU Emacs 28.0.50 (build 2, x86_64-apple-darwin20.3.0, NS appkit-2022.30 Version 11.2.3 (Build 20D91))
> >  of 2021-06-07 built on Workstation.local
> 
> ... so perhaps it's only a problem for the Macos build?

I suspect it has to do with the special way macOS encodes non-ASCII
file names.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 14:08 ` Lars Ingebrigtsen
  2021-06-07 14:15   ` Eli Zaretskii
@ 2021-06-07 14:24   ` Lars Ingebrigtsen
  2021-06-07 14:36     ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-07 14:24 UTC (permalink / raw)
  To: 48902; +Cc: Alan Third, Rudolf Adamkovič

Lars Ingebrigtsen <larsi@gnus.org> writes:

> ... so perhaps it's only a problem for the Macos build?

Yup; the current trunk isn't able to view images under Macos from dired
if the path contains a non-ASCII char.

Test case:

larsi@emkay trunk % mkdir /tmp/fóo 
larsi@emkay trunk % cp etc/images/gnus/gnus.png /tmp/fóo/
larsi@emkay trunk % ./src/emacs /tmp/fóo/

And then RET the file to test.  I then get an error message about
failed output type 'public.tiff' on the terminal.

I've added Alan to the CCs.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 14:24   ` Lars Ingebrigtsen
@ 2021-06-07 14:36     ` Eli Zaretskii
  0 siblings, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-07 14:36 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 48902, alan, salutis

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Mon, 07 Jun 2021 16:24:44 +0200
> Cc: Alan Third <alan@idiocy.org>,
>  Rudolf Adamkovič <salutis@me.com>
> 
> larsi@emkay trunk % mkdir /tmp/fóo 
> larsi@emkay trunk % cp etc/images/gnus/gnus.png /tmp/fóo/
> larsi@emkay trunk % ./src/emacs /tmp/fóo/
> 
> And then RET the file to test.  I then get an error message about
> failed output type 'public.tiff' on the terminal.

What happens if you type

  larsi@emkay trunk % ./src/emacs /tmp/fóo/gnus.png

does it show the PNG image then?

And what are the values of file-name-coding-system and
default-file-name-coding-system?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 13:32 bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2021-06-07 14:08 ` Lars Ingebrigtsen
  2021-06-07 14:13 ` Eli Zaretskii
@ 2021-06-08 10:39 ` naofumi
  2021-06-08 11:57   ` Lars Ingebrigtsen
  2 siblings, 1 reply; 29+ messages in thread
From: naofumi @ 2021-06-08 10:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 48902, Rudolf Adamkovič

[-- Attachment #1: Type: text/plain, Size: 2110 bytes --]

On Monday, June 07, 2021 16:15 CEST, Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > ... so perhaps it's only a problem for the Macos build?
>
> I suspect it has to do with the special way macOS encodes non-ASCII
> file names.
>

I found the same issue with images which has Japanese filename on macOS.
Message buffer says:
------------------------------------------------------------------------
Unable to load image (image :type png :file /Users/naofumi/_git/git.sv.gnu.org/emacs_png/いーまっくす.png :scale 1 :max-width 480 :max-height 781 :format nil :transform-smoothing t) [5 times]
------------------------------------------------------------------------

At least, attached small patch could fix this.

attachments:
0001-Fix-to-show-images-with-non-ascii-filename-on-macOS.patch
ns_load_image-error-with-non-ascii-filename.png
revert-filename-NSString-in-nsimage.png
emacs_png.tar.gz

This [EmacsImage allocInitFromFile:] change was introduced by the
following commit:
------------------------------------------------------------------------
commit 747a923b9a35533f98573ad5b01fccf096195079
Author: Alan Third <alan@idiocy.org>
Date:   Tue Dec 22 23:28:25 2020 +0000

    Use new NSString lisp methods

    * src/nsfont.m (ns_otf_to_script):
    (ns_registry_to_script):
    (ns_get_req_script): Use NSString conversion methods.
    * src/nsimage.m ([EmacsImage allocInitFromFile:]): Use NSString
    conversion methods.
    * src/nsmenu.m (ns_menu_show): Use NSString conversion methods.    * src/nsselect.m (symbol_to_nsstring):
    (ns_string_to_pasteboard_internal): Use NSString conversion methods.
    * src/nsterm.m (ns_term_init):
    ([EmacsView initFrameFromEmacs:]): Use NSString conversion methods.    * src/nsxwidget.m (nsxwidget_webkit_uri):
    (nsxwidget_webkit_title):
    (js_to_lisp): Use NSString conversion methods.
    (build_string_with_nsstr): Functionality replaced by NSString    extensions.

------------------------------------------------------------------------

Regards,
--Naofumi

[-- Attachment #2: emacs_png.tar.gz --]
[-- Type: application/x-gzip, Size: 14482 bytes --]

[-- Attachment #3: ns_load_image-error-with-non-ascii-filename.png --]
[-- Type: image/png, Size: 366374 bytes --]

[-- Attachment #4: 0001-Fix-to-show-images-with-non-ascii-filename-on-macOS.patch --]
[-- Type: application/octet-stream, Size: 921 bytes --]

From bb9250b67bf887224059c337d117a720fcf79dd7 Mon Sep 17 00:00:00 2001
From: Naofumi Yasufuku <naofumi@yasufuku.dev>
Date: Tue, 8 Jun 2021 19:04:24 +0900
Subject: [PATCH] Fix to show images with non-ascii filename on macOS

* src/nsimage.m ([EmacsImage allocInitFromFile:]): Revert filename
NSString to use stringWithUTF8String instead of stringWithLispString.
(Bug#48902)
---
 src/nsimage.m | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/nsimage.m b/src/nsimage.m
index fa81a41a51..8c7a3d9a09 100644
--- a/src/nsimage.m
+++ b/src/nsimage.m
@@ -262,7 +262,7 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
   found = ENCODE_FILE (found);
 
   image = [[EmacsImage alloc] initByReferencingFile:
-                     [NSString stringWithLispString: found]];
+                     [NSString stringWithUTF8String: SSDATA (found)]];
 
   image->bmRep = nil;
 #ifdef NS_IMPL_COCOA
-- 
2.31.1


[-- Attachment #5: revert-filename-NSString-in-nsimage.png --]
[-- Type: image/png, Size: 232034 bytes --]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 10:39 ` naofumi
@ 2021-06-08 11:57   ` Lars Ingebrigtsen
  2021-06-08 12:12     ` Alan Third
  0 siblings, 1 reply; 29+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-08 11:57 UTC (permalink / raw)
  To: naofumi; +Cc: 48902, Rudolf Adamkovič, Alan Third

naofumi@yasufuku.dev <naofumi@yasufuku.dev> writes:

> diff --git a/src/nsimage.m b/src/nsimage.m
> index fa81a41a51..8c7a3d9a09 100644
> --- a/src/nsimage.m
> +++ b/src/nsimage.m
> @@ -262,7 +262,7 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
>    found = ENCODE_FILE (found);
>  
>    image = [[EmacsImage alloc] initByReferencingFile:
> -                     [NSString stringWithLispString: found]];
> +                     [NSString stringWithUTF8String: SSDATA (found)]];

Hm...  I'm not very familiar at all with the Objective C code here...
but shouldn't "found" here be a Lisp string so that stringWithLispString
would do the right thing?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 11:57   ` Lars Ingebrigtsen
@ 2021-06-08 12:12     ` Alan Third
  2021-06-08 12:14       ` Lars Ingebrigtsen
  2021-06-08 12:37       ` Eli Zaretskii
  0 siblings, 2 replies; 29+ messages in thread
From: Alan Third @ 2021-06-08 12:12 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 48902, Rudolf Adamkovič, naofumi

On Tue, Jun 08, 2021 at 01:57:01PM +0200, Lars Ingebrigtsen wrote:
> naofumi@yasufuku.dev <naofumi@yasufuku.dev> writes:
> 
> > diff --git a/src/nsimage.m b/src/nsimage.m
> > index fa81a41a51..8c7a3d9a09 100644
> > --- a/src/nsimage.m
> > +++ b/src/nsimage.m
> > @@ -262,7 +262,7 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
> >    found = ENCODE_FILE (found);
> >  
> >    image = [[EmacsImage alloc] initByReferencingFile:
> > -                     [NSString stringWithLispString: found]];
> > +                     [NSString stringWithUTF8String: SSDATA (found)]];
> 
> Hm...  I'm not very familiar at all with the Objective C code here...
> but shouldn't "found" here be a Lisp string so that stringWithLispString
> would do the right thing?

It's always possible that stringWithLispString isn't doing the right
thing. It's implemented at nsfns.m:3026. I know almost nothing about
UTF8/UTF16 so while it looks like it's doing the right thing to me, I
could be entirely wrong.
-- 
Alan Third





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 12:12     ` Alan Third
@ 2021-06-08 12:14       ` Lars Ingebrigtsen
  2021-06-08 17:45         ` Mattias Engdegård
  2021-06-08 18:17         ` Mattias Engdegård
  2021-06-08 12:37       ` Eli Zaretskii
  1 sibling, 2 replies; 29+ messages in thread
From: Lars Ingebrigtsen @ 2021-06-08 12:14 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, Mattias Engdegård, Rudolf Adamkovič, naofumi

Alan Third <alan@idiocy.org> writes:

>> >    image = [[EmacsImage alloc] initByReferencingFile:
>> > -                     [NSString stringWithLispString: found]];
>> > +                     [NSString stringWithUTF8String: SSDATA (found)]];
>> 
>> Hm...  I'm not very familiar at all with the Objective C code here...
>> but shouldn't "found" here be a Lisp string so that stringWithLispString
>> would do the right thing?
>
> It's always possible that stringWithLispString isn't doing the right
> thing. It's implemented at nsfns.m:3026. I know almost nothing about
> UTF8/UTF16 so while it looks like it's doing the right thing to me, I
> could be entirely wrong.

Right -- and that was written by Mattias, so I've added him to the CCs.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 12:12     ` Alan Third
  2021-06-08 12:14       ` Lars Ingebrigtsen
@ 2021-06-08 12:37       ` Eli Zaretskii
  2021-06-08 13:00         ` Alan Third
  1 sibling, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-08 12:37 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, larsi, salutis, naofumi

> Date: Tue, 8 Jun 2021 13:12:56 +0100
> From: Alan Third <alan@idiocy.org>
> Cc: 48902@debbugs.gnu.org, Rudolf Adamkovič <salutis@me.com>,
>  naofumi@yasufuku.dev
> 
> On Tue, Jun 08, 2021 at 01:57:01PM +0200, Lars Ingebrigtsen wrote:
> > naofumi@yasufuku.dev <naofumi@yasufuku.dev> writes:
> > 
> > > diff --git a/src/nsimage.m b/src/nsimage.m
> > > index fa81a41a51..8c7a3d9a09 100644
> > > --- a/src/nsimage.m
> > > +++ b/src/nsimage.m
> > > @@ -262,7 +262,7 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
> > >    found = ENCODE_FILE (found);
> > >  
> > >    image = [[EmacsImage alloc] initByReferencingFile:
> > > -                     [NSString stringWithLispString: found]];
> > > +                     [NSString stringWithUTF8String: SSDATA (found)]];
> > 
> > Hm...  I'm not very familiar at all with the Objective C code here...
> > but shouldn't "found" here be a Lisp string so that stringWithLispString
> > would do the right thing?
> 
> It's always possible that stringWithLispString isn't doing the right
> thing. It's implemented at nsfns.m:3026. I know almost nothing about
> UTF8/UTF16 so while it looks like it's doing the right thing to me, I
> could be entirely wrong.

It looks like stringWithLispString encodes into UTF-16?  But file
names on macOS should be encoded in UTF-8, and in fact
allocInitFromFile already does TRT when it calls ENCODE_FILE, just
before stringWithLispString is called.  So I think the patch is
correct.

(UTF-16 encoding on macOS is for ENCODE_SYSTEM, right?)





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 12:37       ` Eli Zaretskii
@ 2021-06-08 13:00         ` Alan Third
  2021-06-08 14:02           ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Third @ 2021-06-08 13:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 48902, larsi, salutis, naofumi

On Tue, Jun 08, 2021 at 03:37:51PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 8 Jun 2021 13:12:56 +0100
> > From: Alan Third <alan@idiocy.org>
> > Cc: 48902@debbugs.gnu.org, Rudolf Adamkovič <salutis@me.com>,
> >  naofumi@yasufuku.dev
> > 
> > On Tue, Jun 08, 2021 at 01:57:01PM +0200, Lars Ingebrigtsen wrote:
> > > naofumi@yasufuku.dev <naofumi@yasufuku.dev> writes:
> > > 
> > > > diff --git a/src/nsimage.m b/src/nsimage.m
> > > > index fa81a41a51..8c7a3d9a09 100644
> > > > --- a/src/nsimage.m
> > > > +++ b/src/nsimage.m
> > > > @@ -262,7 +262,7 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
> > > >    found = ENCODE_FILE (found);
> > > >  
> > > >    image = [[EmacsImage alloc] initByReferencingFile:
> > > > -                     [NSString stringWithLispString: found]];
> > > > +                     [NSString stringWithUTF8String: SSDATA (found)]];
> > > 
> > > Hm...  I'm not very familiar at all with the Objective C code here...
> > > but shouldn't "found" here be a Lisp string so that stringWithLispString
> > > would do the right thing?
> > 
> > It's always possible that stringWithLispString isn't doing the right
> > thing. It's implemented at nsfns.m:3026. I know almost nothing about
> > UTF8/UTF16 so while it looks like it's doing the right thing to me, I
> > could be entirely wrong.
> 
> It looks like stringWithLispString encodes into UTF-16?  But file
> names on macOS should be encoded in UTF-8, and in fact
> allocInitFromFile already does TRT when it calls ENCODE_FILE, just
> before stringWithLispString is called.  So I think the patch is
> correct.
> 
> (UTF-16 encoding on macOS is for ENCODE_SYSTEM, right?)

I think you're right. But confusingly initByReferencingFile takes an
NSString which is a UTF-16 format string, so if I remove all the calls
to ENCODE_FILE, stringWithLispString works fine.

I guess we just need to make a note that stringWithLispString cannot
handle UTF-8 encoded filenames, unless someone has a smarter solution.

-- 
Alan Third





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 13:00         ` Alan Third
@ 2021-06-08 14:02           ` Eli Zaretskii
  2021-06-08 16:19             ` Alan Third
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-08 14:02 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, larsi, salutis, naofumi

> Date: Tue, 8 Jun 2021 14:00:17 +0100
> From: Alan Third <alan@idiocy.org>
> Cc: larsi@gnus.org, naofumi@yasufuku.dev, 48902@debbugs.gnu.org,
> 	salutis@me.com
> 
> > It looks like stringWithLispString encodes into UTF-16?  But file
> > names on macOS should be encoded in UTF-8, and in fact
> > allocInitFromFile already does TRT when it calls ENCODE_FILE, just
> > before stringWithLispString is called.  So I think the patch is
> > correct.
> > 
> > (UTF-16 encoding on macOS is for ENCODE_SYSTEM, right?)
> 
> I think you're right. But confusingly initByReferencingFile takes an
> NSString which is a UTF-16 format string, so if I remove all the calls
> to ENCODE_FILE, stringWithLispString works fine.
> 
> I guess we just need to make a note that stringWithLispString cannot
> handle UTF-8 encoded filenames, unless someone has a smarter solution.

If you do need a UTF-16 encoded string, then instead of ENCODE_FILE
you can call code_convert_string_norecord with Qutf_16.  There's no
need to invent or use a private UTF-16 encoder there, and you also get
rid of an unnecessary extra UTF-8 encoding as a bonus.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 14:02           ` Eli Zaretskii
@ 2021-06-08 16:19             ` Alan Third
  2021-06-08 18:09               ` Eli Zaretskii
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Third @ 2021-06-08 16:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 48902, larsi, salutis, naofumi

On Tue, Jun 08, 2021 at 05:02:25PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 8 Jun 2021 14:00:17 +0100
> > From: Alan Third <alan@idiocy.org>
> > Cc: larsi@gnus.org, naofumi@yasufuku.dev, 48902@debbugs.gnu.org,
> > 	salutis@me.com
> > 
> > > It looks like stringWithLispString encodes into UTF-16?  But file
> > > names on macOS should be encoded in UTF-8, and in fact
> > > allocInitFromFile already does TRT when it calls ENCODE_FILE, just
> > > before stringWithLispString is called.  So I think the patch is
> > > correct.
> > > 
> > > (UTF-16 encoding on macOS is for ENCODE_SYSTEM, right?)
> > 
> > I think you're right. But confusingly initByReferencingFile takes an
> > NSString which is a UTF-16 format string, so if I remove all the calls
> > to ENCODE_FILE, stringWithLispString works fine.
> > 
> > I guess we just need to make a note that stringWithLispString cannot
> > handle UTF-8 encoded filenames, unless someone has a smarter solution.
> 
> If you do need a UTF-16 encoded string, then instead of ENCODE_FILE
> you can call code_convert_string_norecord with Qutf_16.  There's no
> need to invent or use a private UTF-16 encoder there, and you also get
> rid of an unnecessary extra UTF-8 encoding as a bonus.

In this case the call to ENCODE_FILE in allocInitFromFile is actually
redundant because image_find_image_fd already calls ENCODE_FILE on the
filename before passing it back. So we get a UTF-8 string no matter
what.

NSString can read in almost anything, and Mattias extended it to read
in multibyte (and ascii) lisp strings, so we don't need a UTF-16 input
specifically. It would probably be nice if NSString was also able to
recognise that a lisp string is UTF-8 and handle that itself, but I
don't think that's really possible, unless we make the assumption that
any unibyte string it's passed will already be ascii or UTF-8.

I don't know if that's a reasonable assumption.

-- 
Alan Third





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 12:14       ` Lars Ingebrigtsen
@ 2021-06-08 17:45         ` Mattias Engdegård
  2021-06-08 18:18           ` Eli Zaretskii
  2021-06-08 19:10           ` Alan Third
  2021-06-08 18:17         ` Mattias Engdegård
  1 sibling, 2 replies; 29+ messages in thread
From: Mattias Engdegård @ 2021-06-08 17:45 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 48902, Alan Third, Rudolf Adamkovič, naofumi

8 juni 2021 kl. 14.14 skrev Lars Ingebrigtsen <larsi@gnus.org>:

>> It's always possible that stringWithLispString isn't doing the right
>> thing. It's implemented at nsfns.m:3026. I know almost nothing about
>> UTF8/UTF16 so while it looks like it's doing the right thing to me, I
>> could be entirely wrong.
> 
> Right -- and that was written by Mattias, so I've added him to the CCs.

Thank you, but stringWithLispString: is actually fine, unless you count its inability to produce useful results from wrong input!

The image code seems to be quite confused with respect to whether the file names being passed around are in encoded form. Until recently it seems to have worked by pure chance since as long as the file name encoding is UTF-8 and the low-level code accesses the raw string data we do get the same result, but at least since 747a923b9a35 that's no longer the case.

Concretely, we have:

1. image_find_image_file probably expects its `file` argument to be non-encoded, but the string it returns is always encoded.

2. native_image_load calls image_find_image_file and passes its return value to ns_load_image.

3. ns_load_image calls [EmacsImage allocInitFromFile:] with its file argument.

4. [EmacsImage allocInitFromFile: file] can apparently be called with both a non-encoded or an encoded `file` argument (clearly not ideal), and it does:

  found = image_find_image_file (file);
  // This is dubious when `file` is already encoded.

  found = ENCODE_FILE (found);
  // This is completely useless since `found` is already encoded! Apparently ENCODE_FILE is idempotent, at least on macOS...

  [NSString stringWithLispString: found]
  // This produces nonsense as `found` is a string of raw bytes, so any Unicode will be converted to stretches of U+FFFD REPLACEMENT CHAR.
  [NSString stringWithLispString: file]
  // Same problem again, with a different variable.

The quick fix of reverting to stringWithUTF8String: will work, but the real problem is that we have no control of the encodedness of lisp strings being passed around. Comments would help, and I'd even go as far to suggest

typedef struct { Lisp_Object string; } encoded_file_name_t;

with the appropriate constructors and accessors, to get C's static type checking to work for us.






^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 16:19             ` Alan Third
@ 2021-06-08 18:09               ` Eli Zaretskii
  2021-06-08 19:24                 ` Alan Third
  0 siblings, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-08 18:09 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, larsi, salutis, naofumi

> Date: Tue, 8 Jun 2021 17:19:44 +0100
> From: Alan Third <alan@idiocy.org>
> Cc: larsi@gnus.org, naofumi@yasufuku.dev, 48902@debbugs.gnu.org,
> 	salutis@me.com
> 
> In this case the call to ENCODE_FILE in allocInitFromFile is actually
> redundant because image_find_image_fd already calls ENCODE_FILE on the
> filename before passing it back. So we get a UTF-8 string no matter
> what.

Then why was the code re-encoding t in UTF-16?  A bug?

> NSString can read in almost anything, and Mattias extended it to read
> in multibyte (and ascii) lisp strings, so we don't need a UTF-16 input
> specifically. It would probably be nice if NSString was also able to
> recognise that a lisp string is UTF-8 and handle that itself, but I
> don't think that's really possible, unless we make the assumption that
> any unibyte string it's passed will already be ascii or UTF-8.
> 
> I don't know if that's a reasonable assumption.

Any file name passed through ENCODE_FILE should be in UTF-8 on macOS,
as I understand that's how the macOS filesystems work.  Am I mistaken?
Can the value of default-file-name-coding-system on macOS be anything
other than UTF-8?





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 12:14       ` Lars Ingebrigtsen
  2021-06-08 17:45         ` Mattias Engdegård
@ 2021-06-08 18:17         ` Mattias Engdegård
  1 sibling, 0 replies; 29+ messages in thread
From: Mattias Engdegård @ 2021-06-08 18:17 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, Lars Ingebrigtsen, Rudolf Adamkovič, naofumi

[Replying to a couple of previous messages]

> I guess we just need to make a note that stringWithLispString cannot
> handle UTF-8 encoded filenames, unless someone has a smarter solution.

This is not restricted to file names but yes, we should definitely clarify that it expects Unicode (or ASCII) strings as input, since raw bytes are interpreted as, well, raw bytes.

> NSString can read in almost anything, and Mattias extended it to read
> in multibyte (and ascii) lisp strings, so we don't need a UTF-16 input
> specifically. It would probably be nice if NSString was also able to
> recognise that a lisp string is UTF-8 and handle that itself, but I
> don't think that's really possible, unless we make the assumption that
> any unibyte string it's passed will already be ascii or UTF-8.
> 
> I don't know if that's a reasonable assumption.

No, I don't think it's reasonable either -- we should not put dwimmery into our string conversion logic just because we are too sloppy to document whether an argument or return value is encoded or not. stringWithLispString: appears to work as designed.






^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 17:45         ` Mattias Engdegård
@ 2021-06-08 18:18           ` Eli Zaretskii
  2021-06-08 19:13             ` naofumi
  2021-06-08 19:10           ` Alan Third
  1 sibling, 1 reply; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-08 18:18 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 48902, larsi, salutis, alan, naofumi

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Tue, 8 Jun 2021 19:45:22 +0200
> Cc: 48902@debbugs.gnu.org, Alan Third <alan@idiocy.org>,
>  Rudolf Adamkovič <salutis@me.com>, naofumi@yasufuku.dev
> 
> The quick fix of reverting to stringWithUTF8String: will work, but the real problem is that we have no control of the encodedness of lisp strings being passed around.

The usual technique in these cases is to keep Lisp strings unencoded,
encode them when passing to the low-level C functions, and pass to
those functions only the pointer to the string's data.

In those rare cases when you really need to pass a Lisp string that is
an encoded file name, call the argument "encoded_file" or somesuch.
But these cases should be rare exceptions.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 17:45         ` Mattias Engdegård
  2021-06-08 18:18           ` Eli Zaretskii
@ 2021-06-08 19:10           ` Alan Third
  2021-06-08 19:52             ` Mattias Engdegård
  1 sibling, 1 reply; 29+ messages in thread
From: Alan Third @ 2021-06-08 19:10 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 48902, Lars Ingebrigtsen, Rudolf Adamkovič, naofumi

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

On Tue, Jun 08, 2021 at 07:45:22PM +0200, Mattias Engdegård wrote:
> 8 juni 2021 kl. 14.14 skrev Lars Ingebrigtsen <larsi@gnus.org>:
> 
> >> It's always possible that stringWithLispString isn't doing the right
> >> thing. It's implemented at nsfns.m:3026. I know almost nothing about
> >> UTF8/UTF16 so while it looks like it's doing the right thing to me, I
> >> could be entirely wrong.
> > 
> > Right -- and that was written by Mattias, so I've added him to the CCs.
> 
> Thank you, but stringWithLispString: is actually fine, unless you
> count its inability to produce useful results from wrong input!

In my defence it wasn't entirely clear to me that a lisp string
returned from ENCODE_FILE was incompatible with stringWithLispString. ;)

> The image code seems to be quite confused with respect to whether
> the file names being passed around are in encoded form. Until
> recently it seems to have worked by pure chance since as long as the
> file name encoding is UTF-8 and the low-level code accesses the raw
> string data we do get the same result, but at least since
> 747a923b9a35 that's no longer the case.

Hmm, and as you point out we use "file" further down and it may or may
not be encoded, but will probably have the same contents as found,
which we know is encoded. Plus it's setting the "name" field in the
image, which we probably want to keep as uniform as possible for
caching purposes but is otherwise irrelevant.

I think the attached should solve this.
-- 
Alan Third

[-- Attachment #2: 0001-Fix-image-filename-encoding-issues-in-NS-port-bug-48.patch --]
[-- Type: text/x-diff, Size: 2175 bytes --]

From c2902846c57c55576b1612bf11afaf240994ca70 Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Tue, 8 Jun 2021 20:08:34 +0100
Subject: [PATCH] Fix image filename encoding issues in NS port (bug#48902)

* src/nsfns.m ([NSString stringWithLispString:]): Clarify usage in comment.
* src/nsimage.m ([EmacsImage allocInitFromFile:]): Clarify that the
filename is UTF-8 encoded and handle it correctly.
---
 src/nsfns.m   |  3 ++-
 src/nsimage.m | 11 +++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/nsfns.m b/src/nsfns.m
index d14f7b51ea..98801d8526 100644
--- a/src/nsfns.m
+++ b/src/nsfns.m
@@ -3024,7 +3024,8 @@ - (NSString *)panel: (id)sender userEnteredFilename: (NSString *)filename
 }
 
 @implementation NSString (EmacsString)
-/* Make an NSString from a Lisp string.  */
+/* Make an NSString from a Lisp string.  STRING must not be in an
+   encoded form (e.g. UTF-8).  */
 + (NSString *)stringWithLispString:(Lisp_Object)string
 {
   /* Shortcut for the common case.  */
diff --git a/src/nsimage.m b/src/nsimage.m
index fa81a41a51..b0bd52111b 100644
--- a/src/nsimage.m
+++ b/src/nsimage.m
@@ -252,17 +252,16 @@ @implementation EmacsImage
 + (instancetype)allocInitFromFile: (Lisp_Object)file
 {
   NSImageRep *imgRep;
-  Lisp_Object found;
+  Lisp_Object utf8_filename;
   EmacsImage *image;
 
   /* Search bitmap-file-path for the file, if appropriate.  */
-  found = image_find_image_file (file);
-  if (!STRINGP (found))
+  utf8_filename = image_find_image_file (file);
+  if (!STRINGP (utf8_filename))
     return nil;
-  found = ENCODE_FILE (found);
 
   image = [[EmacsImage alloc] initByReferencingFile:
-                     [NSString stringWithLispString: found]];
+                     [NSString stringWithUTF8String: SSDATA (utf8_filename)]];
 
   image->bmRep = nil;
 #ifdef NS_IMPL_COCOA
@@ -278,7 +277,7 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
 
   [image setSize: NSMakeSize([imgRep pixelsWide], [imgRep pixelsHigh])];
 
-  [image setName: [NSString stringWithLispString: file]];
+  [image setName: [NSString stringWithUTF8String: SSDATA (utf8_filename)]];
 
   return image;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 18:18           ` Eli Zaretskii
@ 2021-06-08 19:13             ` naofumi
  2021-06-08 20:08               ` Mattias Engdegård
  0 siblings, 1 reply; 29+ messages in thread
From: naofumi @ 2021-06-08 19:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 48902, Mattias Engdegård, larsi, salutis, alan

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On Tuesday, June 08, 2021 20:18 CEST, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Mattias Engdegård <mattiase@acm.org>
> > Date: Tue, 8 Jun 2021 19:45:22 +0200
> > Cc: 48902@debbugs.gnu.org, Alan Third <alan@idiocy.org>,
> >  Rudolf Adamkovič <salutis@me.com>, naofumi@yasufuku.dev
> >
> > The quick fix of reverting to stringWithUTF8String: will work, but the real problem is that we have no control of the encodedness of lisp strings being passed around.
>
> The usual technique in these cases is to keep Lisp strings unencoded,
> encode them when passing to the low-level C functions, and pass to
> those functions only the pointer to the string's data.
>
> In those rare cases when you really need to pass a Lisp string that is
> an encoded file name, call the argument "encoded_file" or somesuch.
> But these cases should be rare exceptions.
>
>

I agree that [NSString stringWithLispString:] is working as expected,
and it is not the real problem.

For example, another patch using STRING_SET_MULTIBYTE() is here.
This looks still just a quick fix and bit weired, though..

attachments:
0001-Fix-to-show-images-with-non-ascii-filename-STRING_SET_MULTIBYTE.patch
image-non-ascii-filename-STRING_SET_MULTIBYTE-macos.png
image-non-ascii-filename-STRING_SET_MULTIBYTE-linux.png

On the other hand, I cannot find out that non-UTF-8 filename coding is
really needed on macOS.  It might be over-engineered thing, and just an overhead.

Regards,
--Naofumi

[-- Attachment #2: image-non-ascii-filename-STRING_SET_MULTIBYTE-macos.png --]
[-- Type: image/png, Size: 234036 bytes --]

[-- Attachment #3: 0001-Fix-to-show-images-with-non-ascii-filename-STRING_SET_MULTIBYTE.patch --]
[-- Type: application/octet-stream, Size: 1676 bytes --]

From 0b18100ada3a16e667684383150c4f4d7848e5ce Mon Sep 17 00:00:00 2001
From: Naofumi Yasufuku <naofumi@yasufuku.dev>
Date: Wed, 9 Jun 2021 02:10:43 +0900
Subject: [PATCH] Fix to show images with non-ascii filename on macOS

* src/image.c (image_find_image_fd): Indicate 'file_found' Lisp_String
as multibyte if 'file' or 'search_path' is multibyte Lisp_String.
Without this special care, string_to_multibyte() call in
[NSString stringWithLispString:]  attempts to convert the multibyte
filename (UTF-8 by default) to multybyte string unintentionally.
(Bug#48902)
* src/nsimage.m ([EmacsImage allocInitFromFile:]): Remove redundant
ENCODE_FILE() which is done by image_find_image_fd().
---
 src/image.c   | 2 ++
 src/nsimage.m | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/image.c b/src/image.c
index b34dc3e916..f7204af873 100644
--- a/src/image.c
+++ b/src/image.c
@@ -3156,6 +3156,8 @@ image_find_image_fd (Lisp_Object file, int *pfd)
   if (fd >= 0 || fd == -2)
     {
       file_found = ENCODE_FILE (file_found);
+      if (STRING_MULTIBYTE (search_path) || STRING_MULTIBYTE (file))
+	STRING_SET_MULTIBYTE (file_found);
       if (fd == -2)
 	{
 	  /* The file exists locally, but has a file name handler.
diff --git a/src/nsimage.m b/src/nsimage.m
index fa81a41a51..5e5960de90 100644
--- a/src/nsimage.m
+++ b/src/nsimage.m
@@ -259,7 +259,6 @@ + (instancetype)allocInitFromFile: (Lisp_Object)file
   found = image_find_image_file (file);
   if (!STRINGP (found))
     return nil;
-  found = ENCODE_FILE (found);
 
   image = [[EmacsImage alloc] initByReferencingFile:
                      [NSString stringWithLispString: found]];
-- 
2.31.1


[-- Attachment #4: image-non-ascii-filename-STRING_SET_MULTIBYTE-linux.png --]
[-- Type: image/png, Size: 82170 bytes --]

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 18:09               ` Eli Zaretskii
@ 2021-06-08 19:24                 ` Alan Third
  0 siblings, 0 replies; 29+ messages in thread
From: Alan Third @ 2021-06-08 19:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 48902, larsi, salutis, naofumi

On Tue, Jun 08, 2021 at 09:09:51PM +0300, Eli Zaretskii wrote:
> > Date: Tue, 8 Jun 2021 17:19:44 +0100
> > From: Alan Third <alan@idiocy.org>
> > Cc: larsi@gnus.org, naofumi@yasufuku.dev, 48902@debbugs.gnu.org,
> > 	salutis@me.com
> > 
> > In this case the call to ENCODE_FILE in allocInitFromFile is actually
> > redundant because image_find_image_fd already calls ENCODE_FILE on the
> > filename before passing it back. So we get a UTF-8 string no matter
> > what.
> 
> Then why was the code re-encoding t in UTF-16?  A bug?

No, sorry, I'm not being clear. The internal format of NSString is
UTF-16. We can load practically anything into it, as long as we know
ahead of time what the encoding is.

> > NSString can read in almost anything, and Mattias extended it to read
> > in multibyte (and ascii) lisp strings, so we don't need a UTF-16 input
> > specifically. It would probably be nice if NSString was also able to
> > recognise that a lisp string is UTF-8 and handle that itself, but I
> > don't think that's really possible, unless we make the assumption that
> > any unibyte string it's passed will already be ascii or UTF-8.
> > 
> > I don't know if that's a reasonable assumption.
> 
> Any file name passed through ENCODE_FILE should be in UTF-8 on macOS,
> as I understand that's how the macOS filesystems work.  Am I mistaken?
> Can the value of default-file-name-coding-system on macOS be anything
> other than UTF-8?

Not as far as I'm aware. But NSString is used all over the place in
the NS port code base (and all through the toolkit). Any time we pass
a string to or from the toolkit it has to be converted to or from
NSString. I think most of the actual file access code in Emacs is low
level C code which won't go near NSString, though, so it's not worth
changing C code to make ObjC code cleaner.

I guess I'm just being lazy and would like our extensions to NSString
to just DTRT, but it seems that's impractical. :)
-- 
Alan Third





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 19:10           ` Alan Third
@ 2021-06-08 19:52             ` Mattias Engdegård
  2021-06-08 20:33               ` Alan Third
  0 siblings, 1 reply; 29+ messages in thread
From: Mattias Engdegård @ 2021-06-08 19:52 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, Lars Ingebrigtsen, Rudolf Adamkovič, naofumi

8 juni 2021 kl. 21.10 skrev Alan Third <alan@idiocy.org>:

> In my defence it wasn't entirely clear to me that a lisp string
> returned from ENCODE_FILE was incompatible with stringWithLispString. ;)

Oh it's compatible all right, it just takes it job description very literally!
That's typical of them computers -- no imagination at all.

> Hmm, and as you point out we use "file" further down and it may or may
> not be encoded, but will probably have the same contents as found,
> which we know is encoded. Plus it's setting the "name" field in the
> image, which we probably want to keep as uniform as possible for
> caching purposes but is otherwise irrelevant.
> 
> I think the attached should solve this.

Thank you, that would work and I don't mind you pushing that right away.
We probably should clear up the encodedness of `file` in allocInitFromFile: -- as Eli said, the convention is keeping strings unencoded until needed by low-level operations and it really makes the most sense.






^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory  names containing apostrophes and backticks cause problems
  2021-06-08 19:13             ` naofumi
@ 2021-06-08 20:08               ` Mattias Engdegård
  0 siblings, 0 replies; 29+ messages in thread
From: Mattias Engdegård @ 2021-06-08 20:08 UTC (permalink / raw)
  To: naofumi@yasufuku.dev; +Cc: 48902, salutis, larsi, alan

8 juni 2021 kl. 21.13 skrev naofumi@yasufuku.dev:

> For example, another patch using STRING_SET_MULTIBYTE() is here.
> This looks still just a quick fix and bit weired, though..

Thank you, but it's probably better to always return an unencoded string from image_find_image_fd in that case. The current code looks like a premature optimisation.

> On the other hand, I cannot find out that non-UTF-8 filename coding is
> really needed on macOS.  It might be over-engineered thing, and just an overhead.

Maybe and in practice you are probably right, but the NS port is not exclusive to macOS.
There is the quasi-NFD normalisation step but I'm not sure how important that is today. There's no need to convert to NFD for accessing files; it only matters when comparing names (much like case folding on many file systems).






^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 19:52             ` Mattias Engdegård
@ 2021-06-08 20:33               ` Alan Third
  2021-06-09 11:40                 ` Mattias Engdegård
  2021-06-09 11:56                 ` Eli Zaretskii
  0 siblings, 2 replies; 29+ messages in thread
From: Alan Third @ 2021-06-08 20:33 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 48902, Lars Ingebrigtsen, Rudolf Adamkovič, naofumi

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

On Tue, Jun 08, 2021 at 09:52:51PM +0200, Mattias Engdegård wrote:
> 8 juni 2021 kl. 21.10 skrev Alan Third <alan@idiocy.org>:
> 
> > I think the attached should solve this.
> 
> Thank you, that would work and I don't mind you pushing that right
> away. We probably should clear up the encodedness of `file` in
> allocInitFromFile: -- as Eli said, the convention is keeping strings
> unencoded until needed by low-level operations and it really makes
> the most sense.

It's not just allocInitFromFile, I'm looking at the other callers of
image_find_image_file and they all call ENCODE_FILE after it too.

The only direct caller of image_find_image_fd that actually uses the
contents of the returned string (svg_load) also encodes the file name.

So I think we could restrict the use of the encoded filename within
image_find_image_fd to *only* when it actually opens the file.

Patch attached. I've tested it here but I only have a couple of images
to try it with.



I've been looking at the other changes I made in
747a923b9a35533f98573ad5b01fccf096195079 and I'm not sure they're
correct. They clearly work now, but most of the time it's probably
simple ASCII which should pass easily.

Before they *all* seem to have assumed the data was UTF8 encoded,
which is surely wrong since most of the time it's coming from Emacs.
It's things like menu item titles.

These are the use cases stringWithLispString was designed for, right?
The only odd one is image filenames because they're explicitly encoded?
-- 
Alan Third

[-- Attachment #2: v2-0001-Fix-image-filename-encoding-issues-bug-48902.patch --]
[-- Type: text/x-diff, Size: 2186 bytes --]

From 48763cfe123955173ad82085125b2f08295daa7d Mon Sep 17 00:00:00 2001
From: Alan Third <alan@idiocy.org>
Date: Tue, 8 Jun 2021 20:08:34 +0100
Subject: [PATCH v2] Fix image filename encoding issues (bug#48902)

* src/image.c (image_find_image_fd): Don't return an encoded filename
string.
* src/nsfns.m: ([NSString stringWithLispString:]): Clarify usage
comment.
---
 src/image.c | 19 ++++++++-----------
 src/nsfns.m |  3 ++-
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/image.c b/src/image.c
index b34dc3e916..d1aaaf8f53 100644
--- a/src/image.c
+++ b/src/image.c
@@ -3153,19 +3153,16 @@ image_find_image_fd (Lisp_Object file, int *pfd)
   /* Try to find FILE in data-directory/images, then x-bitmap-file-path.  */
   fd = openp (search_path, file, Qnil, &file_found,
 	      pfd ? Qt : make_fixnum (R_OK), false, false);
-  if (fd >= 0 || fd == -2)
+  if (fd == -2)
     {
-      file_found = ENCODE_FILE (file_found);
-      if (fd == -2)
-	{
-	  /* The file exists locally, but has a file name handler.
-	     (This happens, e.g., under Auto Image File Mode.)
-	     'openp' didn't open the file, so we should, because the
-	     caller expects that.  */
-	  fd = emacs_open (SSDATA (file_found), O_RDONLY, 0);
-	}
+      /* The file exists locally, but has a file name handler.
+	 (This happens, e.g., under Auto Image File Mode.)
+	 'openp' didn't open the file, so we should, because the
+	 caller expects that.  */
+      Lisp_Object encoded_name = ENCODE_FILE (file_found);
+      fd = emacs_open (SSDATA (encoded_name), O_RDONLY, 0);
     }
-  else	/* fd < 0, but not -2 */
+  else if (fd < 0)
     return Qnil;
   if (pfd)
     *pfd = fd;
diff --git a/src/nsfns.m b/src/nsfns.m
index d14f7b51ea..98801d8526 100644
--- a/src/nsfns.m
+++ b/src/nsfns.m
@@ -3024,7 +3024,8 @@ - (NSString *)panel: (id)sender userEnteredFilename: (NSString *)filename
 }
 
 @implementation NSString (EmacsString)
-/* Make an NSString from a Lisp string.  */
+/* Make an NSString from a Lisp string.  STRING must not be in an
+   encoded form (e.g. UTF-8).  */
 + (NSString *)stringWithLispString:(Lisp_Object)string
 {
   /* Shortcut for the common case.  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-07 14:13 ` Eli Zaretskii
@ 2021-06-08 22:21   ` Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 29+ messages in thread
From: Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-08 22:21 UTC (permalink / raw)
  To: 48902

Eli Zaretskii <eliz@gnu.org> writes:

> When you visit that directory in Dired, what is the value of
> buffer-file-coding-system in the Dired buffer?

The value is “utf-8-unix”.

> Also, you mention backtick, but there's no such character in the
> example you show.

There is no backtick. My apologies!

-- Rudy





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 20:33               ` Alan Third
@ 2021-06-09 11:40                 ` Mattias Engdegård
  2021-06-09 15:19                   ` Alan Third
  2021-06-09 11:56                 ` Eli Zaretskii
  1 sibling, 1 reply; 29+ messages in thread
From: Mattias Engdegård @ 2021-06-09 11:40 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, Lars Ingebrigtsen, Rudolf Adamkovič, naofumi

8 juni 2021 kl. 22.33 skrev Alan Third <alan@idiocy.org>:

> It's not just allocInitFromFile, I'm looking at the other callers of
> image_find_image_file and they all call ENCODE_FILE after it too.
> 
> The only direct caller of image_find_image_fd that actually uses the
> contents of the returned string (svg_load) also encodes the file name.
> 
> So I think we could restrict the use of the encoded filename within
> image_find_image_fd to *only* when it actually opens the file.

Thank you, and I arrived at the same conclusion.

> Patch attached. I've tested it here but I only have a couple of images
> to try it with.

Looks fine, but the image_find_image_file comment needs to be amended since it says that it returns an encoded string.

> I've been looking at the other changes I made in
> 747a923b9a35533f98573ad5b01fccf096195079 and I'm not sure they're
> correct. They clearly work now, but most of the time it's probably
> simple ASCII which should pass easily.
> 
> Before they *all* seem to have assumed the data was UTF8 encoded,
> which is surely wrong since most of the time it's coming from Emacs.
> It's things like menu item titles.
> 
> These are the use cases stringWithLispString was designed for, right?
> The only odd one is image filenames because they're explicitly encoded?

I should think so -- stringWithLispString: was designed as a general-purpose method to convert from a lisp string to NSString without changing the contents. Non-Unicode values (which includes raw bytes) become U+FFFD except surrogates as they can be represented (in a manner of speaking) in UTF-16, and it turns out to be more useful that way.

Furthermore, if we use stringWithLispString: for file names, no special file name encoding step should be needed on our side, since the NS libs will perform any needed normalisation (at least if I've understood it right).






^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-08 20:33               ` Alan Third
  2021-06-09 11:40                 ` Mattias Engdegård
@ 2021-06-09 11:56                 ` Eli Zaretskii
  1 sibling, 0 replies; 29+ messages in thread
From: Eli Zaretskii @ 2021-06-09 11:56 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, mattiase, larsi, salutis, naofumi

> Date: Tue, 8 Jun 2021 21:33:01 +0100
> From: Alan Third <alan@idiocy.org>
> Cc: 48902@debbugs.gnu.org, Lars Ingebrigtsen <larsi@gnus.org>,
>  Rudolf Adamkovič <salutis@me.com>, naofumi@yasufuku.dev
> 
> It's not just allocInitFromFile, I'm looking at the other callers of
> image_find_image_file and they all call ENCODE_FILE after it too.

Encoding an already encoded file name is a no-op.  But I agree it's
unclean and we had better fixed that.

> Patch attached. I've tested it here but I only have a couple of images
> to try it with.

Thanks, it LGTM.  But please also adjust the comment of
image_find_image_fd.





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-09 11:40                 ` Mattias Engdegård
@ 2021-06-09 15:19                   ` Alan Third
  2021-06-11 22:09                     ` Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 29+ messages in thread
From: Alan Third @ 2021-06-09 15:19 UTC (permalink / raw)
  To: Mattias Engdegård
  Cc: 48902, Lars Ingebrigtsen, Rudolf Adamkovič, naofumi

On Wed, Jun 09, 2021 at 01:40:18PM +0200, Mattias Engdegård wrote:
> 8 juni 2021 kl. 22.33 skrev Alan Third <alan@idiocy.org>:
> 
> > Patch attached. I've tested it here but I only have a couple of images
> > to try it with.
> 
> Looks fine, but the image_find_image_file comment needs to be
> amended since it says that it returns an encoded string.

I've made a change to the comment.

> Furthermore, if we use stringWithLispString: for file names, no
> special file name encoding step should be needed on our side, since
> the NS libs will perform any needed normalisation (at least if I've
> understood it right).

Yes, I believe that's right, so I've made that change too and pushed
to master.

As far as I can see this fixes the problem so I'll close the bug
report. If it's still broken in some way, please reply to this email
and we'll reopen the bug.

Thanks everyone!
-- 
Alan Third





^ permalink raw reply	[flat|nested] 29+ messages in thread

* bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems
  2021-06-09 15:19                   ` Alan Third
@ 2021-06-11 22:09                     ` Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 29+ messages in thread
From: Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2021-06-11 22:09 UTC (permalink / raw)
  To: Alan Third; +Cc: 48902, Mattias Engdegård, Lars Ingebrigtsen, naofumi

I have pulled, recompiled, updated my EMMS music library, and now, 
every album has artwork showing, including Japanese and other 
non-English albums. Thank you Alan, Eli, Lars, Mattias, and 
Naofumi. This was my first GNU bug report, and, as a fresh 
GNU/Emacs/DRM-free convert, I found the experience rather 
impressive!

R+

Alan Third <alan@idiocy.org> writes:

> On Wed, Jun 09, 2021 at 01:40:18PM +0200, Mattias Engdegård 
> wrote:
>> 8 juni 2021 kl. 22.33 skrev Alan Third <alan@idiocy.org>:
>>
>> > Patch attached. I've tested it here but I only have a couple 
>> > of images
>> > to try it with.
>>
>> Looks fine, but the image_find_image_file comment needs to be
>> amended since it says that it returns an encoded string.
>
> I've made a change to the comment.
>
>> Furthermore, if we use stringWithLispString: for file names, no
>> special file name encoding step should be needed on our side, 
>> since
>> the NS libs will perform any needed normalisation (at least if 
>> I've
>> understood it right).
>
> Yes, I believe that's right, so I've made that change too and 
> pushed
> to master.
>
> As far as I can see this fixes the problem so I'll close the bug
> report. If it's still broken in some way, please reply to this 
> email
> and we'll reopen the bug.
>
> Thanks everyone!





^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-06-11 22:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 13:32 bug#48902: 28.0.50; Directory names containing apostrophes and backticks cause problems Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-07 14:08 ` Lars Ingebrigtsen
2021-06-07 14:15   ` Eli Zaretskii
2021-06-07 14:24   ` Lars Ingebrigtsen
2021-06-07 14:36     ` Eli Zaretskii
2021-06-07 14:13 ` Eli Zaretskii
2021-06-08 22:21   ` Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-08 10:39 ` naofumi
2021-06-08 11:57   ` Lars Ingebrigtsen
2021-06-08 12:12     ` Alan Third
2021-06-08 12:14       ` Lars Ingebrigtsen
2021-06-08 17:45         ` Mattias Engdegård
2021-06-08 18:18           ` Eli Zaretskii
2021-06-08 19:13             ` naofumi
2021-06-08 20:08               ` Mattias Engdegård
2021-06-08 19:10           ` Alan Third
2021-06-08 19:52             ` Mattias Engdegård
2021-06-08 20:33               ` Alan Third
2021-06-09 11:40                 ` Mattias Engdegård
2021-06-09 15:19                   ` Alan Third
2021-06-11 22:09                     ` Rudolf Adamkovič via Bug reports for GNU Emacs, the Swiss army knife of text editors
2021-06-09 11:56                 ` Eli Zaretskii
2021-06-08 18:17         ` Mattias Engdegård
2021-06-08 12:37       ` Eli Zaretskii
2021-06-08 13:00         ` Alan Third
2021-06-08 14:02           ` Eli Zaretskii
2021-06-08 16:19             ` Alan Third
2021-06-08 18:09               ` Eli Zaretskii
2021-06-08 19:24                 ` Alan Third

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).