all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Po Lu <luangruo@yahoo.com>
Cc: emacs-devel@gnu.org
Subject: Re: master 6011d39b6a: Fix drag-and-drop of files with multibyte filenames
Date: Sun, 05 Jun 2022 15:54:18 +0300	[thread overview]
Message-ID: <83h74z9sp1.fsf@gnu.org> (raw)
In-Reply-To: <87v8tfz686.fsf@yahoo.com> (message from Po Lu on Sun, 05 Jun 2022 19:42:49 +0800)

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Sun, 05 Jun 2022 19:42:49 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Then why not encode in UTF-8, for example?
> 
> How about (or file-name-coding-system default-file-name-coding-system)
> instead?  AFAICT, that's what ENCODE_FILE does.

Yes.  Sorry, I forgot that the code was in Lisp, not C.

> > If some program other than Emacs is the target of the drop, raw bytes
> > produced from raw-text will not be meaningful for it.
> 
> Why not?  Aren't those bytes equivalent to a C string describing a file
> name that can be passed to `open'?

Not necessarily.  First, non-ASCII characters can be encoded in
different ways, and the other program might not necessarily support
more than just the locale's encoding.  And second, any characters to
which Emacs gives codepoints beyond the Unicode codespace (something
that is rare, but it does happen) will not be understood by the other
programs at all, because their codepoints are completely private to
Emacs.

> I wrote that code according to how C_STRINGs are already encoded in
> select.el:
> 
> 	   ((eq type 'C_STRING)
>             ;; According to ICCCM Protocol v2.0 (para 2.7.1), C_STRING
>             ;; is a zero-terminated sequence of raw bytes that
>             ;; shouldn't be interpreted as text in any encoding.
>             ;; Therefore, if STR is unibyte (the normal case), we use
>             ;; it as-is; otherwise we assume some of the characters
>             ;; are eight-bit and ensure they are converted to their
>             ;; single-byte representation.
>             (or (null (multibyte-string-p str))
>                 (setq str (encode-coding-string str 'raw-text-unix))))

See the comment: it explicitly tells about "strings" that aren't text.
File names are always human-readable text, or at least they should be.

> > I actually don't understand why you don't use ENCODE_FILE for files
> > and ENCODE_SYSTEM for everything else -- this is the only encoding
> > which we know to be generally suitable for any operation that calls
> > low-level C APIs whose implementation is not in Emacs.  Bonus points
> > for adhering to selection-coding-system when that is non-nil.
> >
> > Are there any known problems with using these two system encodings in
> > this case?
> 
> Yes: the entire selection mechanism is implemented in Lisp, and moving
> parts to C specifically would require some rethinking of the C code
> involved, and wouldn't be backwards-compatible.

No need to move anything to C: you can do the same in Lisp.  See
above.

> The FILE_NAME target has existed for decades in Lisp for programs that
> comply with the ICCCM and also deals with all kinds of file name
> encodings (see the call to `xselect--encode-string' in
> `xselect-convert-to-filename'), so I don't see why this code cannot.

<Shrug> I guess that other code is also incorrect, and was never
seriously tested with non-ASCII file names outside of UTF-8 locales.
Try Emacs whose file-name-coding-system is iso-2022-jp or somesuch.



  reply	other threads:[~2022-06-05 12:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-05  9:21 master 6011d39b6a: Fix drag-and-drop of files with multibyte filenames Eli Zaretskii
2022-06-05 10:00 ` Po Lu
2022-06-05 10:31   ` Eli Zaretskii
2022-06-05 11:42     ` Po Lu
2022-06-05 12:54       ` Eli Zaretskii [this message]
2022-06-05 13:07         ` Po Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83h74z9sp1.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=luangruo@yahoo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.