From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: master 6011d39b6a: Fix drag-and-drop of files with multibyte filenames Date: Sun, 05 Jun 2022 15:54:18 +0300 Message-ID: <83h74z9sp1.fsf@gnu.org> References: <83r143a2j3.fsf@gnu.org> <87y1ybzaz9.fsf@yahoo.com> <83mter9zbl.fsf@gnu.org> <87v8tfz686.fsf@yahoo.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7869"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Po Lu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Jun 05 14:55:17 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nxpmm-0001up-WE for ged-emacs-devel@m.gmane-mx.org; Sun, 05 Jun 2022 14:55:17 +0200 Original-Received: from localhost ([::1]:60660 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nxpml-0001fZ-H4 for ged-emacs-devel@m.gmane-mx.org; Sun, 05 Jun 2022 08:55:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49664) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nxpm4-0000tV-Sq for emacs-devel@gnu.org; Sun, 05 Jun 2022 08:54:33 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:43160) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nxpm4-00054Q-Ji; Sun, 05 Jun 2022 08:54:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=I3pFQWeGIb1IMCvBmDKpkEHqKjrT4FnH4pjHduA5lTk=; b=lxxdo8Q+Ep+n D4uaO7FdE51/QdI1hITSLEQWYkv+5pn6IUiDQOiMgyeNmW5grbGzPcnUHDSouVZDXaAWePL9qn5Ui e5VXtMa+T75k30GYbIyBFi8R9Yq19kXFM9+W99rpXeUulf00aPegF+hL/orZrXMS3RHjTbfehHpIm lSas6MqgNDD3P1FVdjDK+7+Vk5p1Wx98KPu79soBiuMtkUnGIG75bGjs3yGUad7b7XJh/SUQL+uMz JPaHa8BrjJU9w2ChNWjBLTdiDagiIybkbwCahQU75Fa1FXA4MrNkUrUBaQsG6sbBds0qDzWbnpD/6 bcG4r4/AO7qC0xT8KpicJA==; Original-Received: from [87.69.77.57] (port=4040 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nxpm3-0001Jj-Ev; Sun, 05 Jun 2022 08:54:32 -0400 In-Reply-To: <87v8tfz686.fsf@yahoo.com> (message from Po Lu on Sun, 05 Jun 2022 19:42:49 +0800) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:290702 Archived-At: > From: Po Lu > Cc: emacs-devel@gnu.org > Date: Sun, 05 Jun 2022 19:42:49 +0800 > > Eli Zaretskii writes: > > > Then why not encode in UTF-8, for example? > > How about (or file-name-coding-system default-file-name-coding-system) > instead? AFAICT, that's what ENCODE_FILE does. Yes. Sorry, I forgot that the code was in Lisp, not C. > > If some program other than Emacs is the target of the drop, raw bytes > > produced from raw-text will not be meaningful for it. > > Why not? Aren't those bytes equivalent to a C string describing a file > name that can be passed to `open'? Not necessarily. First, non-ASCII characters can be encoded in different ways, and the other program might not necessarily support more than just the locale's encoding. And second, any characters to which Emacs gives codepoints beyond the Unicode codespace (something that is rare, but it does happen) will not be understood by the other programs at all, because their codepoints are completely private to Emacs. > I wrote that code according to how C_STRINGs are already encoded in > select.el: > > ((eq type 'C_STRING) > ;; According to ICCCM Protocol v2.0 (para 2.7.1), C_STRING > ;; is a zero-terminated sequence of raw bytes that > ;; shouldn't be interpreted as text in any encoding. > ;; Therefore, if STR is unibyte (the normal case), we use > ;; it as-is; otherwise we assume some of the characters > ;; are eight-bit and ensure they are converted to their > ;; single-byte representation. > (or (null (multibyte-string-p str)) > (setq str (encode-coding-string str 'raw-text-unix)))) See the comment: it explicitly tells about "strings" that aren't text. File names are always human-readable text, or at least they should be. > > I actually don't understand why you don't use ENCODE_FILE for files > > and ENCODE_SYSTEM for everything else -- this is the only encoding > > which we know to be generally suitable for any operation that calls > > low-level C APIs whose implementation is not in Emacs. Bonus points > > for adhering to selection-coding-system when that is non-nil. > > > > Are there any known problems with using these two system encodings in > > this case? > > Yes: the entire selection mechanism is implemented in Lisp, and moving > parts to C specifically would require some rethinking of the C code > involved, and wouldn't be backwards-compatible. No need to move anything to C: you can do the same in Lisp. See above. > The FILE_NAME target has existed for decades in Lisp for programs that > comply with the ICCCM and also deals with all kinds of file name > encodings (see the call to `xselect--encode-string' in > `xselect-convert-to-filename'), so I don't see why this code cannot. I guess that other code is also incorrect, and was never seriously tested with non-ASCII file names outside of UTF-8 locales. Try Emacs whose file-name-coding-system is iso-2022-jp or somesuch.