From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: master 6011d39b6a: Fix drag-and-drop of files with multibyte filenames Date: Sun, 05 Jun 2022 13:31:10 +0300 Message-ID: <83mter9zbl.fsf@gnu.org> References: <83r143a2j3.fsf@gnu.org> <87y1ybzaz9.fsf@yahoo.com> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12593"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Po Lu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Jun 05 12:34:04 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nxna7-00038t-Or for ged-emacs-devel@m.gmane-mx.org; Sun, 05 Jun 2022 12:34:03 +0200 Original-Received: from localhost ([::1]:55174 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nxna6-0001Bb-Bf for ged-emacs-devel@m.gmane-mx.org; Sun, 05 Jun 2022 06:34:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60680) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nxnXc-0008Po-2M for emacs-devel@gnu.org; Sun, 05 Jun 2022 06:31:28 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:41200) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nxnXb-0000Mv-JB; Sun, 05 Jun 2022 06:31:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=R6Bab3gh7tgp7TWvDYJHPRPn9subyPIvR/Ua168wAKw=; b=r9F7vnGvU2bG ft7+CLLxRYO6wPcBi9nS2TLK27j6Kn5hYxRNEPZ2CLGanYjpGYZ1FhBzgS4+egzoZedB/wUbZWeO9 TTMZX2Fb59zyK5z3oo643h/Zh9xs6HlWA8ZaJWX5mgtmiUE+caYvIT1xyi1t6yHED0pM2sR74t3Dn fXJCgbhbehAQzCA2PmWiLM2IzxGfyt/6kHByqv5SzUO7k8VXTI2ouBTJmKK/gYL/VlT7dmUJ1J1T6 1uHBuHttaKoCBnRVLcs+sZl04XMIuWrokdjFmdI3h7FtBcPr/fcj3ANvxsv3OdrE7F/X/wl2TuhIz fCpTZdtwKEAWls0zNcKvyA==; Original-Received: from [87.69.77.57] (port=2716 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nxnXY-000889-BZ; Sun, 05 Jun 2022 06:31:27 -0400 In-Reply-To: <87y1ybzaz9.fsf@yahoo.com> (message from Po Lu on Sun, 05 Jun 2022 18:00:10 +0800) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:290685 Archived-At: > From: Po Lu > Cc: emacs-devel@gnu.org > Date: Sun, 05 Jun 2022 18:00:10 +0800 > > Eli Zaretskii writes: > > > I don't think I understand this change. raw-text basically doesn't do > > any conversion, except if the text includes raw bytes. Is that the > > problem here, and if so, how come a file name can include raw bytes in > > its name? > > Encoding it as `raw-text-unix' is to satisfy the requirement in > xselect.c that strings returned by selection converters must be > unibyte. IOW, it's the same as > > (string-as-unibyte (expand-file-name value)) > > except that we can't use `string-as-unibyte', because it's obsolete. Then why not encode in UTF-8, for example? > > And what does "Motif expects this to be STRING, but it treats the data > > as a sequence of bytes instead of a Latin-1 string" mean in this > > context? The difference between raw bytes and Latin-1 strings is only > > meaningful to Emacs; how does Motif distinguish between them? > > The selection property type STRING means a Latin-1 string, with some > minor extensions. See this paragraph under "TEXT Properties" in the > ICCCM: > > STRING as a type or a target specifies the ISO Latin-1 character set > plus the control characters TAB (octal 11) and NEWLINE (octal > 12). The spacing interpretation of TAB is context dependent. Other > ASCII control characters are explicitly not included in STRING at the > present time. > > But Motif doesn't comply with the ICCCM meaning of STRING or use the > generic TEXT type when converting a drag-and-drop selection to > FILE_NAME. It instead expects the type of the selection property to be > STRING, but the data is treated as raw bytes. If some program other than Emacs is the target of the drop, raw bytes produced from raw-text will not be meaningful for it. I actually don't understand why you don't use ENCODE_FILE for files and ENCODE_SYSTEM for everything else -- this is the only encoding which we know to be generally suitable for any operation that calls low-level C APIs whose implementation is not in Emacs. Bonus points for adhering to selection-coding-system when that is non-nil. Are there any known problems with using these two system encodings in this case?