From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: fixing url-unhex-string for unicode/multi-byte charsets Date: Fri, 06 Nov 2020 17:04:05 +0200 Message-ID: <83eel68r2y.fsf@gnu.org> References: <20201106074742.jq3h4uujm7oce7af@E15-2016.optimum.net> <83wnyy9akw.fsf@gnu.org> <20201106102756.e2ctvpjruenatud5@E15-2016.optimum.net> <83pn4q8zdz.fsf@gnu.org> <20201106122846.unoizvad53blgncf@E15-2016.optimum.net> <83lffe8v92.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29952"; mail-complaints-to="usenet@ciao.gmane.io" Cc: boruch_baum@gmx.com, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Nov 06 16:04:40 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kb3I7-0007ew-FS for ged-emacs-devel@m.gmane-mx.org; Fri, 06 Nov 2020 16:04:39 +0100 Original-Received: from localhost ([::1]:43206 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kb3I6-0008Sa-Fw for ged-emacs-devel@m.gmane-mx.org; Fri, 06 Nov 2020 10:04:38 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58936) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kb3Hb-00081O-Kh for emacs-devel@gnu.org; Fri, 06 Nov 2020 10:04:07 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:38408) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kb3Ha-0000ZT-JX; Fri, 06 Nov 2020 10:04:06 -0500 Original-Received: from [176.228.60.248] (port=4976 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kb3HZ-0001Qm-Vm; Fri, 06 Nov 2020 10:04:06 -0500 In-Reply-To: (message from Stefan Monnier on Fri, 06 Nov 2020 09:59:02 -0500) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:258823 Archived-At: > From: Stefan Monnier > Cc: Boruch Baum , emacs-devel@gnu.org > Date: Fri, 06 Nov 2020 09:59:02 -0500 > > > (dolist (str '("hello%20world" > > "%d7%a9%d7%9c%d7%95%d7%9d%20%d7%a2%d7%95%d7%9c%d7%9d")) > > (insert (decode-coding-string (url-unhex-string str) > > (or file-name-coding-system > > default-file-name-coding-system)) > > "\n")) > > > > The result of evaluating this is two lines inserted into the current > > buffer: > > > > hello world > > שלום עולם > > > > If this doesn't work for you, or if you tried something slightly > > different, I'd like to hear the details, perhaps there's some > > subtlety I'm missing. > > My guess is that his `file-name-coding-system` is set to something > different from utf-8. > [ BTW, I wouldn't be surprised to hear that the Freedesktop spec > documents that the file names in the Trash should use utf-8, in which > case the code should hard-code utf-8 rather than use > `file-name-coding-system` ;-) ] If the trash spec says it must be UTF-8, then yes, TRT is to use that unconditionally. But if the spec says nothing, I'd expect the file names to be in whatever encoding they were on disk, which usually should coincide with file-name-coding-system.