From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: fixing url-unhex-string for unicode/multi-byte charsets Date: Fri, 06 Nov 2020 14:04:40 +0200 Message-ID: <83pn4q8zdz.fsf@gnu.org> References: <20201106074742.jq3h4uujm7oce7af@E15-2016.optimum.net> <83wnyy9akw.fsf@gnu.org> <20201106102756.e2ctvpjruenatud5@E15-2016.optimum.net> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2508"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Boruch Baum Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Nov 06 13:06:09 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kb0VN-0000Z0-F7 for ged-emacs-devel@m.gmane-mx.org; Fri, 06 Nov 2020 13:06:09 +0100 Original-Received: from localhost ([::1]:44370 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kb0VM-0008Pt-EI for ged-emacs-devel@m.gmane-mx.org; Fri, 06 Nov 2020 07:06:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:52034) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kb0U1-0007qH-Ri for emacs-devel@gnu.org; Fri, 06 Nov 2020 07:04:45 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:59329) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kb0Tz-0004zv-LV; Fri, 06 Nov 2020 07:04:43 -0500 Original-Received: from [176.228.60.248] (port=1523 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kb0Tz-0006kF-1d; Fri, 06 Nov 2020 07:04:43 -0500 In-Reply-To: <20201106102756.e2ctvpjruenatud5@E15-2016.optimum.net> (message from Boruch Baum on Fri, 6 Nov 2020 05:27:56 -0500) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:258805 Archived-At: > Date: Fri, 6 Nov 2020 05:27:56 -0500 > From: Boruch Baum > Cc: emacs-devel@gnu.org > > > I made a suggestion in that discussion, I will repeat some of them > > here: > > Yeah, but they don't work. I said "something like that", because I don't know the full context. If "don't work" means "needs minor adaptations", the suggestions are still valid. > > So, for file names, something like the below should do the job > > simpler: > > > > (decode-coding-string (url-unhex-string STR) > > (or file-name-coding-system > > (default-value 'file-name-coding-system))) > > Try it. I can't, not in full: I don't have a Freedesktop trash anywhere I have access to. I did try the 2 file names you posted, including the one with Hebrew characters, and it did work for me, on the assumption that file-name-coding-system is UTF-8. > To reproduce, touch and then trash a file named some two Hebrew > words delimited by a space. Navigate to the trash directory's 'info' > sub-directory and extract the 'path' value from the file's meta-data > .info file. That's the string we need to decode. Apply the string to > your solution and see that you do not get the space-delimited two > Hebrew words. A stand-alone test case, which doesn't require an actual trash, would be appreciated, so I could see which parrt doesn't work, and how to fix it. Alternatively, maybe you could explain why you needed to insert the text into a temporary buffer and then extract it from there? AFAIK, we have the same primitives that work on decoding strings as we have for decoding buffer text.