From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Boruch Baum Newsgroups: gmane.emacs.devel Subject: fixing url-unhex-string for unicode/multi-byte charsets Date: Fri, 6 Nov 2020 02:47:42 -0500 Message-ID: <20201106074742.jq3h4uujm7oce7af@E15-2016.optimum.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="3977"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: NeoMutt/20180716 To: Emacs-Devel List Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Nov 06 08:51:48 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kawXE-0000vT-LS for ged-emacs-devel@m.gmane-mx.org; Fri, 06 Nov 2020 08:51:48 +0100 Original-Received: from localhost ([::1]:34966 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kawXD-0008WS-Dl for ged-emacs-devel@m.gmane-mx.org; Fri, 06 Nov 2020 02:51:47 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43768) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kawTR-0007pR-73 for emacs-devel@gnu.org; Fri, 06 Nov 2020 02:47:54 -0500 Original-Received: from mout.gmx.net ([212.227.17.22]:49353) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kawTL-0008CJ-5I for emacs-devel@gnu.org; Fri, 06 Nov 2020 02:47:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1604648865; bh=KT1aOmpkv2lkZMm8+LsfRyOGwKsTsqzzuBHZlEGdKKc=; h=X-UI-Sender-Class:Date:From:To:Subject; b=U+aiA+skrZPSXvSgIpC+UxIuC8o1KrHE2ukmF+vf0FFXNZLt4Ky8MjK1N6CHAb1yb ruZlVFykHC00nZvWfW7OnDMUCAdvqFnI695/SNQxTIfRyCFfQ3ugMrrHJ9Jcq4lWKA Z1UmCCNFu90YWMlAdUDEoHKeAy4HB95tUmWsgJlA= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Original-Received: from E15-2016.optimum.net ([70.19.86.82]) by mail.gmx.com (mrgmx104 [212.227.17.174]) with ESMTPSA (Nemesis) id 1Mgeo8-1k74m43B6K-00h3QP for ; Fri, 06 Nov 2020 08:47:45 +0100 Content-Disposition: inline X-Provags-ID: V03:K1:GZnn5feN7vMlOkjuv/TVqsD5MTofLXtuaXdjFqpCX2FX07BYqrP 1CYhvaMi1MUWpOmQ2ZWhilppe4E2FChBXMPEXTEC+hjTCPbO+DmFBGK1Yf/BMzKqxygky1H AfS29jfXxZLNmG58+m6RyZPwZ2+MfsiAagTI3PtSYrt57SkzFeWexWXXKiO5MyxR0X4drtQ FN+Ic0d8UxwxX3/8RFw9A== X-UI-Out-Filterresults: notjunk:1;V03:K0:VQpRnUdWdMg=:VjREw05J907dT7NegoFDKd 0OhA6UPiYU9pSP8fr5knMSr9xdHXVyOS2LUDvqMemby0cvpBHGvmDrtPajJOMkWph76Afk+Du OgIrBaLydkhcecmTe2F7CqU8ws2rWC5W3Kpb9ak7kenccSfx7wJmrOZ15kQGVSTzxKSLPhTn4 JKaNnExFNuaW1CA0jJw7xk57wdob2zd47OVMiEfZihRAhQiPAFkDLapfQVGf1Q0DS3RCvrxMU z3fPp+nD65GuUtoZKCsDZPRZsSfen7giiHdAqRbNBVU3BJ8+sdSFE0haNySUlm3qtsZQ5XdL9 YjQFtJBIMrDLDoCRCXSbcGHmOrx3etnF4rTlSsTwCXY3/DbaYBEEKmTRKU0UMUzxQTbBvdYUL 9W3MHoPvpW+FR/ifqDDoQ/rCMi5AwTWOAs2rdhOGw7QMLahVwMo6A0VCVghMJxDrHFtuePn/V SRc11ybM0zbjkHdWqF7UdUSMkO5P8g7WcqDC4lLeY70QGO6c49lP54KB/PxH5RYEus+ve9axe NA6CrAjDqAWSSqnZqCcqJETOgEkTOLvz842yTeoDq4y9t/TZOPqd4yWLUHDruYafu6NceVMGX 9LP1u3+3cYreMTJvTiHN0Wuz658R6QZjzasVbLm+Z0qi8i8a52rzVP+A9s8LDdzDRtuH9w9Ng uruM33Igr4Tp9HCO5OrdO1OF62ndlJTRYbbCR8WSXHFIXlRkrQm+MLyewDQ1evkDifcq7j0fY zw5Pud1THQ/bPrMjYb/nNTn2eMf9OQ+MJ1yymC40NG29qw8lq3M02+c8P45MPUMTTafOq9lD Received-SPF: pass client-ip=212.227.17.22; envelope-from=boruch_baum@gmx.com; helo=mout.gmx.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/06 02:47:45 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:258779 Archived-At: In the thread "Friendlier dired experience", Michael Albinus noted that the new emacs feature to place remote files in the local trash performs hex-encoding on remote file-names as if they were URLs, which led me to discover that was also happening for local files encoded in multi-byte (eg. unicode) character-set encodings. Neither of these cases were being properly handled by the current emacs function `url-unhex-string'. We noticed this for the case of restoring a trashed file, but it can be expected to exhibit in other cases. I've solved the problem for diredc, using code from the emacs-w3m project (thanks). Whether for the general emacs case it should be handled by altering function `url-unhex-string', or whether a second function should be created isn't for me to decide, so here's my fix for you to discuss, decide, apply. =2D-8<--cut here-(start)------------------------------------------- >8 (defun diredc--decode-hexlated-string (str) "Convert hexlated string to human-readable, with charset coding support. This function improves upon `url-unhex-string' by handled hexlated multi-byte and unicode characters. Credit to the `emacs-w3m' project for the core-code, at `w3m-url-decode-string'." ;; NOTE: This technique should be used by `url-unhex-string' itself, ;; or integrated otherwise into emacs. (let ((start 0) (case-fold-search t) (regexp "%\\(?:\\([0-9a-f][0-9a-f]\\)\\|0d%0a\\)")) (with-temp-buffer (set-buffer-multibyte nil) (while (string-match regexp str start) (insert (substring str start (match-beginning 0)) (if (match-beginning 1) (string-to-number (match-string 1 str) 16) ?\n)) (setq start (match-end 0))) (insert (substring str start)) (decode-coding-string (buffer-string) (with-coding-priority nil (car (detect-coding-region (point-min) (point-max)))))))) =2D-8<--cut here-(end)--------------------------------------------- >8 =2D- hkp://keys.gnupg.net CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0