From: Boruch Baum <boruch_baum@gmx.com>
To: Emacs-Devel List <emacs-devel@gnu.org>
Subject: fixing url-unhex-string for unicode/multi-byte charsets
Date: Fri, 6 Nov 2020 02:47:42 -0500 [thread overview]
Message-ID: <20201106074742.jq3h4uujm7oce7af@E15-2016.optimum.net> (raw)
In the thread "Friendlier dired experience", Michael Albinus noted that
the new emacs feature to place remote files in the local trash performs
hex-encoding on remote file-names as if they were URLs, which led me to
discover that was also happening for local files encoded in multi-byte
(eg. unicode) character-set encodings. Neither of these cases were being
properly handled by the current emacs function `url-unhex-string'. We
noticed this for the case of restoring a trashed file, but it can be
expected to exhibit in other cases.
I've solved the problem for diredc, using code from the emacs-w3m
project (thanks). Whether for the general emacs case it should be
handled by altering function `url-unhex-string', or whether a second
function should be created isn't for me to decide, so here's my fix for
you to discuss, decide, apply.
--8<--cut here-(start)------------------------------------------- >8
(defun diredc--decode-hexlated-string (str)
"Convert hexlated string to human-readable, with charset coding support.
This function improves upon `url-unhex-string' by handled
hexlated multi-byte and unicode characters. Credit to the
`emacs-w3m' project for the core-code, at
`w3m-url-decode-string'."
;; NOTE: This technique should be used by `url-unhex-string' itself,
;; or integrated otherwise into emacs.
(let ((start 0)
(case-fold-search t)
(regexp "%\\(?:\\([0-9a-f][0-9a-f]\\)\\|0d%0a\\)"))
(with-temp-buffer
(set-buffer-multibyte nil)
(while (string-match regexp str start)
(insert (substring str start (match-beginning 0))
(if (match-beginning 1)
(string-to-number (match-string 1 str) 16)
?\n))
(setq start (match-end 0)))
(insert (substring str start))
(decode-coding-string
(buffer-string)
(with-coding-priority nil
(car (detect-coding-region (point-min) (point-max))))))))
--8<--cut here-(end)--------------------------------------------- >8
--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1 7286 0036 9E45 1595 8BC0
next reply other threads:[~2020-11-06 7:47 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-06 7:47 Boruch Baum [this message]
2020-11-06 8:02 ` fixing url-unhex-string for unicode/multi-byte charsets Eli Zaretskii
2020-11-06 10:27 ` Boruch Baum
2020-11-06 12:04 ` Eli Zaretskii
2020-11-06 12:28 ` Boruch Baum
2020-11-06 13:34 ` Eli Zaretskii
2020-11-06 14:59 ` Stefan Monnier
2020-11-06 15:04 ` Eli Zaretskii
2020-11-08 9:12 ` Boruch Baum
2020-11-08 13:39 ` Stefan Monnier
2020-11-08 15:07 ` Eli Zaretskii
2020-11-06 14:38 ` Stefan Monnier
-- strict thread matches above, loose matches on Subject: below --
2020-11-06 7:54 Boruch Baum
2020-11-06 8:05 ` Eli Zaretskii
2020-11-06 10:34 ` Boruch Baum
2020-11-06 12:06 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201106074742.jq3h4uujm7oce7af@E15-2016.optimum.net \
--to=boruch_baum@gmx.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.