unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* fixing url-unhex-string for unicode/multi-byte charsets
@ 2020-11-06  7:54 Boruch Baum
  2020-11-06  8:05 ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Boruch Baum @ 2020-11-06  7:54 UTC (permalink / raw)
  To: Emacs-Devel List

Katsumi Yamaoka at the emacs-w3m project points out that emacs has a
function `eww-decode-url-file-name' that solves this issue. Maybe that
function should become the canonical emacs solution?

--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0



^ permalink raw reply	[flat|nested] 16+ messages in thread
* fixing url-unhex-string for unicode/multi-byte charsets
@ 2020-11-06  7:47 Boruch Baum
  2020-11-06  8:02 ` Eli Zaretskii
  0 siblings, 1 reply; 16+ messages in thread
From: Boruch Baum @ 2020-11-06  7:47 UTC (permalink / raw)
  To: Emacs-Devel List

In the thread "Friendlier dired experience", Michael Albinus noted that
the new emacs feature to place remote files in the local trash performs
hex-encoding on remote file-names as if they were URLs, which led me to
discover that was also happening for local files encoded in multi-byte
(eg. unicode) character-set encodings. Neither of these cases were being
properly handled by the current emacs function `url-unhex-string'. We
noticed this for the case of restoring a trashed file, but it can be
expected to exhibit in other cases.

I've solved the problem for diredc, using code from the emacs-w3m
project (thanks). Whether for the general emacs case it should be
handled by altering function `url-unhex-string', or whether a second
function should be created isn't for me to decide, so here's my fix for
you to discuss, decide, apply.

--8<--cut here-(start)------------------------------------------- >8
(defun diredc--decode-hexlated-string (str)
  "Convert hexlated string to human-readable, with charset coding support.
This function improves upon `url-unhex-string' by handled
hexlated multi-byte and unicode characters. Credit to the
`emacs-w3m' project for the core-code, at
`w3m-url-decode-string'."
  ;; NOTE: This technique should be used by `url-unhex-string' itself,
  ;;       or integrated otherwise into emacs.
  (let ((start 0)
        (case-fold-search t)
        (regexp "%\\(?:\\([0-9a-f][0-9a-f]\\)\\|0d%0a\\)"))
    (with-temp-buffer
      (set-buffer-multibyte nil)
      (while (string-match regexp str start)
        (insert (substring str start (match-beginning 0))
        	   (if (match-beginning 1)
        	      (string-to-number (match-string 1 str) 16)
        	    ?\n))
      (setq start (match-end 0)))
      (insert (substring str start))
      (decode-coding-string
        (buffer-string)
        (with-coding-priority nil
               (car (detect-coding-region (point-min) (point-max))))))))
--8<--cut here-(end)--------------------------------------------- >8

--
hkp://keys.gnupg.net
CA45 09B5 5351 7C11 A9D1  7286 0036 9E45 1595 8BC0



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-11-08 15:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06  7:54 fixing url-unhex-string for unicode/multi-byte charsets Boruch Baum
2020-11-06  8:05 ` Eli Zaretskii
2020-11-06 10:34   ` Boruch Baum
2020-11-06 12:06     ` Eli Zaretskii
  -- strict thread matches above, loose matches on Subject: below --
2020-11-06  7:47 Boruch Baum
2020-11-06  8:02 ` Eli Zaretskii
2020-11-06 10:27   ` Boruch Baum
2020-11-06 12:04     ` Eli Zaretskii
2020-11-06 12:28       ` Boruch Baum
2020-11-06 13:34         ` Eli Zaretskii
2020-11-06 14:59           ` Stefan Monnier
2020-11-06 15:04             ` Eli Zaretskii
2020-11-08  9:12               ` Boruch Baum
2020-11-08 13:39                 ` Stefan Monnier
2020-11-08 15:07                 ` Eli Zaretskii
2020-11-06 14:38     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).