all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Boruch Baum <boruch_baum@gmx.com>
Cc: emacs-devel@gnu.org
Subject: Re: fixing url-unhex-string for unicode/multi-byte charsets
Date: Fri, 06 Nov 2020 15:34:01 +0200	[thread overview]
Message-ID: <83lffe8v92.fsf@gnu.org> (raw)
In-Reply-To: <20201106122846.unoizvad53blgncf@E15-2016.optimum.net> (message from Boruch Baum on Fri, 6 Nov 2020 07:28:46 -0500)

> Date: Fri, 6 Nov 2020 07:28:46 -0500
> From: Boruch Baum <boruch_baum@gmx.com>
> Cc: emacs-devel@gnu.org
> 
> > A stand-alone test case, which doesn't require an actual trash, would
> > be appreciated, so I could see which parrt doesn't work, and how to
> > fix it.
> 
> That would be the two file names that I previously posted. You say that
> they succeeded for you, but they didn't for me. The result I got was
> good for the first case (English two words), and garbage for the second
> case (Hebrew two words).

I tried that before posting the suggestion.  FTR, the below works for
me on the current emacs-27 branch and on master, both on MS-Windows
(where I used a literal 'utf-8 instead of file-name-coding-system)
and on GNU/Linux:

 (dolist (str '("hello%20world"
                "%d7%a9%d7%9c%d7%95%d7%9d%20%d7%a2%d7%95%d7%9c%d7%9d"))
   (insert (decode-coding-string (url-unhex-string str)
                                 (or file-name-coding-system
                                     default-file-name-coding-system))
           "\n"))

The result of evaluating this is two lines inserted into the current
buffer:

  hello world
  שלום עולם

If this doesn't work for you, or if you tried something slightly
different, I'd like to hear the details, perhaps there's some
subtlety I'm missing.

> > Alternatively, maybe you could explain why you needed to insert the
> > text into a temporary buffer and then extract it from there?  AFAIK,
> > we have the same primitives that work on decoding strings as we have
> > for decoding buffer text.
> 
> I don't need to. It's implementation done in emacs-w3m. I also pointed
> out that eww does it differently. I think the need in emacs-w3m is to
> mix the ascii characters and selected binary output, which can't be done
> with say replace-regexp-in-string. So what they do is use a temporary
> buffer, set `buffer-multibyte' to nil, and instead of
> replace-regexp-in-string build the result in the temporary buffer.

As a rule of thumb, any Lisp code that needs to do something with a
string and does that by inserting it into a temporary buffer and
working on that instead, should raise the "missing primitive" alarm.
In this case, I see no missing primitives for decoding a string, so
using a temp buffer looks an unnecessary complication to me.



  reply	other threads:[~2020-11-06 13:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-06  7:47 fixing url-unhex-string for unicode/multi-byte charsets Boruch Baum
2020-11-06  8:02 ` Eli Zaretskii
2020-11-06 10:27   ` Boruch Baum
2020-11-06 12:04     ` Eli Zaretskii
2020-11-06 12:28       ` Boruch Baum
2020-11-06 13:34         ` Eli Zaretskii [this message]
2020-11-06 14:59           ` Stefan Monnier
2020-11-06 15:04             ` Eli Zaretskii
2020-11-08  9:12               ` Boruch Baum
2020-11-08 13:39                 ` Stefan Monnier
2020-11-08 15:07                 ` Eli Zaretskii
2020-11-06 14:38     ` Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
2020-11-06  7:54 Boruch Baum
2020-11-06  8:05 ` Eli Zaretskii
2020-11-06 10:34   ` Boruch Baum
2020-11-06 12:06     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83lffe8v92.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=boruch_baum@gmx.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.