unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: <tomas@tuxteam.de>
To: Eli Zaretskii <eliz@gnu.org>
Cc: help-gnu-emacs@gnu.org
Subject: Re: url-retrieve and encoding
Date: Mon, 12 Feb 2024 06:30:11 +0100	[thread overview]
Message-ID: <Zcms4+J8wnsb16vj@tuxteam.de> (raw)
In-Reply-To: <86eddist6k.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 1827 bytes --]

On Sun, Feb 11, 2024 at 09:21:39PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 11 Feb 2024 18:49:25 +0100
> > From: tomas@tuxteam.de
> > Cc: help-gnu-emacs@gnu.org
> > 
> > > > Yes: decode-coding-region.
> > > 
> > > Ahhh -- thanks a bunch for this one! How could I have missed it.
> > > 
> > > > > (...) But that feels
> > > > > a bit... gross:
> > > > 
> > > > Indeed.  Why didn't you try decoding to begin with?
> > 
> > OK, now I can answer this question more precisely: actually, I'd
> > been there already and was coufused that the function did... nothing.
> > 
> > Now at least I know why: the buffer is unibyte.
> 
> The solution is (quite obviously) not to do that in-place.

I guessed so, thanks for the clarification.

> Alternatively, you could make the buffer multibyte in advance, but
> that's tricky, so I don't recommend that.

If url-retrieve had a "callback interface", as processes have, with
their filters, then one could arrange things for the decoding to happen
there. Actually, that's what's going on in the background, I guess.

> > Its content /is/ utf-8.
> 
> That's not really 100% accurate, although it's close.  If the unibyte
> buffer includes byte sequences that are not valid UTF-8, decoding does
> change the byte stream in those places.

Of course, you are right. The HTTP headers /state/ it to be utf-8. It's
like trusting the label on the bottle :-)

> > But then... I can do things "in buffer" by simply invoking
> > (toggle-enable-multibyte-characters t). At least, it seems to
> > work. But... is it a good idea?
> 
> No.  Always call the decode function, never play with
> multi-uni-byteness, because the latter will eventually surprise (or
> bite) you.

I guessed so. Thanks for your patience (and for helping me learn).

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2024-02-12  5:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-10 19:31 url-retrieve and encoding tomas
2024-02-10 19:41 ` Eli Zaretskii
2024-02-10 19:49   ` tomas
2024-02-11 17:49     ` tomas
2024-02-11 19:21       ` Eli Zaretskii
2024-02-12  5:30         ` tomas [this message]
2024-02-10 20:51 ` Tim Landscheidt
2024-02-11  6:30   ` tomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zcms4+J8wnsb16vj@tuxteam.de \
    --to=tomas@tuxteam.de \
    --cc=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).