all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Joseph Turner <joseph@ushin.org>
Cc: emacs-devel@gnu.org, schwab@suse.de, adam@alphapapa.net
Subject: Re: How to get buffer byte length (not number of characters)?
Date: Thu, 22 Aug 2024 07:06:32 +0300	[thread overview]
Message-ID: <86plq1td4n.fsf@gnu.org> (raw)
In-Reply-To: <871q2hfn7c.fsf@ushin.org> (message from Joseph Turner on Wed, 21 Aug 2024 16:52:39 -0700)

> From: Joseph Turner <joseph@ushin.org>
> Cc: emacs-devel@gnu.org,  schwab@suse.de,  adam@alphapapa.net
> Date: Wed, 21 Aug 2024 16:52:39 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Currently, plz.el always creates the curl subprocess like so:
> >> 
> >> (make-process :coding 'binary ...)
> >> 
> >> https://git.savannah.gnu.org/cgit/emacs/elpa.git/tree/plz.el?h=externals-release/plz#n519
> >> 
> >> Does this DTRT?
> >
> > It could be TRT if plz.el encodes the buffer text "by hand" before
> > sending the results to curl and decodes it when it receives text from
> > curl.  Which I think is what happens there.
> 
> plz.el does not manually encode buffer text *within Emacs* when sending
> requests to curl, but by default, plz.el sends data to curl with --data,
> which tells curl to strip CR and newlines.  With the :body-type 'binary
> argument, plz.el instead uses --data-binary, which does no conversion.

Newlines is a relatively minor issue (although it, too, needs to be
considered).  My main concern is with the text encoding.  How can it
be TRT to use 'binary when sending buffer text to curl? that would
mean we are more-or-less always sending the internal representation of
characters, which is superset of UTF-8.  If the data was originally
encoded in anything but UTF-8, reading it into Emacs and then sending
it back will change the byte sequences from that other encoding to
UTF-8.  Moreover, 'binary does not guarantee that the result is valid
UTF-8.

So maybe I misunderstand how these plz.el facilities are used, but up
front this sounds like a mistake.

> We don't want to strip newlines from hyperdrive files, so we always use
> :body-type 'binary when sending buffer contents.  Should hyperdrive.el
> encode data with `buffer-file-coding-system' before passing to plz.el?

I would think so, but maybe we should bring the plz.el developers on
board of this discussion.

> When receiving text from curl, plz.el optionally decodes the text
> according to the charset in the 'Content-Type' header, e.g., "text/html;
> charset=utf-8" or utf-8 if no charset is found.

By "optionally" you mean that it doesn't always happen, except if the
caller requests that?  If so, the caller of plz.el should decode the
text manually before using it in user-facing features.

> Perhaps hyperdrive.el should check the 'Content-Type' header charset,
> then fallback to guessing the coding system based on filename and file
> contents with `set-auto-coding' (to avoid decoding images, etc.), and
> then finally fallback to something else?

Probably.  But then I don't know anything about hyperdrive.el, either.
If it copies text between files or URLs without showing it to the
user, then the best strategy is indeed not to decode and encode stuff,
but handle it as a stream of raw bytes.  (In that case, my suggestion
would be to use unibyte buffers and strings for temporarily storing
and processing these raw bytes in Emacs.)  But if the text is somehow
shown to the user, it must be decoded to be displayed correctly by
Emacs.  And then it must be encoded back when writing it back to the
external storage.



  reply	other threads:[~2024-08-22  4:06 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-20  7:10 How to get buffer byte length (not number of characters)? Joseph Turner
2024-08-20  7:51 ` Joseph Turner
2024-08-20 11:20   ` Eli Zaretskii
2024-08-20 11:15 ` Eli Zaretskii
2024-08-21  9:20   ` Joseph Turner
2024-08-21 17:47     ` Eli Zaretskii
2024-08-21 23:52       ` Joseph Turner
2024-08-22  4:06         ` Eli Zaretskii [this message]
2024-08-22  7:24           ` Joseph Turner
2024-08-22 11:04             ` Eli Zaretskii
2024-08-22 18:29               ` Joseph Turner
2024-08-22 18:44                 ` Eli Zaretskii
2024-08-22 19:32                   ` tomas
2024-08-23  3:56                   ` Joseph Turner
2024-08-23  7:02                     ` Eli Zaretskii
2024-08-23  7:37                       ` Joseph Turner
2024-08-23 12:34                         ` Eli Zaretskii
2024-08-23  7:43                       ` Joseph Turner
2024-08-23 12:38                         ` Eli Zaretskii
2024-08-23 16:59                           ` Joseph Turner
2024-08-23 17:35                             ` Eli Zaretskii
2024-08-23 20:37                               ` Joseph Turner
2024-08-24  6:14                     ` Joseph Turner
2024-08-22 12:26             ` Adam Porter
2024-08-22 12:47               ` tomas
2024-08-23  6:28                 ` Adam Porter
2024-08-22 13:50               ` Eli Zaretskii
2024-08-23  6:31                 ` Adam Porter
2024-08-23  6:51                   ` Eli Zaretskii
2024-08-23  7:07                   ` Joseph Turner
2024-08-23  7:58                     ` Joseph Turner
2024-08-22  7:09     ` Andreas Schwab
2024-08-22  7:30       ` Joseph Turner
2024-08-22 11:05         ` Eli Zaretskii
2024-08-26  6:37   ` Joseph Turner
2024-08-26  6:49     ` Joseph Turner
2024-08-26 11:22       ` Eli Zaretskii
2024-08-27  4:48         ` Joseph Turner
2024-08-26 11:20     ` Eli Zaretskii
2024-08-20 11:24 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86plq1td4n.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=adam@alphapapa.net \
    --cc=emacs-devel@gnu.org \
    --cc=joseph@ushin.org \
    --cc=schwab@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.