unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: emacs-devel@gnu.org
Subject: Re: creating unibyte strings
Date: Fri, 22 Mar 2019 15:27:17 +0200	[thread overview]
Message-ID: <837ecrrqdm.fsf@gnu.org> (raw)
In-Reply-To: <jwvh8bvhzhs.fsf-monnier+emacs@gnu.org> (message from Stefan Monnier on Fri, 22 Mar 2019 08:33:02 -0400)

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: emacs-devel@gnu.org
> Date: Fri, 22 Mar 2019 08:33:02 -0400
> 
> >> Which reminds me: could someone add to the module API a primitive to
> >> build a *unibyte* string?
> > I don't like adding such a primitive.  We don't want to proliferate
> > unibyte strings in Emacs through that back door, because manipulating
> > unibyte strings involves subtle issues many Lisp programmers are not
> > aware of.
> 
> I don't see what's subtle about "unibyte" strings, as long as you
> understand that these are strings of *bytes* instead of strings
> of *characters* (i.e. they're `int8[]` rather than `w_char_t[]`).

That's the subtlety, right there.  Handling such "strings" in Emacs
Lisp can produce strange and unexpected results for someone who is not
aware of the difference and its implications.

> "Multibyte" strings are just as subtle (maybe more so even), yet we
> rightly don't hesitate to offer a primitive way to construct them.

Because we succeed to hide the subtleties in that case, so the
multibyte nature is not really visible on the Lisp level, unless you
try very hard to make it so.

> > Instead, how about doing that via vectors of byte values?
> 
> What's the advantage?  That seems even more convoluted: create a Lisp
> vector of the right size (i.e. 8x the size of your string on a 64bit
> system), loop over your string turning each byte into a Lisp integer
> (with the reverted API, this involves allocation of an `emacs_value`
> box), then pass that to `concat`?

That's one way, but I'm sure I can come up with a simpler one. ;-)

> It's probably going to be even less efficient than going through utf-8
> and back.

I doubt that.  It's just an assignment.  And it's a rare situation
anyway.

> Think about cases where the module receives byte strings from the disk
> or the network and need to pass that to `decode-coding-string`.
> And consider that we might be talking about megabytes of strings.

They don't need to decode, they just need to arrange for it to be
UTF-8.



  reply	other threads:[~2019-03-22 13:27 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 18:12 Oddities with dynamic modules Eli Zaretskii
2018-10-12 14:29 ` Kaushal Modi
2019-02-10 20:23 ` Philipp Stephani
2019-02-11 15:45   ` Eli Zaretskii
2019-02-11 16:04     ` Yuri Khan
2019-03-21 20:04       ` Philipp Stephani
2019-03-21 20:17         ` Eli Zaretskii
2019-03-21 20:32           ` Philipp Stephani
2019-03-21 20:46             ` Eli Zaretskii
2019-03-21 20:51               ` Philipp Stephani
2019-03-21 20:12     ` Philipp Stephani
2019-03-21 20:25       ` Eli Zaretskii
2019-03-21 20:34         ` Philipp Stephani
2019-03-21 20:51           ` Eli Zaretskii
2019-03-21 20:58             ` Philipp Stephani
2019-03-22  1:26               ` creating unibyte strings (was: Oddities with dynamic modules) Stefan Monnier
2019-03-22  7:41                 ` Eli Zaretskii
2019-03-22 12:33                   ` creating unibyte strings Stefan Monnier
2019-03-22 13:27                     ` Eli Zaretskii [this message]
2019-03-22 14:23                       ` Stefan Monnier
2019-03-22 15:11                         ` Eli Zaretskii
2019-03-22 15:37                           ` Stefan Monnier
2019-03-22 15:54                             ` Eli Zaretskii
2019-03-24 14:51                           ` Elias Mårtenson
2019-03-24 17:10                             ` Eli Zaretskii
2019-03-25  1:47                               ` Elias Mårtenson
2019-03-25  3:41                                 ` Eli Zaretskii
2019-03-26 10:23                                   ` Elias Mårtenson
2019-03-26 11:12                                     ` Stefan Monnier
2019-03-22  8:20               ` Oddities with dynamic modules Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=837ecrrqdm.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).