Re: creating unibyte strings

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Stefan Monnier <monnier@iro.umontreal.ca>
To: emacs-devel@gnu.org
Subject: Re: creating unibyte strings
Date: Fri, 22 Mar 2019 11:37:59 -0400	[thread overview]
Message-ID: <jwvd0mjdjfa.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: 83sgvfq6yv.fsf@gnu.org

[ Boy this discussion is really frustrating.  I should have just added
  the damn feature and moved on.  Now I'm stuck in this morass!  ]

>> But this has nothing to do with the modules API: it's not more tricky
>> then when doing it purely in Elisp.  Are you seriously suggesting we
>> deprecate unibyte strings altogether?
> We won't deprecate unibyte strings, but we decided long ago to
> minimize their use.

Minimize their use doesn't mean that the places where they are used are
less important.  Sometimes what you need is a unibyte string and nothing
else will do.

It also doesn't explain why you want to make it extra cumbersome for
modules whereas Elisp can still do it conveniently.

>> Then I don't know what subtleties you're talking about.
>> Can you give some examples of the kinds of things you're thinking of?
> String concatenation, for one.  Regular expression search for another.
> And those just the ones I thought about in the first 5 seconds.

I don't see in which way these are better hidden for multibyte strings
than they are for unibyte strings.

>> >> > Instead, how about doing that via vectors of byte values?
>> >> What's the advantage?  That seems even more convoluted: create a Lisp
>> >> vector of the right size (i.e. 8x the size of your string on a 64bit
>> >> system), loop over your string turning each byte into a Lisp integer
>> >> (with the reverted API, this involves allocation of an `emacs_value`
>> >> box), then pass that to `concat`?
>> > That's one way, but I'm sure I can come up with a simpler one. ;-)
>> I'm all ears.
> Provide an Emacs primitive for that, then at least some of the
> awkwardness is gone.

No matter the primitive you provide, it means that to build a unibyte
Elisp strings out of a C char[], you're suggesting we go through an
extra copy that uses up 8x the memory.

With such inefficient interfaces, the whole idea of writing modules
becomes completely unattractive: better write a separate application and
communicate via pipes (then you can get unibyte strings in the natural
way).

> And/or use records.

I don't understand what you mean by "use records".

>> >> It's probably going to be even less efficient than going through utf-8
>> >> and back.
>> > I doubt that.  It's just an assignment.  And it's a rare situation
>> > anyway.
>> Why do you think it's rare?
> Because the number of Emacs features that require you to submit a
> unibyte string is very small.

Maybe rare in terms of number of lines of code that will want to do.
But that doesn't mean rare in terms of number of times it'll be executed
for a specific user, so performance considerations should apply.

>> 2- the C side string contains text in latin-1, big5, younameit.
>>    The module API provides nothing convenient.  Should we force our
>>    module to link to C-side coding-system libraries to convert to utf-8
>>    before passing it on to the Elisp, even though Emacs already has all
>>    the needed facilities?  Really?
>
> Yes, really.  Why is that a problem?  libiconv exists on every
> platform we support, and is easy to use.  Moreover, if you just want
> to convert a native string into another native string, using Emacs
> built-in en/decoding machinery is inconvenient, because it involves
> more copying than necessary.

The idea is not to use Emacs as a C library for text conversion, but
that if you receive a latin-1 string and want to pass it to Emacs, it
makes a lot of sense to do:

    make_bytestring (s)

and later

    (decode-coding-string s)

then having to link with libiconv.

>> 3- The C side string contains binary data (say PNG images).
>>    What does "arrange for it to be UTF-8" even mean?
> Nothing, since in this case there's no meaning to "decoding".

My point exactly: what should be done instead?

The solution currently used for this existing case is to call make_string
on it (even though it's not a utf-8 string) and then pass it through
(encode-coding-string s 'utf-8) which is ridiculously inefficient
compared to what make_bytestring would do.

        Stefan

next prev parent reply	other threads:[~2019-03-22 15:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 18:12 Oddities with dynamic modules Eli Zaretskii
2018-10-12 14:29 ` Kaushal Modi
2019-02-10 20:23 ` Philipp Stephani
2019-02-11 15:45   ` Eli Zaretskii
2019-02-11 16:04     ` Yuri Khan
2019-03-21 20:04       ` Philipp Stephani
2019-03-21 20:17         ` Eli Zaretskii
2019-03-21 20:32           ` Philipp Stephani
2019-03-21 20:46             ` Eli Zaretskii
2019-03-21 20:51               ` Philipp Stephani
2019-03-21 20:12     ` Philipp Stephani
2019-03-21 20:25       ` Eli Zaretskii
2019-03-21 20:34         ` Philipp Stephani
2019-03-21 20:51           ` Eli Zaretskii
2019-03-21 20:58             ` Philipp Stephani
2019-03-22  1:26               ` creating unibyte strings (was: Oddities with dynamic modules) Stefan Monnier
2019-03-22  7:41                 ` Eli Zaretskii
2019-03-22 12:33                   ` creating unibyte strings Stefan Monnier
2019-03-22 13:27                     ` Eli Zaretskii
2019-03-22 14:23                       ` Stefan Monnier
2019-03-22 15:11                         ` Eli Zaretskii
2019-03-22 15:37                           ` Stefan Monnier [this message]
2019-03-22 15:54                             ` Eli Zaretskii
2019-03-24 14:51                           ` Elias Mårtenson
2019-03-24 17:10                             ` Eli Zaretskii
2019-03-25  1:47                               ` Elias Mårtenson
2019-03-25  3:41                                 ` Eli Zaretskii
2019-03-26 10:23                                   ` Elias Mårtenson
2019-03-26 11:12                                     ` Stefan Monnier
2019-03-22  8:20               ` Oddities with dynamic modules Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvd0mjdjfa.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).