From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: creating unibyte strings Date: Fri, 22 Mar 2019 15:27:17 +0200 Message-ID: <837ecrrqdm.fsf@gnu.org> References: <83y3b4wdw9.fsf@gnu.org> <83tvhal45r.fsf@gnu.org> <83h8bwt1on.fsf@gnu.org> <83bm24t0hv.fsf@gnu.org> <83wokrs6en.fsf@gnu.org> Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="100863"; mail-complaints-to="usenet@blaine.gmane.org" Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 22 14:42:29 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h7KRH-000Q6z-F0 for ged-emacs-devel@m.gmane.org; Fri, 22 Mar 2019 14:42:27 +0100 Original-Received: from localhost ([127.0.0.1]:57554 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h7KRG-0001BM-C7 for ged-emacs-devel@m.gmane.org; Fri, 22 Mar 2019 09:42:26 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:58277) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h7KKt-0004Gr-Bt for emacs-devel@gnu.org; Fri, 22 Mar 2019 09:35:52 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:42120) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h7KCg-0001d0-Rr; Fri, 22 Mar 2019 09:27:23 -0400 Original-Received: from [176.228.60.248] (port=2152 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1h7KCg-0003fs-4l; Fri, 22 Mar 2019 09:27:22 -0400 In-reply-to: (message from Stefan Monnier on Fri, 22 Mar 2019 08:33:02 -0400) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:234560 Archived-At: > From: Stefan Monnier > Cc: emacs-devel@gnu.org > Date: Fri, 22 Mar 2019 08:33:02 -0400 > > >> Which reminds me: could someone add to the module API a primitive to > >> build a *unibyte* string? > > I don't like adding such a primitive. We don't want to proliferate > > unibyte strings in Emacs through that back door, because manipulating > > unibyte strings involves subtle issues many Lisp programmers are not > > aware of. > > I don't see what's subtle about "unibyte" strings, as long as you > understand that these are strings of *bytes* instead of strings > of *characters* (i.e. they're `int8[]` rather than `w_char_t[]`). That's the subtlety, right there. Handling such "strings" in Emacs Lisp can produce strange and unexpected results for someone who is not aware of the difference and its implications. > "Multibyte" strings are just as subtle (maybe more so even), yet we > rightly don't hesitate to offer a primitive way to construct them. Because we succeed to hide the subtleties in that case, so the multibyte nature is not really visible on the Lisp level, unless you try very hard to make it so. > > Instead, how about doing that via vectors of byte values? > > What's the advantage? That seems even more convoluted: create a Lisp > vector of the right size (i.e. 8x the size of your string on a 64bit > system), loop over your string turning each byte into a Lisp integer > (with the reverted API, this involves allocation of an `emacs_value` > box), then pass that to `concat`? That's one way, but I'm sure I can come up with a simpler one. ;-) > It's probably going to be even less efficient than going through utf-8 > and back. I doubt that. It's just an assignment. And it's a rare situation anyway. > Think about cases where the module receives byte strings from the disk > or the network and need to pass that to `decode-coding-string`. > And consider that we might be talking about megabytes of strings. They don't need to decode, they just need to arrange for it to be UTF-8.