From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: creating unibyte strings Date: Fri, 22 Mar 2019 08:33:02 -0400 Message-ID: References: <83y3b4wdw9.fsf@gnu.org> <83tvhal45r.fsf@gnu.org> <83h8bwt1on.fsf@gnu.org> <83bm24t0hv.fsf@gnu.org> <83wokrs6en.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="53209"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 22 13:34:56 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1h7JNv-000DZn-T4 for ged-emacs-devel@m.gmane.org; Fri, 22 Mar 2019 13:34:56 +0100 Original-Received: from localhost ([127.0.0.1]:56779 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h7JNu-000423-To for ged-emacs-devel@m.gmane.org; Fri, 22 Mar 2019 08:34:54 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:42532) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h7JMA-0003Wm-Mc for emacs-devel@gnu.org; Fri, 22 Mar 2019 08:33:07 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h7JM9-0007Ns-Ij for emacs-devel@gnu.org; Fri, 22 Mar 2019 08:33:06 -0400 Original-Received: from mail01.iro.umontreal.ca ([132.204.25.201]:39540) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h7JM9-0007Lh-5a for emacs-devel@gnu.org; Fri, 22 Mar 2019 08:33:05 -0400 Original-Received: from mail01.iro.umontreal.ca (mail01.iro.umontreal.ca [127.0.0.1]) by mail01.iro.umontreal.ca (Postfix) with ESMTP id 9D307813AE04 for ; Fri, 22 Mar 2019 08:33:03 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; h=content-type:content-type:mime-version:user-agent:in-reply-to :date:date:references:message-id:subject:subject:to:from:from; s=dkim; t=1553257983; x=1554121984; bh=NUxA9DLaIOv4JSu8OLgFolkb pYJErUn/ireIHqKLo9U=; b=FG/Okh+pvKP9chlXHzy3x572WthgK+zoRMkxx+/y NRz8Wp3aBg4IFBoxkGD3KVeRLG4/va7zN0AKAObBPsiO7hw/L307h1E8DJRgcpFt 0tVqWe/SxpYwgjVieKh9fsUATV0mXRwE//N751FX3OX48c5CkK2hmGt2U+Jkperd c9To2M3DUF7IoiHok6ZMqjG5GkF9SXpruGDGq4TiqD5EJchCzwGzEbH/t4jb9l26 /wfn9xXCszy/2jij9w0v/KdOjwd1VKZlEa7C1Ht3LaWHv9YGVEfcrN3nNa6YKMsC WuhtElWhV9EKhucTkczKDkF0RzTsroINO/gaDOWV6IC6eQ== X-Virus-Scanned: amavisd-new at iro.umontreal.ca Original-Received: from mail01.iro.umontreal.ca ([127.0.0.1]) by mail01.iro.umontreal.ca (mail01.iro.umontreal.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mqzIkpAlbDjY for ; Fri, 22 Mar 2019 08:33:03 -0400 (EDT) Original-Received: from pastel (75-119-242-252.dsl.teksavvy.com [75.119.242.252]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id DBD7F813ADEC; Fri, 22 Mar 2019 08:33:02 -0400 (EDT) In-Reply-To: <83wokrs6en.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 22 Mar 2019 10:41:04 +0300") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 132.204.25.201 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:234549 Archived-At: >> Which reminds me: could someone add to the module API a primitive to >> build a *unibyte* string? > I don't like adding such a primitive. We don't want to proliferate > unibyte strings in Emacs through that back door, because manipulating > unibyte strings involves subtle issues many Lisp programmers are not > aware of. I don't see what's subtle about "unibyte" strings, as long as you understand that these are strings of *bytes* instead of strings of *characters* (i.e. they're `int8[]` rather than `w_char_t[]`). "Multibyte" strings are just as subtle (maybe more so even), yet we rightly don't hesitate to offer a primitive way to construct them. > Instead, how about doing that via vectors of byte values? What's the advantage? That seems even more convoluted: create a Lisp vector of the right size (i.e. 8x the size of your string on a 64bit system), loop over your string turning each byte into a Lisp integer (with the reverted API, this involves allocation of an `emacs_value` box), then pass that to `concat`? It's probably going to be even less efficient than going through utf-8 and back. Think about cases where the module receives byte strings from the disk or the network and need to pass that to `decode-coding-string`. And consider that we might be talking about megabytes of strings. Stefan