From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Why does using aset sometimes output raw bytes? Date: Sun, 09 Dec 2018 19:47:03 +0200 Message-ID: <83pnua384o.fsf@gnu.org> References: <87h8fmohmo.fsf@gmx.net> <83y38y3exe.fsf@gnu.org> <87d0qaog92.fsf@gmx.net> <83r2eq39q7.fsf@gnu.org> <87wooimwr9.fsf@gmx.net> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1544377549 26431 195.159.176.226 (9 Dec 2018 17:45:49 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 9 Dec 2018 17:45:49 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sun Dec 09 18:45:45 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gW39F-0006nI-3v for geh-help-gnu-emacs@m.gmane.org; Sun, 09 Dec 2018 18:45:45 +0100 Original-Received: from localhost ([::1]:56151 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gW3BL-0002h9-Fj for geh-help-gnu-emacs@m.gmane.org; Sun, 09 Dec 2018 12:47:55 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:40466) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gW3An-0002gr-Pw for help-gnu-emacs@gnu.org; Sun, 09 Dec 2018 12:47:22 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gW3Aj-0002je-QV for help-gnu-emacs@gnu.org; Sun, 09 Dec 2018 12:47:21 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44407) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gW3Aj-0002is-MH for help-gnu-emacs@gnu.org; Sun, 09 Dec 2018 12:47:17 -0500 Original-Received: from [176.228.60.248] (port=4506 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1gW3Aj-0005O8-9f for help-gnu-emacs@gnu.org; Sun, 09 Dec 2018 12:47:17 -0500 In-reply-to: <87wooimwr9.fsf@gmx.net> (message from Stephen Berman on Sun, 09 Dec 2018 18:32:26 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:118946 Archived-At: > From: Stephen Berman > Cc: help-gnu-emacs@gnu.org > Date: Sun, 09 Dec 2018 18:32:26 +0100 > > >> why are raw bytes inserted only with some > >> multibyte strings (e.g. with "äöüß" but not with "ſðđŋ")? > > > > Because ſ doesn't fit in a single byte, so when you insert it, the > > entire string is made multibyte, and then the other characters are > > inserted into a multibyte string. > > This seems to imply that ä, ö, ü and ß do fit in a single byte? Yet > (multibyte-string-p "äöüß") returns t. So I still don't understand. Look at the codepoints: the above are all less than FF hex, so they can fit in a single byte. By contrast, ſ is 17F hex, more than a single byte can hold. So inserting ſ into a unibyte string _must_ first make that string multibyte, whereas inserting ä etc. can leave it unibyte. Why (multibyte-string-p "äöüß") returns t is an unrelated issue: it has to do with how the Lisp reader reads the string. The result is a multibyte string, where ä is represented by its UTF-8 sequence and not by its single-byte codepoint E4 hex. If you want a unibyte string with these bytes, use (multibyte-string-p "\344\366\374\337") instead. > >> is there some way in Lisp to say "treat the value of s0 as multibyte > >> (regardless of what characters it contains)"? > > > > Not that I know of, no. And I don't really understand how could such > > a thing exist: how do you "treat as multibyte" an arbitrary byte that > > is beyond 127 decimal? > > Actually, for the code I was experimenting with, it seems to suffice to > use (make-string len 128) as the input to aset (before, I had used > (make-string len 32), which led to raw bytes being displayed). Not sure I understand what you mean by "suffice". Feel free to ask questions if there are some left.