From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Harald Hanche-Olsen Newsgroups: gmane.emacs.devel Subject: Re: (aset UNIBYTE-STRING MULTIBYTE-CHAR) Date: Thu, 15 May 2008 08:11:46 +0200 (CEST) Message-ID: <20080515.081146.48179041.hanche@math.ntnu.no> References: <20080514.145043.228449419.hanche@math.ntnu.no> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: Text/Plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1210831934 15540 80.91.229.12 (15 May 2008 06:12:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 15 May 2008 06:12:14 +0000 (UTC) Cc: emacs-devel@gnu.org To: monnier@iro.umontreal.ca Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu May 15 08:12:49 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1JwWhk-0000kf-KH for ged-emacs-devel@m.gmane.org; Thu, 15 May 2008 08:12:40 +0200 Original-Received: from localhost ([127.0.0.1]:41399 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JwWh1-00047M-Gt for ged-emacs-devel@m.gmane.org; Thu, 15 May 2008 02:11:55 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JwWgw-00044x-SM for emacs-devel@gnu.org; Thu, 15 May 2008 02:11:50 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JwWgw-000441-2l for emacs-devel@gnu.org; Thu, 15 May 2008 02:11:50 -0400 Original-Received: from [199.232.76.173] (port=58454 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JwWgv-00043q-PM for emacs-devel@gnu.org; Thu, 15 May 2008 02:11:49 -0400 Original-Received: from abel.math.ntnu.no ([129.241.15.50]:65378) by monty-python.gnu.org with smtp (Exim 4.60) (envelope-from ) id 1JwWgv-0007sl-0c for emacs-devel@gnu.org; Thu, 15 May 2008 02:11:49 -0400 Original-Received: (qmail 9074 invoked from network); 15 May 2008 06:11:47 -0000 Original-Received: from gauss.math.ntnu.no (HELO localhost) (hanche@129.241.15.102) by abel.math.ntnu.no with ESMTPA; 15 May 2008 06:11:47 -0000 In-Reply-To: X-URL: http://www.math.ntnu.no/~hanche/ X-Mailer: Mew version 5.2.51 on Emacs 23.0.0 / Mule 6.0 (HANACHIRUSATO) X-detected-kernel: by monty-python.gnu.org: Solaris 8 (1) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:97184 Archived-At: + Stefan Monnier : > I just want to see more > examples to better understand the context and try to figure out what's > the right way to fix the problem. Notice that in your example, > > (setq foo (make-string 4 ?a)) > (aset foo 1 ?å) > (aset foo 1 ?€) ; => Error: args out of range > > the problem comes from the fact that now that we use Unicode, ?å = 229. > So this integer is also the code of a byte, which is why the first aset > succeeds. Right. Or perhaps more accurately, it is why the first aset succeeds without automagically converting foo to a multibyte string. > Maybe the better answer is for `make-string' to always create > multibyte strings, just like `string' now does. Hmm. Except it doesn't, quite: (multibyte-string-p (string ?a ?b ?c ?d)) => nil (multibyte-string-p (string ?a ?b ?c ?å)) => t It seems to be the presence of non-ASCII that triggers the creation of a multibyte string, even though in this case a unibyte string could also hold the result. In fact, the current behaviours of string and make-string are quite similar: (multibyte-string-p (make-string 3 ?a)) => nil (multibyte-string-p (make-string 3 ?å)) => t > In any case if you stay far away from `aset on strings' your life will > be generally better, the birds will sing and the sun will shine. 8) I am willing to believe that. > > The most basic way to alter the contents of an existing string is with > > `aset' (*note Array Functions::). `(aset STRING IDX CHAR)' stores CHAR > > into STRING at index IDX. Each character occupies one or more bytes, > > and if CHAR needs a different number of bytes from the character > > already present at that index, `aset' signals an error. > > > That last bit actually seems to be outdated: An error is not ALWAYS > > signaled in the indicated situation, only sometimes. > > I hope the text is correct, if not, please report it as a bug. Okay. I'll run it past you here first, though, since my understanding of multibyte strings is still patchy. This succeeds and returns "€a€": (let ((str (make-string 3 ?€))) (aset str 1 ?a) str) If I am not mistaken ?€ needs two bytes (or more?) while ?a needs one, right? And since two (or more) is different from one, the above text claims that aset signals an error? Or is my understanding wrong? There is code in aset to shuffle the contents of a multibyte strings around in case of a size mismatch, however: if (prev_bytes != new_bytes) { /* We must relocate the string data. */ > > (defun mew-addrstr-parse-syntax-list (str sep addrp &optional depth allow-spc) > > (when str > > (let* ((i 0) (len (length str)) > > (par-cnt 0) (tmp-cnt 0) (sep-cnt 0) > > (tmp (mew-make-string len)) > > c ret prevc) > > (catch 'max > > (while (< i len) > > (setq c (aref str i)) ; <= problem occurs here > > ... deleted ...))))) > > Hmm... I don't see any `aset'. Rats. Not enough caffeine, too much work. The deleted code is a big (cond ...), about 80 lines long, that I didn't want to burden the list with (it performs parsing after all). I assure you that it contains (aset tmp tmp-cnt c) in multiple places. It could have achieved the same result by consing up a list of the characters and using (string (nreverse char-list)), or perhaps by appending chars to a temporary buffer, but it didn't. - Harald