From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Compiling Elisp to a native code with a GCC plugin Date: Fri, 17 Sep 2010 22:57:13 +0200 Message-ID: <83wrqkytp2.fsf@gnu.org> References: <87bp805ecr.fsf@gmail.com> <87iq26z97e.fsf@uwakimon.sk.tsukuba.ac.jp> <87y6b0yi8o.fsf@uwakimon.sk.tsukuba.ac.jp> <87sk18bioh.fsf@lola.goethe.zz> <87fwx8bhkq.fsf@lola.goethe.zz> <8739t8bepl.fsf@lola.goethe.zz> <87tylo9vou.fsf@lola.goethe.zz> <87pqwc9tnm.fsf@lola.goethe.zz> <87hbhoxkuw.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1284757090 6074 80.91.229.12 (17 Sep 2010 20:58:10 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 17 Sep 2010 20:58:10 +0000 (UTC) Cc: dak@gnu.org, emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 17 22:58:07 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Owi0T-00023a-Tz for ged-emacs-devel@m.gmane.org; Fri, 17 Sep 2010 22:58:06 +0200 Original-Received: from localhost ([127.0.0.1]:41196 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Owi0S-00042p-U5 for ged-emacs-devel@m.gmane.org; Fri, 17 Sep 2010 16:58:05 -0400 Original-Received: from [140.186.70.92] (port=38330 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Owi0J-00042E-Lm for emacs-devel@gnu.org; Fri, 17 Sep 2010 16:57:56 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Owi0I-0004xA-EK for emacs-devel@gnu.org; Fri, 17 Sep 2010 16:57:55 -0400 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:52616) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Owi0G-0004wX-7H; Fri, 17 Sep 2010 16:57:52 -0400 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0L8W00500SPL9Z00@a-mtaout20.012.net.il>; Fri, 17 Sep 2010 22:57:09 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([77.126.210.149]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L8W004U0SV7QRC0@a-mtaout20.012.net.il>; Fri, 17 Sep 2010 22:57:08 +0200 (IST) In-reply-to: <87hbhoxkuw.fsf@uwakimon.sk.tsukuba.ac.jp> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:130364 Archived-At: > From: "Stephen J. Turnbull" > Date: Sat, 18 Sep 2010 03:53:27 +0900 > Cc: emacs-devel@gnu.org > > Actually, there's an exceptional case: if both strings are pure ASCII. > In that case it might be possible that one string is multibyte and the > other unibyte, while the numbers of characters and of bytes are equal. A unibyte string in Emacs has its `size_byte' member set to a negative value: /* Mark STR as a unibyte string. */ #define STRING_SET_UNIBYTE(STR) \ do { if (EQ (STR, empty_multibyte_string)) \ (STR) = empty_unibyte_string; \ else XSTRING (STR)->size_byte = -1; } while (0) By contrast, a multibyte string holds there the number of bytes in its internal representation. So a pure ASCII string could be unibyte or multibyte, and the `size_byte' member will be negative in the former case and positive in the latter case. However, AFAIK Emacs always makes a unibyte string if all the characters are pure ASCII. So this does not matter in practice. > The example you gave proves nothing, however. In fact, when that > string is presented by `string-as-multibyte', ?\351 will be converted > to a private space character in Unicode and therefore will have more > than one byte in its representation. Thus the length in bytes of the > string (as multibyte) will be 7 (or maybe more, I forget which private > space naked bytes live in). Here's one way to get byte length of a > string: > > (defun string-byte-count (s) > (length (if (string-multibyte-p s) (encode-coding-string s 'utf-8) s))) See above: this is not accurate.