From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Re: Inadequate documentation of silly characters on screen. Date: Fri, 20 Nov 2009 12:37:13 +0900 Message-ID: <87vdh57tp2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20091118191258.GA2676@muc.de> <20091119082040.GA1720@muc.de> <87aayitvoy.fsf@wanchan.jasonrumney.net> <87ocmyf6so.fsf@catnip.gol.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1258687860 3893 80.91.229.12 (20 Nov 2009 03:31:00 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 20 Nov 2009 03:31:00 +0000 (UTC) Cc: Alan Mackenzie , Jason Rumney , Stefan Monnier , emacs-devel@gnu.org To: Miles Bader Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Nov 20 04:30:53 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NBKCy-0002Vw-Li for ged-emacs-devel@m.gmane.org; Fri, 20 Nov 2009 04:30:52 +0100 Original-Received: from localhost ([127.0.0.1]:48893 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBKCy-00043d-1A for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 22:30:52 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NBKCs-00043E-SK for emacs-devel@gnu.org; Thu, 19 Nov 2009 22:30:46 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NBKCo-000403-1H for emacs-devel@gnu.org; Thu, 19 Nov 2009 22:30:46 -0500 Original-Received: from [199.232.76.173] (port=38800 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NBKCn-0003zx-TO for emacs-devel@gnu.org; Thu, 19 Nov 2009 22:30:41 -0500 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:49276) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NBKCk-00050r-DI; Thu, 19 Nov 2009 22:30:38 -0500 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id A91D31537B6; Fri, 20 Nov 2009 12:30:36 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 406B61A25EE; Fri, 20 Nov 2009 12:37:14 +0900 (JST) In-Reply-To: <87ocmyf6so.fsf@catnip.gol.com> X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta29) "garbanzo" d20e0a45a4b2 XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:117319 Archived-At: Miles Bader writes: > Stefan Monnier writes: > > many strings start as unibyte even though they really should start > > right away as multibyte. > > That seems the fundamental problem here. > > It seems better to make unibyte strings something that can only be > created with some explicit operation. I don't see why you *need* them at all. Both pre-Emacs-integration Mule and XEmacs do fine with a multibyte representation for binary. Nobody has complained about performance of stream operations since Kyle Jones and Hrvoje Niksic bitched and we did some measurements in 1998 or so. It turns out that (as you'd expect) multibyte stream operations (except Boyer-Moore, which takes no performance hit :-) are about 50% slower because the representation is about 50% bigger. But this is rarely noticable to users. The noticable performance problems turned out to be a problem with Unix interfaces, not multibyte. The performance problem is in array operations, since (without caching) finding a particular character position is O(position). If you want to turn Emacs into an engine for general network programming and the like, yes, it would be good to have a separate unibyte type. This is what Python does, but Emacs would not have to go through the agony of switching from a unibyte representation for human-readable text to a multibyte representation the way Python does for Python 3. In that case, Emacs should not create them without an explicit operation, and there should be a separate notation such as #b"this is a unibyte string" (although #b may already be taken?) for literals.