From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Camm Maguire Newsgroups: gmane.lisp.gcl.devel,gmane.emacs.devel Subject: utf8 and emacs text/string multibyte representation Date: Wed, 29 Oct 2014 10:04:58 -0400 Message-ID: <87zjcfx985.fsf_-_@maguirefamily.org> References: <87wq7jxc7d.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1414591566 6255 80.91.229.3 (29 Oct 2014 14:06:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 29 Oct 2014 14:06:06 +0000 (UTC) To: emacs-devel@gnu.org,gcl-devel@gnu.org Original-X-From: gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org Wed Oct 29 15:06:00 2014 Return-path: Envelope-to: gnu-gcl-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XjTsu-0000nC-8y for gnu-gcl-devel@m.gmane.org; Wed, 29 Oct 2014 15:06:00 +0100 Original-Received: from localhost ([::1]:46355 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XjTst-0004Vi-L2 for gnu-gcl-devel@m.gmane.org; Wed, 29 Oct 2014 10:05:59 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39689) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XjTsl-0004Lq-6E for gcl-devel@gnu.org; Wed, 29 Oct 2014 10:05:57 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XjTsf-0006o9-2a for gcl-devel@gnu.org; Wed, 29 Oct 2014 10:05:51 -0400 Original-Received: from vms173023pub.verizon.net ([206.46.173.23]:38261) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XjTsS-0006kr-EF; Wed, 29 Oct 2014 10:05:32 -0400 Original-Received: from localhost.m.enhanced.com ([173.61.191.70]) by vms173023.mailsrvcs.net (Oracle Communications Messaging Server 7.0.5.32.0 64bit (built Jul 16 2014)) with ESMTPA id <0NE7000YKLSECUM0@vms173023.mailsrvcs.net>; Wed, 29 Oct 2014 09:05:03 -0500 (CDT) X-CMAE-Score: 0 X-CMAE-Analysis: v=2.1 cv=GLe/yVJP c=1 sm=1 tr=0 a=/u9AJkq9Lu4W7WiJwJyTEw==:117 a=1r3tstjE1_UA:10 a=LdTvEE7h3esA:10 a=kj9zAlcOel0A:10 a=9N09Ue-cAAAA:8 a=85uBIQG4AAAA:8 a=oR5dmqMzAAAA:8 a=-9mUelKeXuEA:10 a=PPb1gXbL0pMKNm1r1DcA:9 a=CjuIK1q_8ugA:10 Original-Received: from camm by localhost.m.enhanced.com with local (Exim 4.80) (envelope-from ) id 1XjTru-0006NJ-KG; Wed, 29 Oct 2014 10:04:58 -0400 In-reply-to: <87wq7jxc7d.fsf@gnu.org> (Jose E. Marchesi's message of "Wed, 29 Oct 2014 14:00:38 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 206.46.173.23 X-BeenThere: gcl-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org Original-Sender: gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.gcl.devel:8784 gmane.emacs.devel:175993 Archived-At: Greetings! I've recently been considering supporting unicode in gcl by representing strings internally in utf8. It appears that emacs does the same or similar. Apart from the obvious memory footprint benefits, I'd like to ask what other advantages/disadvantages have been discovered. Much of the utf8 literature emphasizes that most algorithms can proceed conventionally in byte-wise fashion, including lexicographical ordering comparisons, given that almost all jobs are sequential, at least initially. A cached internal pointer storing the last referenced codepoint offset makes access essentially O(1). Yet setting string elements can trigger reallocations/memmove operations. While these can be aggregated over the setting of multiple elements, operations like nreverse look ridiculous if left in terms of calls to aref and aset. Thoughts, advice and experiences most appreciated. Take care, -- Camm Maguire camm@maguirefamily.org ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah