From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Camm Maguire <camm@maguirefamily.org>
Newsgroups: gmane.lisp.gcl.devel,gmane.emacs.devel
Subject: Re: utf8 and emacs text/string multibyte representation
Date: Fri, 31 Oct 2014 14:05:20 -0400
Message-ID: <87h9ykazdr.fsf@maguirefamily.org>
References: <jwvioj39hx0.fsf-monnier+emacs@gnu.org> <87wq7jxc7d.fsf@gnu.org>
	<87zjcfx985.fsf_-_@maguirefamily.org> <83mw8f0w08.fsf@gnu.org>
	<87oasu3m72.fsf@maguirefamily.org> <83bnou26is.fsf@gnu.org>
	<87bnotwsqn.fsf@maguirefamily.org> <83y4rxzgmm.fsf@gnu.org>
	<87lhnxo73l.fsf@maguirefamily.org> <83wq7hzf9t.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1414785215 21896 80.91.229.3 (31 Oct 2014 19:53:35 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 31 Oct 2014 19:53:35 +0000 (UTC)
Cc: gcl-devel@gnu.org, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org Fri Oct 31 20:53:29 2014
Return-path: <gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org>
Envelope-to: gnu-gcl-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org>)
	id 1XkIGG-00047P-Uy
	for gnu-gcl-devel@m.gmane.org; Fri, 31 Oct 2014 20:53:29 +0100
Original-Received: from localhost ([::1]:42176 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org>)
	id 1XkIGG-0008P1-Hv
	for gnu-gcl-devel@m.gmane.org; Fri, 31 Oct 2014 15:53:28 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:32798)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <camm@maguirefamily.org>) id 1XkGaf-0005Hh-Ix
	for gcl-devel@gnu.org; Fri, 31 Oct 2014 14:06:30 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <camm@maguirefamily.org>) id 1XkGaa-0005kk-Bd
	for gcl-devel@gnu.org; Fri, 31 Oct 2014 14:06:25 -0400
Original-Received: from vms173019pub.verizon.net ([206.46.173.19]:34164)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <camm@maguirefamily.org>)
	id 1XkGaI-0005gy-7q; Fri, 31 Oct 2014 14:06:02 -0400
Original-Received: from localhost.m.enhanced.com ([173.61.191.70])
	by vms173019.mailsrvcs.net
	(Oracle Communications Messaging Server 7.0.5.32.0 64bit (built Jul 16
	2014))
	with ESMTPA id <0NEB00FKCM901R70@vms173019.mailsrvcs.net>; Fri,
	31 Oct 2014 13:05:43 -0500 (CDT)
X-CMAE-Score: 0
X-CMAE-Analysis: v=2.1 cv=GLe/yVJP c=1 sm=1 tr=0	a=/u9AJkq9Lu4W7WiJwJyTEw==:117
	a=1r3tstjE1_UA:10 a=LdTvEE7h3esA:10	a=kj9zAlcOel0A:10 a=9N09Ue-cAAAA:8
	a=85uBIQG4AAAA:8 a=oR5dmqMzAAAA:8	a=-9mUelKeXuEA:10 a=mDV3o1hIAAAA:8
	a=ZAGHukeXuiKxStLX9qkA:9	a=PzxVPkERg0I16-ZP:21 a=ZFWHTWuafX_3CQRF:21
	a=CjuIK1q_8ugA:10
Original-Received: from camm by localhost.m.enhanced.com with local (Exim 4.80)
	(envelope-from <camm@maguirefamily.org>)	id 1XkGZc-0000h5-BD; Fri,
	31 Oct 2014 14:05:20 -0400
In-reply-to: <83wq7hzf9t.fsf@gnu.org>
	(Eli Zaretskii's message of "Thu, 30 Oct	2014 18:35:58 +0200")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux)
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 206.46.173.19
X-BeenThere: gcl-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <gcl-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/gcl-devel>,
	<mailto:gcl-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/gcl-devel>
List-Post: <mailto:gcl-devel@gnu.org>
List-Help: <mailto:gcl-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/gcl-devel>,
	<mailto:gcl-devel-request@gnu.org?subject=subscribe>
Errors-To: gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org
Original-Sender: gcl-devel-bounces+gnu-gcl-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.lisp.gcl.devel:8800 gmane.emacs.devel:176148
Archived-At: <http://permalink.gmane.org/gmane.lisp.gcl.devel/8800>

Thanks so much!

Been discussing this elsewhere, and its come to my attention that not
only do all unicode code-points not fit into UTF-16, but all unicode
characters don't fit into unicode code-points :-).  Presumably this is
why emacs expanded to 22bits?  In any case, it makes clear what one
correspondent said, that unicode must be processed sequentially, so
there is no real reason to struggle to get random O(1) access to unicode
characters. 

If this is indeed the case, all these encodings have the same problems
though varying in degree, and UTF-8 is clearly the smallest and most
ascii compatible.  The question then arises as to whether lisp
characters, which by definition do offer random access in strings, need
be the same as or close to unicode characters.  

Did you consider leaving aref, char-code and code-char alone and writing
unicode functions on top of these, i.e. unicode-length!=length, as
opposed to making aref itself do this translation under the hood,
thereby violating the expectation of O(1) access, (which is certainly
offered in other kinds of arrays, though it is questionable whether real
users actually expect this for strings)?  In doing so, one would then
know that aref is random-access, and unicode-??? is sequential only.

Take care,

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Camm Maguire <camm@maguirefamily.org>
>> Cc: emacs-devel@gnu.org,  gcl-devel@gnu.org
>> Date: Thu, 30 Oct 2014 12:27:58 -0400
>> 
>> > I'm not sure what you mean by a "boxed character".  A character in
>> > Emacs is just an int.
>> >
>> 
>> Then how do you distinguish integers from characters at the lisp level?
>
> We don't -- except that a valid character's value must fit the Unicode
> range.
>
> There's no character data type in Emacs.  (XEmacs does have it.)
>
>
>
>

-- 
Camm Maguire			     		    camm@maguirefamily.org
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah