From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Thomas Lord <lord@emf.net>
Newsgroups: gmane.emacs.devel
Subject: Re: Release plans
Date: Wed, 27 Aug 2008 18:10:41 -0700
Message-ID: <48B5FB11.6090708@emf.net>
References: <878wusz0v9.fsf@uwakimon.sk.tsukuba.ac.jp>	<E1KVrn3-000497-Rp@fencepost.gnu.org>	<87vdxp27z6.fsf@uwakimon.sk.tsukuba.ac.jp>	<87prnxe5hc.fsf@rattlesnake.com>	<873aktck5d.fsf@uwakimon.sk.tsukuba.ac.jp>	<87k5e5dsvq.fsf@rattlesnake.com>	<E1KY1Xc-0002bG-4j@fencepost.gnu.org>
	<48B44802.1080302@emf.net>	<E1KYQ9q-0000mH-1o@fencepost.gnu.org>
	<48B5D5EF.2030501@emf.net> <20080827220906.GB5374@muc.de>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1219882815 29910 80.91.229.12 (28 Aug 2008 00:20:15 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 28 Aug 2008 00:20:15 +0000 (UTC)
Cc: bob@rattlesnake.com, rms@gnu.org, emacs-devel@gnu.org
To: Alan Mackenzie <acm@muc.de>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Aug 28 02:21:08 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KYVG4-0003hi-OT
	for ged-emacs-devel@m.gmane.org; Thu, 28 Aug 2008 02:21:05 +0200
Original-Received: from localhost ([127.0.0.1]:41240 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KYVF6-0004KX-ED
	for ged-emacs-devel@m.gmane.org; Wed, 27 Aug 2008 20:20:04 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KYVF1-0004I2-NX
	for emacs-devel@gnu.org; Wed, 27 Aug 2008 20:19:59 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KYVF0-0004Ge-Pf
	for emacs-devel@gnu.org; Wed, 27 Aug 2008 20:19:59 -0400
Original-Received: from [199.232.76.173] (port=58059 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KYVF0-0004G9-F0
	for emacs-devel@gnu.org; Wed, 27 Aug 2008 20:19:58 -0400
Original-Received: from mail.42inc.com ([205.149.0.25]:45236)
	by monty-python.gnu.org with esmtps (SSL 3.0:RSA_3DES_EDE_CBC_SHA1:24)
	(Exim 4.60) (envelope-from <lord@emf.net>)
	id 1KYVEv-0000qn-Bs; Wed, 27 Aug 2008 20:19:53 -0400
X-TFF-CGPSA-Version: 1.5
X-TFF-CGPSA-Filter-42inc: Scanned
X-42-Virus-Scanned: by 42 Antivirus -- Found to be clean.
Original-Received: from [69.236.75.128] (account lord@emf.net HELO [192.168.1.64])
	by mail.42inc.com (CommuniGate Pro SMTP 5.0.13)
	with ESMTPA id 37799841; Wed, 27 Aug 2008 17:19:46 -0700
User-Agent: Thunderbird 1.5.0.5 (X11/20060808)
In-Reply-To: <20080827220906.GB5374@muc.de>
X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:103061
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/103061>

Alan Mackenzie wrote:
> If you were to fix a bug, or start implementing a cool new feature - you
> know, make some sort of contribution, no matter how small - it would be
> really appreciated.
>   


I'm working on a new implementation of strings and buffers,
including support for undo, markers, properties, and overlays.
That's going well.

If it finishes well, then I might make a new implementation of
interact loops and redisplay primitives, add a dynamic loader,
and call it a day.   (After which I might work on a dynamically
loadable extension language interpreter.)

There is a fork in the road coming up, though.  I started on the
strings and buffers project for a different purpose, originally:  I
want them for an extended XML parser with SAX and DOM
models.   After I get strings and buffers working I'll have to consider
what to work on next out of the set: XML foo, Emacs interact loop,
or Emacs-style redisplay.   I'm not sure which I'll pick.

The new strings and buffers implementation is inspired by
several needs including but not limited to:

1) to have such as a stand-alone C library
2) to be useful for a lisp implementation without requiring all
    the baggage of a lisp implementation
3) to fix some performance problems that gap buffers have
4) to handle unicode very cleanly
5) to make string-handling C code easier to get right

The string representation is based on splay trees: a string
is a splay tree of gap buffers.   There are various invariants
(that I won't get into here) that make these trees more than
ordinary splay trees -- they are a new data structure (afaict).

The string representation is (semantically) functional: strings
are never modified.     Instead, strings share state (i.e., share
sub-trees) and linear styles of programming are rewarded (e.g.,
sometimes new strings are constructed by cheaply modifying
existing strings *if* there are no other references left to the unmodified
string).

Because strings are functional and share state, "undo" is pretty
trivial.   To snapshot the state of a buffer at some instant,
just make a copy of the string it contains.   That copy
will share state (so it is space efficient and cheap to create). 
That copy won't change (since the string representation is
functional).

The splay tree of gap buffers representation means that the
amortized cost of shifting characters from one side of the
conceptual gap to another (moving the point) is O(log n) (compared
to gap buffers which are O(n)).  At the
same time, the space inefficiency of the tree and time inefficiency
of scanning it are no worse than O(n) (in the length of the string) --
the same as gap buffers.   So the new data structure is at least
asymptotically faster.

Another interesting advantage of this string/buffer representation
is that it is very good at representing *sparse* buffers -- buffers that
are mostly just a string of a single character repeated, only in a few
places interrupted with others.    As a consequence, these new 
strings/buffers
can be used for non-traditional purposes, such as huge sparse bitsets, or
large hash tables.   (A buffer as a hash table?    Roughly speaking: compute
an N-bit hash value, then go to that line of the buffer -- that line of 
the buffer
contains the corresponding (possibly empty) hash bucket.   This 
splay-of-gap-buffers
thing can handle that quite efficiently -- O (log N) for the "goto step" 
with amortized
cost much lower if access patterns display locality.  It's not really 
*quite* that
simple for a fast hash table but it's close.)


This kind of project is very much Emacs related and, also, is slower
than just looking up a minor bug with a quick fix.  I like to get some 
orientation
to the terrain along the way, hence the bursts of verbosity (which, really,
aren't *that* much in the overall ebb and flow).

Oh, the meat of the new string data structure looks like it will weigh in
far under 5K lines of code -- the whole buffer support perhaps doubling it.
It's shaping up as pretty tight code that is pretty and simple to follow 
(another
of the goals).   The main logic for editing primitives and splaying look to
come in under 1K lines of code -- maybe 2K.

Thinking of the possible follow-on projects of interact loop, redisplay, and
dynamic loader:

Interact loops are easy enough and I can certainly make a "stackless"
version (so, no more "recursive minibuffers" hair).

Redisplay is the intimidating one, still, because it just looks hopelessly
tedious.   Conceptually easy, sure, but tedious and requiring too much code.
I'm still thinking about that part.

The combination of the buffers, interact loop, redisplay and loader would
be "an Emacs", stripped of lisp, but ready to dynamically load a lisp
(or any other language) interpreter.

Since you asked so nicely,
-t