From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Thomas Lord Newsgroups: gmane.emacs.devel Subject: Re: Release plans Date: Wed, 27 Aug 2008 18:10:41 -0700 Message-ID: <48B5FB11.6090708@emf.net> References: <878wusz0v9.fsf@uwakimon.sk.tsukuba.ac.jp> <87vdxp27z6.fsf@uwakimon.sk.tsukuba.ac.jp> <87prnxe5hc.fsf@rattlesnake.com> <873aktck5d.fsf@uwakimon.sk.tsukuba.ac.jp> <87k5e5dsvq.fsf@rattlesnake.com> <48B44802.1080302@emf.net> <48B5D5EF.2030501@emf.net> <20080827220906.GB5374@muc.de> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1219882815 29910 80.91.229.12 (28 Aug 2008 00:20:15 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 28 Aug 2008 00:20:15 +0000 (UTC) Cc: bob@rattlesnake.com, rms@gnu.org, emacs-devel@gnu.org To: Alan Mackenzie Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Aug 28 02:21:08 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KYVG4-0003hi-OT for ged-emacs-devel@m.gmane.org; Thu, 28 Aug 2008 02:21:05 +0200 Original-Received: from localhost ([127.0.0.1]:41240 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KYVF6-0004KX-ED for ged-emacs-devel@m.gmane.org; Wed, 27 Aug 2008 20:20:04 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KYVF1-0004I2-NX for emacs-devel@gnu.org; Wed, 27 Aug 2008 20:19:59 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KYVF0-0004Ge-Pf for emacs-devel@gnu.org; Wed, 27 Aug 2008 20:19:59 -0400 Original-Received: from [199.232.76.173] (port=58059 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KYVF0-0004G9-F0 for emacs-devel@gnu.org; Wed, 27 Aug 2008 20:19:58 -0400 Original-Received: from mail.42inc.com ([205.149.0.25]:45236) by monty-python.gnu.org with esmtps (SSL 3.0:RSA_3DES_EDE_CBC_SHA1:24) (Exim 4.60) (envelope-from ) id 1KYVEv-0000qn-Bs; Wed, 27 Aug 2008 20:19:53 -0400 X-TFF-CGPSA-Version: 1.5 X-TFF-CGPSA-Filter-42inc: Scanned X-42-Virus-Scanned: by 42 Antivirus -- Found to be clean. Original-Received: from [69.236.75.128] (account lord@emf.net HELO [192.168.1.64]) by mail.42inc.com (CommuniGate Pro SMTP 5.0.13) with ESMTPA id 37799841; Wed, 27 Aug 2008 17:19:46 -0700 User-Agent: Thunderbird 1.5.0.5 (X11/20060808) In-Reply-To: <20080827220906.GB5374@muc.de> X-detected-kernel: by monty-python.gnu.org: Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:103061 Archived-At: Alan Mackenzie wrote: > If you were to fix a bug, or start implementing a cool new feature - you > know, make some sort of contribution, no matter how small - it would be > really appreciated. > I'm working on a new implementation of strings and buffers, including support for undo, markers, properties, and overlays. That's going well. If it finishes well, then I might make a new implementation of interact loops and redisplay primitives, add a dynamic loader, and call it a day. (After which I might work on a dynamically loadable extension language interpreter.) There is a fork in the road coming up, though. I started on the strings and buffers project for a different purpose, originally: I want them for an extended XML parser with SAX and DOM models. After I get strings and buffers working I'll have to consider what to work on next out of the set: XML foo, Emacs interact loop, or Emacs-style redisplay. I'm not sure which I'll pick. The new strings and buffers implementation is inspired by several needs including but not limited to: 1) to have such as a stand-alone C library 2) to be useful for a lisp implementation without requiring all the baggage of a lisp implementation 3) to fix some performance problems that gap buffers have 4) to handle unicode very cleanly 5) to make string-handling C code easier to get right The string representation is based on splay trees: a string is a splay tree of gap buffers. There are various invariants (that I won't get into here) that make these trees more than ordinary splay trees -- they are a new data structure (afaict). The string representation is (semantically) functional: strings are never modified. Instead, strings share state (i.e., share sub-trees) and linear styles of programming are rewarded (e.g., sometimes new strings are constructed by cheaply modifying existing strings *if* there are no other references left to the unmodified string). Because strings are functional and share state, "undo" is pretty trivial. To snapshot the state of a buffer at some instant, just make a copy of the string it contains. That copy will share state (so it is space efficient and cheap to create). That copy won't change (since the string representation is functional). The splay tree of gap buffers representation means that the amortized cost of shifting characters from one side of the conceptual gap to another (moving the point) is O(log n) (compared to gap buffers which are O(n)). At the same time, the space inefficiency of the tree and time inefficiency of scanning it are no worse than O(n) (in the length of the string) -- the same as gap buffers. So the new data structure is at least asymptotically faster. Another interesting advantage of this string/buffer representation is that it is very good at representing *sparse* buffers -- buffers that are mostly just a string of a single character repeated, only in a few places interrupted with others. As a consequence, these new strings/buffers can be used for non-traditional purposes, such as huge sparse bitsets, or large hash tables. (A buffer as a hash table? Roughly speaking: compute an N-bit hash value, then go to that line of the buffer -- that line of the buffer contains the corresponding (possibly empty) hash bucket. This splay-of-gap-buffers thing can handle that quite efficiently -- O (log N) for the "goto step" with amortized cost much lower if access patterns display locality. It's not really *quite* that simple for a fast hash table but it's close.) This kind of project is very much Emacs related and, also, is slower than just looking up a minor bug with a quick fix. I like to get some orientation to the terrain along the way, hence the bursts of verbosity (which, really, aren't *that* much in the overall ebb and flow). Oh, the meat of the new string data structure looks like it will weigh in far under 5K lines of code -- the whole buffer support perhaps doubling it. It's shaping up as pretty tight code that is pretty and simple to follow (another of the goals). The main logic for editing primitives and splaying look to come in under 1K lines of code -- maybe 2K. Thinking of the possible follow-on projects of interact loop, redisplay, and dynamic loader: Interact loops are easy enough and I can certainly make a "stackless" version (so, no more "recursive minibuffers" hair). Redisplay is the intimidating one, still, because it just looks hopelessly tedious. Conceptually easy, sure, but tedious and requiring too much code. I'm still thinking about that part. The combination of the buffers, interact loop, redisplay and loader would be "an Emacs", stripped of lisp, but ready to dynamically load a lisp (or any other language) interpreter. Since you asked so nicely, -t