From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alin Soare Newsgroups: gmane.emacs.devel Subject: Re: line buffer as Red Black Trees instead of linear Date: Fri, 16 May 2014 11:53:25 +0300 Message-ID: References: <83mwein6w1.fsf@gnu.org> <83fvkamg9v.fsf@gnu.org> <837g5mm8ke.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=047d7b2e12dbf0e39704f9808bc5 X-Trace: ger.gmane.org 1400230421 17693 80.91.229.3 (16 May 2014 08:53:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 16 May 2014 08:53:41 +0000 (UTC) Cc: Stefan Monnier , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri May 16 10:53:36 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WlDtX-0000r9-GN for ged-emacs-devel@m.gmane.org; Fri, 16 May 2014 10:53:35 +0200 Original-Received: from localhost ([::1]:34169 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WlDtX-0004tg-77 for ged-emacs-devel@m.gmane.org; Fri, 16 May 2014 04:53:35 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50108) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WlDtS-0004tP-JX for emacs-devel@gnu.org; Fri, 16 May 2014 04:53:32 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WlDtQ-000327-Pt for emacs-devel@gnu.org; Fri, 16 May 2014 04:53:30 -0400 Original-Received: from mail-ie0-x22b.google.com ([2607:f8b0:4001:c03::22b]:44770) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WlDtO-00031i-KW; Fri, 16 May 2014 04:53:26 -0400 Original-Received: by mail-ie0-f171.google.com with SMTP id rl12so2222325iec.16 for ; Fri, 16 May 2014 01:53:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TAZYCkitWAO3HaOcoEIbePWJo4Cz5bHt6L644tC0CYk=; b=a39UaEB3JfshHuem1BCj0EElHsRzxlme3TxU9G0hH64nQ1CQ/GzfaJImWvA2KUjwm/ 8Uc5YlBY4ds4Dc7a5u7uuQPbU5FBw1Y7IGHH73ef+76qzq3jBz+73cTFisvEmSIIWDIE HlCRjClTVw/8UbchUXLexlE5BrUd+bJ/Wjs+cB/jRx9PeMzXsaHWG8S9F0D4dfRVI3Pu KeywfLfrpuVggdtUU5wHzWcRNL0p20uDMpfxj3AkVBWkhExjhASMz6nV0JnMeCEnR5PP OsZODyGqWV4uF7dRkIL6aaaWdF/b39weYyk0wcluIDEkVYekLakYmYOshIx1Y7jlEC81 0W7g== X-Received: by 10.50.131.138 with SMTP id om10mr51878255igb.25.1400230405791; Fri, 16 May 2014 01:53:25 -0700 (PDT) Original-Received: by 10.43.142.70 with HTTP; Fri, 16 May 2014 01:53:25 -0700 (PDT) In-Reply-To: <837g5mm8ke.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2607:f8b0:4001:c03::22b X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:171874 Archived-At: --047d7b2e12dbf0e39704f9808bc5 Content-Type: text/plain; charset=UTF-8 > > The values that need to be precomputed are the metrics of each > "display element" (character glyph, stretch glyph, image glyph, etc.) > that will (or would) be displayed when/if the line is shown on the > screen. Your suggestion calls for precomputing those in advance, > while the current implementation computes them only when they are > actually needed. I very much doubt that your suggestion, if > implemented, will be a net win. > > There are many possibilities. In the moment when we need to access them (lazy evaluation) or when we modify them, one can split a node in many nodes, such that all the block of text at a given node will have the same properties, so one can compute any property quickly. In the worse case, when each character of a buffer has a distinct property than its neighbours, a buffer will have a rbt with 1 charater on each node, in which case the timing will be log(length(buffer->text)) for all operations. > It might make sense to cache the values, once they are computed, so > that any subsequent routines that need to access the same portions of > buffer text won't need to recompute the same metrics. But before this > kind of caching is attempted, Someone(TM) should demonstrate that the > same portions of the long lines are traversed by the display engine > more than once, because otherwise what you suggest will be pure > overhead with no hope of any gains. > One can modify accordingly the rbt at the first transversal, or at modification, and at a 2nd nothing more to do, apart from reading the result in log(N). > > > The solution of this problem is the red-black trees, by inserting at each > > node all the pre-computed characteristics of the block of text kept at > the > > given node. Any modification will pre-compute a finite number of nodes, > and > > when need to access them , only a logarithmic # of nodes will be taken in > > consideration for all possible operations. > > "Premature optimization is the root of all evil." (Donald Knuth) > > I would suggest to study the relevant code and, more importantly, time > it to find the _real_ (as opposed to imaginary) hot spots. Then > present that data, and we could thereafter discuss whether your > suggested solution might indeed speed up the code which takes up the > lion's share of the time when Emacs needs to display very long lines. > > I know the theory, but I wanted to report the bug, and provide the solution. I thought that those who have fresh in mind the place in emacs where it slows will say exactly why it slows, and indeed they answered. > If indeed these techniques could help, patches will be very welcome, > as this is an annoying problem, and there's a bug report for it (see > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13675). > This technique can help for sure, but I will not have time to implement it now. I did not know about bug reports, but this problem is annoying for me , as I am using emacs all the time. I will solve it myself, if nobody solves it before. --047d7b2e12dbf0e39704f9808bc5 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
=C2=A0

The values that need to be precomputed are the metrics of each
"display element" (character glyph, stretch glyph, image glyph, e= tc.)
that will (or would) be displayed when/if the line is shown on the
screen. =C2=A0Your suggestion calls for precomputing those in advance,
while the current implementation computes them only when they are
actually needed. =C2=A0I very much doubt that your suggestion, if
implemented, will be a net win.


There are many possibilities.

In the moment when we need to access them (lazy evaluation= ) or when we modify them, one can split a node in many nodes, such that all= the block of text at a given node will have the same properties, so one ca= n compute any property quickly.


In the worse case, when each character o= f a buffer has a distinct property than its neighbours, a buffer will have = a rbt with 1 charater on each node, in which case the timing will be log(le= ngth(buffer->text)) for all operations.


=C2=A0
It might make sense to cache the values, once they are computed, so
that any subsequent routines that need to access the same portions of
buffer text won't need to recompute the same metrics. =C2=A0But before = this
kind of caching is attempted, Someone(TM) should demonstrate that the
same portions of the long lines are traversed by the display engine
more than once, because otherwise what you suggest will be pure
overhead with no hope of any gains.

One= can modify accordingly the rbt at the first transversal, or at modificatio= n, and at a 2nd nothing more to do, apart from reading the result in log(N)= .

=C2=A0

> The solution of this problem is the red-black trees, by inserting at e= ach
> node all the pre-computed characteristics of the block of text kept at= the
> given node. Any modification will pre-compute a finite number of nodes= , and
> when need to access them , only a logarithmic # of nodes will be taken= in
> consideration for all possible operations.

"Premature optimization is the root of all evil." =C2=A0(Do= nald Knuth)

I would suggest to study the relevant code and, more importantly, time
it to find the _real_ (as opposed to imaginary) hot spots. =C2=A0Then
present that data, and we could thereafter discuss whether your
suggested solution might indeed speed up the code which takes up the
lion's share of the time when Emacs needs to display very long lines.

I know the theory, but I wanted to rep= ort the bug, and provide the solution. I thought that those who have fresh = in mind the place in emacs where it slows will say exactly why it slows, an= d indeed they answered.

=C2=A0
If indeed these techniques could help, patches will be very welcome,
as this is an annoying problem, and there's a bug report for it (see http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D13675).

This technique can = help for sure, but I will not have time to implement it now. I did not know= about bug reports, but this problem is annoying for me , as I am using ema= cs all the time.

I will solv= e it myself, if nobody solves it before.

--047d7b2e12dbf0e39704f9808bc5--