From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Beni Cherniavsky Newsgroups: gmane.emacs.bidi,gmane.emacs.devel Subject: Re: Mixed L2R and R2L paragraphs and horizontal scroll Date: Thu, 11 Feb 2010 23:40:03 +0200 Message-ID: <30fb12601002111340m26c80bcfi69906ac90d887684@mail.gmail.com> References: <83tyu3iu6b.fsf@gnu.org> <83vdeghfqg.fsf@gnu.org> <201002012205.o11M5Sci011809@beta.mvs.co.il> <83k4uvh09o.fsf@gnu.org> <201002031310.o13DAqXd019253@beta.mvs.co.il> <40314.130.55.118.19.1265230948.squirrel@webmail.lanl.gov> <201002041621.o14GL6w5006928@beta.mvs.co.il> <833a1ghjrj.fsf@gnu.org> <83tytwf1tp.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1265924458 2254 80.91.229.12 (11 Feb 2010 21:40:58 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 11 Feb 2010 21:40:58 +0000 (UTC) Cc: emacs-bidi@gnu.org, Stefan Monnier , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Thu Feb 11 22:40:54 2010 Return-path: Envelope-to: gnu-emacs-bidi@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NfgmL-0007ln-GL for gnu-emacs-bidi@m.gmane.org; Thu, 11 Feb 2010 22:40:54 +0100 Original-Received: from localhost ([127.0.0.1]:33230 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NfgmL-0005Wk-0F for gnu-emacs-bidi@m.gmane.org; Thu, 11 Feb 2010 16:40:53 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Nfglz-0005HC-72 for emacs-bidi@gnu.org; Thu, 11 Feb 2010 16:40:31 -0500 Original-Received: from [140.186.70.92] (port=48408 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Nfglw-0005G0-KG for emacs-bidi@gnu.org; Thu, 11 Feb 2010 16:40:30 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Nfglv-0000ER-RA for emacs-bidi@gnu.org; Thu, 11 Feb 2010 16:40:28 -0500 Original-Received: from fg-out-1718.google.com ([72.14.220.152]:28804) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Nfglt-0000Dd-4b; Thu, 11 Feb 2010 16:40:25 -0500 Original-Received: by fg-out-1718.google.com with SMTP id 16so37624fgg.12 for ; Thu, 11 Feb 2010 13:40:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:from:date:x-google-sender-auth:message-id:subject:to:cc :content-type:content-transfer-encoding; bh=eyv88i2zA0TAgzkDCJ730RGNJrEunBRdkgC7kPSAXs0=; b=T0qysyYz+bZYRxvMfS4nv82gWpt8VM41AeNs2sIFvcuA+P6tX0XFsxMybOm4OJ8PZ6 igmL/quxeJoqvje7uEAE+aJyjMp0KKdrYYf1jdUfi9tP+9B2dQh0N6af6o2zX1y6kxU/ 0zVN+519qf+OCc/D6+X8ATBHETc1HxPILhMrE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; b=C/fPpn5gLNELGBmy9S99H8Y8op8Xs4qug8CX38kXoCUYGEmjgg2hLPtxRcQy+SdSyY 6lpqUGWJcmJISmjKOPCqBrpzUATkbajwLeren1Xew3eVg5fOxtURsHWew3vVUDVKN+6Z USCYQz75wilqL2hlQm93/7vLuypUjQV4sR7T0= Original-Received: by 10.87.36.4 with SMTP id o4mr1204867fgj.69.1265924423367; Thu, 11 Feb 2010 13:40:23 -0800 (PST) In-Reply-To: <83tytwf1tp.fsf@gnu.org> X-Google-Sender-Auth: 2fc55565bef96a2c X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-bidi@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussion of Emacs support for multi-directional text." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bidi:553 gmane.emacs.devel:121064 Archived-At: On Thu, Feb 4, 2010 at 18:21, Ehud Karni wrote: > I wish more users who uses Hebrew routinely will take part in this > discussion. > That'd be my clue ;-). Hi. New here, sufficiently retro-lurked. [Sorry, long mail. In the first half I'm whining about why I don't like Eli's solution; but I also reply with technical ideas below...] First, I want to draw attention to the distinction between line wrapping and truncation/scrolling. - Line wrapping (aka continuation lines) in visual order is bad! It violates the deep axiom that reading order between *lines* is downward regardless of bidi. - Truncation in visual order is OK! It fits the rigid scrolling model of a small window horizontally moving over a wide page. This is the only model of truncation that I've seen in other programs. You either have the "Web" model where lines wrap at window boundary and there is no truncation, or the "Page" model where lines are layed out onto an underlying page, and you see a physical window onto it. (It does require a lot of horizontal scrolling to read mixed direction text, but that's the user's problem.) - Truncation in logical order might(?) be OK if coupled with logical-order "mirrored" scrolling. I've never seen such a program, so I don't know if it would be usable. I believe we can easily try it out by running a plain L2R emacs in a bidi terminal, e.g. mlterm. I'll try to work with that a bit to see how it feels... In all the following, I'm only talking about wrapped continuation lines. I got the impression Ehud is also mostly concerned about contintinuation lines - correct me if I'm wrong. Second, allow me to sum up Ehud's arguments. - It's the Right Thing to do. Books, papers, and correct software have always done it this way, and that's what the Unicode standard says. - Convenience: Doing it the wrong way requires discontinued reading, which is annoying. and add 2 more angles to the issue: - Mental model, or why imperfect bidi is painful: As an R2L user, I constantly maintain a mental model of the logical order. I've got some deep habits and assumptions about the mapping from logical to visual. *Any* deviation will completely confuse my poor brain about the logical order of the buffer. Worse yet, if I now proceed to *edit* the buffer, I'll modify it in completely wrong places, and even when I realize that, fixing it will be even harder! I'll need to *simultaneously* reverse-engineer your deviant bidi algorithm and figure out the real logical order, and then very carefully fix my edits, all the time getting strangely permuted feedback for my actions. This involves concentration, a lot of forward/backward-char movement to visualize the logical order, and expletives under my breath :-( This is the *real* reason we hate broken bidi support. No bidi at all is frequently better - ain't pretty but at least has 1:1 mental model. - Emotional: this kind of broken bottom-up line wrapping is precisely the problem with visual-order Hebrew. Reduce the browser window width on any visual-order site, and you'll see it. We (Hebrew readers) had to live with it until logical-order support arrived in browsers, have cursed too many sites that use visual-order to this very day, and by now we hate it with a burning passion! It's none of your fault, but getting line wrapping wrong will step on very sore spots with many users... To be fair, we're talking about rare situations where embedded text is broken across lines. But note that a wrong base direction can inflict this on whole paragraphs (more on that below). On Fri, Feb 5, 2010 at 11:50, Eli Zaretskii wrote: > Like Ehud, I think that it would be swell to have what he wants. =C2=A0Bu= t, > possibly unlike Ehud, I think that what I have now it not a disaster, > and we can live with it for the time being, maybe even longer. > > The reasons for my decision to implement truncation and continuation > as I did are: > > =C2=A0. It is the only reasonable way to go that does not call for a very > =C2=A0 =C2=A0serious surgery, perhaps even a total rewrite, of the displa= y > =C2=A0 =C2=A0engine code. > > =C2=A0. I saw no other editor that supports truncation and behaves > =C2=A0 =C2=A0otherwise. =C2=A0(I don't know about any editors that suppor= t > =C2=A0 =C2=A0continuation lines like Emacs does.) =C2=A0See below. > Truncation is OK, but the issue is continuation. Not following your claim about editors that support continuation - all these do and behave otherwise (i.e. as Ehud wants): Notepad, gedit, firefox/webkit, OpenOffice. > =C2=A0. The issue pops up only in relatively rare situations: mixed > =C2=A0 =C2=A0L2R/R2L text that gets truncated/continued within a stretch = of > =C2=A0 =C2=A0text whose directionality is against the paragraph direction= . > Indeed, embedded text tends to be short. But I'm afraid it's bigger than you think, because if the base direction of a paragraph is incorrect, *the whole paragraph* will wrap in this broken bottom-up manner. Since base direction guessing is never perfect, and users don't always have the option - or patience - to fix it manually, this makes the otherwise minor problem more visible. Also, changing the base direction of any paragraph will behave funny: Instead of (mostly) just jumping horizontally, it'll also reverse the order of lines! !of lines Instead of (mostly) just jumping horizontally, it'll also reverse the :Also, changing the base direction of any paragraph will behave funny See? [estimated, some punctuation might be off] This also means that forcing all paragraphs to R2L or L2R base direction (which would be a handy way to momentarily work around wrong imperfect guessing) would break line order in half the paragraphs in a mixed buffer! >> If it's just "difficult", then (just like rigid scrolling), it can be >> kept as a known shortcoming. > > It is either VERY difficult or very slow. > > The current display code lays out glyphs in each ``glyph row'' one by > one, in the visual order. =C2=A0Thus, for the portion of text that is > reversed from its logical order, the bidi reordering code effectively > delivers the characters backwards to this glyph layout code, in the > decreasing order of buffer positions. =C2=A0That is, the glyphs assembled > first are the last ones to be read. =C2=A0Then you hit the window margin, > and know that there isn't enough place for the whole line. =C2=A0Only the= n > you know how many characters will fit on this line. =C2=A0But you know th= at > in terms of the last portion of the text in the reading order, which > tells you very little about how many characters at the beginning of > this stretch of text you could display instead. =C2=A0(Remember that Emac= s > supports variable size characters and different fonts on the same > line, so just counting characters will not do.) > > What would be nice is to scan the text to be reversed in the logical > order, and find the part of it that will fit on this screen line. > Then we could reorder only that part. Right. Line breaking must be done in logical order. > But to do that, we need to try > every possibility by actually doing most of the display work behind > the scenes, because of the complications with different font sizes, > faces, composite characters and issues like ligatures and the like, > which change the amount of screen estate taken by a portion of a line, > even if you just juxtapose the same two characters. > Right, this is a known annoying property of bidi interacting with typographic features. Note however that you have a new trade-off here: if you could compromise precision of line breaking to get correct bidi behaviour (with fast redisplay), users would be happy. See below for a concrete attempt. > With a newline marking the end of the line, it's easy: the bidi > reordering ends at the newline, then restarts after it. So if only the line breaking points were static, you'd have no performance problem! =3D> Could you maybe cache this information and recompute it only when the line is edited? I understand part of the whole point of your implementation was to avoid any caching of bidi ordering; but caching of line breaking points sounds much less intrusive... [XEmacs already has a "Line Start Cache" according to its Internals Manual. I didn't find a similar overview for Emacs. Is there anything I can read to understand Emacs redisplay before I attempt to approach the source?] > By contrast, > to support ``bidi-smart continuation'', we need to find the place > where to break the line, and that is impossible without actually > trying to display it. > > In the example below > > =C2=A0word1 word2 WORD1 WORD2 > > to be displayed as > > =C2=A0word1 word2 2DROW 1DROW > > if the window is only wide enough to display > > =C2=A0word1 word2 =C2=A01DROW > > we need to try displaying in order > > =C2=A0word1 word2 1 > =C2=A0word1 word2 1D > =C2=A0word1 word2 1DR > =C2=A0word1 word2 1DRO > =C2=A0word1 word2 1DROW > =C2=A0word1 word2 =C2=A01DROW > =C2=A0word1 word2 W 1DROW > > until we discover where we should stop. =C2=A0(We could do a binary searc= h, > of course, but that's details.) =C2=A0I don't think that's reasonable, an= d > I have no idea what will this do to the redisplay speed. > Binary search is a big improvement! In 10 attempts you can handle lines of 1K chars, in 20 - 1M. On my computer Emacs presently handles 100k smoothly, 1M already feels sluggish. By crude (and probably wrong) computation, binary search would still be fast enough up to 10K... Also, I presume that the heavy part of a redisplay is normally the actual output to screen (if not, why do such a complex job minimizing it?). This means that "dry" running the engine without actual output 10 times should result in much less that 10x slowdown. To top this, I think you can do several times better if you allow some imprecision in line breaking of mixed-direction paragraphs. Naturally, you must not overshoot the screen, but some undershooting is OK. So it seems to me that you could reasonably do it with a greedy approach: (1) Add characters in *logical order* as long as they fit. (2) Try it in visual order to account for precise typographic stuff. (3) As long as it doesn't fit, strip one a char and retry (2). (4) When OK, repeat with actual output display to the screen. If (1) overestimates, you're left with a shorter line than ideal; if it underestimates, you do extra iterations. But I guess that it normally won't be off by more than one character, so it will look OK and run fast. [One pathological case that springs to mind is Arabic shaping. Doing (1) in logical order would result in all the wrong ligatures, risking the estimation being seriously off. It's still much better than wrong line order, so I'd ignore that for now; but an Arabic expert opinion would be welcome...] Note that this scheme runs the display engine at least 3 times, even for pure-L2R short lines! We'd have to optimize the common cases before a release; I can see how it might work, but I don't want to complicate the picture at this stage. As long as we conclude that SOME such scheme is workable, we can leave the detailed implementation for the future. Finally, I want to propose a feature that I think will be handy, and also happens to support efficient wrapping. The truth is that any way to wrap an embedding accross lines is ugly! I'd like a mode where any embedding either fits completely on a line or starts and ends on a lines by itself: +----------------------------------------+ |some latin text followed by \| |\ ROF TXET GNOL TAHWEMOS WERBEH| |\ SIHT GNITARTSNOMED| |followed by latin tail | +----------------------------------------+ This is relatively easy to implement efficiently - you add embedded characters in *visual* order as you propose, but if the embedding doesn't fit entirely, you just fall back to the breaking where the embedding started! You don't even need a stack - I'm talking one "primary" level for each visual line. If you don't like any of the other ideas, this seems like a minimally intrusive way to make your approach more usable. --=20 Beni Cherniavsky-Paskin