From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Beni Cherniavsky <cben@users.sf.net>
Newsgroups: gmane.emacs.bidi,gmane.emacs.devel
Subject: Re: Mixed L2R and R2L paragraphs and horizontal scroll
Date: Thu, 11 Feb 2010 23:40:03 +0200
Message-ID: <30fb12601002111340m26c80bcfi69906ac90d887684@mail.gmail.com>
References: <83tyu3iu6b.fsf@gnu.org> <83vdeghfqg.fsf@gnu.org> 
	<201002012205.o11M5Sci011809@beta.mvs.co.il> <83k4uvh09o.fsf@gnu.org> 
	<201002031310.o13DAqXd019253@beta.mvs.co.il>
	<40314.130.55.118.19.1265230948.squirrel@webmail.lanl.gov> 
	<201002041621.o14GL6w5006928@beta.mvs.co.il> <833a1ghjrj.fsf@gnu.org> 
	<jwvsk9gr6zg.fsf-monnier+emacs@gnu.org> <83tytwf1tp.fsf@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: ger.gmane.org 1265924458 2254 80.91.229.12 (11 Feb 2010 21:40:58 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 11 Feb 2010 21:40:58 +0000 (UTC)
Cc: emacs-bidi@gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>,
	emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org Thu Feb 11 22:40:54 2010
Return-path: <emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org>
Envelope-to: gnu-emacs-bidi@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org>)
	id 1NfgmL-0007ln-GL
	for gnu-emacs-bidi@m.gmane.org; Thu, 11 Feb 2010 22:40:54 +0100
Original-Received: from localhost ([127.0.0.1]:33230 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1NfgmL-0005Wk-0F
	for gnu-emacs-bidi@m.gmane.org; Thu, 11 Feb 2010 16:40:53 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Nfglz-0005HC-72
	for emacs-bidi@gnu.org; Thu, 11 Feb 2010 16:40:31 -0500
Original-Received: from [140.186.70.92] (port=48408 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Nfglw-0005G0-KG
	for emacs-bidi@gnu.org; Thu, 11 Feb 2010 16:40:30 -0500
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <beni.cherniavsky@gmail.com>) id 1Nfglv-0000ER-RA
	for emacs-bidi@gnu.org; Thu, 11 Feb 2010 16:40:28 -0500
Original-Received: from fg-out-1718.google.com ([72.14.220.152]:28804)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <beni.cherniavsky@gmail.com>)
	id 1Nfglt-0000Dd-4b; Thu, 11 Feb 2010 16:40:25 -0500
Original-Received: by fg-out-1718.google.com with SMTP id 16so37624fgg.12
	for <multiple recipients>; Thu, 11 Feb 2010 13:40:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:received:in-reply-to
	:references:from:date:x-google-sender-auth:message-id:subject:to:cc
	:content-type:content-transfer-encoding;
	bh=eyv88i2zA0TAgzkDCJ730RGNJrEunBRdkgC7kPSAXs0=;
	b=T0qysyYz+bZYRxvMfS4nv82gWpt8VM41AeNs2sIFvcuA+P6tX0XFsxMybOm4OJ8PZ6
	igmL/quxeJoqvje7uEAE+aJyjMp0KKdrYYf1jdUfi9tP+9B2dQh0N6af6o2zX1y6kxU/
	0zVN+519qf+OCc/D6+X8ATBHETc1HxPILhMrE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type
	:content-transfer-encoding;
	b=C/fPpn5gLNELGBmy9S99H8Y8op8Xs4qug8CX38kXoCUYGEmjgg2hLPtxRcQy+SdSyY
	6lpqUGWJcmJISmjKOPCqBrpzUATkbajwLeren1Xew3eVg5fOxtURsHWew3vVUDVKN+6Z
	USCYQz75wilqL2hlQm93/7vLuypUjQV4sR7T0=
Original-Received: by 10.87.36.4 with SMTP id o4mr1204867fgj.69.1265924423367; Thu, 11 
	Feb 2010 13:40:23 -0800 (PST)
In-Reply-To: <83tytwf1tp.fsf@gnu.org>
X-Google-Sender-Auth: 2fc55565bef96a2c
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-BeenThere: emacs-bidi@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussion of Emacs support for multi-directional text."
	<emacs-bidi.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-bidi>,
	<mailto:emacs-bidi-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-bidi>
List-Post: <mailto:emacs-bidi@gnu.org>
List-Help: <mailto:emacs-bidi-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-bidi>,
	<mailto:emacs-bidi-request@gnu.org?subject=subscribe>
Original-Sender: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org
Errors-To: emacs-bidi-bounces+gnu-emacs-bidi=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bidi:553 gmane.emacs.devel:121064
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/121064>

On Thu, Feb 4, 2010 at 18:21, Ehud Karni <ehud@unix.mvs.co.il> wrote:
> I wish more users who uses Hebrew routinely will take part in this
> discussion.
>
That'd be my clue ;-).  Hi.  New here, sufficiently retro-lurked.

[Sorry, long mail.  In the first half I'm whining about why I don't
like Eli's solution; but I also reply with technical ideas below...]


First, I want to draw attention to the distinction between line
wrapping and truncation/scrolling.

- Line wrapping (aka continuation lines) in visual order is bad!
  It violates the deep axiom that reading order between *lines*
  is downward regardless of bidi.

- Truncation in visual order is OK!  It fits the rigid scrolling
  model of a small window horizontally moving over a wide page.

  This is the only model of truncation that I've seen in other
  programs.  You either have the "Web" model where lines wrap at
  window boundary and there is no truncation, or the "Page" model
  where lines are layed out onto an underlying page, and you see
  a physical window onto it.

  (It does require a lot of horizontal scrolling to read mixed
  direction text, but that's the user's problem.)

- Truncation in logical order might(?) be OK if coupled with
  logical-order "mirrored" scrolling.

  I've never seen such a program, so I don't know if it would be
  usable.  I believe we can easily try it out by running a plain
  L2R emacs in a bidi terminal, e.g. mlterm.  I'll try to work with
  that a bit to see how it feels...

In all the following, I'm only talking about wrapped continuation
lines.  I got the impression Ehud is also mostly concerned about
contintinuation lines - correct me if I'm wrong.


Second, allow me to sum up Ehud's arguments.

- It's the Right Thing to do.  Books, papers, and correct software have
  always done it this way, and that's what the Unicode standard says.

- Convenience: Doing it the wrong way requires discontinued reading,
  which is annoying.

and add 2 more angles to the issue:

- Mental model, or why imperfect bidi is painful:

  As an R2L user, I constantly maintain a mental model of the logical
  order.  I've got some deep habits and assumptions about the mapping
  from logical to visual.  *Any* deviation will completely confuse my
  poor brain about the logical order of the buffer.

  Worse yet, if I now proceed to *edit* the buffer, I'll modify it in
  completely wrong places, and even when I realize that, fixing it will
  be even harder!  I'll need to *simultaneously* reverse-engineer your
  deviant bidi algorithm and figure out the real logical order, and
  then very carefully fix my edits, all the time getting strangely
  permuted feedback for my actions.

  This involves concentration, a lot of forward/backward-char movement
  to visualize the logical order, and expletives under my breath :-(

  This is the *real* reason we hate broken bidi support.  No bidi at all
  is frequently better - ain't pretty but at least has 1:1 mental model.

- Emotional: this kind of broken bottom-up line wrapping is precisely
  the problem with visual-order Hebrew.  Reduce the browser window width
  on any visual-order site, and you'll see it.  We (Hebrew readers)
  had to live with it until logical-order support arrived in browsers,
  have cursed too many sites that use visual-order to this very day,
  and by now we hate it with a burning passion!

  It's none of your fault, but getting line wrapping wrong will step on
  very sore spots with many users...

To be fair, we're talking about rare situations where embedded text is
broken across lines.  But note that a wrong base direction can inflict
this on whole paragraphs (more on that below).


On Fri, Feb 5, 2010 at 11:50, Eli Zaretskii <eliz@gnu.org> wrote:
> Like Ehud, I think that it would be swell to have what he wants. =C2=A0Bu=
t,
> possibly unlike Ehud, I think that what I have now it not a disaster,
> and we can live with it for the time being, maybe even longer.
>
> The reasons for my decision to implement truncation and continuation
> as I did are:
>
> =C2=A0. It is the only reasonable way to go that does not call for a very
> =C2=A0 =C2=A0serious surgery, perhaps even a total rewrite, of the displa=
y
> =C2=A0 =C2=A0engine code.
>
> =C2=A0. I saw no other editor that supports truncation and behaves
> =C2=A0 =C2=A0otherwise. =C2=A0(I don't know about any editors that suppor=
t
> =C2=A0 =C2=A0continuation lines like Emacs does.) =C2=A0See below.
>
Truncation is OK, but the issue is continuation.

Not following your claim about editors that support continuation -
all these do and behave otherwise (i.e. as Ehud wants):
Notepad, gedit, firefox/webkit, OpenOffice.

> =C2=A0. The issue pops up only in relatively rare situations: mixed
> =C2=A0 =C2=A0L2R/R2L text that gets truncated/continued within a stretch =
of
> =C2=A0 =C2=A0text whose directionality is against the paragraph direction=
.
>
Indeed, embedded text tends to be short.

But I'm afraid it's bigger than you think, because if the base direction
of a paragraph is incorrect, *the whole paragraph* will wrap in this
broken bottom-up manner.  Since base direction guessing is never perfect,
and users don't always have the option - or patience - to fix it manually,
this makes the otherwise minor problem more visible.

Also, changing the base direction of any paragraph will behave funny:
Instead of (mostly) just jumping horizontally, it'll also reverse the
order of lines!

                                                            !of lines
Instead of (mostly) just jumping horizontally, it'll also reverse the
:Also, changing the base direction of any paragraph will behave funny

See?  [estimated, some punctuation might be off]

This also means that forcing all paragraphs to R2L or L2R base direction
(which would be a handy way to momentarily work around wrong imperfect
guessing) would break line order in half the paragraphs in a mixed buffer!


>> If it's just "difficult", then (just like rigid scrolling), it can be
>> kept as a known shortcoming.
>
> It is either VERY difficult or very slow.
>
> The current display code lays out glyphs in each ``glyph row'' one by
> one, in the visual order. =C2=A0Thus, for the portion of text that is
> reversed from its logical order, the bidi reordering code effectively
> delivers the characters backwards to this glyph layout code, in the
> decreasing order of buffer positions. =C2=A0That is, the glyphs assembled
> first are the last ones to be read. =C2=A0Then you hit the window margin,
> and know that there isn't enough place for the whole line. =C2=A0Only the=
n
> you know how many characters will fit on this line. =C2=A0But you know th=
at
> in terms of the last portion of the text in the reading order, which
> tells you very little about how many characters at the beginning of
> this stretch of text you could display instead. =C2=A0(Remember that Emac=
s
> supports variable size characters and different fonts on the same
> line, so just counting characters will not do.)
>
> What would be nice is to scan the text to be reversed in the logical
> order, and find the part of it that will fit on this screen line.
> Then we could reorder only that part.

Right.  Line breaking must be done in logical order.

> But to do that, we need to try
> every possibility by actually doing most of the display work behind
> the scenes, because of the complications with different font sizes,
> faces, composite characters and issues like ligatures and the like,
> which change the amount of screen estate taken by a portion of a line,
> even if you just juxtapose the same two characters.
>
Right, this is a known annoying property of bidi interacting with
typographic features.  Note however that you have a new trade-off here:
if you could compromise precision of line breaking to get correct bidi
behaviour (with fast redisplay), users would be happy.

See below for a concrete attempt.

> With a newline marking the end of the line, it's easy: the bidi
> reordering ends at the newline, then restarts after it.

So if only the line breaking points were static, you'd have no
performance problem!

=3D> Could you maybe cache this information and recompute it only when
the line is edited?  I understand part of the whole point of your
implementation was to avoid any caching of bidi ordering; but caching
of line breaking points sounds much less intrusive...

[XEmacs already has a "Line Start Cache" according to its Internals Manual.
I didn't find a similar overview for Emacs.  Is there anything I can read
to understand Emacs redisplay before I attempt to approach the source?]

> By contrast,
> to support ``bidi-smart continuation'', we need to find the place
> where to break the line, and that is impossible without actually
> trying to display it.
>
> In the example below
>
> =C2=A0word1 word2 WORD1 WORD2
>
> to be displayed as
>
> =C2=A0word1 word2 2DROW 1DROW
>
> if the window is only wide enough to display
>
> =C2=A0word1 word2 =C2=A01DROW
>
> we need to try displaying in order
>
> =C2=A0word1 word2 1
> =C2=A0word1 word2 1D
> =C2=A0word1 word2 1DR
> =C2=A0word1 word2 1DRO
> =C2=A0word1 word2 1DROW
> =C2=A0word1 word2 =C2=A01DROW
> =C2=A0word1 word2 W 1DROW
>
> until we discover where we should stop. =C2=A0(We could do a binary searc=
h,
> of course, but that's details.) =C2=A0I don't think that's reasonable, an=
d
> I have no idea what will this do to the redisplay speed.
>
Binary search is a big improvement!  In 10 attempts you can handle lines
of 1K chars, in 20 - 1M.  On my computer Emacs presently handles 100k
smoothly, 1M already feels sluggish.  By crude (and probably wrong)
computation, binary search would still be fast enough up to 10K...

Also, I presume that the heavy part of a redisplay is normally the actual
output to screen (if not, why do such a complex job minimizing it?).  This
means that "dry" running the engine without actual output 10 times should
result in much less that 10x slowdown.

To top this, I think you can do several times better if you allow some
imprecision in line breaking of mixed-direction paragraphs.  Naturally,
you must not overshoot the screen, but some undershooting is OK.  So it
seems to me that you could reasonably do it with a greedy approach:

(1) Add characters in *logical order* as long as they fit.
(2) Try it in visual order to account for precise typographic stuff.
(3) As long as it doesn't fit, strip one a char and retry (2).
(4) When OK, repeat with actual output display to the screen.

If (1) overestimates, you're left with a shorter line than ideal; if
it underestimates, you do extra iterations.  But I guess that it
normally won't be off by more than one character, so it will look OK
and run fast.

[One pathological case that springs to mind is Arabic shaping.  Doing
(1) in logical order would result in all the wrong ligatures, risking
the estimation being seriously off.  It's still much better than wrong
line order, so I'd ignore that for now; but an Arabic expert opinion
would be welcome...]

Note that this scheme runs the display engine at least 3 times, even
for pure-L2R short lines!  We'd have to optimize the common cases before
a release; I can see how it might work, but I don't want to complicate
the picture at this stage.

As long as we conclude that SOME such scheme is workable, we can
leave the detailed implementation for the future.


Finally, I want to propose a feature that I think will be handy,
and also happens to support efficient wrapping.  The truth is that any
way to wrap an embedding accross lines is ugly!  I'd like a mode where
any embedding either fits completely on a line or starts and ends on a
lines by itself:

+----------------------------------------+
|some latin text followed by            \|
|\          ROF TXET GNOL TAHWEMOS WERBEH|
|\                     SIHT GNITARTSNOMED|
|followed by latin tail                  |
+----------------------------------------+

This is relatively easy to implement efficiently - you add embedded
characters in *visual* order as you propose, but if the embedding
doesn't fit entirely, you just fall back to the breaking where the
embedding started!  You don't even need a stack - I'm talking one
"primary" level for each visual line.

If you don't like any of the other ideas, this seems like a minimally
intrusive way to make your approach more usable.

--=20
Beni Cherniavsky-Paskin <cben@users.sf.net>